Skip to content

zizzfizzix/scrape-similar

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

342 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scrape Similar

A powerful browser extension for extracting structured data from websites and exporting to Google Sheets, Excel (.xlsx), CSV, TSV, or the clipboard. Built with WXT, Vite, React 19, TypeScript, and Manifest V3 WebExtension APIs.

🚀 Installation

For end-users: Download from the Chrome Web Store

Currently only tested in Chrome. Pre-built packages for all major browsers (Chromium, Firefox, & Edge) will be published at a later time. Until then, you can build the extension locally.

Build from source

See the Development section below for step-by-step instructions.

✨ Features

Core Functionality

  • Visual Element Picker: Point and click to select elements visually - the easiest way to scrape data
  • Web Scraping: Extract structured data from any website using XPath selectors
  • Google Sheets Export: Direct export with OAuth2 authentication
  • Multiple Export Formats: Excel (.xlsx), CSV, TSV, and clipboard export options
  • Preset Management: Save and load scraping configurations
  • System Presets: Pre-built configurations for common use cases

User Experience

  • Visual Element Picker: Hover over elements to see matches highlighted in real-time, adjust selector specificity, and click to scrape
  • Side Panel Interface: Work alongside your browsing without interruption
  • Quick Scrape: Right-click any element to start scraping instantly
  • Keyboard Shortcuts:
    • ⌘+Shift+X (macOS) or Ctrl+Shift+X (Windows/Linux) - Toggle Visual Picker Mode
    • ⌘+Shift+S (macOS) or Ctrl+Shift+S (Windows/Linux) - Toggle Side Panel
  • Theme Support: Light and dark mode
  • Interactive Onboarding: Comprehensive guide for new users

Built-in Presets

  • Link Analysis: Nofollow, sponsored, UGC, and dofollow links
  • Content Elements: Headings (H1-H6), images, forms, buttons & CTAs
  • Navigation: Internal and external links, social media links
  • SEO Elements: Meta tags, structured data, and more

🛠️ Development

Prerequisites

  • Node.js >= 22.0.0
  • pnpm (recommended) or npm
  • A modern browser (Chromium-based, Firefox, or Edge) for testing

For a deep dive on setting up a new project or migrating an existing one to WXT, see the official installation guide: WXT ‑ Installation.

Installation

  1. Clone the repository:

    git clone https://github.com/zizzfizzix/scrape-similar.git
    cd scrape-similar
  2. Install dependencies and prepare WXT:

    pnpm install       # or npm install
    pnpm postinstall   # runs `wxt prepare` so the dev server recognises entrypoints

Development Commands

# Start development server with automatic browser-reloading (WXT dev)
pnpm dev            # or npm run dev

# Target a specific browser in dev mode (e.g. Firefox)
pnpm dev:firefox    # or npm run dev:firefox

# Production build
pnpm build          # or npm run build

# Production build for Firefox (adds sources.zip required by AMO)
pnpm build:firefox

# Create zipped packages for submission
a) Generic zip        -> pnpm zip
b) Firefox zip & src  -> pnpm zip:firefox

# Submit a release programmatically (requires .env.submit)
pnpm wxt submit --chrome-zip .output/*-chrome.zip --firefox-zip .output/*-firefox.zip

For the full list of available commands (build -b, zip -b, submit, etc.) consult the official WXT docs.

Browser Extension Development Mode

Most modern browsers allow loading unpacked extensions while in developer/debug mode:

  1. Open your browser’s extensions management page and enable developer/debug mode.
  2. Select “Load unpacked” (or the equivalent option) and choose the build directory generated by the dev or build command.
  3. The extension will now be loaded and ready for testing.

🏗️ Architecture

Project Structure

src/
├── entrypoints/
│   ├── background/      # Background service worker (wxt.entry.background)
│   ├── content/         # Content scripts (wxt.entry.content)
│   ├── options/         # Options page (React)
│   ├── sidepanel/       # Side panel (React)
│   └── onboarding/      # Interactive onboarding (React)
├── components/          # Shared React components
│   └── ui/              # shadcn-ui components
├── utils/               # Utility functions (storage, analytics, scraping, etc.)
├── assets/              # Static assets
└── wxt.config.ts        # WXT configuration

Key Technologies

  • Extension framework: WXT
  • Frontend: React 19, TypeScript, Tailwind CSS
  • UI Components: shadcn-ui (Radix UI primitives)
  • Build Tooling: Vite
  • Analytics: PostHog (privacy-focused)
  • WebExtension APIs: Manifest V3, service workers

Development Guidelines

Code Style

  • Use TypeScript for all code
  • Prefer functional components with React hooks
  • Use descriptive variable names with auxiliary verbs
  • Follow the established project structure

Web Extension Best Practices

  • Use Manifest V3 features and patterns
  • Abstract browser-specific extension APIs with WXT
  • Implement proper error handling for WebExtension APIs
  • Follow the principle of least privilege for permissions
  • Use event-driven service workers for background tasks

State Management

  • Use React’s useState and useReducer for component state
  • Persist data with the storage WebExtension API through WXT storage
  • Use message passing for cross-context communication

🧪 Testing

Manual Testing

  • Test on various website types (e-commerce, blogs, portfolios)
  • Verify all export formats work correctly
  • Test Google Sheets integration with different data sizes
  • Verify visual picker functionality and keyboard shortcuts
  • Test context menu quick scrape functionality

Debugging

  • Use your browser’s DevTools for debugging
  • Inspect the background service worker console
  • Monitor network requests in the Network tab
  • Enable the extension’s built-in debug mode

📦 Building for Production

WXT wraps Vite so the standard build pipeline is still available, but the preferred workflow is via WXT’s CLI helpers:

# Multi-browser release build
pnpm build                # Chromium-compatible output in .output
pnpm build:firefox        # Adds Firefox-flavoured output

The resulting files are placed in the .output directory. Each variant can then be zipped with pnpm zip or pnpm zip:<browser> before uploading to the respective store.

🚀 Publishing

WXT ships with powerful automation for packaging and submitting new versions:

  1. Generate ZIP(s) for the browsers you plan to publish. Example:

    pnpm zip                # Chrome / Edge package
    pnpm zip:firefox        # Firefox package *and* mandatory sources zip
  2. (Optional) Use pnpm wxt submit to programmatically upload each ZIP to its store. See the Publishing guide for required environment variables and advanced CI setups.

  3. Safari is not yet fully automated. If you target Safari, follow Apple’s safari-web-extension-converter flow as described in the WXT Safari notes. WXT can still output a Safari-ready bundle via:

    pnpm wxt build -b safari

Need continuous delivery? Copy the GitHub Actions workflow in the WXT docs to zip and submit on every version tag.

🐛 Support & Issues

For bugs, feature requests, or questions:

  • Open an issue in this repository
  • Check existing issues for known problems
  • Review the privacy policy for data-handling concerns

📄 License

MIT License – see LICENSE for details

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

📊 Analytics

This extension uses PostHog for anonymous analytics to improve the user experience. No personal data or scraped content is collected. See PRIVACY_POLICY.md for details.

About

A powerful browser extension for extracting structured data from websites and exporting to Google Sheets, Excel (.xlsx), CSV, TSV, or the clipboard.

Resources

License

Stars

Watchers

Forks

Contributors

Languages