Skip to content

Model Context Protocol server for extracting GitHub repositories and Colab notebooks into LLM-digestible formats. Supports intelligent filtering, binary detection, and seamless integration with Cursor IDE, Claude Desktop and MCP Clients.

Notifications You must be signed in to change notification settings

sidxh/data-extractor

Repository files navigation

This is a Next.js project that provides a web UI for a powerful code extraction tool.

Features

  • Web Interface: A clean, intuitive UI for extracting content from GitHub and Google Colab URLs.
  • MCP Server: The core logic is also packaged as an NPM module that can be used as a tool in AI assistants like Cursor and Claude.

MCP Package

The code-extractor-mcp package provides command-line access to the extraction tools.

Getting Started

To run the web interface locally:

First, install the dependencies:

npm install

Then, run the development server:

npm run dev
# or
yarn dev
# or
pnpm dev
# or
bun dev

Open http://localhost:3000 with your browser to see the result.

You can start editing the page by modifying app/page.tsx. The page auto-updates as you edit the file.

This project uses next/font to automatically optimize and load Geist, a new font family for Vercel.

Learn More

To learn more about Next.js, take a look at the following resources:

You can check out the Next.js GitHub repository - your feedback and contributions are welcome!

Deploy on Vercel

The easiest way to deploy your Next.js app is to use the Vercel Platform from the creators of Next.js.

Check out our Next.js deployment documentation for more details.

data-extractor

About

Model Context Protocol server for extracting GitHub repositories and Colab notebooks into LLM-digestible formats. Supports intelligent filtering, binary detection, and seamless integration with Cursor IDE, Claude Desktop and MCP Clients.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published