A modern file storage and semantic search system built with Cloudflare Workers, R2, and Durable Objects. This application provides file upload capabilities with AI-powered semantic search through WebSocket connections.
- File Upload to R2: Upload files directly to Cloudflare R2 object storage
- Semantic Search: AI-powered semantic search using Cloudflare AI embeddings
- Durable Objects: Persistent SQLite database for metadata and user information
- WebSocket Support: Real-time search results through WebSocket connections
- React UI: Beautiful, responsive React interface for file management
- User Tracking: SQLite database storing user information and search history
-
Cloudflare Worker (
src/worker.ts)- Handles HTTP requests and routing
- Manages file uploads to R2
- Serves the React application
- Proxies WebSocket connections to Durable Objects
-
Durable Object (
src/durable-object.ts)- Manages SQLite database with three tables:
files: File metadata and embeddingsusers: User information and activitysearch_history: Search queries and results
- Handles WebSocket connections for real-time search
- Performs semantic search using cosine similarity
- Generates embeddings using Cloudflare AI
- Manages SQLite database with three tables:
-
React UI (embedded in worker.ts)
- File upload interface with drag-and-drop
- Semantic search with real-time results
- File browser with download capabilities
- WebSocket status indicator
GET /- Serves the React applicationPOST /api/upload- Upload a file to R2- FormData:
file,userId
- FormData:
GET /api/files- List all files in R2GET /api/download/:filename- Download a file from R2POST /api/search- Perform semantic search- Body:
{ query: string, userId?: string }
- Body:
Connect to /api/ws for real-time search:
// Connect
ws.send(JSON.stringify({ type: 'register', userId: 'user123' }))
// Search
ws.send(JSON.stringify({ type: 'search', query: 'your query', userId: 'user123' }))
// Response
{ type: 'search_results', results: [...] }POST /index- Index a file with embeddingsPOST /search- Perform semantic searchGET /user?userId=xxx- Get user informationPOST /user- Update user informationGET /stats- Get usage statistics
CREATE TABLE files (
id INTEGER PRIMARY KEY AUTOINCREMENT,
filename TEXT NOT NULL,
text TEXT,
userId TEXT NOT NULL,
fileType TEXT,
fileSize INTEGER,
embedding TEXT,
createdAt TEXT DEFAULT CURRENT_TIMESTAMP
)CREATE TABLE users (
id INTEGER PRIMARY KEY AUTOINCREMENT,
userId TEXT UNIQUE NOT NULL,
username TEXT,
email TEXT,
createdAt TEXT DEFAULT CURRENT_TIMESTAMP,
lastActive TEXT DEFAULT CURRENT_TIMESTAMP
)CREATE TABLE search_history (
id INTEGER PRIMARY KEY AUTOINCREMENT,
userId TEXT NOT NULL,
query TEXT NOT NULL,
resultsCount INTEGER,
searchedAt TEXT DEFAULT CURRENT_TIMESTAMP
)- Node.js 18+
- Cloudflare account
- Wrangler CLI installed globally
- Clone the repository:
git clone <repository-url>
cd cfstore- Install dependencies:
npm install-
Configure your Cloudflare account in
wrangler.toml:- Update the R2 bucket name if needed
- Ensure AI binding is available in your account
-
Create the R2 bucket:
wrangler r2 bucket create cfstore-files- Deploy the worker:
npm run deployRun the development server:
npm run devThis will start a local development server at http://localhost:8787
name = "cfstore"
main = "src/worker.ts"
compatibility_date = "2024-11-01"
node_compat = true
[durable_objects]
bindings = [
{ name = "SEMANTIC_STORE", class_name = "SemanticStore" }
]
[[migrations]]
tag = "v1"
new_classes = ["SemanticStore"]
[[r2_buckets]]
binding = "FILE_BUCKET"
bucket_name = "cfstore-files"
[ai]
binding = "AI"- User uploads a file through the React UI
- Worker receives the file and uploads it to R2
- Worker extracts text content from the file
- Worker sends metadata to Durable Object
- Durable Object generates embeddings using Cloudflare AI
- Durable Object stores metadata and embeddings in SQLite
- User enters a search query
- Query is sent via WebSocket or REST API
- Durable Object generates embedding for the query
- Performs cosine similarity comparison with stored embeddings
- Returns top 10 most similar files
- Results are sent back to the client
- Search is logged in search_history table
Uses Cloudflare AI's @cf/baai/bge-base-en-v1.5 model to generate embeddings:
- Text is limited to 5000 characters for embedding
- Embeddings are stored as JSON strings in SQLite
- Cosine similarity is used to compare embeddings
- Cloudflare Workers: Serverless edge computing
- Cloudflare R2: S3-compatible object storage
- Durable Objects: Stateful serverless objects with SQLite
- Cloudflare AI: AI inference at the edge
- React: UI framework (loaded via CDN)
- TypeScript: Type-safe development
The semantic search uses vector embeddings to find similar documents based on meaning rather than exact text matches. This allows for:
- Finding documents with similar concepts
- Language understanding beyond keyword matching
- Ranking results by semantic similarity
The Durable Object uses SQLite for persistent storage:
- ACID compliant transactions
- Efficient indexing for fast queries
- Relational data modeling
- Full SQL query support
Real-time bidirectional communication:
- Instant search results
- Connection status updates
- Automatic reconnection on disconnect
- Efficient for repeated queries
- Text extraction is basic (only for text files)
- Embedding generation limited to 5000 characters
- Single Durable Object instance (scalable with sharding)
- R2 bucket name must be unique globally
- Support for PDF and document parsing
- Batch file upload
- Advanced filtering and sorting
- User authentication
- File sharing and permissions
- Analytics dashboard
- Full-text search fallback
- Multiple R2 bucket support
MIT
For issues and questions, please open an issue on GitHub.