CFStore - Semantic File Storage

A modern file storage and semantic search system built with Cloudflare Workers, R2, and Durable Objects. This application provides file upload capabilities with AI-powered semantic search through WebSocket connections.

Features

File Upload to R2: Upload files directly to Cloudflare R2 object storage
Semantic Search: AI-powered semantic search using Cloudflare AI embeddings
Durable Objects: Persistent SQLite database for metadata and user information
WebSocket Support: Real-time search results through WebSocket connections
React UI: Beautiful, responsive React interface for file management
User Tracking: SQLite database storing user information and search history

Architecture

Components

Cloudflare Worker (src/worker.ts)
- Handles HTTP requests and routing
- Manages file uploads to R2
- Serves the React application
- Proxies WebSocket connections to Durable Objects
Durable Object (src/durable-object.ts)
- Manages SQLite database with three tables:
  - files: File metadata and embeddings
  - users: User information and activity
  - search_history: Search queries and results
- Handles WebSocket connections for real-time search
- Performs semantic search using cosine similarity
- Generates embeddings using Cloudflare AI
React UI (embedded in worker.ts)
- File upload interface with drag-and-drop
- Semantic search with real-time results
- File browser with download capabilities
- WebSocket status indicator

API Endpoints

REST API

GET / - Serves the React application
POST /api/upload - Upload a file to R2
- FormData: file, userId
GET /api/files - List all files in R2
GET /api/download/:filename - Download a file from R2
POST /api/search - Perform semantic search
- Body: { query: string, userId?: string }

WebSocket API

Connect to /api/ws for real-time search:

// Connect
ws.send(JSON.stringify({ type: 'register', userId: 'user123' }))

// Search
ws.send(JSON.stringify({ type: 'search', query: 'your query', userId: 'user123' }))

// Response
{ type: 'search_results', results: [...] }

Durable Object Endpoints

POST /index - Index a file with embeddings
POST /search - Perform semantic search
GET /user?userId=xxx - Get user information
POST /user - Update user information
GET /stats - Get usage statistics

Database Schema

Files Table

CREATE TABLE files (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  filename TEXT NOT NULL,
  text TEXT,
  userId TEXT NOT NULL,
  fileType TEXT,
  fileSize INTEGER,
  embedding TEXT,
  createdAt TEXT DEFAULT CURRENT_TIMESTAMP
)

Users Table

CREATE TABLE users (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  userId TEXT UNIQUE NOT NULL,
  username TEXT,
  email TEXT,
  createdAt TEXT DEFAULT CURRENT_TIMESTAMP,
  lastActive TEXT DEFAULT CURRENT_TIMESTAMP
)

Search History Table

CREATE TABLE search_history (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  userId TEXT NOT NULL,
  query TEXT NOT NULL,
  resultsCount INTEGER,
  searchedAt TEXT DEFAULT CURRENT_TIMESTAMP
)

Setup

Prerequisites

Node.js 18+
Cloudflare account
Wrangler CLI installed globally

Installation

Clone the repository:

git clone <repository-url>
cd cfstore

Install dependencies:

npm install

Configure your Cloudflare account in wrangler.toml:
- Update the R2 bucket name if needed
- Ensure AI binding is available in your account
Create the R2 bucket:

wrangler r2 bucket create cfstore-files

Deploy the worker:

npm run deploy

Development

Run the development server:

npm run dev

This will start a local development server at http://localhost:8787

Configuration

wrangler.toml

name = "cfstore"
main = "src/worker.ts"
compatibility_date = "2024-11-01"
node_compat = true

[durable_objects]
bindings = [
  { name = "SEMANTIC_STORE", class_name = "SemanticStore" }
]

[[migrations]]
tag = "v1"
new_classes = ["SemanticStore"]

[[r2_buckets]]
binding = "FILE_BUCKET"
bucket_name = "cfstore-files"

[ai]
binding = "AI"

How It Works

File Upload Flow

User uploads a file through the React UI
Worker receives the file and uploads it to R2
Worker extracts text content from the file
Worker sends metadata to Durable Object
Durable Object generates embeddings using Cloudflare AI
Durable Object stores metadata and embeddings in SQLite

Semantic Search Flow

User enters a search query
Query is sent via WebSocket or REST API
Durable Object generates embedding for the query
Performs cosine similarity comparison with stored embeddings
Returns top 10 most similar files
Results are sent back to the client
Search is logged in search_history table

Embedding Generation

Uses Cloudflare AI's @cf/baai/bge-base-en-v1.5 model to generate embeddings:

Text is limited to 5000 characters for embedding
Embeddings are stored as JSON strings in SQLite
Cosine similarity is used to compare embeddings

Technologies Used

Cloudflare Workers: Serverless edge computing
Cloudflare R2: S3-compatible object storage
Durable Objects: Stateful serverless objects with SQLite
Cloudflare AI: AI inference at the edge
React: UI framework (loaded via CDN)
TypeScript: Type-safe development

Features in Detail

Semantic Search

The semantic search uses vector embeddings to find similar documents based on meaning rather than exact text matches. This allows for:

Finding documents with similar concepts
Language understanding beyond keyword matching
Ranking results by semantic similarity

SQLite Database

The Durable Object uses SQLite for persistent storage:

ACID compliant transactions
Efficient indexing for fast queries
Relational data modeling
Full SQL query support

WebSocket Support

Real-time bidirectional communication:

Instant search results
Connection status updates
Automatic reconnection on disconnect
Efficient for repeated queries

Limitations

Text extraction is basic (only for text files)
Embedding generation limited to 5000 characters
Single Durable Object instance (scalable with sharding)
R2 bucket name must be unique globally

Future Enhancements

License

MIT

Support

For issues and questions, please open an issue on GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
package.json		package.json
tsconfig.json		tsconfig.json
wrangler.toml		wrangler.toml

brokenashish/cfstore

Folders and files

Latest commit

History

Repository files navigation