This is a Bluesky labeler that automatically detects faces of public figures in post images and applies labels to those posts.
Example: When someone posts an image containing Donald Trump, the labeler automatically detects his face and applies the "trump" label to that post. Users who subscribe to this labeler will see these labels on posts in their feeds.
Starting scope: Currently configured to detect Trump as a proof of concept. Easy to expand to other public figures.
This project requires familiarity with TypeScript, the command line, and Linux.
If you find this project helpful, consider supporting development:
Venmo: @Leif-Hancox-Li
- Node.js v22.11.0 (LTS) for the runtime
- npm (comes with Node.js) for package management
- Python 3.8+ for the face detection service
- pip3 for Python package management
- ~50MB disk space for face detection models and reference images
- 4+ CPU cores recommended for face detection
- ~1GB RAM for models and processing
Clone the repo and install Node.js dependencies:
git clone <your-repo-url>
cd bsky-face-labeller
npm installThe face detection service requires Python packages. Install them with:
cd python-service
pip3 install -r requirements.txt --break-system-packages
cd ..Note: On some systems, you may need to use --break-system-packages flag. For production deployments, consider using a virtual environment:
cd python-service
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
deactivate
cd ..If using a virtual environment, update scripts/start-services.sh to use python-service/venv/bin/python as the interpreter.
Run the Skyware labeler setup to convert an existing Bluesky account into a labeler:
npx @skyware/labeler setupYou can exit after converting the account; there's no need to add the labels with the wizard. We'll do that from code.
Copy the .env.example file to .env and fill in your values:
cp .env.example .envEdit .env with your labeler credentials:
DID=did:plc:xxx
SIGNING_KEY=xxx
BSKY_IDENTIFIER=xxx
BSKY_PASSWORD=xxx
HOST=127.0.0.1
PORT=4100
METRICS_PORT=4101
FIREHOSE_URL=wss://jetstream1.us-east.bsky.network/subscribe
CURSOR_UPDATE_INTERVAL=10000
# Face detection configuration
FACE_CONFIDENCE_THRESHOLD=0.6
MAX_IMAGE_PROCESSING_TIME=10000
MAX_QUEUE_SIZE=100
PROCESS_ALL_POSTS=falseImportant: Set PROCESS_ALL_POSTS=false initially to avoid overwhelming your server. You can enable it later after testing.
Download the required face-api.js models (~12MB):
npm run download-modelsThis will download models to the models/ directory.
Add 5-10 clear photos of Trump's face to the reference-faces/trumpface/ directory:
# Example: Download images to reference-faces/trumpface/
# Name them 001.jpg, 002.jpg, 003.jpg, etc.See reference-faces/README.md for detailed guidelines on image quality and requirements.
You need to provide these images yourself. Search for high-quality, well-lit photos of Trump's face from different angles.
Run the label setup script to publish your label definitions to Bluesky:
npm run set-labelsThis creates the "trump" label in the Bluesky labeler system.
A cursor.txt file containing the time in microseconds needs to be present. It will be created automatically on first run.
The server connects to Jetstream, which provides a WebSocket endpoint that emits ATProto events in JSON. There are many public instances available:
| Hostname | Region |
|---|---|
jetstream1.us-east.bsky.network |
US-East |
jetstream2.us-east.bsky.network |
US-East |
jetstream1.us-west.bsky.network |
US-West |
jetstream2.us-west.bsky.network |
US-West |
The server needs to be reachable outside your local network using the URL you provided during the account setup (typically using a reverse proxy such as Caddy):
labeler.example.com {
reverse_proxy 127.0.0.1:4100
}Metrics are exposed on the defined METRICS_PORT for Prometheus. This dashboard can be used to visualize the metrics in Grafana.
The labeler consists of two services that must both be running:
Start the Python face detection service first:
cd python-service
python3 face_service.pyIn a separate terminal, start the Node.js labeler:
npm run startUse the provided scripts to manage both services:
# Start both services via PM2
./scripts/start-services.sh
# Check status
./scripts/status.sh
# View logs
pm2 logs
# Restart services
./scripts/restart-services.sh
# Stop services
./scripts/stop-services.shYou should see logs indicating:
- Python service loading reference faces
- Node.js service connecting to Python service
- Reference faces being loaded
- Connection to Jetstream
- "Face detection initialization complete"
You can check that the labeler is reachable by checking the /xrpc/com.atproto.label.queryLabels endpoint of your labeler's server. A new, empty labeler returns {"cursor":"0","labels":[]}.
With PROCESS_ALL_POSTS=false (default), the labeler won't process any posts yet. To test:
- Temporarily set
PROCESS_ALL_POSTS=truein your.env - Restart the labeler
- Post an image containing Trump to Bluesky (or wait for someone else to)
- Watch the logs for face detection results
- Check if the label was applied
Warning: Setting PROCESS_ALL_POSTS=true will process ALL posts with images on the firehose, which can be 200-600 posts/second during peak times. Only enable this if your server can handle it, or add filtering logic first.
Once Trump detection is working, you can add more people:
-
Create a new directory in
reference-faces/:mkdir reference-faces/biden
-
Add 5-10 clear photos of that person to the directory (name them
001.jpg,002.jpg, etc.) -
Add the label to
src/constants.ts:{ rkey: '', identifier: 'biden', locales: [ { lang: 'en', name: 'Joe Biden', description: 'This post contains an image of Joe Biden', }, ], }
-
Publish the new label:
npm run set-labels
-
Restart the labeler to load the new reference faces
FACE_CONFIDENCE_THRESHOLD(default: 0.6) - Minimum confidence for face match (0.0-1.0). Higher = more strict, fewer false positives.MAX_IMAGE_PROCESSING_TIME(default: 10000) - Maximum time in ms to process a single image before timeout.MAX_QUEUE_SIZE(default: 100) - Maximum number of posts in processing queue. Posts are dropped when queue is full.QUEUE_CONCURRENCY(default: 2) - Number of images to process in parallel. Set to 1 for stability on low-memory systems (recommended for ≤2GB RAM).PROCESS_ALL_POSTS(default: false) - Whether to process all posts with images. Set tofalseto use follower-based filtering instead.MIN_FOLLOWER_COUNT(default: 1000) - WhenPROCESS_ALL_POSTS=false, only process posts from accounts with at least this many followers. Set to 0 to process all posts. Higher values reduce server load by focusing on popular accounts.MAX_FACES_TO_PROCESS(default: 50) - Skip images with more faces than this limit to prevent memory issues with crowd photos.CACHE_MAX_AGE_DAYS(default: 30) - Evict cache entries not seen in this many days.CACHE_CLEANUP_INTERVAL(default: 86400000) - How often to run cache cleanup in milliseconds (default: 24 hours).HEARTBEAT_INTERVAL(default: 60000) - How often to check for connection health in milliseconds.HEARTBEAT_TIMEOUT(default: 300000) - Restart if no events received for this many milliseconds (default: 5 minutes).
If you're getting too many false positives:
- Increase
FACE_CONFIDENCE_THRESHOLDto 0.7 or 0.8 - Add more varied reference images
If you're missing correct detections:
- Decrease
FACE_CONFIDENCE_THRESHOLDto 0.5 - Add more reference images with similar angles/lighting
If processing is too slow:
- Reduce
MAX_IMAGE_PROCESSING_TIME - Add more CPU cores
- Process only specific posts (add filtering logic)
Metrics are available at http://localhost:4101/metrics:
posts_processed_total- Total posts processedfaces_detected_total- Total faces detected (by person)image_processing_duration_seconds- Processing time histogramprocessing_queue_size- Current queue sizeprocessing_errors_total- Error countsimage_cache_hits_total- Number of cache hits (duplicate images)image_cache_misses_total- Number of cache misses (new images)image_cache_size- Total entries in cache database
Use PM2 or grep to search logs for specific events:
View live logs:
pm2 logs labeler
pm2 logs python-serviceFind posts that were successfully labeled:
pm2 logs labeler --nostream | grep "Labeled post"Example output:
[2026-01-21 17:30:15] INFO: Labeled post at://did:plc:xyz/app.bsky.feed.post/abc123 with: trump
Find all face detections:
pm2 logs labeler --nostream | grep "Found.*face"Example output:
[2026-01-21 17:30:14] INFO: Found 1 face(s) in image 1 (245ms): trump
Find cache hits (previously processed images):
pm2 logs labeler --nostream | grep "Cache hit"Get the post URL from a labeled post:
When you see a post URI like at://did:plc:xyz/app.bsky.feed.post/abc123, you can view it in your browser:
https://bsky.app/profile/did:plc:xyz/post/abc123
Replace the DID and post rkey (the part after /post/) with the values from your log.
Filter logs by date:
pm2 logs labeler --nostream --lines 1000 | grep "2026-01-21" | grep "Labeled post"Save logs to file for analysis:
pm2 logs labeler --nostream --lines 10000 > labeler-logs.txt- Check reference images are high quality and faces are clearly visible
- Review logs for face detection errors
- Lower confidence threshold
- Increase confidence threshold
- Add more reference images for better accuracy
- Check reference images don't include other people
- Set
PROCESS_ALL_POSTS=false - Add filtering logic to process fewer posts
- Reduce
MAX_QUEUE_SIZE
- Reduce
MAX_QUEUE_SIZE - Add more RAM
- Process fewer posts at once
- alice, creator of the Zodiac Sign Labels
- Juliet, author of the Pronouns labeler, whose code my labelers were originally based on
- futur, creator of the skyware libraries which make it easier to build things for Bluesky