Skip to content

boltzmann-brain/bsky-face-labeller

Repository files navigation

Bluesky Face Detection Labeler

This is a Bluesky labeler that automatically detects faces of public figures in post images and applies labels to those posts.

Example: When someone posts an image containing Donald Trump, the labeler automatically detects his face and applies the "trump" label to that post. Users who subscribe to this labeler will see these labels on posts in their feeds.

Starting scope: Currently configured to detect Trump as a proof of concept. Easy to expand to other public figures.

This project requires familiarity with TypeScript, the command line, and Linux.

Support This Project

If you find this project helpful, consider supporting development:

Venmo: @Leif-Hancox-Li

Prerequisites

  • Node.js v22.11.0 (LTS) for the runtime
  • npm (comes with Node.js) for package management
  • Python 3.8+ for the face detection service
  • pip3 for Python package management
  • ~50MB disk space for face detection models and reference images
  • 4+ CPU cores recommended for face detection
  • ~1GB RAM for models and processing

Setup

1. Initial Setup

Clone the repo and install Node.js dependencies:

git clone <your-repo-url>
cd bsky-face-labeller
npm install

2. Install Python Dependencies

The face detection service requires Python packages. Install them with:

cd python-service
pip3 install -r requirements.txt --break-system-packages
cd ..

Note: On some systems, you may need to use --break-system-packages flag. For production deployments, consider using a virtual environment:

cd python-service
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
deactivate
cd ..

If using a virtual environment, update scripts/start-services.sh to use python-service/venv/bin/python as the interpreter.

3. Setup Labeler Account

Run the Skyware labeler setup to convert an existing Bluesky account into a labeler:

npx @skyware/labeler setup

You can exit after converting the account; there's no need to add the labels with the wizard. We'll do that from code.

4. Configure Environment

Copy the .env.example file to .env and fill in your values:

cp .env.example .env

Edit .env with your labeler credentials:

DID=did:plc:xxx
SIGNING_KEY=xxx
BSKY_IDENTIFIER=xxx
BSKY_PASSWORD=xxx
HOST=127.0.0.1
PORT=4100
METRICS_PORT=4101
FIREHOSE_URL=wss://jetstream1.us-east.bsky.network/subscribe
CURSOR_UPDATE_INTERVAL=10000

# Face detection configuration
FACE_CONFIDENCE_THRESHOLD=0.6
MAX_IMAGE_PROCESSING_TIME=10000
MAX_QUEUE_SIZE=100
PROCESS_ALL_POSTS=false

Important: Set PROCESS_ALL_POSTS=false initially to avoid overwhelming your server. You can enable it later after testing.

5. Download Face Detection Models

Download the required face-api.js models (~12MB):

npm run download-models

This will download models to the models/ directory.

6. Add Reference Face Images

Add 5-10 clear photos of Trump's face to the reference-faces/trumpface/ directory:

# Example: Download images to reference-faces/trumpface/
# Name them 001.jpg, 002.jpg, 003.jpg, etc.

See reference-faces/README.md for detailed guidelines on image quality and requirements.

You need to provide these images yourself. Search for high-quality, well-lit photos of Trump's face from different angles.

7. Publish Labels

Run the label setup script to publish your label definitions to Bluesky:

npm run set-labels

This creates the "trump" label in the Bluesky labeler system.

8. Create Cursor File

A cursor.txt file containing the time in microseconds needs to be present. It will be created automatically on first run.

The server connects to Jetstream, which provides a WebSocket endpoint that emits ATProto events in JSON. There are many public instances available:

Hostname Region
jetstream1.us-east.bsky.network US-East
jetstream2.us-east.bsky.network US-East
jetstream1.us-west.bsky.network US-West
jetstream2.us-west.bsky.network US-West

The server needs to be reachable outside your local network using the URL you provided during the account setup (typically using a reverse proxy such as Caddy):

labeler.example.com {
	reverse_proxy 127.0.0.1:4100
}

Metrics are exposed on the defined METRICS_PORT for Prometheus. This dashboard can be used to visualize the metrics in Grafana.

Running the Labeler

The labeler consists of two services that must both be running:

Development (Local)

Start the Python face detection service first:

cd python-service
python3 face_service.py

In a separate terminal, start the Node.js labeler:

npm run start

Production (VPS)

Use the provided scripts to manage both services:

# Start both services via PM2
./scripts/start-services.sh

# Check status
./scripts/status.sh

# View logs
pm2 logs

# Restart services
./scripts/restart-services.sh

# Stop services
./scripts/stop-services.sh

You should see logs indicating:

  1. Python service loading reference faces
  2. Node.js service connecting to Python service
  3. Reference faces being loaded
  4. Connection to Jetstream
  5. "Face detection initialization complete"

You can check that the labeler is reachable by checking the /xrpc/com.atproto.label.queryLabels endpoint of your labeler's server. A new, empty labeler returns {"cursor":"0","labels":[]}.

Testing

With PROCESS_ALL_POSTS=false (default), the labeler won't process any posts yet. To test:

  1. Temporarily set PROCESS_ALL_POSTS=true in your .env
  2. Restart the labeler
  3. Post an image containing Trump to Bluesky (or wait for someone else to)
  4. Watch the logs for face detection results
  5. Check if the label was applied

Warning: Setting PROCESS_ALL_POSTS=true will process ALL posts with images on the firehose, which can be 200-600 posts/second during peak times. Only enable this if your server can handle it, or add filtering logic first.

Adding More Public Figures

Once Trump detection is working, you can add more people:

  1. Create a new directory in reference-faces/:

    mkdir reference-faces/biden
  2. Add 5-10 clear photos of that person to the directory (name them 001.jpg, 002.jpg, etc.)

  3. Add the label to src/constants.ts:

    {
      rkey: '',
      identifier: 'biden',
      locales: [
        {
          lang: 'en',
          name: 'Joe Biden',
          description: 'This post contains an image of Joe Biden',
        },
      ],
    }
  4. Publish the new label:

    npm run set-labels
  5. Restart the labeler to load the new reference faces

Configuration

Environment Variables

  • FACE_CONFIDENCE_THRESHOLD (default: 0.6) - Minimum confidence for face match (0.0-1.0). Higher = more strict, fewer false positives.
  • MAX_IMAGE_PROCESSING_TIME (default: 10000) - Maximum time in ms to process a single image before timeout.
  • MAX_QUEUE_SIZE (default: 100) - Maximum number of posts in processing queue. Posts are dropped when queue is full.
  • QUEUE_CONCURRENCY (default: 2) - Number of images to process in parallel. Set to 1 for stability on low-memory systems (recommended for ≤2GB RAM).
  • PROCESS_ALL_POSTS (default: false) - Whether to process all posts with images. Set to false to use follower-based filtering instead.
  • MIN_FOLLOWER_COUNT (default: 1000) - When PROCESS_ALL_POSTS=false, only process posts from accounts with at least this many followers. Set to 0 to process all posts. Higher values reduce server load by focusing on popular accounts.
  • MAX_FACES_TO_PROCESS (default: 50) - Skip images with more faces than this limit to prevent memory issues with crowd photos.
  • CACHE_MAX_AGE_DAYS (default: 30) - Evict cache entries not seen in this many days.
  • CACHE_CLEANUP_INTERVAL (default: 86400000) - How often to run cache cleanup in milliseconds (default: 24 hours).
  • HEARTBEAT_INTERVAL (default: 60000) - How often to check for connection health in milliseconds.
  • HEARTBEAT_TIMEOUT (default: 300000) - Restart if no events received for this many milliseconds (default: 5 minutes).

Performance Tuning

If you're getting too many false positives:

  • Increase FACE_CONFIDENCE_THRESHOLD to 0.7 or 0.8
  • Add more varied reference images

If you're missing correct detections:

  • Decrease FACE_CONFIDENCE_THRESHOLD to 0.5
  • Add more reference images with similar angles/lighting

If processing is too slow:

  • Reduce MAX_IMAGE_PROCESSING_TIME
  • Add more CPU cores
  • Process only specific posts (add filtering logic)

Monitoring

Metrics are available at http://localhost:4101/metrics:

  • posts_processed_total - Total posts processed
  • faces_detected_total - Total faces detected (by person)
  • image_processing_duration_seconds - Processing time histogram
  • processing_queue_size - Current queue size
  • processing_errors_total - Error counts
  • image_cache_hits_total - Number of cache hits (duplicate images)
  • image_cache_misses_total - Number of cache misses (new images)
  • image_cache_size - Total entries in cache database

Querying Logs

Use PM2 or grep to search logs for specific events:

View live logs:

pm2 logs labeler
pm2 logs python-service

Find posts that were successfully labeled:

pm2 logs labeler --nostream | grep "Labeled post"

Example output:

[2026-01-21 17:30:15] INFO: Labeled post at://did:plc:xyz/app.bsky.feed.post/abc123 with: trump

Find all face detections:

pm2 logs labeler --nostream | grep "Found.*face"

Example output:

[2026-01-21 17:30:14] INFO: Found 1 face(s) in image 1 (245ms): trump

Find cache hits (previously processed images):

pm2 logs labeler --nostream | grep "Cache hit"

Get the post URL from a labeled post:

When you see a post URI like at://did:plc:xyz/app.bsky.feed.post/abc123, you can view it in your browser:

https://bsky.app/profile/did:plc:xyz/post/abc123

Replace the DID and post rkey (the part after /post/) with the values from your log.

Filter logs by date:

pm2 logs labeler --nostream --lines 1000 | grep "2026-01-21" | grep "Labeled post"

Save logs to file for analysis:

pm2 logs labeler --nostream --lines 10000 > labeler-logs.txt

Troubleshooting

No faces detected

  • Check reference images are high quality and faces are clearly visible
  • Review logs for face detection errors
  • Lower confidence threshold

False positives

  • Increase confidence threshold
  • Add more reference images for better accuracy
  • Check reference images don't include other people

High CPU usage

  • Set PROCESS_ALL_POSTS=false
  • Add filtering logic to process fewer posts
  • Reduce MAX_QUEUE_SIZE

Out of memory

  • Reduce MAX_QUEUE_SIZE
  • Add more RAM
  • Process fewer posts at once

Credits

About

BlueSky labeller to detect images that contain the faces of certain people

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors