Bluesky Face Detection Labeler

This is a Bluesky labeler that automatically detects faces of public figures in post images and applies labels to those posts.

Example: When someone posts an image containing Donald Trump, the labeler automatically detects his face and applies the "trump" label to that post. Users who subscribe to this labeler will see these labels on posts in their feeds.

Starting scope: Currently configured to detect Trump as a proof of concept. Easy to expand to other public figures.

This project requires familiarity with TypeScript, the command line, and Linux.

Support This Project

If you find this project helpful, consider supporting development:

Venmo: @Leif-Hancox-Li

Prerequisites

Node.js v22.11.0 (LTS) for the runtime
npm (comes with Node.js) for package management
Python 3.8+ for the face detection service
pip3 for Python package management
~50MB disk space for face detection models and reference images
4+ CPU cores recommended for face detection
~1GB RAM for models and processing

Setup

1. Initial Setup

Clone the repo and install Node.js dependencies:

git clone <your-repo-url>
cd bsky-face-labeller
npm install

2. Install Python Dependencies

The face detection service requires Python packages. Install them with:

cd python-service
pip3 install -r requirements.txt --break-system-packages
cd ..

Note: On some systems, you may need to use --break-system-packages flag. For production deployments, consider using a virtual environment:

cd python-service
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
deactivate
cd ..

If using a virtual environment, update scripts/start-services.sh to use python-service/venv/bin/python as the interpreter.

3. Setup Labeler Account

Run the Skyware labeler setup to convert an existing Bluesky account into a labeler:

npx @skyware/labeler setup

You can exit after converting the account; there's no need to add the labels with the wizard. We'll do that from code.

4. Configure Environment

Copy the .env.example file to .env and fill in your values:

cp .env.example .env

Edit .env with your labeler credentials:

DID=did:plc:xxx
SIGNING_KEY=xxx
BSKY_IDENTIFIER=xxx
BSKY_PASSWORD=xxx
HOST=127.0.0.1
PORT=4100
METRICS_PORT=4101
FIREHOSE_URL=wss://jetstream1.us-east.bsky.network/subscribe
CURSOR_UPDATE_INTERVAL=10000

# Face detection configuration
FACE_CONFIDENCE_THRESHOLD=0.6
MAX_IMAGE_PROCESSING_TIME=10000
MAX_QUEUE_SIZE=100
PROCESS_ALL_POSTS=false

Important: Set PROCESS_ALL_POSTS=false initially to avoid overwhelming your server. You can enable it later after testing.

5. Download Face Detection Models

Download the required face-api.js models (~12MB):

npm run download-models

This will download models to the models/ directory.

6. Add Reference Face Images

Add 5-10 clear photos of Trump's face to the reference-faces/trumpface/ directory:

# Example: Download images to reference-faces/trumpface/
# Name them 001.jpg, 002.jpg, 003.jpg, etc.

See reference-faces/README.md for detailed guidelines on image quality and requirements.

You need to provide these images yourself. Search for high-quality, well-lit photos of Trump's face from different angles.

7. Publish Labels

Run the label setup script to publish your label definitions to Bluesky:

npm run set-labels

This creates the "trump" label in the Bluesky labeler system.

8. Create Cursor File

A cursor.txt file containing the time in microseconds needs to be present. It will be created automatically on first run.

The server connects to Jetstream, which provides a WebSocket endpoint that emits ATProto events in JSON. There are many public instances available:

Hostname	Region
`jetstream1.us-east.bsky.network`	US-East
`jetstream2.us-east.bsky.network`	US-East
`jetstream1.us-west.bsky.network`	US-West
`jetstream2.us-west.bsky.network`	US-West

The server needs to be reachable outside your local network using the URL you provided during the account setup (typically using a reverse proxy such as Caddy):

labeler.example.com {
	reverse_proxy 127.0.0.1:4100
}

Metrics are exposed on the defined METRICS_PORT for Prometheus. This dashboard can be used to visualize the metrics in Grafana.

Running the Labeler

The labeler consists of two services that must both be running:

Development (Local)

Start the Python face detection service first:

cd python-service
python3 face_service.py

In a separate terminal, start the Node.js labeler:

npm run start

Production (VPS)

Use the provided scripts to manage both services:

# Start both services via PM2
./scripts/start-services.sh

# Check status
./scripts/status.sh

# View logs
pm2 logs

# Restart services
./scripts/restart-services.sh

# Stop services
./scripts/stop-services.sh

You should see logs indicating:

Python service loading reference faces
Node.js service connecting to Python service
Reference faces being loaded
Connection to Jetstream
"Face detection initialization complete"

You can check that the labeler is reachable by checking the /xrpc/com.atproto.label.queryLabels endpoint of your labeler's server. A new, empty labeler returns {"cursor":"0","labels":[]}.

Testing

With PROCESS_ALL_POSTS=false (default), the labeler won't process any posts yet. To test:

Temporarily set PROCESS_ALL_POSTS=true in your .env
Restart the labeler
Post an image containing Trump to Bluesky (or wait for someone else to)
Watch the logs for face detection results
Check if the label was applied

Warning: Setting PROCESS_ALL_POSTS=true will process ALL posts with images on the firehose, which can be 200-600 posts/second during peak times. Only enable this if your server can handle it, or add filtering logic first.

Adding More Public Figures

Once Trump detection is working, you can add more people:

Create a new directory in reference-faces/:
```
mkdir reference-faces/biden
```
Add 5-10 clear photos of that person to the directory (name them 001.jpg, 002.jpg, etc.)

Add the label to src/constants.ts:

{
  rkey: '',
  identifier: 'biden',
  locales: [
    {
      lang: 'en',
      name: 'Joe Biden',
      description: 'This post contains an image of Joe Biden',
    },
  ],
}

Publish the new label:
```
npm run set-labels
```
Restart the labeler to load the new reference faces

Configuration

Environment Variables

FACE_CONFIDENCE_THRESHOLD (default: 0.6) - Minimum confidence for face match (0.0-1.0). Higher = more strict, fewer false positives.
MAX_IMAGE_PROCESSING_TIME (default: 10000) - Maximum time in ms to process a single image before timeout.
MAX_QUEUE_SIZE (default: 100) - Maximum number of posts in processing queue. Posts are dropped when queue is full.
QUEUE_CONCURRENCY (default: 2) - Number of images to process in parallel. Set to 1 for stability on low-memory systems (recommended for ≤2GB RAM).
PROCESS_ALL_POSTS (default: false) - Whether to process all posts with images. Set to false to use follower-based filtering instead.
MIN_FOLLOWER_COUNT (default: 1000) - When PROCESS_ALL_POSTS=false, only process posts from accounts with at least this many followers. Set to 0 to process all posts. Higher values reduce server load by focusing on popular accounts.
MAX_FACES_TO_PROCESS (default: 50) - Skip images with more faces than this limit to prevent memory issues with crowd photos.
CACHE_MAX_AGE_DAYS (default: 30) - Evict cache entries not seen in this many days.
CACHE_CLEANUP_INTERVAL (default: 86400000) - How often to run cache cleanup in milliseconds (default: 24 hours).
HEARTBEAT_INTERVAL (default: 60000) - How often to check for connection health in milliseconds.
HEARTBEAT_TIMEOUT (default: 300000) - Restart if no events received for this many milliseconds (default: 5 minutes).

Performance Tuning

If you're getting too many false positives:

Increase FACE_CONFIDENCE_THRESHOLD to 0.7 or 0.8
Add more varied reference images

If you're missing correct detections:

Decrease FACE_CONFIDENCE_THRESHOLD to 0.5
Add more reference images with similar angles/lighting

If processing is too slow:

Reduce MAX_IMAGE_PROCESSING_TIME
Add more CPU cores
Process only specific posts (add filtering logic)

Monitoring

Metrics are available at http://localhost:4101/metrics:

posts_processed_total - Total posts processed
faces_detected_total - Total faces detected (by person)
image_processing_duration_seconds - Processing time histogram
processing_queue_size - Current queue size
processing_errors_total - Error counts
image_cache_hits_total - Number of cache hits (duplicate images)
image_cache_misses_total - Number of cache misses (new images)
image_cache_size - Total entries in cache database

Querying Logs

Use PM2 or grep to search logs for specific events:

View live logs:

pm2 logs labeler
pm2 logs python-service

Find posts that were successfully labeled:

pm2 logs labeler --nostream | grep "Labeled post"

Example output:

[2026-01-21 17:30:15] INFO: Labeled post at://did:plc:xyz/app.bsky.feed.post/abc123 with: trump

Find all face detections:

pm2 logs labeler --nostream | grep "Found.*face"

Example output:

[2026-01-21 17:30:14] INFO: Found 1 face(s) in image 1 (245ms): trump

Find cache hits (previously processed images):

pm2 logs labeler --nostream | grep "Cache hit"

Get the post URL from a labeled post:

When you see a post URI like at://did:plc:xyz/app.bsky.feed.post/abc123, you can view it in your browser:

https://bsky.app/profile/did:plc:xyz/post/abc123

Replace the DID and post rkey (the part after /post/) with the values from your log.

Filter logs by date:

pm2 logs labeler --nostream --lines 1000 | grep "2026-01-21" | grep "Labeled post"

Save logs to file for analysis:

pm2 logs labeler --nostream --lines 10000 > labeler-logs.txt

Troubleshooting

No faces detected

Check reference images are high quality and faces are clearly visible
Review logs for face detection errors
Lower confidence threshold

False positives

Increase confidence threshold
Add more reference images for better accuracy
Check reference images don't include other people

High CPU usage

Set PROCESS_ALL_POSTS=false
Add filtering logic to process fewer posts
Reduce MAX_QUEUE_SIZE

Out of memory

Reduce MAX_QUEUE_SIZE
Add more RAM
Process fewer posts at once

Credits

alice, creator of the Zodiac Sign Labels
Juliet, author of the Pronouns labeler, whose code my labelers were originally based on
futur, creator of the skyware libraries which make it easier to build things for Bluesky

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github		.github
.husky		.husky
python-service		python-service
reference-faces		reference-faces
scripts		scripts
src		src
.editorconfig		.editorconfig
.env.example		.env.example
.gitignore		.gitignore
.nvmrc		.nvmrc
.prettierignore		.prettierignore
.prettierrc		.prettierrc
CLAUDE.md		CLAUDE.md
DEPLOYMENT.md		DEPLOYMENT.md
LICENSE		LICENSE
README.md		README.md
eslint.config.mjs		eslint.config.mjs
package-lock.json		package-lock.json
package.json		package.json
test-phash-cache.ts		test-phash-cache.ts
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bluesky Face Detection Labeler

Support This Project

Prerequisites

Setup

1. Initial Setup

2. Install Python Dependencies

3. Setup Labeler Account

4. Configure Environment

5. Download Face Detection Models

6. Add Reference Face Images

7. Publish Labels

8. Create Cursor File

Running the Labeler

Development (Local)

Production (VPS)

Testing

Adding More Public Figures

Configuration

Environment Variables

Performance Tuning

Monitoring

Querying Logs

Troubleshooting

No faces detected

False positives

High CPU usage

Out of memory

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Bluesky Face Detection Labeler

Support This Project

Prerequisites

Setup

1. Initial Setup

2. Install Python Dependencies

3. Setup Labeler Account

4. Configure Environment

5. Download Face Detection Models

6. Add Reference Face Images

7. Publish Labels

8. Create Cursor File

Running the Labeler

Development (Local)

Production (VPS)

Testing

Adding More Public Figures

Configuration

Environment Variables

Performance Tuning

Monitoring

Querying Logs

Troubleshooting

No faces detected

False positives

High CPU usage

Out of memory

Credits

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages