Context
I maintain getbased, a blood work dashboard that uses Evolu for cross-device sync. I self-host the relay at sync.getbased.health using the official Docker image. I've been running it in production for a few days and filed #660 about logging and silent failures after a DB wipe.
Since then, I built a standalone wrapper project (getbased-relay) that wraps @evolu/nodejs with fixes for several operational issues I hit. The project is MIT-licensed and I'm happy to contribute any of these improvements upstream. Sharing them here as structured feedback.
Issues and solutions
1. 1MB default quota is too small for real-world use
The official relay's isOwnerWithinQuota uses maxBytes = 1024 * 1024 (1MB). A single user with 3 profiles and chat data exceeds this quickly because CRDT ops accumulate over time. When the quota is exceeded, the relay returns ProtocolQuotaError, but the client caches it silently — sync just stops with no user feedback.
The 1MB default is hardcoded in apps/relay/src/index.ts with no way to change it without rebuilding the image.
Suggestion: Make the quota configurable via environment variable (e.g., QUOTA_PER_OWNER_MB, defaulting to something more generous like 10MB). Our wrapper does this via src/lib/config.ts — all settings come from env vars with sane defaults.
2. Logging is all-or-nothing
The relay has two modes: enableLogging: false (zero visibility — only the startup line) or enableLogging: true (dumps raw SQL queries via createRelayLogger, extremely noisy). There's no middle ground for operations.
Connection events, subscribe/unsubscribe, errors, broadcasts — all invisible unless you enable the SQL firehose. When sync breaks, there's nothing to diagnose (as described in #660).
Suggestion: The relay logger already emits structured events (connection, close, subscribe, broadcast, storage errors, etc.) via createRelayLogger. The issue is that all of these are gated behind a single enableLogging boolean. A leveled approach would help — e.g., connection lifecycle at info, message details at debug, errors always. Our wrapper implements this with a custom Console that intercepts relay logger calls and emits structured JSON at configurable levels (LOG_LEVEL=info|debug|warn|error). The key trick: we lock console.enabled = true via Object.defineProperty so all events reach our filter, then apply levels ourselves (src/lib/logger.ts).
3. No health check endpoint
The relay's HTTP server has zero request handlers — it only handles WebSocket upgrades. Health check probes (including the Docker image's own HEALTHCHECK, which just does a TCP connect) can't distinguish between "relay is running" and "relay is healthy and accepting sync."
Our client-side checkRelayConnection() tries to open a WebSocket to /ping, which causes WS_ERR_EXPECTED_MASK errors in the relay logs because there's no handler for it.
Suggestion: A simple /health HTTP endpoint on the relay port (or a separate admin port) returning {"status":"ok","uptime":...}. Our wrapper runs a separate admin HTTP server on a configurable port with /health (unauthenticated, for uptime monitors) and /metrics (token-gated). See src/lib/admin-server.ts.
4. No usage metrics
The evolu_usage table tracks per-owner storedBytes, but there's no way to query it without direct DB access. For relay operators, basic questions are unanswerable: how many owners use the relay? How much storage does each use? What's the total DB size? Is the relay approaching disk limits?
Suggestion: Expose read-only metrics via an authenticated endpoint. Our wrapper opens the relay DB in readonly mode via better-sqlite3 and serves per-owner usage, owner count, total stored bytes, and DB file size via /metrics (src/lib/metrics.ts).
5. isOwnerAllowed side-effect: rejects all non-ownerId connections
Providing isOwnerAllowed (even one that always returns true) activates parseOwnerIdFromOwnerWebSocketTransportUrl in the upgrade handler. This rejects every WebSocket connection that doesn't have a valid ownerId in the URL path — including health check probes and monitoring tools — with a 400 Bad Request. The side effect of rejecting all non-owner connections is surprising when all you want is activity tracking.
Suggestion: Allow owner tracking without requiring ownerId in the URL. Our wrapper tracks owners via subscribe events emitted by the relay logger instead of the isOwnerAllowed hook (src/lib/owner-tracker.ts).
6. No global disk quota
isOwnerWithinQuota handles per-owner limits, but nothing prevents the total relay storage from filling the disk. If a relay serves many owners, each within their individual quota, the aggregate can still exhaust disk space.
Suggestion: A global quota check alongside the per-owner check. Our wrapper checks total stored bytes against a configurable QUOTA_GLOBAL_MB in the same isOwnerWithinQuota callback (src/lib/quota.ts).
7. subscribeSyncState / getSyncState commented out
In the Evolu client, both subscribeSyncState and getSyncState are commented out with "TODO: Update it for the owner-api". This means clients can't check whether they're actually syncing. I had to add a 30-second polling safety net because subscribeQuery doesn't reliably fire for remote changes in all cases. Having sync state observability would let clients show meaningful UI (syncing/synced/error/disconnected) instead of guessing.
Not filing this as a separate issue since it's marked as a TODO — just flagging it as something that would significantly help client-side UX.
8. Docker file ownership (minor)
The official Dockerfile creates data dirs as the evolu user (UID 1001). If you mount a host volume with different ownership, the relay can't write and fails with SQLITE_READONLY — but there's no error message, sync just silently stops.
What could be upstreamed
Good candidates for upstream PRs:
- Configurable quota via env vars (trivial change to
apps/relay/src/index.ts)
- Leveled logging (or at minimum, a
LOG_LEVEL env var that filters existing relay logger events)
/health endpoint on the relay server
- Re-enabling
subscribeSyncState / getSyncState
Pattern worth documenting:
- Separate admin server with metrics endpoint
- Owner activity tracking via sidecar file
- Global disk quota
- DB startup integrity checks
Reference implementation
All of the above is implemented in getbased-relay (~800 lines of TypeScript across 8 modules). It's a standalone TypeScript project that npm installs @evolu/nodejs and wraps createNodeJsRelay — no fork of the Evolu monorepo needed.
Happy to open PRs for any of the upstream-friendly items if that would be useful. Thanks for building Evolu — the local-first CRDT layer is excellent, these are just operational rough edges from running it in production.
Context
I maintain getbased, a blood work dashboard that uses Evolu for cross-device sync. I self-host the relay at sync.getbased.health using the official Docker image. I've been running it in production for a few days and filed #660 about logging and silent failures after a DB wipe.
Since then, I built a standalone wrapper project (getbased-relay) that wraps
@evolu/nodejswith fixes for several operational issues I hit. The project is MIT-licensed and I'm happy to contribute any of these improvements upstream. Sharing them here as structured feedback.Issues and solutions
1. 1MB default quota is too small for real-world use
The official relay's
isOwnerWithinQuotausesmaxBytes = 1024 * 1024(1MB). A single user with 3 profiles and chat data exceeds this quickly because CRDT ops accumulate over time. When the quota is exceeded, the relay returnsProtocolQuotaError, but the client caches it silently — sync just stops with no user feedback.The 1MB default is hardcoded in
apps/relay/src/index.tswith no way to change it without rebuilding the image.Suggestion: Make the quota configurable via environment variable (e.g.,
QUOTA_PER_OWNER_MB, defaulting to something more generous like 10MB). Our wrapper does this viasrc/lib/config.ts— all settings come from env vars with sane defaults.2. Logging is all-or-nothing
The relay has two modes:
enableLogging: false(zero visibility — only the startup line) orenableLogging: true(dumps raw SQL queries viacreateRelayLogger, extremely noisy). There's no middle ground for operations.Connection events, subscribe/unsubscribe, errors, broadcasts — all invisible unless you enable the SQL firehose. When sync breaks, there's nothing to diagnose (as described in #660).
Suggestion: The relay logger already emits structured events (
connection,close,subscribe,broadcast,storageerrors, etc.) viacreateRelayLogger. The issue is that all of these are gated behind a singleenableLoggingboolean. A leveled approach would help — e.g., connection lifecycle atinfo, message details atdebug, errors always. Our wrapper implements this with a customConsolethat intercepts relay logger calls and emits structured JSON at configurable levels (LOG_LEVEL=info|debug|warn|error). The key trick: we lockconsole.enabled = trueviaObject.definePropertyso all events reach our filter, then apply levels ourselves (src/lib/logger.ts).3. No health check endpoint
The relay's HTTP server has zero request handlers — it only handles WebSocket upgrades. Health check probes (including the Docker image's own
HEALTHCHECK, which just does a TCP connect) can't distinguish between "relay is running" and "relay is healthy and accepting sync."Our client-side
checkRelayConnection()tries to open a WebSocket to/ping, which causesWS_ERR_EXPECTED_MASKerrors in the relay logs because there's no handler for it.Suggestion: A simple
/healthHTTP endpoint on the relay port (or a separate admin port) returning{"status":"ok","uptime":...}. Our wrapper runs a separate admin HTTP server on a configurable port with/health(unauthenticated, for uptime monitors) and/metrics(token-gated). Seesrc/lib/admin-server.ts.4. No usage metrics
The
evolu_usagetable tracks per-ownerstoredBytes, but there's no way to query it without direct DB access. For relay operators, basic questions are unanswerable: how many owners use the relay? How much storage does each use? What's the total DB size? Is the relay approaching disk limits?Suggestion: Expose read-only metrics via an authenticated endpoint. Our wrapper opens the relay DB in
readonlymode viabetter-sqlite3and serves per-owner usage, owner count, total stored bytes, and DB file size via/metrics(src/lib/metrics.ts).5.
isOwnerAllowedside-effect: rejects all non-ownerId connectionsProviding
isOwnerAllowed(even one that always returnstrue) activatesparseOwnerIdFromOwnerWebSocketTransportUrlin theupgradehandler. This rejects every WebSocket connection that doesn't have a valid ownerId in the URL path — including health check probes and monitoring tools — with a400 Bad Request. The side effect of rejecting all non-owner connections is surprising when all you want is activity tracking.Suggestion: Allow owner tracking without requiring ownerId in the URL. Our wrapper tracks owners via
subscribeevents emitted by the relay logger instead of theisOwnerAllowedhook (src/lib/owner-tracker.ts).6. No global disk quota
isOwnerWithinQuotahandles per-owner limits, but nothing prevents the total relay storage from filling the disk. If a relay serves many owners, each within their individual quota, the aggregate can still exhaust disk space.Suggestion: A global quota check alongside the per-owner check. Our wrapper checks total stored bytes against a configurable
QUOTA_GLOBAL_MBin the sameisOwnerWithinQuotacallback (src/lib/quota.ts).7.
subscribeSyncState/getSyncStatecommented outIn the Evolu client, both
subscribeSyncStateandgetSyncStateare commented out with"TODO: Update it for the owner-api". This means clients can't check whether they're actually syncing. I had to add a 30-second polling safety net becausesubscribeQuerydoesn't reliably fire for remote changes in all cases. Having sync state observability would let clients show meaningful UI (syncing/synced/error/disconnected) instead of guessing.Not filing this as a separate issue since it's marked as a TODO — just flagging it as something that would significantly help client-side UX.
8. Docker file ownership (minor)
The official Dockerfile creates data dirs as the
evoluuser (UID 1001). If you mount a host volume with different ownership, the relay can't write and fails withSQLITE_READONLY— but there's no error message, sync just silently stops.What could be upstreamed
Good candidates for upstream PRs:
apps/relay/src/index.ts)LOG_LEVELenv var that filters existing relay logger events)/healthendpoint on the relay serversubscribeSyncState/getSyncStatePattern worth documenting:
Reference implementation
All of the above is implemented in getbased-relay (~800 lines of TypeScript across 8 modules). It's a standalone TypeScript project that
npm installs@evolu/nodejsand wrapscreateNodeJsRelay— no fork of the Evolu monorepo needed.Happy to open PRs for any of the upstream-friendly items if that would be useful. Thanks for building Evolu — the local-first CRDT layer is excellent, these are just operational rough edges from running it in production.