-
Notifications
You must be signed in to change notification settings - Fork 282
Add query fallback when Electric disconnects #3402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add query fallback when Electric disconnects #3402
Conversation
This commit implements a fallback mode that allows Electric to serve shape data even when logical replication is not available. When the replication client is not ready, shape requests will query the database directly (similar to initial snapshots) and return the data to clients. Key changes: Server-side: - StatusMonitor: Track replication availability in status response - Api: Detect fallback mode and serve data via direct DB queries - Request/Response: Add fallback_mode field to track request state - Response headers: Add 'electric-fallback-mode' header to indicate fallback polling mode to clients Client-side: - Add FALLBACK_MODE_HEADER constant for detecting fallback responses Implementation details: - When replication_client_ready is false, requests enter fallback mode - Fallback requests use Shapes.query_subset to query DB directly - Data is formatted as insert operations in the shape log format - Responses include up_to_date=true and fallback_mode=true - Clients receive 'electric-fallback-mode: true' header This allows clients to continue receiving data during replication failures and provides a foundation for polling-based fallback mechanisms with configurable intervals. Related to implementing status monitoring and graceful degradation when Electric cannot connect to logical replication.
This commit adds a status endpoint and client-side support for
detecting and handling fallback mode with automatic recovery.
Server-side changes:
- StatusPlug: New /v1/status endpoint returning server status
- Cache-Control: 5-second caching for CDN efficiency
- Returns: status (live/fallback/starting), replication_available,
connection state, and shape subsystem state
Client-side changes (TypeScript):
- Fallback mode detection: Reads electric-fallback-mode header
- Auto status polling: Polls /v1/status every 60 seconds when in
fallback mode
- Auto-recovery: Automatically switches back to live mode when
server replication is restored
- Cleanup: Stops polling on unsubscribe or reset
Client behavior:
- When fallback mode detected via header, starts status polling
- Status endpoint polled every 60s (configurable)
- When server returns to "live" status, triggers shape refresh
- Reconnects to live replication automatically
- CDN caches status responses for 5s to minimize server load
This provides a complete fallback solution:
1. Server detects replication unavailable → returns fallback data
2. Client detects fallback header → starts polling status
3. Server replication restored → status endpoint reflects change
4. Client polls status → detects live mode → auto-reconnects
5. Seamless transition back to real-time replication
Example usage:
```typescript
const stream = new ShapeStream({
url: 'http://localhost:3000/v1/shape',
params: { table: 'items' }
})
// Automatically handles fallback mode and recovery
stream.subscribe(messages => {
// Receives data in both live and fallback modes
console.log(messages)
})
```
CDN caching (5s cache-control) will handle the load, so clients can poll more frequently for faster recovery when replication is restored.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3402 +/- ##
==========================================
+ Coverage 69.70% 73.94% +4.23%
==========================================
Files 181 21 -160
Lines 9826 756 -9070
Branches 352 0 -352
==========================================
- Hits 6849 559 -6290
+ Misses 2975 197 -2778
+ Partials 2 0 -2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Add comprehensive RFC document explaining the query fallback mode feature: - Problem statement and motivation - Architecture and component design - Detailed implementation specifics - Data flow diagrams - Trade-offs and alternatives considered - Testing and deployment strategy - Performance impact analysis - Future work and extensibility The RFC serves as both documentation and design rationale for the feature, covering server-side (Elixir) and client-side (TypeScript) components.
|
Mixed feelings about this one. Obviously it's nice to continue serving data, but wouldn't it provide inconsistent experience? How does it work to retrieve snapshots but not have live updates? E.g. you get the first page, no more live updates for it but then you get a second page that is more recent in time. I understand we do progressive snapshots, but I presume we don't want to rely on that. Even if we do, it means we're putting more and more load on the database with fallback query for each page. |
|
I feel that the PG replication is as reliable as a PG read replica - with the exception that we currently don't do read only mode yet so as to serve data while the replication stream is inactive. If we knew a replication slot exists and where we last left it, we could also create new shapes (i.e. just the snapshots) without replication active since we know that we can "resume" them once we resume replication, and that brings us closer to a read replica behaviour as well. In short I think we should aim to provide what someone would expect from a read replica, but in change stream format. |
Ooo yes! That's a great point. |
No description provided.