Skip to content

Commit 74f1f0b

Browse files
committed
Merge branch 'main' into ianwoodard/infreng-211-shard-stream-tracking
2 parents 3aa2835 + bee0240 commit 74f1f0b

File tree

5 files changed

+270
-38
lines changed

5 files changed

+270
-38
lines changed

README.md

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -29,14 +29,14 @@ style C fill:#e1f5fe
2929

3030
### Stream Management
3131

32-
Streams are automatically tracked and cleaned up after inactivity (configurable via `CLEANUP_STREAM_IDLE_SEC`, default 300s).
32+
Streams are automatically tracked and cleaned up after inactivity (configurable via `CLEANUP_STREAM_IDLE_SEC`, default 120s).
3333
This prevents memory leaks from:
3434

3535
- Crashed or disconnected publishers
3636
- Streams that never reach Phase::End
3737
- Network failures during publishing
3838

39-
A cleanup worker runs periodically (configurable via `CLEANUP_WORKER_INTERVAL_SEC`, default 300s), deleting streams that haven't received any publishes within the inactivity threshold. Active streams (receiving regular publishes) are kept alive indefinitely, supporting long-running or continuous streaming use cases.
39+
A cleanup worker runs periodically (configurable via `CLEANUP_WORKER_INTERVAL_SEC`, default 120s), deleting streams that haven't received any publishes within the inactivity threshold. Active streams (receiving regular publishes) are kept alive indefinitely, supporting long-running or continuous streaming use cases.
4040

4141
**Note:** While streams themselves are unbounded in duration, client connections (on the Gateway service) may have separate timeout limits. This allows clients to reconnect to ongoing streams as needed.
4242

@@ -76,12 +76,20 @@ The `payload` field contains your application data and is typically used with DE
7676

7777
**Size Limits:**
7878

79-
- Maximum message size: 32KB (entire PublishRequest protobuf)
79+
- Maximum message size: 16KB (entire PublishRequest protobuf)
8080
- Messages exceeding this limit receive a 413 Payload Too Large response
8181
- For larger data, split into multiple DELTA messages
8282

8383
The payload must be a valid JSON-like structure in the form of a `google/protobuf/struct.proto`.
8484

85+
**Rate Limits:**
86+
87+
- Maximum 20 requests per second per channel
88+
- Exceeding returns `429 Too Many Requests` with `Retry-After` header
89+
- High-frequency publishers should batch messages
90+
91+
**Retention:** Streams hold up to approximately 1200 messages. At maximum publish rate, this provides ~60 seconds of history to consumers to catch up after brief disconnections.
92+
8593
#### Example: Streaming "hello, world!"
8694

8795
```http

services/publish/README.md

Lines changed: 22 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ Each publish updates a stream's activity timestamp in Redis. This allows automat
88

99
### Cleanup Worker
1010

11-
A background worker runs periodically (configurable via `CLEANUP_WORKER_INTERVAL_SEC`, default 300s) and deletes streams with no activity for a configurable duration (via `CLEANUP_STREAM_IDLE_SEC`, default 300s).
11+
A background worker runs periodically (configurable via `CLEANUP_WORKER_INTERVAL_SEC`, default 120s) and deletes streams with no activity for a configurable duration (via `CLEANUP_STREAM_IDLE_SEC`, default 120s).
1212

1313
### Phase::End Behavior
1414

@@ -22,18 +22,34 @@ This prevents memory leaks from crashed clients or incomplete streams while allo
2222

2323
## API Limits
2424

25+
### Rate Limiting
26+
27+
Publishers are limited to 20 requests per second using a fixed-window counter in Redis.
28+
29+
- Key: `rate_limit:channel:{org_id}:{channel_id}`
30+
- Window: 1 second
31+
- Enforced via a Lua script for atomicity
32+
33+
Rate limiting runs before tracking/publishing to avoid wasted work. The `Retry-After` header tells clients when to retry.
34+
35+
Rate limiting fails closed intentionally since if Redis is having issues with rate limits, it likely isn't going to be able to handle the streams.
36+
37+
**Relationship to stream size:**
38+
39+
At 20/sec with 1200 message streams, consumers have ~60 seconds to recover from disconnections.
40+
2541
### Message Size
2642

27-
Publish requests are limited to 32KB (configurable via `MAX_MESSAGE_SIZE_BYTES`).
43+
Publish requests are limited to 16KB.
2844
Requests exceeding this limit are rejected with `413 Payload Too Large`.
2945

30-
Combined with the stream length limit of 500 messages, this bounds maximum stream size to approximately 16MB per stream.
46+
Combined with the stream length limit of 1200 messages, this bounds maximum stream size to approximately 19.2MB per stream.
3147

3248
Publishers handling large data should chunk it into multiple DELTA messages within the START/DELTA/END streaming pattern.
3349

3450
### Stream Length
3551

36-
Streams are automatically trimmed to approximately 500 messages (configurable via `MAX_STREAM_LEN`). Older messages are removed as new ones arrive.
52+
Streams are automatically trimmed to approximately 1200 messages. Older messages are removed as new ones arrive.
3753

3854
## Design Decisions
3955

@@ -63,3 +79,5 @@ Both `CLEANUP_WORKER_INTERVAL_SEC` and `CLEANUP_STREAM_IDLE_SEC` can be tuned in
6379
### Known Edge Cases
6480

6581
- **Sorted sets bloat**: If untrack operations consistently fail, the sorted sets accumulate entries for already deleted streams. The worker will attempt to delete non-existent streams (harmless) but the sorted sets grow. If this becomes a problem, we can add a periodic SCAN to remove ghost entries.
82+
83+
- **Rate Limit Leak**: If EXPIRE fails after INCR succeeds, the rate limit key persists without a TTL. This is unlikely but not impossible.

0 commit comments

Comments
 (0)