Is your feature request related to a problem? Please describe.
The source coordination store currently has only two implementations: dynamodb for production multi-node environments and in_memory for single-node development use. Running Data Prepper in a multi-node cluster requires Amazon DynamoDB, so there is no cloud-agnostic option for the coordination store backend.
Describe the solution you'd like
A new jdbc source coordination store plugin backed by a relational database. The coordination store's requirements (key-value CRUD with optimistic locking, status-based queries with priority ordering, and TTL for completed partitions) map naturally to standard SQL operations.
| DynamoDB concept |
JDBC equivalent |
| Partition key + sort key |
Composite primary key (source_identifier, source_partition_key) |
Conditional put (attribute_not_exists) |
Conditional insert (e.g. INSERT ... WHERE NOT EXISTS) |
| Conditional update (version check) |
UPDATE ... WHERE version = ? |
| GSI query by status + priority |
SELECT ... WHERE source_status = ? ORDER BY partition_priority with an index |
| TTL (automatic expiration) |
Not included in the initial implementation (see below) |
Configuration example:
source_coordination:
store:
jdbc:
url: "jdbc:postgresql://localhost:5432/dataprepper"
username: "dp_user"
password: "..."
TTL-based cleanup of completed partitions is not included in the initial implementation and will be addressed in a follow-up.
The initial implementation will be tested with PostgreSQL and MySQL.
Describe alternatives you've considered (Optional)
Redis: Offers native TTL support and lightweight operation, but does not guarantee durability by default. The coordination store requires that a successful write is durable, since lost partition state updates could cause duplicate processing or missed partitions. RDBMS provides this guarantee by default.
Additional context
The original source coordination design (#2412) noted that the store should be pluggable and mentioned several potential backends including "Remote/Local File DB, Apache Zookeeper, MySQL, DynamoDB, and more." The SourceCoordinationStore interface is already designed as a plugin, so adding a new implementation requires no changes to the existing framework or DynamoDB plugin.
I am willing to implement this.
Is your feature request related to a problem? Please describe.
The source coordination store currently has only two implementations:
dynamodbfor production multi-node environments andin_memoryfor single-node development use. Running Data Prepper in a multi-node cluster requires Amazon DynamoDB, so there is no cloud-agnostic option for the coordination store backend.Describe the solution you'd like
A new
jdbcsource coordination store plugin backed by a relational database. The coordination store's requirements (key-value CRUD with optimistic locking, status-based queries with priority ordering, and TTL for completed partitions) map naturally to standard SQL operations.source_identifier,source_partition_key)attribute_not_exists)INSERT ... WHERE NOT EXISTS)UPDATE ... WHERE version = ?SELECT ... WHERE source_status = ? ORDER BY partition_prioritywith an indexConfiguration example:
TTL-based cleanup of completed partitions is not included in the initial implementation and will be addressed in a follow-up.
The initial implementation will be tested with PostgreSQL and MySQL.
Describe alternatives you've considered (Optional)
Redis: Offers native TTL support and lightweight operation, but does not guarantee durability by default. The coordination store requires that a successful write is durable, since lost partition state updates could cause duplicate processing or missed partitions. RDBMS provides this guarantee by default.
Additional context
The original source coordination design (#2412) noted that the store should be pluggable and mentioned several potential backends including "Remote/Local File DB, Apache Zookeeper, MySQL, DynamoDB, and more." The
SourceCoordinationStoreinterface is already designed as a plugin, so adding a new implementation requires no changes to the existing framework or DynamoDB plugin.I am willing to implement this.