Skip to content

Add JDBC-based source coordination store #6740

@lawofcycles

Description

@lawofcycles

Is your feature request related to a problem? Please describe.

The source coordination store currently has only two implementations: dynamodb for production multi-node environments and in_memory for single-node development use. Running Data Prepper in a multi-node cluster requires Amazon DynamoDB, so there is no cloud-agnostic option for the coordination store backend.

Describe the solution you'd like

A new jdbc source coordination store plugin backed by a relational database. The coordination store's requirements (key-value CRUD with optimistic locking, status-based queries with priority ordering, and TTL for completed partitions) map naturally to standard SQL operations.

DynamoDB concept JDBC equivalent
Partition key + sort key Composite primary key (source_identifier, source_partition_key)
Conditional put (attribute_not_exists) Conditional insert (e.g. INSERT ... WHERE NOT EXISTS)
Conditional update (version check) UPDATE ... WHERE version = ?
GSI query by status + priority SELECT ... WHERE source_status = ? ORDER BY partition_priority with an index
TTL (automatic expiration) Not included in the initial implementation (see below)

Configuration example:

source_coordination:
 store:
   jdbc:
     url: "jdbc:postgresql://localhost:5432/dataprepper"
     username: "dp_user"
     password: "..."

TTL-based cleanup of completed partitions is not included in the initial implementation and will be addressed in a follow-up.

The initial implementation will be tested with PostgreSQL and MySQL.

Describe alternatives you've considered (Optional)

Redis: Offers native TTL support and lightweight operation, but does not guarantee durability by default. The coordination store requires that a successful write is durable, since lost partition state updates could cause duplicate processing or missed partitions. RDBMS provides this guarantee by default.

Additional context

The original source coordination design (#2412) noted that the store should be pluggable and mentioned several potential backends including "Remote/Local File DB, Apache Zookeeper, MySQL, DynamoDB, and more." The SourceCoordinationStore interface is already designed as a plugin, so adding a new implementation requires no changes to the existing framework or DynamoDB plugin.

I am willing to implement this.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

Status

Unplanned

Relationships

None yet

Development

No branches or pull requests

Issue actions