Source Coordination in Data Prepper

**Is your feature request related to a problem? Please describe.**
Data Prepper has many push based sources, such as `http`, `otel_trace_source`, etc. Distributing data between multiple instances of data prepper can easily be solved with a load balancer.

However, pull based sources of Data Prepper do not have a Data Prepper internal way to coordinate which work is done between different instances of Data Prepper in a multi-node scenario. For example, pulling data from something like an OpenSearch cluster with 5 nodes of Data Prepper would result in all 5 nodes pulling the entirety of the data and processing it 5 times total.

**Describe the solution you'd like**
A core data prepper solution for pull based sources to distribute data between multiple instances of data prepper, and a way to track the progress of the data that is pulled to skip processing of duplicate data.

This solution could use a distributed store to coordinate and track progress of the data. The store could be pluggable and configured in the `data-prepper-config.yaml`. The store type could range from Remote/Local File DB, Apache Zookeeper, MySQL, DynamoDB, and more.

**Describe alternatives you've considered (Optional)**
A clear and concise description of any alternative solutions or features you've considered.

**Additional context**


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Source Coordination in Data Prepper #2412

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Source Coordination in Data Prepper #2412

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions