Skip to content

Feature Request: Prometheus Remote Write v1 Source for OpenSearch TSDB #6533

@srikanthpadakanti

Description

@srikanthpadakanti

Feature Request: Prometheus Remote Write v1 Source for Data Prepper

Is your feature request related to a problem?

Currently, there is no native way to ingest metrics from Prometheus servers directly into OpenSearch. Organizations using Prometheus for metrics collection must build and maintain custom exporters to convert Prometheus Remote Write format, which is time-consuming and error-prone.

What solution would you like?

Implement a Prometheus Remote Write v1.0 source plugin for Data Prepper that:

  • Receives Prometheus Remote Write requests on a configurable HTTP endpoint (default: http://data-prepper:9090/api/v1/write)
  • Parses the protocol (Snappy decompression + Protocol Buffer parsing)
  • Converts to multiple output formats (TSDB, OTEL, OpenSearch, Prometheus)
  • Supports OTLP collectors via OpenTelemetry Collector's prometheusremotewrite exporter

Configuration:

Prometheus side (2-line change):

# prometheus.yml
remote_write:
  - url: "http://data-prepper:9090/api/v1/write"

Data Prepper pipeline:

prometheus-pipeline:
    source:
      prometheus:
        port: 9090
        path: "/api/v1/write"
        # Source emits Metric events (Data Prepper internal model)

    sink:
      - opensearch:
          hosts: ["https://opensearch:9200"]
          index_type: tsdb  # Sink converts Metric → TSDB format
          index: metrics

Format Specifications

TSDB Format

OpenSearch TSDB expects documents with space-separated labels format:

Format:

{
  "labels": "__name__ http_requests_total method POST handler /api/items status 200",
  "timestamp": 1633072800000,
  "value": 1.1
}

Specification:
- https://github.com/opensearch-project/time-series-db
- Labels are space-separated key-value pairs: key1 value1 key2 value2
- See https://github.com/opensearch-project/time-series-db#index-some-metrics

Sink Configuration:
sink:
  - opensearch:
      hosts: ["https://opensearch:9200"]
      index_type: tsdb
      index: metrics

OTEL Format

OpenTelemetry Metrics format following the OTEL specification:

Format:
{
  "kind": "gauge",
  "name": "http_requests",
  "value": 100,
  "attributes": {"method": "GET"},
  "time": "2024-02-12T15:30:00.000Z"
}

Specification:
- https://opentelemetry.io/docs/specs/otel/metrics/data-model/
- Data Prepper's internal Metric interface aligns with OTEL model

Sink Configuration:
sink:
  - opensearch:
      hosts: ["https://opensearch:9200"]
      index_type: otel_metrics  # If supported
      index: metrics

OpenSearch Format

Standard OpenSearch document format with conventional field names:

Format:
{
  "@timestamp": "2024-02-12T15:30:00Z",
  "metric_name": "http_requests",
  "metric_value": 100,
  "labels": {"method": "GET"}
}

Specification:
- Standard OpenSearch document structure
- Compatible with existing dashboards and queries

Sink Configuration:
sink:
  - opensearch:
      hosts: ["https://opensearch:9200"]
      index: metrics  # Regular index

Prometheus Format

Preserve original Prometheus Remote Write structure for compatibility:

Format:
{
  "name": "http_requests",
  "labels": {
    "__name__": "http_requests",
    "method": "GET"
  },
  "value": 100,
  "timestamp": 1707523200000
}

Specification:
- https://prometheus.io/docs/specs/remote_write_spec/
- Used for debugging or forwarding to other Prometheus systems

Sink Configuration:
sink:
  - prometheus:
      url: "https://prometheus:9200"  # Amazon Managed Prometheus

Examples

Basic Example

Input: Prometheus Remote Write

TimeSeries {
  labels: [{name: "__name__", value: "http_requests"}, {name: "method", value: "GET"}]
  samples: [{value: 100, timestamp: 1707523200000}]
}

Output (TSDB format):

{"labels": "__name__ http_requests method GET", "timestamp": 1707523200000, "value": 100}

Output (OTEL format):

{"kind": "gauge", "name": "http_requests", "value": 100, "attributes": {"method": "GET"}}

Output (OpenSearch format):

{"@timestamp": "2024-02-12T15:30:00Z", "metric_name": "http_requests", "metric_value": 100, "labels": {"method": "GET"}}

Output (Prometheus format):

{"name": "http_requests", "labels": {"__name__": "http_requests", "method": "GET"}, "value": 100, "timestamp": 1707523200000}
  ## Multi-Sink Example

  Send to both Prometheus and OpenSearch TSDB simultaneously:

  ```yaml
  prometheus-pipeline:
    source:
      prometheus:
        port: 9090
        path: "/api/v1/write"
    sink:
      - prometheus:
          url: "https://aps-workspaces.us-west-2.amazonaws.com/workspaces/ws-xxx"
      - opensearch:
          hosts: ["https://opensearch:9200"]
          index_type: tsdb
          index: metrics

Metadata

Metadata

Labels

enhancementNew feature or requestplugin - sourceA plugin to receive data from a service or location.

Type

No type

Projects

Status

Done

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions