Skip to content

Feature Request: OpenSearch Sink index_type: tsdb for TSDB Index Support #6644

@srikanthpadakanti

Description

@srikanthpadakanti

Feature Request: OpenSearch Sink index_type: tsdb for TSDB Index Support

Is your feature request related to a problem?

With the Prometheus Remote Write source (#6533, PR #6627), Data Prepper will be able to ingest Prometheus metrics as Metric events. But there's no way to write these to OpenSearch TSDB indices. The existing index_type options like metric-analytics produce OTel-format documents. TSDB expects a completely different structure: {labels, timestamp, value} with space-separated label strings.

Anyone wanting to use the OpenSearch TSDB plugin for metrics storage and M3QL queries has no path through Data Prepper today.

Proposed solution

A new index_type: tsdb in the OpenSearch sink. The sink converts Metric events to TSDB document format directly and no processor in between.

prometheus-pipeline:
  source:
    prometheus:
      port: 9090
      path: "/api/v1/write"
  sink:
    - opensearch:
        hosts: ["https://opensearch:9200"]
        index_type: tsdb
        index: metrics

This keeps the pipeline simple and also lets us do multi-sink --> send to both Prometheus and TSDB in one pipeline:

prometheus-pipeline:
  source:
    prometheus:
      port: 9090
      path: "/api/v1/write"
  sink:
    - prometheus:
        url: "https://aps-workspaces.us-west-2.amazonaws.com/workspaces/ws-xxx"
    - opensearch:
        hosts: ["https://opensearch:9200"]
        index_type: tsdb
        index: metrics

TSDB Document Format

TSDB expects three fields per document:

{
  "labels": "__name__ http_requests_total method POST handler /api/items status 200",
  "timestamp": 1633072800000,
  "value": 1.1
}

How metric types map to TSDB documents

The sink takes standard Data Prepper Metric events (JacksonGauge, JacksonSum, JacksonHistogram, JacksonSummary) and converts them. TSDB stores one value per document, so complex types need expansion.

Gauge --> 1 document:

Input:  Gauge{name="cpu_temp", value=72.5, attributes={host: "server-01"}}
Output: {"labels":"__name__ cpu_temp host server-01","timestamp":1706869800000,"value":72.5}

Sum (Counter) --> 1 document:
For monotonic sums, the sink appends _total to the metric name if not already present (Prometheus convention for counters in TSDB).

Input:  Sum{name="http_requests", value=42100, isMonotonic=true, attributes={method: "GET"}}
Output: {"labels":"__name__ http_requests_total method GET","timestamp":1706869800000,"value":42100.0}

Histogram --> N+2 documents:
The sink treats bucketCountsList as per-bucket counts and converts to cumulative — which is what TSDB and M3QL expect (Prometheus convention). Each bucket becomes a separate document with a le label, plus _count and _sum documents.

Input:  Histogram{name="request_duration", bucketCountsList=[5, 7, 13, 5],
        explicitBounds=[0.1, 0.5, 1.0], count=30, sum=15.5, attributes={method: "GET"}}

Output (6 documents):
  {"labels":"__name__ request_duration_bucket le 0.1 method GET","timestamp":...,"value":5.0}
  {"labels":"__name__ request_duration_bucket le 0.5 method GET","timestamp":...,"value":12.0}
  {"labels":"__name__ request_duration_bucket le 1.0 method GET","timestamp":...,"value":25.0}
  {"labels":"__name__ request_duration_bucket le +Inf method GET","timestamp":...,"value":30.0}
  {"labels":"__name__ request_duration_count method GET","timestamp":...,"value":30.0}
  {"labels":"__name__ request_duration_sum method GET","timestamp":...,"value":15.5}

Summary --> N+2 documents:
Each quantile becomes a separate document with a quantile label, plus _count and _sum documents.

Input:  Summary{name="rpc_latency", quantiles=[{0.5: 0.2}, {0.99: 0.8}],
        count=1000, sum=300.5, attributes={service: "api"}}

Output (4 documents):
  {"labels":"__name__ rpc_latency quantile 0.5 service api","timestamp":...,"value":0.2}
  {"labels":"__name__ rpc_latency quantile 0.99 service api","timestamp":...,"value":0.8}
  {"labels":"__name__ rpc_latency_count service api","timestamp":...,"value":1000.0}
  {"labels":"__name__ rpc_latency_sum service api","timestamp":...,"value":300.5}

Implementation approach

Following the same pattern as metric-analytics, log-analytics, trace-analytics-raw:

  1. IndexType.java --> add TSDB("tsdb") enum value
  2. IndexConstants.java --> template filename + default alias metrics-tsdb-v1
  3. IndexConfiguration.java --> TSDB branch in readIndexTemplate()
  4. IndexManagerFactory.java --> case TSDB: with NoIsmPolicyManagement since TSDB manages its own lifecycle (head chunks → blocks, no ISM rollover needed)
  5. OpenSearchSink.java --> TSDB branch in doOutput() using a new TSDBDocumentBuilder
  6. TSDBDocumentBuilder.java --> new class: Metric → TSDB document conversion with histogram/summary expansion
  7. tsdb-index-template.json --> index template with TSDB mappings

Index template

{
  "version": 1,
  "mappings": {
    "properties": {
      "series_ref": {"type": "long", "doc_values": false},
      "labels": {"type": "keyword", "ignore_above": 4096},
      "value": {"type": "double", "doc_values": false},
      "timestamp": {"type": "date", "format": "epoch_millis"},
      "timestamp_range": {"type": "long_range"}
    }
  },
  "settings": {
    "index.translog.durability": "async",
    "index.translog.sync_interval": "1s",
    "refresh_interval": "1s"
  }
}

Notes on deviations from the TSDB README example:

  • value uses double instead of float --> Prometheus uses float64 internally, and OpenSearch float (32-bit) would silently lose precision on large counter values.
  • labels has ignore_above: 4096 --> guards against indexing failures if a metric has many labels. Typical Prometheus label strings are ~200 chars.
  • TSDB-engine settings (tsdb_engine.enabled, tsdb_store, labels.storage_type) are intentionally not in the template and those are cluster-level settings the admin configures when installing the TSDB plugin. Keeping only standard OpenSearch mappings means the template works on any OpenSearch version.

Things worth noting

  • No processor needed --> histogram/summary expansion happens at the sink level in doOutput(), so the pipeline stays simple (source --> sink) and multi-sink works.
  • No routing --> TSDB's TSDBAutoRoutingActionFilter handles shard placement from labels automatically. The sink sends documents without routing.
  • No ISM --> TSDB manages lifecycle through head chunks and blocks. ISM rollover would conflict with that.
  • Label sanitization --> spaces in label values get replaced with underscores since space is TSDB's delimiter.
  • Histogram bucket assumption --> the sink treats bucketCountsList as per-bucket (delta) counts and converts to cumulative via running sum. This matches what the prometheus source (Feature Request: Prometheus Remote Write v1 Source for OpenSearch TSDB #6533) produces. If Metric events from other sources already have cumulative bucket counts, the conversion would need to be aware of that.

Compatibility

  • The TSDB plugin (built against OpenSearch 3.5.0-SNAPSHOT per its build.gradle) requires OpenSearch 3.5.0+. But our index template uses standard mappings only, so it works on any OpenSearch version. TSDB features become available when the plugin is installed.
  • All changes would be behind indexType == IndexType.TSDB. Existing index types are not affected.
  • The source (Feature Request: Prometheus Remote Write v1 Source for OpenSearch TSDB #6533) emits Data Prepper Metric events (JacksonGauge, JacksonSum, JacksonHistogram, JacksonSummary). The TSDB sink converts these to TSDB's {labels, timestamp, value} format and no processor needed between source and sink.

Related content

Metadata

Metadata

Labels

enhancementNew feature or requestplugin - sinkA plugin to write data to a destination.

Type

No type

Projects

Status

Unplanned

Relationships

None yet

Development

No branches or pull requests

Issue actions