Feature Request: OpenSearch Sink index_type: tsdb for TSDB Index Support

# Feature Request: OpenSearch Sink `index_type: tsdb` for TSDB Index Support

## Is your feature request related to a problem?

With the Prometheus Remote Write source (#6533, PR #6627), Data Prepper will be able to ingest Prometheus metrics as Metric events. But there's no way to write these to OpenSearch TSDB indices. The existing `index_type` options like `metric-analytics` produce OTel-format documents. TSDB expects a completely different structure: `{labels, timestamp, value}` with space-separated label strings.

Anyone wanting to use the [OpenSearch TSDB plugin](https://github.com/opensearch-project/time-series-db) for metrics storage and M3QL queries has no path through Data Prepper today.

## Proposed solution

A new `index_type: tsdb` in the OpenSearch sink. The sink converts Metric events to TSDB document format directly and no processor in between.

```yaml
prometheus-pipeline:
  source:
    prometheus:
      port: 9090
      path: "/api/v1/write"
  sink:
    - opensearch:
        hosts: ["https://opensearch:9200"]
        index_type: tsdb
        index: metrics
```

This keeps the pipeline simple and also lets us do multi-sink --> send to both Prometheus and TSDB in one pipeline:

```yaml
prometheus-pipeline:
  source:
    prometheus:
      port: 9090
      path: "/api/v1/write"
  sink:
    - prometheus:
        url: "https://aps-workspaces.us-west-2.amazonaws.com/workspaces/ws-xxx"
    - opensearch:
        hosts: ["https://opensearch:9200"]
        index_type: tsdb
        index: metrics
```

### TSDB Document Format

TSDB expects three fields per document:

```json
{
  "labels": "__name__ http_requests_total method POST handler /api/items status 200",
  "timestamp": 1633072800000,
  "value": 1.1
}
```

- `labels` — space-separated key-value pairs, sorted by key. `__name__` is the metric name.
- `timestamp` — epoch milliseconds.
- `value` — single numeric value. One value per document.
- Spec: https://github.com/opensearch-project/time-series-db

### How metric types map to TSDB documents

The sink takes standard Data Prepper Metric events (JacksonGauge, JacksonSum, JacksonHistogram, JacksonSummary) and converts them. TSDB stores one value per document, so complex types need expansion.

**Gauge --> 1 document:**
```
Input:  Gauge{name="cpu_temp", value=72.5, attributes={host: "server-01"}}
Output: {"labels":"__name__ cpu_temp host server-01","timestamp":1706869800000,"value":72.5}
```

**Sum (Counter) --> 1 document:**
For monotonic sums, the sink appends `_total` to the metric name if not already present (Prometheus convention for counters in TSDB).
```
Input:  Sum{name="http_requests", value=42100, isMonotonic=true, attributes={method: "GET"}}
Output: {"labels":"__name__ http_requests_total method GET","timestamp":1706869800000,"value":42100.0}
```

**Histogram --> N+2 documents:**
The sink treats `bucketCountsList` as per-bucket counts and converts to cumulative — which is what TSDB and M3QL expect (Prometheus convention). Each bucket becomes a separate document with a `le` label, plus `_count` and `_sum` documents.
```
Input:  Histogram{name="request_duration", bucketCountsList=[5, 7, 13, 5],
        explicitBounds=[0.1, 0.5, 1.0], count=30, sum=15.5, attributes={method: "GET"}}

Output (6 documents):
  {"labels":"__name__ request_duration_bucket le 0.1 method GET","timestamp":...,"value":5.0}
  {"labels":"__name__ request_duration_bucket le 0.5 method GET","timestamp":...,"value":12.0}
  {"labels":"__name__ request_duration_bucket le 1.0 method GET","timestamp":...,"value":25.0}
  {"labels":"__name__ request_duration_bucket le +Inf method GET","timestamp":...,"value":30.0}
  {"labels":"__name__ request_duration_count method GET","timestamp":...,"value":30.0}
  {"labels":"__name__ request_duration_sum method GET","timestamp":...,"value":15.5}
```

**Summary --> N+2 documents:**
Each quantile becomes a separate document with a `quantile` label, plus `_count` and `_sum` documents.
```
Input:  Summary{name="rpc_latency", quantiles=[{0.5: 0.2}, {0.99: 0.8}],
        count=1000, sum=300.5, attributes={service: "api"}}

Output (4 documents):
  {"labels":"__name__ rpc_latency quantile 0.5 service api","timestamp":...,"value":0.2}
  {"labels":"__name__ rpc_latency quantile 0.99 service api","timestamp":...,"value":0.8}
  {"labels":"__name__ rpc_latency_count service api","timestamp":...,"value":1000.0}
  {"labels":"__name__ rpc_latency_sum service api","timestamp":...,"value":300.5}
```

### Implementation approach

Following the same pattern as `metric-analytics`, `log-analytics`, `trace-analytics-raw`:

1. **`IndexType.java`** --> add `TSDB("tsdb")` enum value
2. **`IndexConstants.java`** --> template filename + default alias `metrics-tsdb-v1`
3. **`IndexConfiguration.java`** --> TSDB branch in `readIndexTemplate()`
4. **`IndexManagerFactory.java`** --> `case TSDB:` with `NoIsmPolicyManagement` since TSDB manages its own lifecycle (head chunks → blocks, no ISM rollover needed)
5. **`OpenSearchSink.java`** --> TSDB branch in `doOutput()` using a new `TSDBDocumentBuilder`
6. **`TSDBDocumentBuilder.java`** --> new class: Metric → TSDB document conversion with histogram/summary expansion
7. **`tsdb-index-template.json`** --> index template with TSDB mappings

### Index template

```json
{
  "version": 1,
  "mappings": {
    "properties": {
      "series_ref": {"type": "long", "doc_values": false},
      "labels": {"type": "keyword", "ignore_above": 4096},
      "value": {"type": "double", "doc_values": false},
      "timestamp": {"type": "date", "format": "epoch_millis"},
      "timestamp_range": {"type": "long_range"}
    }
  },
  "settings": {
    "index.translog.durability": "async",
    "index.translog.sync_interval": "1s",
    "refresh_interval": "1s"
  }
}
```

Notes on deviations from the [TSDB README example](https://github.com/opensearch-project/time-series-db#create-a-tsdb-engine-index):
- `value` uses `double` instead of `float` --> Prometheus uses float64 internally, and OpenSearch `float` (32-bit) would silently lose precision on large counter values.
- `labels` has `ignore_above: 4096` --> guards against indexing failures if a metric has many labels. Typical Prometheus label strings are ~200 chars.
- TSDB-engine settings (`tsdb_engine.enabled`, `tsdb_store`, `labels.storage_type`) are intentionally not in the template and those are cluster-level settings the admin configures when installing the [TSDB plugin](https://github.com/opensearch-project/time-series-db). Keeping only standard OpenSearch mappings means the template works on any OpenSearch version.

### Things worth noting

- **No processor needed** --> histogram/summary expansion happens at the sink level in `doOutput()`, so the pipeline stays simple (source --> sink) and multi-sink works.
- **No routing** --> TSDB's `TSDBAutoRoutingActionFilter` handles shard placement from labels automatically. The sink sends documents without routing.
- **No ISM** --> TSDB manages lifecycle through head chunks and blocks. ISM rollover would conflict with that.
- **Label sanitization** --> spaces in label values get replaced with underscores since space is TSDB's delimiter.
- **Histogram bucket assumption** --> the sink treats `bucketCountsList` as per-bucket (delta) counts and converts to cumulative via running sum. This matches what the prometheus source (#6533) produces. If Metric events from other sources already have cumulative bucket counts, the conversion would need to be aware of that.

### Compatibility

- The TSDB plugin (built against OpenSearch `3.5.0-SNAPSHOT` per its [build.gradle](https://github.com/opensearch-project/time-series-db/blob/main/build.gradle)) requires OpenSearch 3.5.0+. But our index template uses standard mappings only, so it works on any OpenSearch version. TSDB features become available when the plugin is installed.
- All changes would be behind `indexType == IndexType.TSDB`. Existing index types are not affected.
- The source (#6533) emits Data Prepper Metric events (JacksonGauge, JacksonSum, JacksonHistogram, JacksonSummary). The TSDB sink converts these to TSDB's `{labels, timestamp, value}` format and no processor needed between source and sink.

## Related content

- Prometheus Remote Write source: #6533
- Source PR: #6627
- TSDB plugin: https://github.com/opensearch-project/time-series-db
- TSDB label format: https://github.com/opensearch-project/time-series-db#index-some-metrics
- Prometheus Remote Write spec: https://prometheus.io/docs/specs/remote_write_spec/


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: OpenSearch Sink index_type: tsdb for TSDB Index Support #6644

Feature Request: OpenSearch Sink `index_type: tsdb` for TSDB Index Support

Is your feature request related to a problem?

Proposed solution

TSDB Document Format

How metric types map to TSDB documents

Implementation approach

Index template

Things worth noting

Compatibility

Related content

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: OpenSearch Sink index_type: tsdb for TSDB Index Support #6644

Description

Feature Request: OpenSearch Sink index_type: tsdb for TSDB Index Support

Is your feature request related to a problem?

Proposed solution

TSDB Document Format

How metric types map to TSDB documents

Implementation approach

Index template

Things worth noting

Compatibility

Related content

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Feature Request: OpenSearch Sink `index_type: tsdb` for TSDB Index Support