Skip to content

[Feature][Core] Support Traffic Dyeing (Sampling) and Context-Aware Metrics #10305

@corgy-w

Description

@corgy-w

Search before asking

  • I had searched in the feature and found no similar feature requirement.

Description

Currently, SeaTunnel Metrics are generally global. We cannot distinguish metrics for a specific subset of data (e.g., sampled traffic, heartbeat packets) without aggressively modifying the connector/transform code to add tags manually.

This proposal aims to introduce a System-Level Traffic Dyeing (Sampling) mechanism. It allows:

  1. Marking specific SeaTunnelRows as "Sampled" or "Dyed" at the Source.
  2. Propagating this "Color" context implicitly throughout the execution engine (Source -> Transform -> Sink).
  3. Automatically routing metrics to different counters (e.g., sink_write_count vs sink_write_count_sampled) based on the context, without changing existing Connector code.

Proposed Solution / Architecture:

  1. Data Protocol: Add a long flags field to SeaTunnelRow (Core Data Structure). This uses a bitmask to carry system signals (e.g., IS_SAMPLED, IS_HEARTBEAT) with minimal serialization overhead compared to the existing options Map.
  2. Context Propagation: Introduce MetricTraceContext (based on ThreadLocal) in seatunnel-engine.
    • When SeaTunnelSourceCollector receives a row, it checks the flags and sets the MetricTraceContext.
    • The context is cleared after the row is processed to ensure safety.
    • For async operations, the context must be captured and replayed.
  3. Metrics Integration: Enhance MetricsContext (or AbstractMetricsContext) to support "Context-Aware" or "Routing" metrics.
    • The Metric object (e.g., Counter) acts as a proxy.
    • It checks MetricTraceContext on every update (inc()) and routes the value to the appropriate underlying counter (e.g., standard vs. sampled).

Usage Scenario

  1. Sampling Observation: In high-throughput scenarios (e.g., 1M QPS), calculating precise latency/success-rate for every record is expensive. Users can configure the Source to "sample" 1% of traffic (set the flag). The Metrics system will then automatically track metrics for this 1% separately, providing health visibility with low overhead.
  2. Heartbeat Monitoring: Distinguish "Heartbeat" rows (synthetic data for keep-alive) from actual business data in metrics.
  3. Trace/Debug: "Dye" specific rows to trace their flow and performance through the pipeline without mixing them with normal traffic stats.

Related issues

None.

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions