Skip to content

[Donation Proposal]: OTEL Arrow Adapter #1332

@lquerel

Description

@lquerel

Description

This project extends, in a compatible way, the existing OTLP protocol with a generic columnar representation for metrics, logs, and traces based on the cross-language Apache Arrow transport system for columnar data. This extension significantly improves the efficiency of the OTLP protocol for scenarios involving the transmission of large batches of OTLP entities. Results show a 2-3 times better compression rate for typical data, while Apache Arrow’s zero-copy serialization and deserialization techniques help lower overhead.

The first phase of this project will deliver the following primary components, contained in the donated repository:

  1. Golang-based adapter reference implementation that implements the translation to and from OTLP-Arrow using the OTel Collector’s “pdata” interface.
  2. Protobuf definition for OTLP-Arrow that would be migrated into the opentelemetry-proto repo as it stabilizes. This includes a representation for multi-variate metrics which allows metrics with shared attributes to be compactly represented and processed.
  3. Arrow-enabled OTel Collector Exporter and Receiver components that are drop-in compatible with the OTLP Exporter and Receiver core components.
  4. Arrow-enabled OTel-Go SDK “shim” to the Arrow-enabled Exporter component, which is mainly useful for validation purposes.

In a second phase of the project, we propose developing new OpenTelemetry components and mechanisms for exchanging and processing data with the Apache Arrow ecosystem, including:

  1. Arrow-enabled Collector: an OTel Collector pipeline for processing Arrow batches directly
  2. Support for reading/writing Apache Parquet files, and other methods to leverage non-Golang Apache Arrow ecosystem components (e.g., the Arrow Query engine)
  3. Arrow-enabled OpenTelemetry SDKs (i.e., without a “shim” and adapter).

The repository for donation: https://github.com/f5/otel-arrow-adapter

Collector repo fork containing the drop-in compatible exporter and receiver components under development: https://github.com/open-telemetry/experimental-arrow-collector

More details on the associated OTEP text: https://github.com/lquerel/oteps/blob/main/text/0156-columnar-encoding.md. The OTEP is still pending, unmerged.

Benefits to the OpenTelemetry community

Compared to the existing OpenTelemetry protocol this compatible extension has the following improvements:

  • Reduce the bandwidth requirements of the protocol. The two main levers are: 1) a better representation of the telemetry data based on a columnar representation, 2) a stream-oriented gRPC endpoint that is more efficient to transmit batches of OTLP entities.
  • Provide a more optimal representation for multivariate time-series data. With the current version of the OpenTelemetry protocol, users have to transform multivariate time-series (i.e multiple related metrics sharing the same attributes and timestamp) into a collection of univariate time-series resulting in a large amount of duplication and additional overhead covering the entire chain from exporters to backends.
  • Provide more advanced and efficient telemetry data processing capabilities. Increasing data volume, cost efficiency, and data minimization require additional data processing capabilities such as data projection, aggregation, and filtering.

Reasons for donation

The proposed protocol is tightly integrated with the existing OTLP protocol. A fallback mechanism between the two protocols has been implemented via a new pair of OTLP Arrow Exporter/Receiver that are drop-in compatible with the OTLP Exporter/Receiver core component, justifying the integration of this extension directly into the upstream project.

Repository

https://github.com/f5/otel-arrow-adapter

Existing usage

The project is developed and maintained jointly by F5 and Lightstep. F5 seeks to align the open-source community built around NGINX with the open-source community built around OpenTelemetry; this effort will deliver industrial-grade analytics capability to the telemetry data produced by its software components for its customers. Improving the compression performance and representation of multivariate time series are also among F5's goals. Lightstep, wanting to recommend OpenTelemetry collectors for its customers to use, seeks to improve the compression performance and efficiency achieved by OTLP when used for bulk data transport.

We are actively testing the components for donation on production data using an OpenTelemetry collector built from our drop-in compatible OTLP exporter and receiver. We are extending test coverage and eliminating gaps. We are planning to start beta tests with the community by providing documentation and tooling for benchmarking and troubleshooting.

Maintenance

F5 and Lightstep will continue to develop and maintain the project. We will encourage and help all new contributors to participate in this project. We are open to suggestions and ideas.

Our current roadmap is as follows:

  1. Continue to work on performance, reliability and robustness, extending test coverage, testing with more production data.
  2. Phase 2 of this project
    a) New client SDK natively supporting OTLP Arrow and multivariate metrics.
    b) Continuing the migration to Apache Arrow to achieve end-to-end performance gains.
    c) Integrating processing, aggregation, and filtering capabilities that leverages the Apache Arrow eco-system.
    d) Parquet integration.

Licenses

This project is licensed under the terms of the Apache 2.0 open source license.

Trademarks

The Apache Arrow project is used in this project.

Other notes

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions