Skip to content

[BISECTED]kafka exporter performance issue when encoding is not jaeger type #3162

@Frefreak

Description

@Frefreak

Describe the bug
We noticed a huge performance drop in kafka exporter in newer version (v0.26.0) with sent_span/s at around 50~60K, while in old version (v0.24.0) the same config can achieve 300K+. (the topic has 5 partitions).

After bisecting the issue appears to be introduced by this commit: 714702a, especially this line: 714702a#diff-3201209f317435e2073589faab9b89e22e6eeee6d6e822376d62f08a2cc0c4a6R151.

This commit introduced key by traceID for jaeger_span and jaeger_proto, but not for otlp_proto, in which case the message key in ProducerMessage would always be an empty byte array, causing imbalanced write (only one partition is being written instead of 5, in our env). This results in a huge performance drop when using default otlp_proto encoding. Commenting this line out make the producer writing evenly and gains good performance again.

Steps to reproduce
run with config provided below.

What did you expect to see?
Same or better performance in new version

What did you see instead?
Memory usage accumulated and OOM eventually.

What version did you use?
Version: v0.25.0, v0.26.0

What config did you use?
Config: (e.g. the yaml config file)

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
        max_recv_msg_size_mib: 64
      http:
        endpoint: 0.0.0.0:55681

processors:
  batch:
  memory_limiter:
    check_interval: 5s
    limit_mib: 6000
    spike_limit_percentage: 30

exporters:
  kafka:
    timeout: 15s
    retry_on_failure:
      enabled: false
    protocol_version: 2.2.0
    sending_queue:
      enabled: true
      num_consumers: 32
      queue_size: 10000
    brokers:
      - xxxx:9092
extensions:
  pprof:

service:
  extensions: [pprof]
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter]
      exporters: [kafka]

Environment
OS: Centos 7 (3.10)
Compiler(if manually compiled): go1.16.3

Additional context
Maybe we can pass traceid as key for otlp_proto encoding too, or make a special case to not set key when encoding is otlp_proto. I can help to open a PR regarding this if that's ok.

EDIT:
For otlp_proto the current marshal implementation doesn't actually look inside each ExportTraceServiceRequest struct, and just sending it directly, which seems fast. If need to extract traceID then the performance might still not as good as the old version. So personally I would prefer a change to the sending part (i.e. don't set Key) instead of the marshal part (pass traceid)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions