Optimize Logs SDK BatchProcessor #6569

wmdanor · 2025-03-28T02:15:20Z

Towards: #6382

This PR optimizes Logs SDK BatchProcessor.

Why?

Batch processor does not check if buffer exporter is full, instead it clones records to buffer, then tries to push them to buffer exporter, and then buffer exporter will return false from EnqueueExport if it is full.

It does not cause any performance issues when using NOOP exporter, because it "exports" instantly. -> Buffer exporter input channel will almost never be full.

But actual implementations of exporters are not going to act that way, they need some time for export operation. And so buffer exporter will get full fairly frequently. Each export attempt to buffer exporter costs us copying entire queue inside of batch processor.

In order to catch this or similar performance bottleneck in the future, new benchmark (BatchSimulateExport) was added which will use exporter that will simulate non-instant export by sleeping for 5ms.

Benchmarks

goos: linux
goarch: amd64
pkg: go.opentelemetry.io/otel/sdk/log
cpu: AMD Ryzen 9 7900X 12-Core Processor
                                 │   base.txt   │               new.txt               │
                                 │    sec/op    │   sec/op     vs base                │
Processor/Simple-24                 340.9n ± 2%   343.5n ± 5%        ~ (p=0.971 n=10)
Processor/Batch-24                  667.6n ± 2%   666.3n ± 4%        ~ (p=0.896 n=10)
Processor/BatchSimulateExport-24   4441.5n ± 5%   526.4n ± 4%  -88.15% (p=0.000 n=10)
Processor/SetTimestampSimple-24     360.0n ± 4%   351.0n ± 1%        ~ (p=0.247 n=10)
Processor/SetTimestampBatch-24      669.8n ± 3%   647.0n ± 5%        ~ (p=0.052 n=10)
Processor/AddAttributesSimple-24    383.9n ± 1%   386.6n ± 5%        ~ (p=0.138 n=10)
Processor/AddAttributesBatch-24     759.9n ± 6%   734.5n ± 5%        ~ (p=0.481 n=10)
Processor/SetAttributesSimple-24    369.9n ± 2%   372.4n ± 3%        ~ (p=0.643 n=10)
Processor/SetAttributesBatch-24     684.2n ± 4%   674.0n ± 8%        ~ (p=0.529 n=10)
geomean                             639.9n        499.9n       -21.88%

                                 │   base.txt   │                new.txt                 │
                                 │     B/op     │     B/op      vs base                  │
Processor/Simple-24                  450.0 ± 0%     450.0 ± 0%        ~ (p=1.000 n=10) ¹
Processor/Batch-24                 1.153Ki ± 2%   1.125Ki ± 1%   -2.41% (p=0.001 n=10)
Processor/BatchSimulateExport-24     649.0 ± 1%     473.0 ± 0%  -27.12% (p=0.000 n=10)
Processor/SetTimestampSimple-24      450.0 ± 0%     450.0 ± 0%        ~ (p=1.000 n=10) ¹
Processor/SetTimestampBatch-24     1.145Ki ± 2%   1.122Ki ± 2%   -2.01% (p=0.003 n=10)
Processor/AddAttributesSimple-24     450.0 ± 0%     450.0 ± 0%        ~ (p=1.000 n=10) ¹
Processor/AddAttributesBatch-24    1.169Ki ± 2%   1.148Ki ± 1%   -1.80% (p=0.000 n=10)
Processor/SetAttributesSimple-24     450.0 ± 0%     450.0 ± 0%        ~ (p=1.000 n=10) ¹
Processor/SetAttributesBatch-24    1.141Ki ± 2%   1.128Ki ± 2%   -1.11% (p=0.030 n=10)
geomean                              719.3          688.7        -4.24%
¹ all samples are equal

                                 │  base.txt  │               new.txt               │
                                 │ allocs/op  │ allocs/op   vs base                 │
Processor/Simple-24                1.000 ± 0%   1.000 ± 0%       ~ (p=1.000 n=10) ¹
Processor/Batch-24                 1.000 ± 0%   1.000 ± 0%       ~ (p=1.000 n=10) ¹
Processor/BatchSimulateExport-24   1.000 ± 0%   1.000 ± 0%       ~ (p=1.000 n=10) ¹
Processor/SetTimestampSimple-24    1.000 ± 0%   1.000 ± 0%       ~ (p=1.000 n=10) ¹
Processor/SetTimestampBatch-24     1.000 ± 0%   1.000 ± 0%       ~ (p=1.000 n=10) ¹
Processor/AddAttributesSimple-24   1.000 ± 0%   1.000 ± 0%       ~ (p=1.000 n=10) ¹
Processor/AddAttributesBatch-24    1.000 ± 0%   1.000 ± 0%       ~ (p=1.000 n=10) ¹
Processor/SetAttributesSimple-24   1.000 ± 0%   1.000 ± 0%       ~ (p=1.000 n=10) ¹
Processor/SetAttributesBatch-24    1.000 ± 0%   1.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean                            1.000        1.000       +0.00%
¹ all samples are equal

Benchmarks from issue (external)

goos: linux
goarch: amd64
pkg: github.com/pellared/spanevents-vs-logs
cpu: AMD Ryzen 9 7900X 12-Core Processor
           │ spanevents.txt │              logs.base.txt              │             logs.new.txt             │
           │     sec/op     │    sec/op      vs base                  │   sec/op     vs base                 │
/OTLP-24        41.40µ ± 3%   2935.15µ ± 9%  +6989.05% (p=0.000 n=10)   84.19µ ± 4%  +103.34% (p=0.000 n=10)
/STDOUT-24      39.55µ ± 4%    623.45µ ± 1%  +1476.21% (p=0.000 n=10)   89.51µ ± 3%  +126.29% (p=0.000 n=10)
/NOOP-24       13.179µ ± 2%     4.950µ ± 1%    -62.44% (p=0.000 n=10)   4.932µ ± 1%   -62.58% (p=0.000 n=10)
geomean         27.84µ          208.4µ        +648.68%                  33.37µ        +19.86%

           │ spanevents.txt │             logs.base.txt              │             logs.new.txt             │
           │      B/op      │     B/op       vs base                 │     B/op      vs base                │
/OTLP-24       98.90Ki ± 8%   210.44Ki ± 0%  +112.79% (p=0.000 n=10)   52.01Ki ± 0%  -47.41% (p=0.000 n=10)
/STDOUT-24     91.01Ki ± 0%   286.62Ki ± 0%  +214.93% (p=0.000 n=10)   81.67Ki ± 1%  -10.27% (p=0.000 n=10)
/NOOP-24       29840.0 ± 0%      240.0 ± 0%   -99.20% (p=0.000 n=10)     240.0 ± 0%  -99.20% (p=0.000 n=10)
geomean        64.01Ki         24.18Ki        -62.23%                  9.985Ki       -84.40%

           │ spanevents.txt │            logs.base.txt             │            logs.new.txt            │
           │   allocs/op    │  allocs/op   vs base                 │ allocs/op   vs base                │
/OTLP-24        699.0 ± 20%   2174.5 ± 1%  +211.09% (p=0.000 n=10)   249.5 ± 1%  -64.31% (p=0.000 n=10)
/STDOUT-24      646.5 ±  1%   4912.5 ± 0%  +659.86% (p=0.000 n=10)   872.5 ± 2%  +34.96% (p=0.000 n=10)
/NOOP-24      303.000 ±  0%    3.000 ± 0%   -99.01% (p=0.000 n=10)   3.000 ± 0%  -99.01% (p=0.000 n=10)
geomean         515.4          317.6        -38.37%                  86.76       -83.17%

goos: linux
goarch: amd64
pkg: github.com/pellared/spanevents-vs-logs
cpu: AMD Ryzen 9 7900X 12-Core Processor
           │ logs.base.txt │            logs.new.txt             │
           │    sec/op     │   sec/op     vs base                │
/OTLP-24     2935.15µ ± 9%   84.19µ ± 4%  -97.13% (p=0.000 n=10)
/STDOUT-24    623.45µ ± 1%   89.51µ ± 3%  -85.64% (p=0.000 n=10)
/NOOP-24       4.950µ ± 1%   4.932µ ± 1%        ~ (p=0.342 n=10)
geomean        208.4µ        33.37µ       -83.99%

           │ logs.base.txt │              logs.new.txt              │
           │     B/op      │     B/op      vs base                  │
/OTLP-24     210.44Ki ± 0%   52.01Ki ± 0%  -75.28% (p=0.000 n=10)
/STDOUT-24   286.62Ki ± 0%   81.67Ki ± 1%  -71.51% (p=0.000 n=10)
/NOOP-24        240.0 ± 0%     240.0 ± 0%        ~ (p=1.000 n=10) ¹
geomean       24.18Ki        9.985Ki       -58.70%
¹ all samples are equal

           │ logs.base.txt │             logs.new.txt             │
           │   allocs/op   │ allocs/op   vs base                  │
/OTLP-24       2174.5 ± 1%   249.5 ± 1%  -88.53% (p=0.000 n=10)
/STDOUT-24     4912.5 ± 0%   872.5 ± 2%  -82.24% (p=0.000 n=10)
/NOOP-24        3.000 ± 0%   3.000 ± 0%        ~ (p=1.000 n=10) ¹
geomean         317.6        86.76       -72.69%
¹ all samples are equal

sdk/log/exporter.go

pellared

Batch processor does not check if buffer exporter is full, instead it clones records to buffer, then tries to push them to buffer exporter, and then buffer exporter will return false from EnqueueExport if it is full.
[...]
But actual implementations of exporters are not going to act that way, they need some time for export operation. And so buffer exporter will get full fairly frequently. Each export attempt to buffer exporter costs us copying entire queue inside of batch processor.

This a very good finding 👍

PS. I am happy that improving the logs batching processor does not seem to require making changes in the exported API surface 🎉

sdk/log/batch.go

pellared · 2025-03-28T10:51:12Z

@wmdanor, do you think it is ready for review?

sdk/log/batch.go

wmdanor · 2025-03-28T17:49:32Z

@wmdanor, do you think it is ready for review?

It have noticed that CI failed this morning, so I put it back to draft as I did not have enough time to fix it before going to work.

I saw that you have helped me with resolving those issues, thank you for that.

I have addressed your comment about b.exporter.IsQueueFull() { continue }, so it should be ready now.

Other than that, there is race condition comment you left, should I be trying to add some mutex locks for additional security for this in this PR or do we merge this one as good enough?

pellared · 2025-03-28T18:16:29Z

I think after this is addressed: #6569 (comment) it can be reviewed by others.

wmdanor · 2025-03-28T18:35:45Z

I think after this is addressed: #6569 (comment) it can be reviewed by others.

@pellared pushed the revert.

pellared

@MrAlias, PTAL

sdk/log/exporter.go

MrAlias · 2025-04-10T17:10:10Z

sdk/log/batch.go

+					}
+					return ok
+				})
+			}


Suggested change

}

} else {

qLen = b.q.Len()

}

Addresses #6569 (comment)

# Overview Closes #6786 ### Added - Add exponential histogram support in `go.opentelemetry.io/otel/exporters/prometheus`. (#6421) - The `go.opentelemetry.io/otel/semconv/v1.31.0` package. The package contains semantic conventions from the `v1.31.0` version of the OpenTelemetry Semantic Conventions. See the [migration documentation](./semconv/v1.31.0/MIGRATION.md) for information on how to upgrade from `go.opentelemetry.io/otel/semconv/v1.30.0`. (#6479) - Add `Recording`, `Scope`, and `Record` types in `go.opentelemetry.io/otel/log/logtest`. (#6507) - Add `WithHTTPClient` option to configure the `http.Client` used by `go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp`. (#6751) - Add `WithHTTPClient` option to configure the `http.Client` used by `go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetrichttp`. (#6752) - Add `WithHTTPClient` option to configure the `http.Client` used by `go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploghttp`. (#6688) - Add `ValuesGetter` in `go.opentelemetry.io/otel/propagation`, a `TextMapCarrier` that supports retrieving multiple values for a single key. (#5973) - Add `Values` method to `HeaderCarrier` to implement the new `ValuesGetter` interface in `go.opentelemetry.io/otel/propagation`. (#5973) - Update `Baggage` in `go.opentelemetry.io/otel/propagation` to retrieve multiple values for a key when the carrier implements `ValuesGetter`. (#5973) - Add `AssertEqual` function in `go.opentelemetry.io/otel/log/logtest`. (#6662) - The `go.opentelemetry.io/otel/semconv/v1.32.0` package. The package contains semantic conventions from the `v1.32.0` version of the OpenTelemetry Semantic Conventions. See the [migration documentation](./semconv/v1.32.0/MIGRATION.md) for information on how to upgrade from `go.opentelemetry.io/otel/semconv/v1.31.0`(#6782) - Add `Transform` option in `go.opentelemetry.io/otel/log/logtest`. (#6794) - Add `Desc` option in `go.opentelemetry.io/otel/log/logtest`. (#6796) ### Removed - Drop support for [Go 1.22]. (#6381, #6418) - Remove `Resource` field from `EnabledParameters` in `go.opentelemetry.io/otel/sdk/log`. (#6494) - Remove `RecordFactory` type from `go.opentelemetry.io/otel/log/logtest`. (#6492) - Remove `ScopeRecords`, `EmittedRecord`, and `RecordFactory` types from `go.opentelemetry.io/otel/log/logtest`. (#6507) - Remove `AssertRecordEqual` function in `go.opentelemetry.io/otel/log/logtest`, use `AssertEqual` instead. (#6662) ### Changed - ⚠️ Update `github.com/prometheus/client_golang` to `v1.21.1`, which changes the `NameValidationScheme` to `UTF8Validation`. This allows metrics names to keep original delimiters (e.g. `.`), rather than replacing with underscores. This can be reverted by setting `github.com/prometheus/common/model.NameValidationScheme` to `LegacyValidation` in `github.com/prometheus/common/model`. (#6433) - Initialize map with `len(keys)` in `NewAllowKeysFilter` and `NewDenyKeysFilter` to avoid unnecessary allocations in `go.opentelemetry.io/otel/attribute`. (#6455) - `go.opentelemetry.io/otel/log/logtest` is now a separate Go module. (#6465) - `go.opentelemetry.io/otel/sdk/log/logtest` is now a separate Go module. (#6466) - `Recorder` in `go.opentelemetry.io/otel/log/logtest` no longer separately stores records emitted by loggers with the same instrumentation scope. (#6507) - Improve performance of `BatchProcessor` in `go.opentelemetry.io/otel/sdk/log` by not exporting when exporter cannot accept more. (#6569, #6641) ### Deprecated - Deprecate support for `model.LegacyValidation` for `go.opentelemetry.io/otel/exporters/prometheus`. (#6449) ### Fixes - Stop percent encoding header environment variables in `go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploggrpc` and `go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploghttp`. (#6392) - Ensure the `noopSpan.tracerProvider` method is not inlined in `go.opentelemetry.io/otel/trace` so the `go.opentelemetry.io/auto` instrumentation can instrument non-recording spans. (#6456) - Use a `sync.Pool` instead of allocating `metricdata.ResourceMetrics` in `go.opentelemetry.io/otel/exporters/prometheus`. (#6472) --------- Co-authored-by: Tyler Yahn <[email protected]> Co-authored-by: Robert Pająk <[email protected]>

why

8e9863a

wmdanor mentioned this pull request Mar 28, 2025

sdk/log: Optimize BatchProcessor #6382

Closed

wmdanor marked this pull request as ready for review March 28, 2025 02:19

wmdanor requested review from MrAlias, XSAM, dashpole, dmathieu and pellared as code owners March 28, 2025 02:19

wmdanor commented Mar 28, 2025

View reviewed changes

sdk/log/exporter.go Show resolved Hide resolved

wmdanor and others added 2 commits March 28, 2025 02:24

update comment

93b55ea

Merge branch 'main' into logs-perf-actual

ec81485

wmdanor marked this pull request as draft March 28, 2025 07:54

pellared added 3 commits March 28, 2025 10:18

Update exporter.go

15b3f0e

Merge branch 'main' into logs-perf-actual

2d259b1

Update CHANGELOG.md

996346b

pellared reviewed Mar 28, 2025

View reviewed changes

Update CHANGELOG.md

2c7d8a6

pellared reviewed Mar 28, 2025

View reviewed changes

sdk/log/batch.go Outdated Show resolved Hide resolved

pellared added 2 commits March 28, 2025 11:19

Merge branch 'main' into logs-perf-actual

087f302

Update batch.go

0f8bc54

pellared approved these changes Mar 28, 2025

View reviewed changes

pellared self-requested a review March 28, 2025 10:55

pellared requested changes Mar 28, 2025

View reviewed changes

sdk/log/batch.go Outdated Show resolved Hide resolved

wmdanor and others added 3 commits March 28, 2025 17:39

fix typo

be4cc3d

change skip behaviour

3a8fb6b

Merge branch 'main' into logs-perf-actual

bdec7f0

wmdanor marked this pull request as ready for review March 28, 2025 17:50

wmdanor requested a review from pellared March 28, 2025 17:50

revert skip behaviour

12038f5

wmdanor mentioned this pull request Apr 5, 2025

Refactor Logs SDK Batch Processor wmdanor/opentelemetry-go#3

Draft

pellared approved these changes Apr 8, 2025

View reviewed changes

pellared reviewed Apr 8, 2025

View reviewed changes

sdk/log/exporter.go Outdated Show resolved Hide resolved

wmdanor and others added 2 commits April 8, 2025 18:52

replace isqueuefull with ready

668e54f

Merge branch 'main' into logs-perf-actual

49572b3

dmathieu approved these changes Apr 9, 2025

View reviewed changes

pellared and others added 2 commits April 9, 2025 12:11

Merge branch 'main' into logs-perf-actual

94db019

Merge branch 'main' into logs-perf-actual

61cb20d

dmathieu merged commit a5abd88 into open-telemetry:main Apr 9, 2025
28 of 29 checks passed

MrAlias reviewed Apr 10, 2025

View reviewed changes

sdk/log/batch.go

}

return ok

})

}

Copy link

Contributor

MrAlias Apr 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

}

} else {

qLen = b.q.Len()

}

pellared mentioned this pull request Apr 10, 2025

sdk/log: remove unnecessary b.q.Len() call #6641

Merged

pellared added a commit that referenced this pull request Apr 11, 2025

sdk/log: remove unnecessary b.q.Len() call (#6641)

a6e302f

Addresses #6569 (comment)

pellared mentioned this pull request Apr 11, 2025

SIG meeting notes #6648

Open

MrAlias added this to the v1.36.0 milestone Apr 11, 2025

wmdanor mentioned this pull request Apr 17, 2025

[POC] Refactor Logs SDK Batch Processor #6673

Draft

dmathieu mentioned this pull request May 19, 2025

Release v1.36.0/v0.58.0/v0.12.0 #6793

Merged

renovate bot mentioned this pull request Jun 1, 2025

Update go updates arcalot/arcaflow-container-toolkit#185

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize Logs SDK BatchProcessor #6569

Optimize Logs SDK BatchProcessor #6569

Uh oh!

wmdanor commented Mar 28, 2025 •

edited

Loading

Uh oh!

Uh oh!

pellared left a comment •

edited

Loading

Uh oh!

Uh oh!

pellared commented Mar 28, 2025

Uh oh!

Uh oh!

wmdanor commented Mar 28, 2025 •

edited

Loading

Uh oh!

pellared commented Mar 28, 2025

Uh oh!

wmdanor commented Mar 28, 2025

Uh oh!

pellared left a comment

Uh oh!

Uh oh!

Uh oh!

MrAlias Apr 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

-			}
+} else {
+   qLen = b.q.Len()
+}

Optimize Logs SDK BatchProcessor #6569

Optimize Logs SDK BatchProcessor #6569

Uh oh!

Conversation

wmdanor commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why?

Benchmarks

Benchmarks from issue (external)

Uh oh!

Uh oh!

pellared left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pellared commented Mar 28, 2025

Uh oh!

Uh oh!

wmdanor commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pellared commented Mar 28, 2025

Uh oh!

wmdanor commented Mar 28, 2025

Uh oh!

pellared left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

MrAlias Apr 10, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

wmdanor commented Mar 28, 2025 •

edited

Loading

pellared left a comment •

edited

Loading

wmdanor commented Mar 28, 2025 •

edited

Loading