-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Component(s)
exporter/exporterhelper
What happened?
Describe the bug
When an exporter has a sending_queue.sizer=bytes and sending_queue.batch.max_size, and one single telemetry data is bigger than sending_queue.batch.max_size, then it starts to consume memory indefinitely, and eventually the collector is OOMKilled.
Steps to reproduce
Set up a collector's configuration with an exporter that supports exporterhelper and include the following configuration.
sending_queue:
sizer: bytes
batch:
max_size: 10Then run a collector and send telemetry data. For example, send a trace data by telemetrygen traces --otlp-insecure --traces 1 to OTLP receiver.
What did you expect to see?
Telemetry data should be dropped and output error logs.
What did you see instead?
I couldn't see anything. But CPU usages and memory usages increase, and the process is eventually OOM Killed.
I saw otelcol-dev process was OOMKilled by dmesg.
> sudo dmesg -T | egrep -i 'killed process|oom.kill'
[Fri Apr 18 12:37:16 2025] qemu-system-x86 invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=200
[Fri Apr 18 12:37:16 2025] oom_kill_process+0x118/0x280
[Fri Apr 18 12:37:16 2025] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=user.slice,mems_allowed=0,global_oom,task_memcg=/user.slice/user-2013.slice/[email protected]/app.slice/tmux-spawn-281eff27-82db-4b5a-8dd7-c264f1daffe7.scope,task=otelcol-dev,pid=1542808,uid=2013
[Fri Apr 18 12:37:16 2025] Out of memory: Killed process 1542808 (otelcol-dev) total-vm:32036360kB, anon-rss:24105632kB, file-rss:516kB, shmem-rss:0kB, UID:2013 pgtables:55032kB oom_score_adj:100
[Fri Apr 18 13:14:50 2025] code invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=100
[Fri Apr 18 13:14:50 2025] oom_kill_process+0x118/0x280
[Fri Apr 18 13:14:50 2025] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=user.slice,mems_allowed=0,global_oom,task_memcg=/user.slice/user-2013.slice/[email protected]/app.slice/tmux-spawn-41f0b67f-58b9-4eab-b419-24031351ed25.scope,task=builder,pid=1663701,uid=2013
[Fri Apr 18 13:14:51 2025] Out of memory: Killed process 1663701 (builder) total-vm:27390444kB, anon-rss:23558260kB, file-rss:1500kB, shmem-rss:0kB, UID:2013 pgtables:49208kB oom_score_adj:100
[Fri Apr 18 13:25:41 2025] wpa_supplicant invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
[Fri Apr 18 13:25:41 2025] oom_kill_process+0x118/0x280
[Fri Apr 18 13:25:41 2025] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=wpa_supplicant.service,mems_allowed=0,global_oom,task_memcg=/user.slice/user-2013.slice/[email protected]/app.slice/tmux-spawn-1c691201-4657-4fc1-a2fd-746d93188ea9.scope,task=builder,pid=1693988,uid=2013
[Fri Apr 18 13:25:41 2025] Out of memory: Killed process 1693988 (builder) total-vm:39965976kB, anon-rss:24978248kB, file-rss:532kB, shmem-rss:0kB, UID:2013 pgtables:65084kB oom_score_adj:100Collector version
v0.124.0
Environment information
Environment
OS: Ubuntu 24.04
Compiler: go1.24.1, ocb: v0.124.0
OpenTelemetry Collector configuration
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
exporters:
otlp:
endpoint: localhost:14317
tls:
insecure: true
sending_queue:
sizer: bytes
queue_size: 10000
batch:
flush_timeout: 1s
max_size: 10
service:
pipelines:
traces:
receivers: [otlp]
exporters: [otlp]
telemetry:
logs:
level: debugLog output
go run ./otelcol-dev --config otel-collector-config.yaml
2025-04-18T11:52:52.486-0700 info [email protected]/service.go:199 Setting up own telemetry...
2025-04-18T11:52:52.492-0700 debug builders/builders.go:24 Stable component.
2025-04-18T11:52:52.496-0700 debug builders/builders.go:24 Stable component.
2025-04-18T11:52:52.496-0700 debug [email protected]/otlp.go:58 created signal-agnostic logger
2025-04-18T11:52:52.514-0700 info [email protected]/service.go:266 Starting otelcol-dev... {"Version": "", "NumCPU": 12}
2025-04-18T11:52:52.514-0700 info extensions/extensions.go:41 Starting extensions...
2025-04-18T11:52:52.515-0700 info [email protected]/clientconn.go:176 [core] original dial target is: "localhost:14317" {"grpc_log": true}
2025-04-18T11:52:52.517-0700 info [email protected]/clientconn.go:459 [core] [Channel #1]Channel created {"grpc_log": true}
2025-04-18T11:52:52.517-0700 info [email protected]/clientconn.go:207 [core] [Channel #1]parsed dial target is: resolver.Target{URL:url.URL{Scheme:"passthrough", Opaque:"", User:(*url.Userinfo)(nil), Host:"", Path:"/localhost:14317", RawPath:"", OmitHost:false, ForceQuery:false, RawQuery:"", Fragment:"", RawFragment:""}} {"grpc_log": true}
2025-04-18T11:52:52.517-0700 info [email protected]/clientconn.go:208 [core] [Channel #1]Channel authority set to "localhost:14317" {"grpc_log": true}
2025-04-18T11:52:52.523-0700 info [email protected]/resolver_wrapper.go:210 [core] [Channel #1]Resolver state updated: {
"Addresses": [
{
"Addr": "localhost:14317",
"ServerName": "",
"Attributes": null,
"BalancerAttributes": null,
"Metadata": null
}
],
"Endpoints": [
{
"Addresses": [
{
"Addr": "localhost:14317",
"ServerName": "",
"Attributes": null,
"BalancerAttributes": null,
"Metadata": null
}
],
"Attributes": null
}
],
"ServiceConfig": null,
"Attributes": null
} (resolver returned new addresses) {"grpc_log": true}
2025-04-18T11:52:52.525-0700 info [email protected]/balancer_wrapper.go:122 [core] [Channel #1]Channel switches to new LB policy "pick_first" {"grpc_log": true}
2025-04-18T11:52:52.526-0700 info gracefulswitch/gracefulswitch.go:194 [pick-first-lb] [pick-first-lb 0xc000139f50] Received new config {
"shuffleAddressList": false
}, resolver state {
"Addresses": [
{
"Addr": "localhost:14317",
"ServerName": "",
"Attributes": null,
"BalancerAttributes": null,
"Metadata": null
}
],
"Endpoints": [
{
"Addresses": [
{
"Addr": "localhost:14317",
"ServerName": "",
"Attributes": null,
"BalancerAttributes": null,
"Metadata": null
}
],
"Attributes": null
}
],
"ServiceConfig": null,
"Attributes": null
} {"grpc_log": true}
2025-04-18T11:52:52.527-0700 info [email protected]/balancer_wrapper.go:195 [core] [Channel #1 SubChannel #2]Subchannel created {"grpc_log": true}
2025-04-18T11:52:52.528-0700 info [email protected]/clientconn.go:563 [core] [Channel #1]Channel Connectivity change to CONNECTING {"grpc_log": true}
2025-04-18T11:52:52.529-0700 info [email protected]/clientconn.go:364 [core] [Channel #1]Channel exiting idle mode {"grpc_log": true}
2025-04-18T11:52:52.530-0700 info [email protected]/server.go:690 [core] [Server #3]Server created {"grpc_log": true}
2025-04-18T11:52:52.531-0700 info [email protected]/otlp.go:116 Starting GRPC server {"endpoint": "0.0.0.0:4317"}
2025-04-18T11:52:52.531-0700 info [email protected]/service.go:289 Everything is ready. Begin running and processing data.
2025-04-18T11:52:52.532-0700 info [email protected]/clientconn.go:1224 [core] [Channel #1 SubChannel #2]Subchannel Connectivity change to CONNECTING {"grpc_log": true}
2025-04-18T11:52:52.532-0700 info [email protected]/clientconn.go:1344 [core] [Channel #1 SubChannel #2]Subchannel picks a new address "localhost:14317" to connect {"grpc_log": true}
2025-04-18T11:52:52.533-0700 info [email protected]/server.go:886 [core] [Server #3 ListenSocket #4]ListenSocket created {"grpc_log": true}
2025-04-18T11:52:52.533-0700 info pickfirst/pickfirst.go:184 [pick-first-lb] [pick-first-lb 0xc000139f50] Received SubConn state update: 0xc00027e370, {ConnectivityState:CONNECTING ConnectionError:<nil> connectedAddress:{Addr: ServerName: Attributes:<nil> BalancerAttributes:<nil> Metadata:<nil>}} {"grpc_log": true}
2025-04-18T11:52:52.548-0700 info [email protected]/clientconn.go:1224 [core] [Channel #1 SubChannel #2]Subchannel Connectivity change to READY {"grpc_log": true}
2025-04-18T11:52:52.548-0700 info pickfirst/pickfirst.go:184 [pick-first-lb] [pick-first-lb 0xc000139f50] Received SubConn state update: 0xc00027e370, {ConnectivityState:READY ConnectionError:<nil> connectedAddress:{Addr:localhost:14317 ServerName:localhost:14317 Attributes:<nil> BalancerAttributes:<nil> Metadata:<nil>}} {"grpc_log": true}
2025-04-18T11:52:52.548-0700 info [email protected]/clientconn.go:563 [core] [Channel #1]Channel Connectivity change to READY {"grpc_log": true}
2025-04-18T11:52:55.995-0700 info transport/http2_server.go:662 [transport] [server-transport 0xc0000f6000] Closing: read tcp 127.0.0.1:4317->127.0.0.1:46322: read: connection reset by peer {"grpc_log": true}
2025-04-18T11:52:55.997-0700 info transport/controlbuf.go:577 [transport] [server-transport 0xc0000f6000] loopyWriter exiting with error: transport closed by client {"grpc_log": true}Additional context
No response