Skip to content

[BUG] regression: cgroup throttle metrics missing on ECS #42479

@lattwood

Description

@lattwood

Describe what happened:

(docker|container).cpu.throttled and (docker|container).cpu.throttled.time don't have non-zero data when running on ECS.

When cgroup parsing was removed (and partially re-added back in 7.45 for ECS- 5c80d06 ), the fix was incomplete and didn't include fetching throttle metrics from the parent cgroup.

Due to how ECS creates their cgroup hierarchy, this means nr_periods, nr_throttled, and throttled_usec, all crucial for knowing when you need to increase CPU allocations, are only set to non-zero values for the ecstasks-SHA.slice slice, as that is where the cpu limit is defined.

Describe what you expected:

I expected (docker|container).cpu.throttled and (docker|container).cpu.throttled.time to have meaningful and actionable data instead of all zeroes.

Steps to reproduce the issue:

  • Use ECS
  • Use Datadog
  • Run a service with a tiny CPU allocation and beat it up
  • It doesn't cry uncle (aka raise throttle metrics)

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions