fix(loki.process): Make limit stage shutdown cancelable by boinger · Pull Request #6215 · grafana/alloy

boinger · 2026-05-09T20:13:39Z

Pull Request Details

When loki.process has stage.limit { drop = false }, the stage's
shouldThrottle calls rateLimiter.Wait, which blocks until the next
token. The Run goroutine only exits its range loop between iterations,
so a goroutine inside Wait doesn't observe close(in). With a low
rate, EntryHandler.Stop hangs for the full token interval.

This adds an optional Stopper interface (Stop()) called by
Pipeline.Stop before wg.Wait. limitStage cancels its internal
context to unblock Wait. Pipeline itself implements Stopper so
nested inner pipelines (via matcherStage) forward recursively.

Scope

The by-label code path (stage.limit { by_label_name = "..." }) is
not affected; validateLimitConfig requires drop = true with
by_label_name, so that path uses rl.Allow() and never reaches a
Wait call.

Issue(s) fixed by this Pull Request

None. Goroutine leak triggered only on shutdown with non-trivial rate limits.

Notes to the Reviewer

Repro: stage.limit { rate = 1, burst = 1, drop = false }, sustained
input, SIGTERM. Without the fix, alloy hangs.

Regression tests in limit_test.go and match_test.go share an
assertPipelineStopsPromptly helper using channel-based synchronization:
send one entry, receive it (signaling the first iteration completed),
send a second entry that blocks Wait, assert EntryHandler.Stop
returns within 2s. The match_test.go case exercises
matcherStage.Stop forwarding through its inner Pipeline.

Verified with go test -race -count=1 ./internal/component/loki/process/...
and make lint.

PR Checklist

Documentation added (n/a)
Tests updated
Config converters updated (n/a)

kalleep · 2026-05-11T07:00:46Z

Hey thanks for the pr, ideally you would create an issue before writing a pr and please keep that in mind in the future.

Yes this is a bug but there is a way easier solution that create an additional optional interface for pipeline stages. Stage interface includes a Cleanup function that is called whenever a pipeline is stopped and we should use that instead.

boinger · 2026-05-11T18:21:59Z

Thanks. I will open issues first in the future; I appreciate the steering.

I considered the Cleanup approach first; the current shutdown order has Cleanup running after wg.Wait, so a goroutine stuck in rateLimiter.Wait prevents wg.Wait from returning and Cleanup never gets called.

Should I reorder shutdown to call Cleanup before wg.Wait, or is there a path I'm missing?
Maybe call Cleanup in parallel with wg.Wait so it can serve as a cancellation signal?

Either option shifts the Cleanup invariant for all stages. Most are no-ops, but metric.go's Cleanup mutates state, so I want to make sure I'm not missing a constraint.

kalleep · 2026-05-12T07:36:53Z

I took another look and it's not actually safe to use Cleanup, that would break metrics stage because the actions need's to happen after we don't get any more entries.

Then I would prefer to have an optional interface for calling Stop() on a stage instead of BindContext, the latter is not that clear IMO. Stop should be called before we wait on wg. I am fine with adding comment where we think comments are needed but as for the other pr this is really excessive and just generates lots of noise, I am pretty sure you are doing most of these pr:s using AI. And while it's fine to use AI be mindful that it will consume our (alloy maintainers) time to review them. Hence why we ideally want issues before we get a bunch of pr:s so we can triage them.

rateLimiter.Wait in shouldThrottle blocks until the next token, and the Run goroutine's range loop only observes close(in) between iterations. EntryHandler.Stop hangs for the full token interval at low rates. Add an optional Stopper interface, called by Pipeline.Stop before wg.Wait. limitStage cancels its internal context to unblock Wait; Pipeline implements Stopper so matchers' inner pipelines forward.

boinger · 2026-05-16T19:32:55Z

Reworked per your Stop() direction. One note: Pipeline implements Stopper so matcherStage.Stop just forwards via m.pipeline.Stop(). Two regression tests added (TestLimitWaitPipelineShutdown and TestMatchNestedLimitShutdown), both using channel-based sync rather than time.Sleep.

kalleep

LGTM, thanks

boinger requested a review from a team as a code owner May 9, 2026 20:13

boinger force-pushed the fix/loki-process-limit-shutdown-hang branch from e94765d to 3bc7aca Compare May 16, 2026 19:26

kalleep approved these changes May 18, 2026

View reviewed changes

kalleep merged commit 20717b5 into grafana:main May 18, 2026
47 of 48 checks passed

boinger deleted the fix/loki-process-limit-shutdown-hang branch May 18, 2026 14:53

grafana-alloybot Bot mentioned this pull request May 20, 2026

chore(main): Release 1.17.0 #6111

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(loki.process): Make limit stage shutdown cancelable#6215

fix(loki.process): Make limit stage shutdown cancelable#6215
kalleep merged 1 commit into
grafana:mainfrom
boinger:fix/loki-process-limit-shutdown-hang

boinger commented May 9, 2026 •

edited

Loading

Uh oh!

kalleep commented May 11, 2026

Uh oh!

boinger commented May 11, 2026

Uh oh!

kalleep commented May 12, 2026

Uh oh!

boinger commented May 16, 2026

Uh oh!

kalleep left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

boinger commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Details

Scope

Issue(s) fixed by this Pull Request

Notes to the Reviewer

PR Checklist

Uh oh!

kalleep commented May 11, 2026

Uh oh!

boinger commented May 11, 2026

Uh oh!

kalleep commented May 12, 2026

Uh oh!

boinger commented May 16, 2026

Uh oh!

kalleep left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

boinger commented May 9, 2026 •

edited

Loading