Performance optimization by umputun · Pull Request #2 · go-pkgz/pool

umputun · 2025-02-14T21:00:03Z

The PR is a result of benchmarking the pool performance against typical approaches: errgroup and a manual pool of goroutines with a channel. After checking it, I have made several changes:

Make the push-to-workers channel (i.e., dispatch on Submit) side only in case such dispatch is really needed. Otherwise, it will do a more typical pull from a shared channel by workers. The main reason for this change is to minimize possible sleep on the worker side caused by the random nature of the distribution. However, not all tasks are equal, and in some cases, such distribution may hurt performance. Letting workers pull from a channel is simpler, more predictable, and balances the load automatically.
Make the batch smarter. To eliminate the need for a mutex for batch collecting, it uses either a shared channel or workers' channels, but collects data per worker anyway.
Set the default batch size to 10, as it improves performance significantly by reducing channel communications.
Add a set of performance tests and benchmarks. Performance tests also catch possible degradation.

The end result is not bad: the pool performs better compared to same-sized errgroup or manual pool in any tested cases. Pls note: the traditional (manual) pool showed results almost identical to errgroup

Mode	Run 1	Run 2	Average Time	Improvement vs errgroup	Bench Time (ns/op)	Bench Improvement
errgroup	3.71s	3.70s	3.71s	baseline	37,121,126	baseline
pool default	1.94s	1.96s	1.95s	1.90x faster	22,531,142	1.65x faster
pool with chan=100	2.01s	2.05s	2.03s	1.83x faster	23,172,449	1.60x faster
pool with batch=100	1.67s	1.80s	1.74s	2.13x faster	18,053,358	2.06x faster
pool with batch+chunk	1.74s	1.73s	1.74s	2.13x faster	18,163,251	2.04x faster

Key observations:

Direct tests show up to 2.13x improvement over errgroup
Benchmarks confirm significant speedup with 1.60x-2.06x improvement
Batching provides best performance boost in both tests and benchmarks
Channel size increase alone gives modest 1.83x improvement
Chunking maintains the performance benefits of batching (2.13x faster)

Batching with size 100 provides the most significant improvement, more than doubling performance compared to errgroup.
The results suggest that batching is the most significant performance optimization, while chunking adds functionality (predictable distribution) without significant performance impact.

…s in batch mode cancellation

umputun added 9 commits February 13, 2025 18:51

refactor for performance, add benchmarks

ae707e4

with metrics restored

1f96c6f

restore batch mode

b6ea528

disable profiling test

83e143b

more bench test

3e3a2db

matched perf benchmark

46e8eca

update benchmarks to detect performance regression. address edge case…

bfa30b5

…s in batch mode cancellation

add tests for stateful worker middleware and direct mode chunking

e0b788a

add default values to option descriptions in readme

80b956e

umputun merged commit 6a04124 into master Feb 14, 2025
2 checks passed

umputun deleted the perf branch February 14, 2025 21:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance optimization#2

Performance optimization#2
umputun merged 9 commits intomasterfrom
perf

umputun commented Feb 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

umputun commented Feb 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant