Skip to content

Performance optimization#2

Merged
umputun merged 9 commits intomasterfrom
perf
Feb 14, 2025
Merged

Performance optimization#2
umputun merged 9 commits intomasterfrom
perf

Conversation

@umputun
Copy link
Member

@umputun umputun commented Feb 14, 2025

The PR is a result of benchmarking the pool performance against typical approaches: errgroup and a manual pool of goroutines with a channel. After checking it, I have made several changes:

  1. Make the push-to-workers channel (i.e., dispatch on Submit) side only in case such dispatch is really needed. Otherwise, it will do a more typical pull from a shared channel by workers. The main reason for this change is to minimize possible sleep on the worker side caused by the random nature of the distribution. However, not all tasks are equal, and in some cases, such distribution may hurt performance. Letting workers pull from a channel is simpler, more predictable, and balances the load automatically.
  2. Make the batch smarter. To eliminate the need for a mutex for batch collecting, it uses either a shared channel or workers' channels, but collects data per worker anyway.
  3. Set the default batch size to 10, as it improves performance significantly by reducing channel communications.
  4. Add a set of performance tests and benchmarks. Performance tests also catch possible degradation.

The end result is not bad: the pool performs better compared to same-sized errgroup or manual pool in any tested cases. Pls note: the traditional (manual) pool showed results almost identical to errgroup

Mode Run 1 Run 2 Average Time Improvement vs errgroup Bench Time (ns/op) Bench Improvement
errgroup 3.71s 3.70s 3.71s baseline 37,121,126 baseline
pool default 1.94s 1.96s 1.95s 1.90x faster 22,531,142 1.65x faster
pool with chan=100 2.01s 2.05s 2.03s 1.83x faster 23,172,449 1.60x faster
pool with batch=100 1.67s 1.80s 1.74s 2.13x faster 18,053,358 2.06x faster
pool with batch+chunk 1.74s 1.73s 1.74s 2.13x faster 18,163,251 2.04x faster

Key observations:

  • Direct tests show up to 2.13x improvement over errgroup
  • Benchmarks confirm significant speedup with 1.60x-2.06x improvement
  • Batching provides best performance boost in both tests and benchmarks
  • Channel size increase alone gives modest 1.83x improvement
  • Chunking maintains the performance benefits of batching (2.13x faster)

Batching with size 100 provides the most significant improvement, more than doubling performance compared to errgroup.
The results suggest that batching is the most significant performance optimization, while chunking adds functionality (predictable distribution) without significant performance impact.

@umputun umputun merged commit 6a04124 into master Feb 14, 2025
2 checks passed
@umputun umputun deleted the perf branch February 14, 2025 21:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant