Currently there is no way to benchmark pipelines and its components. It's all either manual (intuition) or making use of logging and deriving results from traces/spans. Though this is fine but this still adds latency and misses the actual few ms due to network or function call. A native way to benchmark the pipelines is a good way to compare results and statistically derive valuable metrics out from your pipeline.
Describe the solution you'd like
- Benchmark as part of
Pipeline() itself
- No external dependencies (using default Python methods)
- Results should be both Pipeline level and per component level
- Using percentiles instead of average as it provides more accurate and user-centric view of how the pipeline actually performs in the real world. So
p50, p90, p99 are must including avg and total displayed.
- Display the benchmark result in a user-friendly way.
Describe alternatives you've considered
- Tracing the pipeline
- Retrieving timestamp from tracer span.
- Then computing per component and pipeline level metrics and then deduce the results.
This isn't a good DX and metrics won't match the actual world pipeline runs as well.
Additional context
Raising PR for review since I feel this is a good value addition to the haystack pipelines.
Currently there is no way to benchmark pipelines and its components. It's all either manual (intuition) or making use of logging and deriving results from traces/spans. Though this is fine but this still adds latency and misses the actual few ms due to network or function call. A native way to benchmark the pipelines is a good way to compare results and statistically derive valuable metrics out from your pipeline.
Describe the solution you'd like
Pipeline()itselfp50, p90, p99are must including avg and total displayed.Describe alternatives you've considered
This isn't a good DX and metrics won't match the actual world pipeline runs as well.
Additional context
Raising PR for review since I feel this is a good value addition to the haystack pipelines.