Skip to content

Conversation

@baszalmstra
Copy link
Collaborator

This PR replaces the spawn_blocking-based .conda extraction implementation with a fully async approach using astral-async-zip and async-compression. The new implementation eliminates the need for bridging between async and sync contexts, resulting in significant performance improvements for realistic package installation workloads.

This is build on top of #1808 and provides a similar performance gain in download and extraction speeds.

…izations

Replace spawn_blocking-based tar.bz2 extraction with fully async implementation
using astral-tokio-tar and async-compression, eliminating thread pool overhead.
Add benchmark for testing extraction performance across realistic scenarios:
- Pure extraction (disk-based)
- Download + extract (network + disk)
- Mixed workload (varying package sizes)

Tests at concurrency levels 8, 16, and 32 to evaluate scaling behavior
and validate async implementation improvements. Benchmark downloads test
packages spanning 18KB to 8MB to represent real-world package sizes.
- Move rt-multi-thread tokio feature from main dependencies to
  target-specific dev-dependencies since it's only needed for the benchmark
- Convert benchmark from [[bin]] to [[bench]] target with harness=false
- Fix clippy warning by adding #[cfg(unix)] to EXECUTABLE_MODE_BITS constant
  which is only used in unix-specific code
This extends the async extraction work from PR conda#1808 (tar.bz2) to .conda
archives using the astral_async_zip crate, achieving 46-61% performance
improvements for disk-based extraction.

Key changes:
- Add streaming async extraction using async_zip's stream API
- Add seek-based extraction for random-access sources
- Implement SpooledTempFile-based buffering fallback (5MB threshold)
- Share common extraction logic in new tokio/shared module
- Use futures::io traits with compat adapters where needed

Performance (vs sync bridge):
- Pure extraction: 46-61% faster for concurrent workloads
- Memory efficient: SpooledTempFile keeps small packages in memory
- All existing tests pass (47/48, 1 unrelated Windows symlink issue)

Dependencies added:
- astral_async_zip (already in workspace)
- async-spooled-tempfile 0.1.0
@baszalmstra baszalmstra force-pushed the feat/async-zip-conda-extraction branch from c0022db to 169af4a Compare November 18, 2025 12:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant