Skip to content

Conversation

@lminer
Copy link
Contributor

@lminer lminer commented Oct 8, 2025

Add Optional GPU Acceleration using CuPy

Summary

This PR adds optional GPU acceleration to museval's BSS evaluation metrics using CuPy, providing significant performance improvements (10-20x speedup) while maintaining full backward compatibility.

Motivation

BSS evaluation metrics can be computationally expensive, especially for:

  • Long audio files (>30 seconds)
  • Multiple sources (4+)
  • Framewise evaluation with many windows
  • Batch processing of datasets

GPU acceleration makes these workloads significantly faster without changing the API or requiring users to modify existing code.

Changes

Core Implementation

1. Backend Abstraction Layer (museval/backends/)

  • __init__.py: Backend selection and initialization with graceful fallback
  • base.py: Abstract base class defining the backend interface
  • numpy_backend.py: CPU implementation (default, no changes for existing users)
  • cupy_backend.py: GPU implementation using CuPy

2. Refactored Core Metrics (museval/metrics.py)

  • Added backend='auto' parameter to all main functions
  • Updated all helper functions to use backend abstraction
  • Maintains 100% backward compatibility (default behavior unchanged)
  • Results always returned as NumPy arrays for API consistency

3. Dependencies (setup.py)

  • Added optional gpu extra: pip install museval[gpu]
  • Uses cupy-cuda12x which bundles CUDA libraries (users only need GPU drivers)
  • Optional dependency - CPU-only installation unchanged

4. Comprehensive Testing

  • tests/test_backends.py: Backend abstraction tests (11 tests)
  • tests/test_gpu_consistency.py: CPU/GPU numerical consistency tests (18 tests)
  • tests/benchmark_gpu.py: Performance benchmarking script
  • All tests pass, GPU tests automatically skip when GPU unavailable

5. Documentation & Examples

  • Updated README.md with GPU installation and usage instructions
  • Added examples/gpu_example.py demonstrating GPU usage
  • Clear documentation of performance characteristics

API Changes

Backward Compatible - No Breaking Changes!

All existing code continues to work without modification:

# Existing code - still works exactly as before (uses CPU)
sdr, isr, sir, sar, perm = bss_eval(reference, estimated)

New Optional GPU Support

# Enable GPU acceleration
sdr, isr, sir, sar, perm = bss_eval(reference, estimated, backend='cupy')

# Or use environment variable for global setting
# export MUSEVAL_BACKEND=cupy
sdr, isr, sir, sar, perm = bss_eval(reference, estimated)  # Uses GPU

All main functions support the backend parameter:

  • bss_eval()
  • bss_eval_sources()
  • bss_eval_images()
  • bss_eval_sources_framewise()
  • bss_eval_images_framewise()

Performance Results

Benchmarked on NVIDIA RTX A6000 with CUDA 12.4:

Workload CPU Time GPU Time Speedup
Short (1s, 2 sources) 1.05 s 81 ms 13.0x
Long (10s, 2 sources) 1.76 s 125 ms 14.1x
Long (10s, 4 sources) 13.28 s 627 ms 21.2x
Very long (30s, 2 sources) 6.07 s 356 ms 17.1x
Framewise (10s, 2s windows) 1.69 s 166 ms 10.2x
Average - - 15.6x

Numerical Consistency

CPU and GPU backends produce identical results within floating-point precision:

  • Max SDR difference: 1.78e-15
  • Max SIR difference: 1.07e-14
  • Max SAR difference: 2.13e-14

Installation

CPU Only (Default - No Changes)

pip install museval

With GPU Support

pip install museval[gpu]

Requirements for GPU:

  • NVIDIA GPU with CUDA support
  • NVIDIA GPU drivers (CUDA Toolkit NOT required - bundled in wheel)

Testing

All tests pass:

pytest tests/
# 51 passed, 1 skipped (GPU test when CuPy unavailable)

Test Coverage

  • Backend abstraction: 11 tests
  • CPU/GPU consistency: 18 tests
  • All existing tests: Pass without modification
  • Total: 52 tests

Design Decisions

  1. Optional Dependency: GPU support is completely optional via extras, no impact on CPU-only users
  2. Graceful Fallback: Automatically falls back to CPU if GPU unavailable with clear warnings
  3. Backward Compatible: Default behavior unchanged, all existing tests pass
  4. Consistent API: Results always returned as NumPy arrays regardless of backend
  5. Clean Architecture: Backend abstraction allows future additions (e.g., JAX, other accelerators)

Checklist

  • All tests pass
  • Backward compatibility maintained
  • Documentation updated (README, examples)
  • Performance benchmarks included
  • GPU tests skip gracefully when GPU unavailable
  • Code follows existing style
  • No breaking changes

Future Enhancements (Not in this PR)

Potential future additions:

  • Multi-GPU support for batch processing
  • JAX backend as alternative
  • Mixed precision for memory savings
  • Automatic backend selection based on input size

Notes

  • The implementation uses CuPy's -cuda12x wheel which bundles CUDA runtime libraries
  • Users only need NVIDIA GPU drivers, not the full CUDA Toolkit installation
  • CPU-only users see zero overhead - backend selection happens at runtime
  • All computations are performed on the selected backend, with minimal CPU-GPU transfers

Testing Instructions for Reviewers

CPU-only (should work everywhere):

pip install -e .
pytest tests/ -v

With GPU (requires NVIDIA GPU):

pip install -e .[gpu]
pytest tests/ -v  # Should pass 51 tests (0 skipped)
python tests/benchmark_gpu.py  # See actual speedup

@faroit
Copy link
Member

faroit commented Oct 17, 2025

@lminer we should probably see if this is much different to #84

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants