Release v0.3.2: Q4_0×Q8_0 Integer SIMD + APR Performance · paiml/realizar

Highlights

Performance milestone: Candle parity achieved! Realizar now matches the performance of HuggingFace's Candle for GGUF Q4_0 inference.

Q4_0×Q8_0 Integer SIMD Matmul - 2x inference speedup for GGUF Q4_0 models
- Quantize activations to Q8_0 format for integer multiply-accumulate
- Use _mm256_maddubs_epi16 for AVX2 SIMD acceleration
- Sign trick algorithm matching llama.cpp's approach
- 2-block loop unrolling with prefetch hints
APR SIMD Matmul - 5-7x inference speedup for APR transformer models
- Trueno Matrix/Vector SIMD acceleration
- Scalar fallback for edge cases
- APR now achieves near-GGUF parity (1.4-6x vs 6-10x before)

Aprender Dependency - Updated from 0.14 to 0.20.1
- Latest TransformerLM and MoE support
- Improved APR format handling