feat(backend): add reducibility classification to SIMD vs GPU dispatch

## Summary

Extend the backend cost model (`src/blis/backend_selection.rs`) with a **computational reducibility classifier** that estimates whether a workload's computation is reducible (shortcuttable, parallelizable) or irreducible (inherently sequential, no speedup from GPU offload) before dispatching to SIMD vs GPU.

## Motivation

Wolfram's recent ruliological analysis of P vs. NP (Jan 2026) formalizes a concept directly applicable to our backend selection: **not all computations benefit from a larger/different compute model**. His exhaustive enumeration of Turing machines shows:

- Some functions see exponential→constant speedup when moved to a "bigger machine" (analogous to CPU→GPU offload)
- Other functions see **zero speedup** regardless of machine size — they are computationally irreducible

Currently, `BackendCostModel::select_backend()` decides SIMD vs GPU based on:
1. Arithmetic intensity vs ridge point (roofline model)
2. Element count threshold (1M minimum for GPU)
3. Hardware availability

This misses a critical dimension: **whether the computation itself can exploit parallelism**. A matmul is highly reducible (GPU wins). A sequential recurrence like `x[n] = f(x[n-1])` is irreducible — shipping it to GPU wastes PCIe transfer time for no parallel gain.

## Proposed Change

Add a `Reducibility` enum and classification step **before** the existing roofline check:

```rust
pub enum Reducibility {
    /// Highly parallel, data-independent — GPU candidate
    Reducible,
    /// Mixed: some parallel, some sequential dependencies
    Partial { parallel_fraction: f64 },
    /// Inherently sequential — SIMD-only, skip GPU consideration
    Irreducible,
}
```

Classification heuristics (not exhaustive enumeration — we're practical, not ruliological):

- **Data-parallel ops** (elementwise, map, broadcast): `Reducible`
- **Reductions with associative operators** (sum, max, dot): `Reducible`  
- **Tiled matmul / convolution**: `Reducible`
- **Scan / prefix-sum**: `Partial { parallel_fraction: 0.5 }` (Blelloch-parallelizable but with overhead)
- **Recurrences / sequential dependencies**: `Irreducible`
- **Sparse irregular access patterns**: `Partial` (memory-bound, not compute-bound on GPU)

Integrate into `select_backend()`:

```rust
pub fn select_backend(&self, m: usize, n: usize, k: usize, reducibility: Reducibility) -> ComputeBackend {
    match reducibility {
        Reducibility::Irreducible => {
            // Skip GPU entirely — no parallel gain, avoid PCIe cost
            self.select_simd_backend()
        }
        Reducibility::Partial { parallel_fraction } => {
            // Adjust effective arithmetic intensity by parallel fraction
            // (Amdahl's law: speedup bounded by 1 / (1 - parallel_fraction))
            let effective_ai = self.arithmetic_intensity(m, n, k) * parallel_fraction;
            self.select_by_roofline(effective_ai, m * n * k)
        }
        Reducibility::Reducible => {
            // Existing logic: full roofline + size threshold
            self.select_by_roofline(self.arithmetic_intensity(m, n, k), m * n * k)
        }
    }
}
```

## Impact

- Avoids GPU dispatch for irreducible workloads (saves PCIe round-trip)
- More accurate cost model for mixed workloads via Amdahl-adjusted AI
- Enables downstream crates (aprender, realizar) to annotate their ops with reducibility hints

## References

- Wolfram, "P vs. NP and the Difficulty of Computation: A Ruliological Approach" (Jan 2026) — empirical evidence that computational irreducibility is ubiquitous and speedup from larger machines is not guaranteed
- Gregg & Hazelwood (2011) — existing 5× PCIe rule already in `backend_selection.rs`
- Amdahl (1967) — parallel fraction bounding for `Partial` case

## Files

- `src/blis/backend_selection.rs` — primary integration point
- `src/hardware.rs` — may need `Reducibility` in `HardwareCapability` profile
- `src/brick/mod.rs` — `ComputeBackend` dispatch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(backend): add reducibility classification to SIMD vs GPU dispatch #99

Summary

Motivation

Proposed Change

Impact

References

Files

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat(backend): add reducibility classification to SIMD vs GPU dispatch #99

Description

Summary

Motivation

Proposed Change

Impact

References

Files

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions