Skip to content

feat(backend): add reducibility classification to SIMD vs GPU dispatch #99

@noahgift

Description

@noahgift

Summary

Extend the backend cost model (src/blis/backend_selection.rs) with a computational reducibility classifier that estimates whether a workload's computation is reducible (shortcuttable, parallelizable) or irreducible (inherently sequential, no speedup from GPU offload) before dispatching to SIMD vs GPU.

Motivation

Wolfram's recent ruliological analysis of P vs. NP (Jan 2026) formalizes a concept directly applicable to our backend selection: not all computations benefit from a larger/different compute model. His exhaustive enumeration of Turing machines shows:

  • Some functions see exponential→constant speedup when moved to a "bigger machine" (analogous to CPU→GPU offload)
  • Other functions see zero speedup regardless of machine size — they are computationally irreducible

Currently, BackendCostModel::select_backend() decides SIMD vs GPU based on:

  1. Arithmetic intensity vs ridge point (roofline model)
  2. Element count threshold (1M minimum for GPU)
  3. Hardware availability

This misses a critical dimension: whether the computation itself can exploit parallelism. A matmul is highly reducible (GPU wins). A sequential recurrence like x[n] = f(x[n-1]) is irreducible — shipping it to GPU wastes PCIe transfer time for no parallel gain.

Proposed Change

Add a Reducibility enum and classification step before the existing roofline check:

pub enum Reducibility {
    /// Highly parallel, data-independent — GPU candidate
    Reducible,
    /// Mixed: some parallel, some sequential dependencies
    Partial { parallel_fraction: f64 },
    /// Inherently sequential — SIMD-only, skip GPU consideration
    Irreducible,
}

Classification heuristics (not exhaustive enumeration — we're practical, not ruliological):

  • Data-parallel ops (elementwise, map, broadcast): Reducible
  • Reductions with associative operators (sum, max, dot): Reducible
  • Tiled matmul / convolution: Reducible
  • Scan / prefix-sum: Partial { parallel_fraction: 0.5 } (Blelloch-parallelizable but with overhead)
  • Recurrences / sequential dependencies: Irreducible
  • Sparse irregular access patterns: Partial (memory-bound, not compute-bound on GPU)

Integrate into select_backend():

pub fn select_backend(&self, m: usize, n: usize, k: usize, reducibility: Reducibility) -> ComputeBackend {
    match reducibility {
        Reducibility::Irreducible => {
            // Skip GPU entirely — no parallel gain, avoid PCIe cost
            self.select_simd_backend()
        }
        Reducibility::Partial { parallel_fraction } => {
            // Adjust effective arithmetic intensity by parallel fraction
            // (Amdahl's law: speedup bounded by 1 / (1 - parallel_fraction))
            let effective_ai = self.arithmetic_intensity(m, n, k) * parallel_fraction;
            self.select_by_roofline(effective_ai, m * n * k)
        }
        Reducibility::Reducible => {
            // Existing logic: full roofline + size threshold
            self.select_by_roofline(self.arithmetic_intensity(m, n, k), m * n * k)
        }
    }
}

Impact

  • Avoids GPU dispatch for irreducible workloads (saves PCIe round-trip)
  • More accurate cost model for mixed workloads via Amdahl-adjusted AI
  • Enables downstream crates (aprender, realizar) to annotate their ops with reducibility hints

References

  • Wolfram, "P vs. NP and the Difficulty of Computation: A Ruliological Approach" (Jan 2026) — empirical evidence that computational irreducibility is ubiquitous and speedup from larger machines is not guaranteed
  • Gregg & Hazelwood (2011) — existing 5× PCIe rule already in backend_selection.rs
  • Amdahl (1967) — parallel fraction bounding for Partial case

Files

  • src/blis/backend_selection.rs — primary integration point
  • src/hardware.rs — may need Reducibility in HardwareCapability profile
  • src/brick/mod.rsComputeBackend dispatch

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions