Skip to content

Almost always fastest? #196

@PallHaraldsson

Description

@PallHaraldsson

Hi, on "which falls behind MKL's gemm beyond 70x70 or so" is that mostly outdated text? It wasn't obviously true from the graph, and I noticed you reran benchmarks last month (before ArrayInterface upgrade, would 3.0 improve speed?), and couldn't zoom unless going to:

https://github.com/chriselrod/LoopVectorization.jl/blob/5ba0d186bcd2d6f4fed09fd6ca9f7817e8dd29e2/docs/src/assets/bench_AmulB_v2.png

Yes, about there and sometimes for bigger, MKL is only slightly faster (from memory MKL had a much bigger edge), but you might want to change to more positive language. I have and want to keep pointing people to these graphs and your awesome work.

I just recently noticed:
https://github.com/JuliaLinearAlgebra/Octavian.jl

Is it fair to say OpenBLAS will soon be replaced? Or could (already)? I know you target Intel with AVX512. The concepts transfer to ARM and AMD, and even some code already for AMD?

As with:
https://github.com/JuliaGPU/GemmKernels.jl

you need no assembly? I mean on some level, but not for high-level (multiply) functions.

I didn't see (or expect) any common code there with your. I did notice GPUifyLoops.jl which is archived and should use KernelAbstractions.jl?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions