Almost always fastest?

Hi, on "which falls behind MKL's gemm beyond 70x70 or so" is that mostly outdated text? It wasn't obviously true from the graph, and I noticed you reran benchmarks last month (before ArrayInterface upgrade, would 3.0 improve speed?), and couldn't zoom unless going to:

https://github.com/chriselrod/LoopVectorization.jl/blob/5ba0d186bcd2d6f4fed09fd6ca9f7817e8dd29e2/docs/src/assets/bench_AmulB_v2.png

Yes, about there and sometimes for bigger, MKL is only slightly faster (from memory MKL had a much bigger edge), but you might want to change to more positive language. I have and want to keep pointing people to these graphs and your awesome work.

I just recently noticed:
https://github.com/JuliaLinearAlgebra/Octavian.jl

Is it fair to say OpenBLAS will soon be replaced? Or could (already)? I know you target Intel with AVX512. The concepts transfer to ARM and AMD, and even some code already for AMD?

As with:
https://github.com/JuliaGPU/GemmKernels.jl

you need no assembly? I mean on some level, but not for high-level (multiply) functions.

I didn't see (or expect) any common code there with your. I did notice GPUifyLoops.jl which is archived and should use KernelAbstractions.jl?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Almost always fastest? #196

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Almost always fastest? #196

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions