2025-11-11

Refernce for Rust ml framework

2025-11-11

cubecl

There is a performance improvement of 5% to 10%.

A*B

5.1818~5.9137ms => 4.7552~5.0662ms

A*B + C

5.4166~6.1707ms => 5.1834~5.4955ms

2025-11-06

m=4096*4 n=4096 k=4096 dtype=bfloat16

hardware: 4090d

Recap: The performance of cubecl is not very stable. In the best case, its performance exceeds candle.

matmul only

cublas

language	method	cost
rust	cudarc+cublas	3.5092ms
cpp	cublas	3.5075ms

A*B

language	method	cost
rust	candle	3.5458ms
rust	cubecl	5.1818~5.9137ms
python	compile	3.5344ms
python	raw	3.5300ms

A*B + C

language	method	cost
rust	candle	5.7466ms
rust	cubecl	5.4166~6.1707ms
python	compile	3.9381ms
python	raw	3.9343ms

2025-09-09

matmul

use matrix shape opt for cache line

A*B + C

A.shape: 235*256 x 4*256

B.shape: 4*256 x 5*256

C.shape: 1 x 5 * 256

framework	4090d
torch	1.78ms
torch+compile	1.37ms
burn-tch	2.10ms
burn-cubecl	2.30ms

2025-09-01

matmul

reuse device for rust ml; use non-compile mode for pytorch

framework	4090d
torch	1.22ms
torch+compile	1.01ms
candle	11.9ms
burn-tch	3.52ms
burn-cubecl	1.56ms

cuda trace

cubecl don't use the new hardware gemm instruction

candle

cubecl

2025-09-01

matmul

add explicit cuda synchronize in bench loop, candle and burn-cubecl differences between the results before and after is not significant, but the time consumption of PyTorch has suddenly increased to 2.68ms

framework	4090d
torch+compile	0.90ms
candle	16.4ms
burn-cubecl	1.46ms

2025-08-29

matmul

use bf16 floating point number

framework	4090d
torch+compile	0.048ms
candle	16.3ms
burn-cubecl	1.51ms

2025-08-28

matmul

use f32 floating point number

framework	4090d
torch+compile	0.048ms
candle	18.9ms
burn-cubecl	2.95ms

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
cpp-ml		cpp-ml
py-ml		py-ml
rust-ml		rust-ml
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

2025-11-11

cubecl

A*B

A*B + C

2025-11-06

matmul only

cublas

A*B

A*B + C

2025-09-09

matmul

2025-09-01

matmul

cuda trace

candle

cubecl

2025-09-01

matmul

2025-08-29

matmul

2025-08-28

matmul

About

Uh oh!

Releases

Packages

Languages

passchaos/ml-bench

Folders and files

Latest commit

History

Repository files navigation

2025-11-11

cubecl

A*B

A*B + C

2025-11-06

matmul only

cublas

A*B

A*B + C

2025-09-09

matmul

2025-09-01

matmul

cuda trace

candle

cubecl

2025-09-01

matmul

2025-08-29

matmul

2025-08-28

matmul

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages