Updates from EricLBuehler/mistralrs #27

Jeadie · 2025-04-15T05:20:52Z

🗣 Description

🔨 Related Issues

🤔 Concerns

Bumps [ring](https://github.com/briansmith/ring) from 0.17.11 to 0.17.13. - [Changelog](https://github.com/briansmith/ring/blob/main/RELEASES.md) - [Commits](https://github.com/briansmith/ring/commits) --- updated-dependencies: - dependency-name: ring dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* DSv3 fixes * Just save the progress * Fix launch of blockwise fp8 dequant * It actually works * Async ops * Optimize non-mla with cat * Fix non-cuda build * Update build * Add more CUDA_CHECK * Works really now * Working fully now with pagedattn * Format everything

* Refactor distributed mapper prep * Support vision model TP * Update docs * Add vision model TP for mllama

* Always pass _USE_MATH_DEFINES * Cargo.lock

* Add UQFF text/vision model API * Typos

* Implement Qwen 2.5 VL * Reverse window index select * Switch to rmsnorm * Warn * Fix config, loads now * Fixes * Complete qwen2_5vl feature Todo: set_use_matmul_via_f16(true) from "pipline/inputs_processor" cause a significant loss of precision. It’s hard to figure it out during subsequent debugging Anyhow, globally setting matnuml precision MAY not be a ideal solution. For now, change the precision back in mistralrs-core/src/vision_models/qwen2_5_vl/inputs_processor.rs Qwen2_5vl feature is functional, start to clean code Add examples for lower_level_qwen2_5vl Fix: for deterministic sampling, top k SHOULD be Some(1) rather than None Clean code Rebase Clean code Fix cuda * Fix Rustfmt and Clippy issues * Clean code * Merge branch ‘main’ --------- Co-authored-by: Eric Buehler <[email protected]>

* Add config * Add the text model * Add inputs processor, loads/runs now * It works! * Add to APIs

* Add vision support for Gemma 3 * Implement image preprocessor and processor * It works, kind of * It works great * Mask must be contiguous * Update docs * Format

* More models for tp * Fix clippy

* Support text-only gemma3 * Add rotating kv cache * Do not preallocate rotating kv cache

* Improve rotating kv cache set_len and more intelligent prefix cacher v2 * Remove prefix cacher v1

…#1212) * Update hf_hub dep to not require openssl and add tests * Update deps * Fixes * Undo 'fix' from clippy * Ok maybe finally fix it

* Fixes for phi4 mini * Fix causal mask * Growable rotating kv cache * Fix clippy

) * Play with varbuilder lifetimes * Merge lora weights * Clippy * Lora works * Support multiple loras * Cleanup, remove adapter activation * Complete merge

* Add mlx quantized kernels * Add mlx quantized kernels * Kernel launcher * Add AFQ isq quant and dequant * Some quantmethod things * Begin to implement the qmm caller * Clippy * Much faster * Cache kernels * Docs * Clippy * Add it to uqff

* Refactor quantizedconfig * Support AFQ prequantized * Update docs * Update docs

…1266) * Automatic isq * typo * Doc

Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.44.1 to 1.44.2. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](tokio-rs/tokio@tokio-1.44.1...tokio-1.44.2) --- updated-dependencies: - dependency-name: tokio dependency-version: 1.44.2 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update the caller * Wire things up * Broadcase for afq gathermm * Broadcase for afq gathermm * Clippy

* Implement Llama 4 * Implement the main changes for the text model * Make chunked mask * Wire things up * Add some EP * Initial sketch of inputs processor * Runs * Progress * all reduce moes * It works! * Some cleanup * Faster moe block * Add device map * Make chunked matrix * Fully working now! * Reactivate cublaslt * Fix shared mlp cublaslt * Refactor to packed experts * Complete merge * It is a normal model now * Fixes * Set device for moe * ISQ fixes * Much faster sort kernel * Faster loading! * Faster loading! * Fp8 cpu copy ops in candle backend * Add the vision model * Add mmproj layer * Actually merge the inputs * Sketch most of the image processor * Add the rest of the image processor * Implement the whole processor * Add the loader * Some fixes * A batch of fixes * Some fixes * tmp * Actually support isq * Ok it works a bit * Fix norm device * It works * A bit cleaner * Support residul tensors * Remove text loader * Implement the device mapping system * Fix auto device map * Add examples * Add model card * Typo

* Serialize sharded uqff files * Loading

…cLBuehler#1278)

github-actions · 2025-04-15T05:21:48Z

Code Metrics Report

  ===============================================================================
 Language            Files        Lines         Code     Comments       Blanks
===============================================================================
 C Header                2           35           28            0            7
 Dockerfile              1           41           22           10            9
 JSON                   12          105          104            0            1
 Python                 71         3026         2622           81          323
 Shell                   1           58           22           18           18
 Plain Text              3         3723            0         2413         1310
 TOML                   19          557          516            2           39
 YAML                    2           21           19            2            0
-------------------------------------------------------------------------------
 Jupyter Notebooks       4            0            0            0            0
 |- Markdown             2           77           32           31           14
 |- Python               2          205          178            1           26
 (Total)                            282          210           32           40
-------------------------------------------------------------------------------
 Markdown               49         4044            0         3071          973
 |- BASH                 6          103          100            0            3
 |- JSON                 1           12           12            0            0
 |- Python               7          121          109            0           12
 |- Rust                16          549          464            0           85
 |- TOML                 2           75           63            0           12
 (Total)                           4904          748         3071         1085
-------------------------------------------------------------------------------
 Rust                  327       107130        95878         2149         9103
 |- Markdown           158         1794           25         1629          140
 (Total)                         108924        95903         3778         9243
===============================================================================
 Total                 491       118740        99211         7746        11783
===============================================================================

EricLBuehler and others added 30 commits March 4, 2025 20:36

Refactor NCCL device mappers (EricLBuehler#1172)

b73e2e9

Fix diffusion device mapping (EricLBuehler#1187)

d5867f2

Internal abstraction for distributed op (EricLBuehler#1188)

b8237b2

Make Sequence::set_toks more safe (EricLBuehler#1190)

237b0db

Fix CI tests out of storage (EricLBuehler#1191)

2651421

Internal abstraction for distributed op (EricLBuehler#1189)

2cc11c3

Fix build_cuda_all.yaml CI (EricLBuehler#1193)

69b8a77

Support tensor parallelism for vision models! (EricLBuehler#1194)

cb70322

* Refactor distributed mapper prep * Support vision model TP * Update docs * Add vision model TP for mllama

Always pass _USE_MATH_DEFINES for CUDA (EricLBuehler#1195)

bc370dc

* Always pass _USE_MATH_DEFINES * Cargo.lock

Remove matmul via f16 framework (EricLBuehler#1196)

515bd1c

Remove API for matmul_via_f16 (EricLBuehler#1197)

afb15a5

Add UQFF text/vision model API (EricLBuehler#1198)

d476535

* Add UQFF text/vision model API * Typos

Implement Gemma 3 (text only)! (EricLBuehler#1201)

2ca644d

* Add config * Add the text model * Add inputs processor, loads/runs now * It works! * Add to APIs

Implement Gemma 3 vision support! (EricLBuehler#1202)

959ef91

* Add vision support for Gemma 3 * Implement image preprocessor and processor * It works, kind of * It works great * Mask must be contiguous * Update docs * Format

Manually fixup sentencepiece detok (EricLBuehler#1204)

201307e

More vision models with TP (EricLBuehler#1200)

77799bd

* More models for tp * Fix clippy

Fix topology link in the docs (EricLBuehler#1205)

183e9cf

Gemma3 1b support and optimized rotating cache (EricLBuehler#1206)

f703558

* Support text-only gemma3 * Add rotating kv cache * Do not preallocate rotating kv cache

Improve rotating kv cache, prefix cacher system (EricLBuehler#1207)

5d4d169

* Improve rotating kv cache set_len and more intelligent prefix cacher v2 * Remove prefix cacher v1

Better handling for kvcache set_len (EricLBuehler#1208)

a20357c

Fix gemma3 vision device in isq

27ca27f

Update deps and use rand 0.9 (EricLBuehler#1210)

7188b67

Fix flash-attn v3 build

ebd84fa

Update hf hub dep, add initial blockwise fp8 GEMM tests (EricLBuehler…

822b2c6

…#1212) * Update hf_hub dep to not require openssl and add tests * Update deps * Fixes * Undo 'fix' from clippy * Ok maybe finally fix it

Growable RotatingKvCache and fixes for Phi-4 mini (EricLBuehler#1215)

a563aa0

* Fixes for phi4 mini * Fix causal mask * Growable rotating kv cache * Fix clippy

Use docker build for x86 pyo3 wheels

1ff8537

Fix cuda warn

443335b

EricLBuehler and others added 21 commits April 4, 2025 15:59

Revamped LoRA support - removing the Ordering system! (EricLBuehler#1263

b286f3e

) * Play with varbuilder lifetimes * Merge lora weights * Clippy * Lora works * Support multiple loras * Cleanup, remove adapter activation * Complete merge

Support prequantized models from MLX (EricLBuehler#1265)

d0e45ce

* Refactor quantizedconfig * Support AFQ prequantized * Update docs * Update docs

Automatic ISQ to select fastest & most accurate method (EricLBuehler#…

fac37b3

…1266) * Automatic isq * typo * Doc

Improved usage metrics (EricLBuehler#1267)

0fd8e40

Fix cuda

7be249d

Gather MM ops in mistralrs-quant (EricLBuehler#1272)

f3a73c3

* Update the caller * Wire things up * Broadcase for afq gathermm * Broadcase for afq gathermm * Clippy

Improve performance of deepseek models

4317618

Typo fix

b267630

BincountOp not used

1897d2a

Remove superflous logging

a1f1523

Fixes for Llama 4 UQFF loading (EricLBuehler#1275)

63387e2

Support sharding for UQFF (EricLBuehler#1276)

fd2456c

* Serialize sharded uqff files * Loading

Fix base64

cece61c

Fix bug for group-topk (group_limited_greedy) in deepseek models (Eri…

9f88ed3

…cLBuehler#1278)

Support the DeepCoder model (EricLBuehler#1279)

5ed0e7e

Add faq for metal not found

98ce4ff

updates from candle

55ad1de

Merge branch 'master' into spiceai

282d5b4

Jeadie self-assigned this Apr 15, 2025

Jeadie added 3 commits April 15, 2025 16:31

fixes

c714339

relax tokio

ff101d3

make AdapterPaths, LoraAdapterPaths public

027c8da

Jeadie changed the title ~~Jeadie/25 04 15/updates~~ Updates from EricLBuehler/mistralrs Apr 16, 2025

Jeadie mentioned this pull request Apr 16, 2025

Upgrade mistral.rs spiceai/spiceai#5404

Merged

ewgenius approved these changes Apr 18, 2025

View reviewed changes

Jeadie merged commit 91a5ad7 into spiceai Apr 20, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Updates from EricLBuehler/mistralrs #27

Updates from EricLBuehler/mistralrs #27

Uh oh!

Jeadie commented Apr 15, 2025

Uh oh!

github-actions bot commented Apr 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Updates from EricLBuehler/mistralrs #27

Updates from EricLBuehler/mistralrs #27

Uh oh!

Conversation

Jeadie commented Apr 15, 2025

🗣 Description

🔨 Related Issues

🤔 Concerns

Uh oh!

github-actions bot commented Apr 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants