Skip to content

Conversation

@Jeadie
Copy link

@Jeadie Jeadie commented Apr 15, 2025

🗣 Description

🔨 Related Issues

🤔 Concerns

EricLBuehler and others added 30 commits March 4, 2025 20:36
Bumps [ring](https://github.com/briansmith/ring) from 0.17.11 to 0.17.13.
- [Changelog](https://github.com/briansmith/ring/blob/main/RELEASES.md)
- [Commits](https://github.com/briansmith/ring/commits)

---
updated-dependencies:
- dependency-name: ring
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* DSv3 fixes

* Just save the progress

* Fix launch of blockwise fp8 dequant

* It actually works

* Async ops

* Optimize non-mla with cat

* Fix non-cuda build

* Update build

* Add more CUDA_CHECK

* Works really now

* Working fully now with pagedattn

* Format everything
* Refactor distributed mapper prep

* Support vision model TP

* Update docs

* Add vision model TP for mllama
* Always pass _USE_MATH_DEFINES

* Cargo.lock
* Add UQFF text/vision model API

* Typos
* Implement Qwen 2.5 VL

* Reverse window index select

* Switch to rmsnorm

* Warn

* Fix config, loads now

* Fixes

* Complete qwen2_5vl feature

Todo: set_use_matmul_via_f16(true) from "pipline/inputs_processor" cause a significant loss of precision.
It’s hard to figure it out during subsequent debugging
Anyhow, globally setting matnuml precision MAY not be a ideal solution.
 For now, change the precision back in mistralrs-core/src/vision_models/qwen2_5_vl/inputs_processor.rs

Qwen2_5vl feature  is functional, start to clean code

Add examples for lower_level_qwen2_5vl

Fix: for deterministic sampling, top k SHOULD be Some(1) rather than None

Clean code

Rebase

Clean code

Fix cuda

* Fix Rustfmt and Clippy issues

* Clean code

* Merge branch ‘main’

---------

Co-authored-by: Eric Buehler <[email protected]>
* Add config

* Add the text model

* Add inputs processor, loads/runs now

* It works!

* Add to APIs
* Add vision support for Gemma 3

* Implement image preprocessor and processor

* It works, kind of

* It works great

* Mask must be contiguous

* Update docs

* Format
* More models for tp

* Fix clippy
* Support text-only gemma3

* Add rotating kv cache

* Do not preallocate rotating kv cache
* Improve rotating kv cache set_len and more intelligent prefix cacher v2

* Remove prefix cacher v1
…#1212)

* Update hf_hub dep to not require openssl and add tests

* Update deps

* Fixes

* Undo 'fix' from clippy

* Ok maybe finally fix it
* Fixes for phi4 mini

* Fix causal mask

* Growable rotating kv cache

* Fix clippy
EricLBuehler and others added 21 commits April 4, 2025 15:59
)

* Play with varbuilder lifetimes

* Merge lora weights

* Clippy

* Lora works

* Support multiple loras

* Cleanup, remove adapter activation

* Complete merge
* Add mlx quantized kernels

* Add mlx quantized kernels

* Kernel launcher

* Add AFQ isq quant and dequant

* Some quantmethod things

* Begin to implement the qmm caller

* Clippy

* Much faster

* Cache kernels

* Docs

* Clippy

* Add it to uqff
* Refactor quantizedconfig

* Support AFQ prequantized

* Update docs

* Update docs
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.44.1 to 1.44.2.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](tokio-rs/tokio@tokio-1.44.1...tokio-1.44.2)

---
updated-dependencies:
- dependency-name: tokio
  dependency-version: 1.44.2
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Update the caller

* Wire things up

* Broadcase for afq gathermm

* Broadcase for afq gathermm

* Clippy
* Implement Llama 4

* Implement the main changes for the text model

* Make chunked mask

* Wire things up

* Add some EP

* Initial sketch of inputs processor

* Runs

* Progress

* all reduce moes

* It works!

* Some cleanup

* Faster moe block

* Add device map

* Make chunked matrix

* Fully working now!

* Reactivate cublaslt

* Fix shared mlp cublaslt

* Refactor to packed experts

* Complete merge

* It is a normal model now

* Fixes

* Set device for moe

* ISQ fixes

* Much faster sort kernel

* Faster loading!

* Faster loading!

* Fp8 cpu copy ops in candle backend

* Add the vision model

* Add mmproj layer

* Actually merge the inputs

* Sketch most of the image processor

* Add the rest of the image processor

* Implement the whole processor

* Add the loader

* Some fixes

* A batch of fixes

* Some fixes

* tmp

* Actually support isq

* Ok it works a bit

* Fix norm device

* It works

* A bit cleaner

* Support residul tensors

* Remove text loader

* Implement the device mapping system

* Fix auto device map

* Add examples

* Add model card

* Typo
* Serialize sharded uqff files

* Loading
@Jeadie Jeadie self-assigned this Apr 15, 2025
@github-actions
Copy link

Code Metrics Report
  ===============================================================================
 Language            Files        Lines         Code     Comments       Blanks
===============================================================================
 C Header                2           35           28            0            7
 Dockerfile              1           41           22           10            9
 JSON                   12          105          104            0            1
 Python                 71         3026         2622           81          323
 Shell                   1           58           22           18           18
 Plain Text              3         3723            0         2413         1310
 TOML                   19          557          516            2           39
 YAML                    2           21           19            2            0
-------------------------------------------------------------------------------
 Jupyter Notebooks       4            0            0            0            0
 |- Markdown             2           77           32           31           14
 |- Python               2          205          178            1           26
 (Total)                            282          210           32           40
-------------------------------------------------------------------------------
 Markdown               49         4044            0         3071          973
 |- BASH                 6          103          100            0            3
 |- JSON                 1           12           12            0            0
 |- Python               7          121          109            0           12
 |- Rust                16          549          464            0           85
 |- TOML                 2           75           63            0           12
 (Total)                           4904          748         3071         1085
-------------------------------------------------------------------------------
 Rust                  327       107130        95878         2149         9103
 |- Markdown           158         1794           25         1629          140
 (Total)                         108924        95903         3778         9243
===============================================================================
 Total                 491       118740        99211         7746        11783
===============================================================================
  

@Jeadie Jeadie changed the title Jeadie/25 04 15/updates Updates from EricLBuehler/mistralrs Apr 16, 2025
@Jeadie Jeadie merged commit 91a5ad7 into spiceai Apr 20, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants