forked from EricLBuehler/mistral.rs
-
Notifications
You must be signed in to change notification settings - Fork 0
Updates from EricLBuehler/mistralrs #27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Bumps [ring](https://github.com/briansmith/ring) from 0.17.11 to 0.17.13. - [Changelog](https://github.com/briansmith/ring/blob/main/RELEASES.md) - [Commits](https://github.com/briansmith/ring/commits) --- updated-dependencies: - dependency-name: ring dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* DSv3 fixes * Just save the progress * Fix launch of blockwise fp8 dequant * It actually works * Async ops * Optimize non-mla with cat * Fix non-cuda build * Update build * Add more CUDA_CHECK * Works really now * Working fully now with pagedattn * Format everything
* Refactor distributed mapper prep * Support vision model TP * Update docs * Add vision model TP for mllama
* Always pass _USE_MATH_DEFINES * Cargo.lock
* Add UQFF text/vision model API * Typos
* Implement Qwen 2.5 VL * Reverse window index select * Switch to rmsnorm * Warn * Fix config, loads now * Fixes * Complete qwen2_5vl feature Todo: set_use_matmul_via_f16(true) from "pipline/inputs_processor" cause a significant loss of precision. It’s hard to figure it out during subsequent debugging Anyhow, globally setting matnuml precision MAY not be a ideal solution. For now, change the precision back in mistralrs-core/src/vision_models/qwen2_5_vl/inputs_processor.rs Qwen2_5vl feature is functional, start to clean code Add examples for lower_level_qwen2_5vl Fix: for deterministic sampling, top k SHOULD be Some(1) rather than None Clean code Rebase Clean code Fix cuda * Fix Rustfmt and Clippy issues * Clean code * Merge branch ‘main’ --------- Co-authored-by: Eric Buehler <[email protected]>
* Add config * Add the text model * Add inputs processor, loads/runs now * It works! * Add to APIs
* Add vision support for Gemma 3 * Implement image preprocessor and processor * It works, kind of * It works great * Mask must be contiguous * Update docs * Format
* More models for tp * Fix clippy
* Support text-only gemma3 * Add rotating kv cache * Do not preallocate rotating kv cache
* Improve rotating kv cache set_len and more intelligent prefix cacher v2 * Remove prefix cacher v1
…#1212) * Update hf_hub dep to not require openssl and add tests * Update deps * Fixes * Undo 'fix' from clippy * Ok maybe finally fix it
* Fixes for phi4 mini * Fix causal mask * Growable rotating kv cache * Fix clippy
* Add mlx quantized kernels * Add mlx quantized kernels * Kernel launcher * Add AFQ isq quant and dequant * Some quantmethod things * Begin to implement the qmm caller * Clippy * Much faster * Cache kernels * Docs * Clippy * Add it to uqff
* Refactor quantizedconfig * Support AFQ prequantized * Update docs * Update docs
…1266) * Automatic isq * typo * Doc
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.44.1 to 1.44.2. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](tokio-rs/tokio@tokio-1.44.1...tokio-1.44.2) --- updated-dependencies: - dependency-name: tokio dependency-version: 1.44.2 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Update the caller * Wire things up * Broadcase for afq gathermm * Broadcase for afq gathermm * Clippy
* Implement Llama 4 * Implement the main changes for the text model * Make chunked mask * Wire things up * Add some EP * Initial sketch of inputs processor * Runs * Progress * all reduce moes * It works! * Some cleanup * Faster moe block * Add device map * Make chunked matrix * Fully working now! * Reactivate cublaslt * Fix shared mlp cublaslt * Refactor to packed experts * Complete merge * It is a normal model now * Fixes * Set device for moe * ISQ fixes * Much faster sort kernel * Faster loading! * Faster loading! * Fp8 cpu copy ops in candle backend * Add the vision model * Add mmproj layer * Actually merge the inputs * Sketch most of the image processor * Add the rest of the image processor * Implement the whole processor * Add the loader * Some fixes * A batch of fixes * Some fixes * tmp * Actually support isq * Ok it works a bit * Fix norm device * It works * A bit cleaner * Support residul tensors * Remove text loader * Implement the device mapping system * Fix auto device map * Add examples * Add model card * Typo
* Serialize sharded uqff files * Loading
Code Metrics Report=============================================================================== Language Files Lines Code Comments Blanks =============================================================================== C Header 2 35 28 0 7 Dockerfile 1 41 22 10 9 JSON 12 105 104 0 1 Python 71 3026 2622 81 323 Shell 1 58 22 18 18 Plain Text 3 3723 0 2413 1310 TOML 19 557 516 2 39 YAML 2 21 19 2 0 ------------------------------------------------------------------------------- Jupyter Notebooks 4 0 0 0 0 |- Markdown 2 77 32 31 14 |- Python 2 205 178 1 26 (Total) 282 210 32 40 ------------------------------------------------------------------------------- Markdown 49 4044 0 3071 973 |- BASH 6 103 100 0 3 |- JSON 1 12 12 0 0 |- Python 7 121 109 0 12 |- Rust 16 549 464 0 85 |- TOML 2 75 63 0 12 (Total) 4904 748 3071 1085 ------------------------------------------------------------------------------- Rust 327 107130 95878 2149 9103 |- Markdown 158 1794 25 1629 140 (Total) 108924 95903 3778 9243 =============================================================================== Total 491 118740 99211 7746 11783 =============================================================================== |
ewgenius
approved these changes
Apr 18, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
🗣 Description
🔨 Related Issues
🤔 Concerns