-
Notifications
You must be signed in to change notification settings - Fork 482
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
Hi there, seems to be a regression introduced sometime after c116ce42.
Here's the compare: https://github.com/EricLBuehler/mistral.rs/compare/c116ce42..2b56c10
When running 2b56c10 mistral server, the second request panics:
mistralrs-server-panic.mp4
thread '<unnamed>' panicked at mistralrs-core/src/pipeline/mod.rs:462:35:
Did not get any inputs. This is shocking.`mistral.rs/mistralrs-core/src/pipeline/mod.rs
Line 462 in 2b56c10
| let l = l.expect("Did not get any inputs. This is shocking."); |
output
matt@Matts-MacBook-Pro-2024 ~/Code/matthewhaynes/mistral.rs master git --no-pager log -1 --oneline
2b56c102 (HEAD -> master, origin/master, origin/HEAD, support-gemma-gguf) Include schemas needed for chatcompletions endpoint (#1353)
matt@Matts-MacBook-Pro-2024 ~/Code/matthewhaynes/mistral.rs master cd mistralrs-server
matt@Matts-MacBook-Pro-2024 ~/Code/matthewhaynes/mistral.rs/mistralrs-server master cargo build --release --features metal
Finished `release` profile [optimized] target(s) in 0.54s
matt@Matts-MacBook-Pro-2024 ~/Code/matthewhaynes/mistral.rs/mistralrs-server master cd ../
matt@Matts-MacBook-Pro-2024 ~/Code/matthewhaynes/mistral.rs master ./target/release/mistralrs-server --port 8888 plain -m meta-llama/Llama-3.2-1B-Instruct
2025-05-26T19:58:11.322726Z INFO mistralrs_server: avx: false, neon: true, simd128: false, f16c: false
2025-05-26T19:58:11.322894Z INFO mistralrs_server: Sampling method: penalties -> temperature -> topk -> topp -> minp -> multinomial
2025-05-26T19:58:11.322956Z INFO mistralrs_server: Model kind is: normal (no adapters)
2025-05-26T19:58:11.323058Z INFO hf_hub: Using token file found "/Users/matt/.cache/huggingface/token"
2025-05-26T19:58:11.324290Z INFO mistralrs_core::pipeline::normal: Loading `tokenizer.json` at `meta-llama/Llama-3.2-1B-Instruct`
2025-05-26T19:58:11.324385Z INFO mistralrs_core::pipeline::normal: Loading `config.json` at `meta-llama/Llama-3.2-1B-Instruct`
2025-05-26T19:58:11.505796Z INFO mistralrs_core::pipeline::paths: Found model weight filenames ["model.safetensors"]
2025-05-26T19:58:11.582024Z INFO mistralrs_core::pipeline::normal: Loading `generation_config.json` at `meta-llama/Llama-3.2-1B-Instruct`
2025-05-26T19:58:11.692240Z INFO mistralrs_core::pipeline::normal: Loading `tokenizer_config.json` at `meta-llama/Llama-3.2-1B-Instruct`
2025-05-26T19:58:11.825444Z INFO mistralrs_quant::utils::log: Automatic loader type determined to be `llama`
2025-05-26T19:58:11.825508Z INFO mistralrs_core::pipeline::normal: Prompt chunk size is 1024.
2025-05-26T19:58:12.505522Z INFO mistralrs_core::utils::normal: DType selected is BF16.
2025-05-26T19:58:12.510890Z INFO mistralrs_core::pipeline::loaders: Using automatic device mapping parameters: text[max_seq_len: 4096, max_batch_size: 1].
2025-05-26T19:58:12.510981Z INFO mistralrs_quant::utils::log: Model has 16 repeating layers.
2025-05-26T19:58:12.510995Z INFO mistralrs_quant::utils::log: Loading model according to the following repeating layer mappings:
2025-05-26T19:58:12.510998Z INFO mistralrs_quant::utils::log: Layers 0-15: metal[4294968389] (36 GB)
2025-05-26T19:58:12.511874Z INFO mistralrs_core::utils::normal: DType selected is BF16.
2025-05-26T19:58:12.511903Z INFO mistralrs_core::pipeline::normal: Model config: Config { hidden_act: Silu, hidden_size: 2048, intermediate_size: 8192, vocab_size: 128256, num_hidden_layers: 16, num_attention_heads: 32, num_key_value_heads: 8, rms_norm_eps: 1e-5, rope_theta: 500000.0, max_position_embeddings: 131072, rope_scaling: Some(Llama3RopeConfig { factor: 32.0, low_freq_factor: 1.0, high_freq_factor: 4.0, original_max_position_embeddings: 8192, rope_type: Llama3 }), quantization_config: None, tie_word_embeddings: true }
2025-05-26T19:58:12.512490Z INFO mistralrs_core::pipeline::normal: Applying ISQ to None
2025-05-26T19:58:12.512608Z INFO mistralrs_core::utils::varbuilder_utils: Loading model using mmap strategy.
2025-05-26T19:58:13.507533Z INFO mistralrs_core::pipeline::chat_template: bos_toks = "<|begin_of_text|>", eos_toks = "<|eot_id|>", "<|end_of_text|>", "<|eom_id|>", unk_tok = `None`
2025-05-26T19:58:13.517271Z INFO mistralrs_server: Model loaded.
2025-05-26T19:58:13.517364Z INFO mistralrs_core: Beginning dummy run.
2025-05-26T19:58:13.518151Z INFO mistralrs_core::prefix_cacher: PrefixCacherV2 is enabled. Expect higher multi-turn throughput for both text and multimodal.
2025-05-26T19:58:15.524430Z INFO mistralrs_core: Dummy run completed in 2.007560459s.
2025-05-26T19:58:15.525855Z INFO mistralrs_server: Serving on http://0.0.0.0:8888.
2025-05-26T19:58:18.522281Z INFO mistralrs_core::engine::logger: Throughput (T/s) 0.80, Prefix cache hitrate 0.00%, 0 running, 0 waiting
2025-05-26T19:58:28.527891Z INFO mistralrs_core::engine::logger: Throughput (T/s) 10.00, Prefix cache hitrate 50.00%, 0 running, 0 waiting
thread '<unnamed>' panicked at mistralrs-core/src/pipeline/mod.rs:462:35:
Did not get any inputs. This is shocking.
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
^C
✘
matt@Matts-MacBook-Pro-2024 ~/Code/matthewhaynes/mistral.rs master RUST_BACKTRACE=1 ./target/release/mistralrs-server --port 8888 plain -m meta-llama/Llama-3.2-1B-Instruct
2025-05-26T19:58:48.115964Z INFO mistralrs_server: avx: false, neon: true, simd128: false, f16c: false
2025-05-26T19:58:48.116009Z INFO mistralrs_server: Sampling method: penalties -> temperature -> topk -> topp -> minp -> multinomial
2025-05-26T19:58:48.116023Z INFO mistralrs_server: Model kind is: normal (no adapters)
2025-05-26T19:58:48.116050Z INFO hf_hub: Using token file found "/Users/matt/.cache/huggingface/token"
2025-05-26T19:58:48.116141Z INFO mistralrs_core::pipeline::normal: Loading `tokenizer.json` at `meta-llama/Llama-3.2-1B-Instruct`
2025-05-26T19:58:48.116184Z INFO mistralrs_core::pipeline::normal: Loading `config.json` at `meta-llama/Llama-3.2-1B-Instruct`
2025-05-26T19:58:48.265188Z INFO mistralrs_core::pipeline::paths: Found model weight filenames ["model.safetensors"]
2025-05-26T19:58:48.342235Z INFO mistralrs_core::pipeline::normal: Loading `generation_config.json` at `meta-llama/Llama-3.2-1B-Instruct`
2025-05-26T19:58:48.497978Z INFO mistralrs_core::pipeline::normal: Loading `tokenizer_config.json` at `meta-llama/Llama-3.2-1B-Instruct`
2025-05-26T19:58:48.566892Z INFO mistralrs_quant::utils::log: Automatic loader type determined to be `llama`
2025-05-26T19:58:48.566919Z INFO mistralrs_core::pipeline::normal: Prompt chunk size is 1024.
2025-05-26T19:58:48.572462Z INFO mistralrs_core::utils::normal: DType selected is BF16.
2025-05-26T19:58:48.582439Z INFO mistralrs_core::pipeline::loaders: Using automatic device mapping parameters: text[max_seq_len: 4096, max_batch_size: 1].
2025-05-26T19:58:48.582494Z INFO mistralrs_quant::utils::log: Model has 16 repeating layers.
2025-05-26T19:58:48.582502Z INFO mistralrs_quant::utils::log: Loading model according to the following repeating layer mappings:
2025-05-26T19:58:48.582507Z INFO mistralrs_quant::utils::log: Layers 0-15: metal[4294968389] (36 GB)
2025-05-26T19:58:48.584154Z INFO mistralrs_core::utils::normal: DType selected is BF16.
2025-05-26T19:58:48.584194Z INFO mistralrs_core::pipeline::normal: Model config: Config { hidden_act: Silu, hidden_size: 2048, intermediate_size: 8192, vocab_size: 128256, num_hidden_layers: 16, num_attention_heads: 32, num_key_value_heads: 8, rms_norm_eps: 1e-5, rope_theta: 500000.0, max_position_embeddings: 131072, rope_scaling: Some(Llama3RopeConfig { factor: 32.0, low_freq_factor: 1.0, high_freq_factor: 4.0, original_max_position_embeddings: 8192, rope_type: Llama3 }), quantization_config: None, tie_word_embeddings: true }
2025-05-26T19:58:48.585135Z INFO mistralrs_core::pipeline::normal: Applying ISQ to None
2025-05-26T19:58:48.585262Z INFO mistralrs_core::utils::varbuilder_utils: Loading model using mmap strategy.
2025-05-26T19:58:49.439811Z INFO mistralrs_core::pipeline::chat_template: bos_toks = "<|begin_of_text|>", eos_toks = "<|eot_id|>", "<|end_of_text|>", "<|eom_id|>", unk_tok = `None`
2025-05-26T19:58:49.449003Z INFO mistralrs_server: Model loaded.
2025-05-26T19:58:49.449051Z INFO mistralrs_core: Beginning dummy run.
2025-05-26T19:58:49.449692Z INFO mistralrs_core::prefix_cacher: PrefixCacherV2 is enabled. Expect higher multi-turn throughput for both text and multimodal.
2025-05-26T19:58:49.540551Z INFO mistralrs_core: Dummy run completed in 0.091494416s.
2025-05-26T19:58:49.541812Z INFO mistralrs_server: Serving on http://0.0.0.0:8888.
thread '<unnamed>' panicked at mistralrs-core/src/pipeline/mod.rs:462:35:
Did not get any inputs. This is shocking.
stack backtrace:
0: _rust_begin_unwind
1: core::panicking::panic_fmt
2: core::option::expect_failed
3: core::iter::adapters::try_process
4: mistralrs_core::pipeline::Pipeline::step::{{closure}}
5: mistralrs_core::engine::Engine::run::{{closure}}
6: mistralrs_core::MistralRs::new::{{closure}}::{{closure}}::{{closure}}
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
2025-05-26T19:58:54.455043Z INFO mistralrs_core::engine::logger: Throughput (T/s) 10.80, Prefix cache hitrate 66.67%, 1 running, 0 waiting
2025-05-26T19:58:58.932596Z WARN mistralrs_core: Engine is dead, rebooting
2025-05-26T19:58:58.932635Z INFO mistralrs_core: Successfully rebooted engine and updated sender + engine handler
2025-05-26T19:58:58.932921Z INFO mistralrs_core::prefix_cacher: PrefixCacherV2 is enabled. Expect higher multi-turn throughput for both text and multimodal.
2025-05-26T19:59:03.937805Z INFO mistralrs_core::engine::logger: Throughput (T/s) 10.00, Prefix cache hitrate 0.00%, 0 running, 0 waiting
^C
✘
matt@Matts-MacBook-Pro-2024 ~/Code/matthewhaynes/mistral.rs master RUST_BACKTRACE=FULL ./target/release/mistralrs-server --port 8888 plain -m meta-llama/Llama-3.2-1B-Instruct
2025-05-26T19:59:13.516218Z INFO mistralrs_server: avx: false, neon: true, simd128: false, f16c: false
2025-05-26T19:59:13.516262Z INFO mistralrs_server: Sampling method: penalties -> temperature -> topk -> topp -> minp -> multinomial
2025-05-26T19:59:13.516275Z INFO mistralrs_server: Model kind is: normal (no adapters)
2025-05-26T19:59:13.516307Z INFO hf_hub: Using token file found "/Users/matt/.cache/huggingface/token"
2025-05-26T19:59:13.516401Z INFO mistralrs_core::pipeline::normal: Loading `tokenizer.json` at `meta-llama/Llama-3.2-1B-Instruct`
2025-05-26T19:59:13.516447Z INFO mistralrs_core::pipeline::normal: Loading `config.json` at `meta-llama/Llama-3.2-1B-Instruct`
2025-05-26T19:59:13.641180Z INFO mistralrs_core::pipeline::paths: Found model weight filenames ["model.safetensors"]
2025-05-26T19:59:13.714444Z INFO mistralrs_core::pipeline::normal: Loading `generation_config.json` at `meta-llama/Llama-3.2-1B-Instruct`
2025-05-26T19:59:13.834169Z INFO mistralrs_core::pipeline::normal: Loading `tokenizer_config.json` at `meta-llama/Llama-3.2-1B-Instruct`
2025-05-26T19:59:13.980503Z INFO mistralrs_quant::utils::log: Automatic loader type determined to be `llama`
2025-05-26T19:59:13.980528Z INFO mistralrs_core::pipeline::normal: Prompt chunk size is 1024.
2025-05-26T19:59:13.984733Z INFO mistralrs_core::utils::normal: DType selected is BF16.
2025-05-26T19:59:13.993146Z INFO mistralrs_core::pipeline::loaders: Using automatic device mapping parameters: text[max_seq_len: 4096, max_batch_size: 1].
2025-05-26T19:59:13.993210Z INFO mistralrs_quant::utils::log: Model has 16 repeating layers.
2025-05-26T19:59:13.993218Z INFO mistralrs_quant::utils::log: Loading model according to the following repeating layer mappings:
2025-05-26T19:59:13.993222Z INFO mistralrs_quant::utils::log: Layers 0-15: metal[4294968389] (36 GB)
2025-05-26T19:59:13.994495Z INFO mistralrs_core::utils::normal: DType selected is BF16.
2025-05-26T19:59:13.994519Z INFO mistralrs_core::pipeline::normal: Model config: Config { hidden_act: Silu, hidden_size: 2048, intermediate_size: 8192, vocab_size: 128256, num_hidden_layers: 16, num_attention_heads: 32, num_key_value_heads: 8, rms_norm_eps: 1e-5, rope_theta: 500000.0, max_position_embeddings: 131072, rope_scaling: Some(Llama3RopeConfig { factor: 32.0, low_freq_factor: 1.0, high_freq_factor: 4.0, original_max_position_embeddings: 8192, rope_type: Llama3 }), quantization_config: None, tie_word_embeddings: true }
2025-05-26T19:59:13.995394Z INFO mistralrs_core::pipeline::normal: Applying ISQ to None
2025-05-26T19:59:13.995503Z INFO mistralrs_core::utils::varbuilder_utils: Loading model using mmap strategy.
2025-05-26T19:59:14.834038Z INFO mistralrs_core::pipeline::chat_template: bos_toks = "<|begin_of_text|>", eos_toks = "<|eot_id|>", "<|end_of_text|>", "<|eom_id|>", unk_tok = `None`
2025-05-26T19:59:14.843804Z INFO mistralrs_server: Model loaded.
2025-05-26T19:59:14.843864Z INFO mistralrs_core: Beginning dummy run.
2025-05-26T19:59:14.844093Z INFO mistralrs_core::prefix_cacher: PrefixCacherV2 is enabled. Expect higher multi-turn throughput for both text and multimodal.
2025-05-26T19:59:14.928812Z INFO mistralrs_core: Dummy run completed in 0.084942708s.
2025-05-26T19:59:14.929148Z INFO mistralrs_server: Serving on http://0.0.0.0:8888.
thread '<unnamed>' panicked at mistralrs-core/src/pipeline/mod.rs:462:35:
Did not get any inputs. This is shocking.
stack backtrace:
0: _rust_begin_unwind
1: core::panicking::panic_fmt
2: core::option::expect_failed
3: core::iter::adapters::try_process
4: mistralrs_core::pipeline::Pipeline::step::{{closure}}
5: mistralrs_core::engine::Engine::run::{{closure}}
6: mistralrs_core::MistralRs::new::{{closure}}::{{closure}}::{{closure}}
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
2025-05-26T19:59:19.844485Z INFO mistralrs_core::engine::logger: Throughput (T/s) 10.80, Prefix cache hitrate 66.67%, 1 running, 0 waiting
^C
✘
matt@Matts-MacBook-Pro-2024 ~/Code/matthewhaynes/mistral.rs master RUST_BACKTRACE=full ./target/release/mistralrs-server --port 8888 plain -m meta-llama/Llama-3.2-1B-Instruct
2025-05-26T19:59:27.974382Z INFO mistralrs_server: avx: false, neon: true, simd128: false, f16c: false
2025-05-26T19:59:27.974433Z INFO mistralrs_server: Sampling method: penalties -> temperature -> topk -> topp -> minp -> multinomial
2025-05-26T19:59:27.974448Z INFO mistralrs_server: Model kind is: normal (no adapters)
2025-05-26T19:59:27.974476Z INFO hf_hub: Using token file found "/Users/matt/.cache/huggingface/token"
2025-05-26T19:59:27.974579Z INFO mistralrs_core::pipeline::normal: Loading `tokenizer.json` at `meta-llama/Llama-3.2-1B-Instruct`
2025-05-26T19:59:27.974625Z INFO mistralrs_core::pipeline::normal: Loading `config.json` at `meta-llama/Llama-3.2-1B-Instruct`
2025-05-26T19:59:28.085527Z INFO mistralrs_core::pipeline::paths: Found model weight filenames ["model.safetensors"]
2025-05-26T19:59:28.154177Z INFO mistralrs_core::pipeline::normal: Loading `generation_config.json` at `meta-llama/Llama-3.2-1B-Instruct`
2025-05-26T19:59:28.448257Z INFO mistralrs_core::pipeline::normal: Loading `tokenizer_config.json` at `meta-llama/Llama-3.2-1B-Instruct`
2025-05-26T19:59:28.500695Z INFO mistralrs_quant::utils::log: Automatic loader type determined to be `llama`
2025-05-26T19:59:28.500717Z INFO mistralrs_core::pipeline::normal: Prompt chunk size is 1024.
2025-05-26T19:59:28.504446Z INFO mistralrs_core::utils::normal: DType selected is BF16.
2025-05-26T19:59:28.512404Z INFO mistralrs_core::pipeline::loaders: Using automatic device mapping parameters: text[max_seq_len: 4096, max_batch_size: 1].
2025-05-26T19:59:28.512460Z INFO mistralrs_quant::utils::log: Model has 16 repeating layers.
2025-05-26T19:59:28.512468Z INFO mistralrs_quant::utils::log: Loading model according to the following repeating layer mappings:
2025-05-26T19:59:28.512473Z INFO mistralrs_quant::utils::log: Layers 0-15: metal[4294968389] (36 GB)
2025-05-26T19:59:28.513859Z INFO mistralrs_core::utils::normal: DType selected is BF16.
2025-05-26T19:59:28.513881Z INFO mistralrs_core::pipeline::normal: Model config: Config { hidden_act: Silu, hidden_size: 2048, intermediate_size: 8192, vocab_size: 128256, num_hidden_layers: 16, num_attention_heads: 32, num_key_value_heads: 8, rms_norm_eps: 1e-5, rope_theta: 500000.0, max_position_embeddings: 131072, rope_scaling: Some(Llama3RopeConfig { factor: 32.0, low_freq_factor: 1.0, high_freq_factor: 4.0, original_max_position_embeddings: 8192, rope_type: Llama3 }), quantization_config: None, tie_word_embeddings: true }
2025-05-26T19:59:28.514678Z INFO mistralrs_core::pipeline::normal: Applying ISQ to None
2025-05-26T19:59:28.514787Z INFO mistralrs_core::utils::varbuilder_utils: Loading model using mmap strategy.
2025-05-26T19:59:29.326964Z INFO mistralrs_core::pipeline::chat_template: bos_toks = "<|begin_of_text|>", eos_toks = "<|eot_id|>", "<|end_of_text|>", "<|eom_id|>", unk_tok = `None`
2025-05-26T19:59:29.336539Z INFO mistralrs_server: Model loaded.
2025-05-26T19:59:29.336587Z INFO mistralrs_core: Beginning dummy run.
2025-05-26T19:59:29.337210Z INFO mistralrs_core::prefix_cacher: PrefixCacherV2 is enabled. Expect higher multi-turn throughput for both text and multimodal.
2025-05-26T19:59:29.424327Z INFO mistralrs_core: Dummy run completed in 0.087735458s.
2025-05-26T19:59:29.425554Z INFO mistralrs_server: Serving on http://0.0.0.0:8888.
thread '<unnamed>' panicked at mistralrs-core/src/pipeline/mod.rs:462:35:
Did not get any inputs. This is shocking.
stack backtrace:
0: 0x1045275cc - <std::sys::backtrace::BacktraceLock::print::DisplayBacktrace as core::fmt::Display>::fmt::h217270392019d164
1: 0x1032a093c - core::fmt::write::he22fcab56bd3ec61
2: 0x104525ae8 - std::io::Write::write_fmt::hb32eaafcfd249a19
3: 0x104527484 - std::sys::backtrace::BacktraceLock::print::h115149c0b879e5c3
4: 0x1045262ac - std::panicking::default_hook::ha0b223ccc4379930
5: 0x1045258e0 - std::panicking::rust_panic_with_hook::h203f96c93e7ac62d
6: 0x10455a32c - std::panicking::begin_panic_handler::{{closure}}::hcc8f653f753c0254
7: 0x10455a29c - std::sys::backtrace::__rust_end_short_backtrace::h911de07218b69a6c
8: 0x10455b280 - _rust_begin_unwind
9: 0x1046ad2e0 - core::panicking::panic_fmt::h6a4014bec58fba4f
10: 0x1046ad5e4 - core::option::expect_failed::h064f2cf84916882a
11: 0x103c25cdc - core::iter::adapters::try_process::h15efef5839024646
12: 0x103f83308 - mistralrs_core::pipeline::Pipeline::step::{{closure}}::h44952332cd29c76d
13: 0x103ff60d0 - mistralrs_core::engine::Engine::run::{{closure}}::haa04be666b79b1a7
14: 0x104006afc - mistralrs_core::MistralRs::new::{{closure}}::{{closure}}::{{closure}}::h79afb8c92fd88f28
15: 0x103e9ea64 - std::sys::backtrace::__rust_begin_short_backtrace::h62234e08cf8beae7
16: 0x104051610 - core::ops::function::FnOnce::call_once{{vtable.shim}}::hdfdc2f031a50fd04
17: 0x10455c6b4 - std::sys::pal::unix::thread::Thread::new::thread_start::h6d53b1b0c047a3b9
18: 0x19783ec0c - __pthread_cond_wait
2025-05-26T19:59:34.342502Z INFO mistralrs_core::engine::logger: Throughput (T/s) 10.80, Prefix cache hitrate 66.67%, 1 running, 0 waiting
^C
✘ But c116ce42 does not
mistralrs-server-no-panic.mp4
output
matt@Matts-MacBook-Pro-2024 ~/Code/matthewhaynes/mistral.rs master git show c116ce42
matt@Matts-MacBook-Pro-2024 ~/Code/matthewhaynes/mistral.rs master git checkout c116ce42
Note: switching to 'c116ce42'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:
git switch -c <new-branch-name>
Or undo this operation with:
git switch -
Turn off this advice by setting config variable advice.detachedHead to false
HEAD is now at c116ce42 Don't use mmap on cuda (#1336)
matt@Matts-MacBook-Pro-2024 ~/Code/matthewhaynes/mistral.rs ➦ c116ce42 cd mistralrs-server
matt@Matts-MacBook-Pro-2024 ~/Code/matthewhaynes/mistral.rs/mistralrs-server ➦ c116ce42 cargo build --release --features metal
Compiling num-complex v0.4.6
Compiling mistralrs-quant v0.5.0 (/Users/matt/Code/matthewhaynes/mistral.rs/mistralrs-quant)
Compiling mistralrs-paged-attn v0.5.0 (/Users/matt/Code/matthewhaynes/mistral.rs/mistralrs-paged-attn)
Compiling mistralrs-core v0.5.0 (/Users/matt/Code/matthewhaynes/mistral.rs/mistralrs-core)
Compiling pulp v0.18.22
Compiling gemm-common v0.17.1
Compiling gemm-f32 v0.17.1
Compiling gemm-c32 v0.17.1
Compiling gemm-c64 v0.17.1
Compiling gemm-f64 v0.17.1
Compiling gemm-f16 v0.17.1
Compiling gemm v0.17.1
Compiling candle-core v0.8.0 (https://github.com/EricLBuehler/candle.git?rev=cb2d8f5#cb2d8f59)
Compiling candle-nn v0.8.0 (https://github.com/EricLBuehler/candle.git?rev=cb2d8f5#cb2d8f59)
Compiling mistralrs-vision v0.5.0 (/Users/matt/Code/matthewhaynes/mistral.rs/mistralrs-vision)
Compiling mistralrs-server v0.5.0 (/Users/matt/Code/matthewhaynes/mistral.rs/mistralrs-server)
Finished `release` profile [optimized] target(s) in 1m 42sHere is the example curl
curl -X 'POST' \
'http://localhost:8888/v1/chat/completions' \
-H 'accept: */*' \
-H 'Content-Type: application/json' \
-d '{
"model": "meta-llama/Llama-3.2-1B-Instruct",
"messages": [{
"role": "user",
"content": "hi!"
}]
}'Be happy to take a look if you have any thoughts?
Latest commit or version
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working