-
Notifications
You must be signed in to change notification settings - Fork 491
Description
Describe the bug
I'm trying to use mistral.rs to run MiniCPM-O on Google Cloud Run (with NVIDIA L4 GPU). I created a custom Dockerfile (see at the end) based on Dockerfile.cuda-all and built the latest master branch. The main addition in my Dockerfile is downloading the model and storing it in the Docker image.
Everything works fine when I send a request with text only, e.g.
response = requests.post(
"http://localhost:9000/v1/chat/completions",
json={
"model":"minicpmo_2_6",
"messages": [
{
"role": "user",
"content": "What is your name?",
}
],
"max_tokens": 256,
"frequency_penalty": 1.0,
"top_p": 0.1,
"temperature": 0,
}
)When I try sending the example from your docs it fails. The request I'm sending is like this:
response = requests.post(
"http://localhost:9000/v1/chat/completions",
json={
"model":"minicpmo_2_6",
"messages": [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://www.nhmagazine.com/content/uploads/2019/05/mtwashingtonFranconia-2-19-18-108-Edit-Edit.jpg"
},
},
{
"type": "text",
"text": "(<image>./</image>) What is shown in this image? Write a detailed response analyzing the scene.",
},
],
}
],
"max_tokens": 256,
"frequency_penalty": 1.0,
"top_p": 0.1,
"temperature": 0,
}
)I also tried with a different image URL and a base64-encoded image. The result is always the same, an error like this in the logs:
'�[2m2025-02-28T17:16:20.454460Z�[0m �[31mERROR�[0m �[2mmistralrs_core::engine�[0m�[2m:�[0m step - Model failed with error: WithBacktrace { inner: Msg("shape mismatch slot_mapping [49], expected 511"), backtrace: Backtrace [{ fn: "candle_core::error::Error::bt" }, { fn: "mistralrs_paged_attn::cuda::backend::paged_attention::reshape_and_cache" }, { fn: "mistralrs_core::paged_attention::layers::paged_attention::PagedAttention::forward" }, { fn: "mistralrs_core::models::qwen2::Model::forward_embed" }, { fn: "<mistralrs_core::vision_models::minicpmo::MiniCpmOModel as mistralrs_core::pipeline::loaders::vision_loaders::VisionModel>::forward" }, { fn: "<mistralrs_core::pipeline::vision::VisionPipeline as mistralrs_core::pipeline::Pipeline>::forward_inputs" }, { fn: "mistralrs_core::pipeline::Pipeline::step::{{closure}}" }, { fn: "mistralrs_core::engine::Engine::run::{{closure}}" }, { fn: "tokio::runtime::runtime::Runtime::block_on" }, { fn: "std::sys::backtrace::__rust_begin_short_backtrace" }, { fn: "core::ops::function::FnOnce::call_once{{vtable.shim}}" }, { fn: "std::sys::pal::unix::thread::Thread::new::thread_start" }] }
'thread '<unnamed>' panicked at mistralrs-core/src/pipeline/inputs_processor.rs:395:21:
'Block table is too small (completion)! start_pos=510 block_size=32 table_len=2
stack backtrace:
0: rust_begin_unwind
1: core::panicking::panic_fmt
2: mistralrs_core::pipeline::inputs_processor::text_models_inputs_processor::make_completion_chunk
3: mistralrs_core::pipeline::inputs_processor::text_models_inputs_processor::get_completion_input
4: <mistralrs_core::vision_models::minicpmo::inputs_processor::MiniCpmOImageProcessor as mistralrs_core::pipeline::inputs_processor::InputsProcessor>::process_inputs
5: mistralrs_core::pipeline::Pipeline::step::{{closure}}
6: mistralrs_core::engine::Engine::run::{{closure}}
7: tokio::runtime::runtime::Runtime::block_on
'note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Full log:
downloaded-logs-20250228-092532.txt
Dockerfile + Cloud Build config: https://gist.github.com/jgonera/3c792ee3f44ec1fc12ba7ede7f723550
Full request-making code: https://gist.github.com/jgonera/326ff5d1612a72d0b80194636146f38c
Am I missing something obvious? I'd appreciate any help!