Skip to content

Errors when providing an input image to MiniCPM-O 2.6 #1166

@jgonera

Description

@jgonera

Describe the bug

I'm trying to use mistral.rs to run MiniCPM-O on Google Cloud Run (with NVIDIA L4 GPU). I created a custom Dockerfile (see at the end) based on Dockerfile.cuda-all and built the latest master branch. The main addition in my Dockerfile is downloading the model and storing it in the Docker image.

Everything works fine when I send a request with text only, e.g.

    response = requests.post(
        "http://localhost:9000/v1/chat/completions",
        json={
            "model":"minicpmo_2_6",
            "messages": [
                {
                    "role": "user",
                    "content": "What is your name?",
                }
            ],
            "max_tokens": 256,
            "frequency_penalty": 1.0,
            "top_p": 0.1,
            "temperature": 0,
        }
    )

When I try sending the example from your docs it fails. The request I'm sending is like this:

    response = requests.post(
        "http://localhost:9000/v1/chat/completions",
        json={
            "model":"minicpmo_2_6",
            "messages": [
                {
                    "role": "user",
                    "content": [
                        {
                            "type": "image_url",
                            "image_url": {
                                "url": "https://www.nhmagazine.com/content/uploads/2019/05/mtwashingtonFranconia-2-19-18-108-Edit-Edit.jpg"
                            },
                        },
                        {
                            "type": "text",
                            "text": "(<image>./</image>) What is shown in this image? Write a detailed response analyzing the scene.",
                        },
                    ],
                }
            ],
            "max_tokens": 256,
            "frequency_penalty": 1.0,
            "top_p": 0.1,
            "temperature": 0,
        }
    )

I also tried with a different image URL and a base64-encoded image. The result is always the same, an error like this in the logs:

'�[2m2025-02-28T17:16:20.454460Z�[0m �[31mERROR�[0m �[2mmistralrs_core::engine�[0m�[2m:�[0m step - Model failed with error: WithBacktrace { inner: Msg("shape mismatch slot_mapping [49], expected 511"), backtrace: Backtrace [{ fn: "candle_core::error::Error::bt" }, { fn: "mistralrs_paged_attn::cuda::backend::paged_attention::reshape_and_cache" }, { fn: "mistralrs_core::paged_attention::layers::paged_attention::PagedAttention::forward" }, { fn: "mistralrs_core::models::qwen2::Model::forward_embed" }, { fn: "<mistralrs_core::vision_models::minicpmo::MiniCpmOModel as mistralrs_core::pipeline::loaders::vision_loaders::VisionModel>::forward" }, { fn: "<mistralrs_core::pipeline::vision::VisionPipeline as mistralrs_core::pipeline::Pipeline>::forward_inputs" }, { fn: "mistralrs_core::pipeline::Pipeline::step::{{closure}}" }, { fn: "mistralrs_core::engine::Engine::run::{{closure}}" }, { fn: "tokio::runtime::runtime::Runtime::block_on" }, { fn: "std::sys::backtrace::__rust_begin_short_backtrace" }, { fn: "core::ops::function::FnOnce::call_once{{vtable.shim}}" }, { fn: "std::sys::pal::unix::thread::Thread::new::thread_start" }] }
'thread '<unnamed>' panicked at mistralrs-core/src/pipeline/inputs_processor.rs:395:21:
'Block table is too small (completion)! start_pos=510 block_size=32 table_len=2
stack backtrace:
0: rust_begin_unwind
1: core::panicking::panic_fmt
2: mistralrs_core::pipeline::inputs_processor::text_models_inputs_processor::make_completion_chunk
3: mistralrs_core::pipeline::inputs_processor::text_models_inputs_processor::get_completion_input
4: <mistralrs_core::vision_models::minicpmo::inputs_processor::MiniCpmOImageProcessor as mistralrs_core::pipeline::inputs_processor::InputsProcessor>::process_inputs
5: mistralrs_core::pipeline::Pipeline::step::{{closure}}
6: mistralrs_core::engine::Engine::run::{{closure}}
7: tokio::runtime::runtime::Runtime::block_on
'note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Full log:
downloaded-logs-20250228-092532.txt

Dockerfile + Cloud Build config: https://gist.github.com/jgonera/3c792ee3f44ec1fc12ba7ede7f723550

Full request-making code: https://gist.github.com/jgonera/326ff5d1612a72d0b80194636146f38c

Am I missing something obvious? I'd appreciate any help!

Latest commit or version

e2f9648

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions