Port mdmd from mainline + Qwen2/2.5-VL support#798
Conversation
97fb051 to
dbcc01b
Compare
|
So does using it with the server require more implementations? |
|
Nothing has been done for the server. You have a command line tool. You can use it like this (see the For the server I'm hoping someone else will do a PR. |
|
Ok. That clears it up. I have still yet to use the command line tools :P We have progress! |
|
Impressive! Tested: Qwen2.5-VL-7B, gemma-3-12b and pixtral-12b on two systems with CPU+iGPU.
|
Does mainline support quantized mmproj files? To me it looked like the convolution and |
|
Yes, mmproj-Qwen2.5-VL-7B-Instruct-Q8_0.gguf produces good results in mainline; it seems it needs |
This PR is a port of mainline's
mtmdlibrary and multi-modal command line toolllama-mtmd-cli, along with implementation of Qwen2/2.5-VL support.Based on my own testing it seems fully functional.
Please test and provide feedback!
Original WIP description
This is WIP to port
mdmdandmdmd-clifrom mainline.Current state:
compiles, but not functional (missing severalggmlops)mtmd-related ops have been added to the CPU and CUDA back-endsmtmdlibrary andmtmd-clitools have been portedexamples/mtmd/test-1.jpegproduce a meaningful responseHere an example run with the CPU back-end:
The same thing with mainline:
Interesting to see that image encoding/decoding in
ik_llama.cppis 2X the speed of mainline without me having done anything related to this part of the calculation.TODO
LOGvsLOG_TEEMore observations
Something is not quite right on the CPU when the number of image tokens is larger than the u-batch size. For the passport photo it generates 1036 image tokens, and I get gibberish unless I setFixed with last commitu-batch = batch = 2048. This is strange because if something was wrong with setting the token positions (different than usual for Qwen2-VL), it shouldn't be working on CUDA either, but it does.Something is not quite right also on CUDA. The first image works fine, but any attempt to add a second image to the conversation leads to gibberish.Fixed with last commit