|
1 | 1 | # Qwen2-VL |
| 2 | + |
| 3 | +LMDeploy supports the following Qwen-VL series of models, which are detailed in the table below: |
| 4 | + |
| 5 | +| Model | Size | Supported Inference Engine | |
| 6 | +| :----------: | :----: | :------------------------: | |
| 7 | +| Qwen-VL-Chat | - | TurboMind, Pytorch | |
| 8 | +| Qwen2-VL | 2B, 7B | PyTorch | |
| 9 | + |
| 10 | +The next chapter demonstrates how to deploy an Qwen-VL model using LMDeploy, with [Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) as an example. |
| 11 | + |
| 12 | +## Installation |
| 13 | + |
| 14 | +Please install LMDeploy by following the [installation guide](../get_started/installation.md), and install other packages that Qwen2-VL needs |
| 15 | + |
| 16 | +```shell |
| 17 | +pip install qwen_vl_utils |
| 18 | +``` |
| 19 | + |
| 20 | +Or, you can build a docker image to set up the inference environment. If the CUDA version on your host machine is `>=12.4`, you can run: |
| 21 | + |
| 22 | +``` |
| 23 | +git clone https://github.com/InternLM/lmdeploy.git |
| 24 | +cd lmdeploy |
| 25 | +docker build --build-arg CUDA_VERSION=cu12 -t openmmlab/lmdeploy:qwen2vl . -f ./docker/Qwen2VL_Dockerfile |
| 26 | +``` |
| 27 | + |
| 28 | +Otherwise, you can go with: |
| 29 | + |
| 30 | +```shell |
| 31 | +docker build --build-arg CUDA_VERSION=cu11 -t openmmlab/lmdeploy:qwen2vl . -f ./docker/Qwen2VL_Dockerfile |
| 32 | +``` |
| 33 | + |
| 34 | +## Offline inference |
| 35 | + |
| 36 | +The following sample code shows the basic usage of VLM pipeline. For detailed information, please refer to [VLM Offline Inference Pipeline](./vl_pipeline.md) |
| 37 | + |
| 38 | +```python |
| 39 | +from lmdeploy import pipeline |
| 40 | +from lmdeploy.vl import load_image |
| 41 | + |
| 42 | +pipe = pipeline('Qwen/Qwen2-VL-2B-Instruct') |
| 43 | + |
| 44 | +image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg') |
| 45 | +response = pipe((f'describe this image', image)) |
| 46 | +print(response) |
| 47 | +``` |
| 48 | + |
| 49 | +More examples are listed below: |
| 50 | + |
| 51 | +<details> |
| 52 | + <summary> |
| 53 | + <b>multi-image multi-round conversation, combined images</b> |
| 54 | + </summary> |
| 55 | + |
| 56 | +```python |
| 57 | +from lmdeploy import pipeline, GenerationConfig |
| 58 | + |
| 59 | +pipe = pipeline('Qwen/Qwen2-VL-2B-Instruct', log_level='INFO') |
| 60 | +messages = [ |
| 61 | + dict(role='user', content=[ |
| 62 | + dict(type='text', text='Describe the two images in detail.'), |
| 63 | + dict(type='image_url', image_url=dict(url='https://raw.githubusercontent.com/QwenLM/Qwen-VL/master/assets/mm_tutorial/Beijing_Small.jpeg')), |
| 64 | + dict(type='image_url', image_url=dict(url='https://raw.githubusercontent.com/QwenLM/Qwen-VL/master/assets/mm_tutorial/Chongqing_Small.jpeg')) |
| 65 | + ]) |
| 66 | +] |
| 67 | +out = pipe(messages, gen_config=GenerationConfig(top_k=1)) |
| 68 | + |
| 69 | +messages.append(dict(role='assistant', content=out.text)) |
| 70 | +messages.append(dict(role='user', content='What are the similarities and differences between these two images.')) |
| 71 | +out = pipe(messages, gen_config=GenerationConfig(top_k=1)) |
| 72 | +``` |
| 73 | + |
| 74 | +</details> |
| 75 | + |
| 76 | +<details> |
| 77 | + <summary> |
| 78 | + <b>image resolution for performance boost</b> |
| 79 | + </summary> |
| 80 | + |
| 81 | +```python |
| 82 | +from lmdeploy import pipeline, GenerationConfig |
| 83 | + |
| 84 | +pipe = pipeline('Qwen/Qwen2-VL-2B-Instruct', log_level='INFO') |
| 85 | + |
| 86 | +min_pixels = 64 * 28 * 28 |
| 87 | +max_pixels = 64 * 28 * 28 |
| 88 | +messages = [ |
| 89 | + dict(role='user', content=[ |
| 90 | + dict(type='text', text='Describe the two images in detail.'), |
| 91 | + dict(type='image_url', image_url=dict(min_pixels=min_pixels, max_pixels=max_pixels, url='https://raw.githubusercontent.com/QwenLM/Qwen-VL/master/assets/mm_tutorial/Beijing_Small.jpeg')), |
| 92 | + dict(type='image_url', image_url=dict(min_pixels=min_pixels, max_pixels=max_pixels, url='https://raw.githubusercontent.com/QwenLM/Qwen-VL/master/assets/mm_tutorial/Chongqing_Small.jpeg')) |
| 93 | + ]) |
| 94 | +] |
| 95 | +out = pipe(messages, gen_config=GenerationConfig(top_k=1)) |
| 96 | + |
| 97 | +messages.append(dict(role='assistant', content=out.text)) |
| 98 | +messages.append(dict(role='user', content='What are the similarities and differences between these two images.')) |
| 99 | +out = pipe(messages, gen_config=GenerationConfig(top_k=1)) |
| 100 | +``` |
| 101 | + |
| 102 | +</details> |
| 103 | + |
| 104 | +## Online serving |
| 105 | + |
| 106 | +You can launch the server by the `lmdeploy serve api_server` CLI: |
| 107 | + |
| 108 | +```shell |
| 109 | +lmdeploy serve api_server Qwen/Qwen2-VL-2B-Instruct |
| 110 | +``` |
| 111 | + |
| 112 | +You can also start the service using the aforementioned built docker image: |
| 113 | + |
| 114 | +```shell |
| 115 | +docker run --runtime nvidia --gpus all \ |
| 116 | + -v ~/.cache/huggingface:/root/.cache/huggingface \ |
| 117 | + --env "HUGGING_FACE_HUB_TOKEN=<secret>" \ |
| 118 | + -p 23333:23333 \ |
| 119 | + --ipc=host \ |
| 120 | + openmmlab/lmdeploy:qwen2vl \ |
| 121 | + lmdeploy serve api_server Qwen/Qwen2-VL-2B-Instruct |
| 122 | +``` |
| 123 | + |
| 124 | +The docker compose is another option. Create a `docker-compose.yml` configuration file in the root directory of the lmdeploy project as follows: |
| 125 | + |
| 126 | +```yaml |
| 127 | +version: '3.5' |
| 128 | + |
| 129 | +services: |
| 130 | + lmdeploy: |
| 131 | + container_name: lmdeploy |
| 132 | + image: openmmlab/lmdeploy:qwen2vl |
| 133 | + ports: |
| 134 | + - "23333:23333" |
| 135 | + environment: |
| 136 | + HUGGING_FACE_HUB_TOKEN: <secret> |
| 137 | + volumes: |
| 138 | + - ~/.cache/huggingface:/root/.cache/huggingface |
| 139 | + stdin_open: true |
| 140 | + tty: true |
| 141 | + ipc: host |
| 142 | + command: lmdeploy serve api_server Qwen/Qwen2-VL-2B-Instruct |
| 143 | + deploy: |
| 144 | + resources: |
| 145 | + reservations: |
| 146 | + devices: |
| 147 | + - driver: nvidia |
| 148 | + count: "all" |
| 149 | + capabilities: [gpu] |
| 150 | +``` |
| 151 | +
|
| 152 | +Then, you can execute the startup command as below: |
| 153 | +
|
| 154 | +```shell |
| 155 | +docker-compose up -d |
| 156 | +``` |
| 157 | + |
| 158 | +If you find the following logs after running `docker logs -f lmdeploy`, it means the service launches successfully. |
| 159 | + |
| 160 | +```text |
| 161 | +HINT: Please open http://0.0.0.0:23333 in a browser for detailed api usage!!! |
| 162 | +HINT: Please open http://0.0.0.0:23333 in a browser for detailed api usage!!! |
| 163 | +HINT: Please open http://0.0.0.0:23333 in a browser for detailed api usage!!! |
| 164 | +INFO: Started server process [2439] |
| 165 | +INFO: Waiting for application startup. |
| 166 | +INFO: Application startup complete. |
| 167 | +INFO: Uvicorn running on http://0.0.0.0:23333 (Press CTRL+C to quit) |
| 168 | +``` |
| 169 | + |
| 170 | +The arguments of `lmdeploy serve api_server` can be reviewed in detail by `lmdeploy serve api_server -h`. |
| 171 | + |
| 172 | +More information about `api_server` as well as how to access the service can be found from [here](api_server_vl.md) |
0 commit comments