Skip to content

Commit 2d9a3ce

Browse files
committed
add 8x example in the readme
1 parent 104478a commit 2d9a3ce

1 file changed

Lines changed: 17 additions & 0 deletions

File tree

examples/image-to-text/README.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -204,3 +204,20 @@ QUANT_CONFIG=./quantization_config/maxabs_quant.json python run_pipeline.py \
204204
--use_flash_attention \
205205
--flash_attention_recompute
206206
```
207+
208+
## Multi-HPU inference
209+
210+
To enable multi-card inference, you must set the environment variable `PT_HPU_ENABLE_LAZY_COLLECTIVES=true`,
211+
212+
### Inference with FusedSDPA on 8 HPUs
213+
214+
Use the following commands to run Llava-v1.6-mistral-7b BF16 inference with FusedSDPA on 8 HPUs:
215+
```bash
216+
PT_HPU_ENABLE_LAZY_COLLECTIVES=true python ../gaudi_spawn.py --use_deepspeed --world_size 8 run_pipeline.py \
217+
--model_name_or_path llava-hf/llava-v1.6-mistral-7b-hf \
218+
--image_path "https://llava-vl.github.io/static/images/view.jpg" \
219+
--use_hpu_graphs \
220+
--bf16 \
221+
--use_flash_attention \
222+
--flash_attention_recompute
223+
```

0 commit comments

Comments
 (0)