File tree Expand file tree Collapse file tree
Expand file tree Collapse file tree Original file line number Diff line number Diff line change @@ -204,3 +204,20 @@ QUANT_CONFIG=./quantization_config/maxabs_quant.json python run_pipeline.py \
204204--use_flash_attention \
205205--flash_attention_recompute
206206```
207+
208+ ## Multi-HPU inference
209+
210+ To enable multi-card inference, you must set the environment variable ` PT_HPU_ENABLE_LAZY_COLLECTIVES=true ` ,
211+
212+ ### Inference with FusedSDPA on 8 HPUs
213+
214+ Use the following commands to run Llava-v1.6-mistral-7b BF16 inference with FusedSDPA on 8 HPUs:
215+ ``` bash
216+ PT_HPU_ENABLE_LAZY_COLLECTIVES=true python ../gaudi_spawn.py --use_deepspeed --world_size 8 run_pipeline.py \
217+ --model_name_or_path llava-hf/llava-v1.6-mistral-7b-hf \
218+ --image_path " https://llava-vl.github.io/static/images/view.jpg" \
219+ --use_hpu_graphs \
220+ --bf16 \
221+ --use_flash_attention \
222+ --flash_attention_recompute
223+ ```
You can’t perform that action at this time.
0 commit comments