add 8x example in the readme

schoi-habana · schoi-habana · commit 2d9a3cee5ca0 · 2024-10-24T04:02:51.000Z
diff --git a/examples/image-to-text/README.md b/examples/image-to-text/README.md
@@ -204,3 +204,20 @@ QUANT_CONFIG=./quantization_config/maxabs_quant.json python run_pipeline.py \
 --use_flash_attention \
 --flash_attention_recompute
 ```
+
+## Multi-HPU inference
+
+To enable multi-card inference, you must set the environment variable `PT_HPU_ENABLE_LAZY_COLLECTIVES=true`,
+
+### Inference with FusedSDPA on 8 HPUs
+
+Use the following commands to run Llava-v1.6-mistral-7b BF16 inference with FusedSDPA on 8 HPUs:
+```bash
+PT_HPU_ENABLE_LAZY_COLLECTIVES=true python ../gaudi_spawn.py --use_deepspeed --world_size 8 run_pipeline.py \
+--model_name_or_path llava-hf/llava-v1.6-mistral-7b-hf \
+--image_path "https://llava-vl.github.io/static/images/view.jpg" \
+--use_hpu_graphs \
+--bf16 \
+--use_flash_attention \
+--flash_attention_recompute
+```