Habana-LLM-Viewer is a tool that provides Roofline model, LLM performance prediction and memory analysis for Intel Gaudi platform. Inspired by LLM-Viewer, Habana-LLM-Viewer can be used to estimate performance of models such as Llama2-13B, Qwen-7B, Mixtral-8x7B on Intel Gaudi platform.



- Simpily run with habana_viewer.py and the results will show up on localhost.
- Simpily run with run_model_projection.py and the results will be saved to folder "data/model".
python run_model_projection.py \
--device IntelGaudi2 \
--device-type B \
--model Llama2-7B \
--data-type BF16 \
--batch-size BATCH_SIZE \
--context-input CONTEXT_INPUT \
--context-output CONTEXT_OUTPUT \
--kvcache-bucket 256 \
--vec-bmm
- Simpily run with run_op_projection.py and the results will be saved to folder "data/operation", same with model projection, one can modify proj_cfg in main.
python run_op_projection.py \
--device IntelGaudi2 \
--device-type B \
--op Matmul \
--data-type BF16 \
--m-list m1 m2 ... \
--n-list n1 n2 ... \
--k-list k1 k2 ...
| Op Name |
Projected Data |
| Matmul |
Link |
- Currently only cover single card perf projection, will support multi-card / multi-node.
- Will cover more models / operations.