@@ -8,14 +8,29 @@ Overall, this microservice offers a streamlined way to integrate large language
88
99## Validated LLM Models
1010
11- | Model | TGI-Gaudi | vLLM-CPU | vLLM-Gaudi |
12- | --------------------------- | --------- | -------- | ---------- |
13- | [ Intel/neural-chat-7b-v3-3] | ✓ | ✓ | ✓ |
14- | [ Llama-2-7b-chat-hf] | ✓ | ✓ | ✓ |
15- | [ Llama-2-70b-chat-hf] | ✓ | - | ✓ |
16- | [ Meta-Llama-3-8B-Instruct] | ✓ | ✓ | ✓ |
17- | [ Meta-Llama-3-70B-Instruct] | ✓ | - | ✓ |
18- | [ Phi-3] | x | Limit 4K | Limit 4K |
11+ | Model | TGI-Gaudi | vLLM-CPU | vLLM-Gaudi |
12+ | ------------------------------------------- | --------- | -------- | ---------- |
13+ | [ Intel/neural-chat-7b-v3-3] | ✓ | ✓ | ✓ |
14+ | [ meta-llama/Llama-2-7b-chat-hf] | ✓ | ✓ | ✓ |
15+ | [ meta-llama/Llama-2-70b-chat-hf] | ✓ | - | ✓ |
16+ | [ meta-llama/Meta-Llama-3-8B-Instruct] | ✓ | ✓ | ✓ |
17+ | [ meta-llama/Meta-Llama-3-70B-Instruct] | ✓ | - | ✓ |
18+ | [ Phi-3] | x | Limit 4K | Limit 4K |
19+ | [ deepseek-ai/DeepSeek-R1-Distill-Llama-70B] | ✓ | - | ✓ |
20+ | [ deepseek-ai/DeepSeek-R1-Distill-Qwen-32B] | ✓ | - | ✓ |
21+
22+ ### System Requirements for LLM Models
23+
24+ | Model | Minimum number of Gaudi cards |
25+ | ------------------------------------------- | ----------------------------- |
26+ | [ Intel/neural-chat-7b-v3-3] | 1 |
27+ | [ meta-llama/Llama-2-7b-chat-hf] | 1 |
28+ | [ meta-llama/Llama-2-70b-chat-hf] | 2 |
29+ | [ meta-llama/Meta-Llama-3-8B-Instruct] | 1 |
30+ | [ meta-llama/Meta-Llama-3-70B-Instruct] | 2 |
31+ | [ Phi-3] | x |
32+ | [ deepseek-ai/DeepSeek-R1-Distill-Llama-70B] | 8 |
33+ | [ deepseek-ai/DeepSeek-R1-Distill-Qwen-32B] | 4 |
1934
2035## Support integrations
2136
@@ -166,9 +181,11 @@ curl http://${host_ip}:${TEXTGEN_PORT}/v1/chat/completions \
166181<!-- Below are links used in these document. They are not rendered: -->
167182
168183[ Intel/neural-chat-7b-v3-3 ] : https://huggingface.co/Intel/neural-chat-7b-v3-3
169- [ Llama-2-7b-chat-hf ] : https://huggingface.co/meta-llama/Llama-2-7b-chat-hf
170- [ Llama-2-70b-chat-hf ] : https://huggingface.co/meta-llama/Llama-2-70b-chat-hf
171- [ Meta-Llama-3-8B-Instruct ] : https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct
172- [ Meta-Llama-3-70B-Instruct ] : https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct
184+ [ meta-llama/ Llama-2-7b-chat-hf] : https://huggingface.co/meta-llama/Llama-2-7b-chat-hf
185+ [ meta-llama/ Llama-2-70b-chat-hf] : https://huggingface.co/meta-llama/Llama-2-70b-chat-hf
186+ [ meta-llama/ Meta-Llama-3-8B-Instruct] : https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct
187+ [ meta-llama/ Meta-Llama-3-70B-Instruct] : https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct
173188[ Phi-3 ] : https://huggingface.co/collections/microsoft/phi-3-6626e15e9585a200d2d761e3
174189[ HuggingFace ] : https://huggingface.co/
190+ [ deepseek-ai/DeepSeek-R1-Distill-Llama-70B ] : https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B
191+ [ deepseek-ai/DeepSeek-R1-Distill-Qwen-32B ] : https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
0 commit comments