[KubeAI][Models] Added 2 new model files and updated model parameters#1150
[KubeAI][Models] Added 2 new model files and updated model parameters#1150poussa merged 4 commits intoopea-project:mainfrom
Conversation
* Added mistral and mistral model yaml files * Updated model parameters more optimal based on benchmark-serving testing with default parameters: request-rate=800, Input/Output tokens=200/200 Signed-off-by: Rantala <valtteri.rantala@intel.com>
|
Also, add short README.md to the models directory stating that vLLM benchmarking serving script was used to find the parameters for the models, with the arguments request-rate=800, Input/Output tokens=200/200, concurrency = xxx. |
Signed-off-by: Rantala <valtteri.rantala@intel.com>
Added README.md file |
eero-t
left a comment
There was a problem hiding this comment.
Few minor changes still needed.
|
CI also seems to require adding README to some "toctree": @poussa Any idea what / where that is? |
When you add a README file, you need to add a reference to it from some other README. In this case, just add link to the kubeai/README.md, for example:
|
Signed-off-by: Rantala <valtteri.rantala@intel.com>
for more information, see https://pre-commit.ci
Added link to kubeai/models/README.md |
|
I think this can be merged despite |
| maxReplicas: 8 | ||
| # Equals to max-num-seqs (batch-size) | ||
| targetRequests: 512 | ||
| resourceProfile: gaudi-for-text-generation::1 |
There was a problem hiding this comment.
Missed the double double-colon typo...
|
This PR merged without CI pass, that will block IO build CI in all the other PRs, https://github.com/opea-project/GenAIExamples/actions/runs/16105452982/job/45440218033?pr=1922 |

Description
Added mistral-7b-instruct-v0.3 and mixtral-8x7b-instruct-v0.1 model yaml files. Updated model parameters more optimal based on benchmark-serving testing with default parameters: request-rate=800, Input/Output tokens=200/200
Issues
n/a.Type of change
Dependencies
n/a.Tests
Run KubeAI's benchmarking-serving tests to all models.