Update troubleshooting guide to include remediation for incorrect pre…#2040
Conversation
…fix cache scorer config
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
Since v1.2 the plugin auto tunes such configurations from the model server metrics so no manual tuning is required, #1748. We should recommend users using the v1.2+ versions, and highlight that such tuning is only required before v1.2 |
Done, specified that past v1.2 autotuning is supported, so long as the model server exposes the required metrics, like vLLM does. |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: BenjaminBraunDev, kfswain The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Due to customer issues with TTFT spikes caused by the prefix cache scorer having an incorrect configuration, adding this to the troubleshooting guide to make it easier for users to diagnose and remediate similar issues.
In this case it was unclear that the TTFT spikes were caused by the prefix cache config until we saw the config wasn't set to the right parameters for the model being served.
Does this PR introduce a user-facing change?: