-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Implement caching for evaluated prompts #44
Copy link
Copy link
Closed
Labels
Description
The goal of this feature is to reduce latency for repeated calls to the chat_completion api by saving the kv_cache keyed by the prompt tokens.
The basic version of this is to simply save the kv_state after the prompt is generated.
Additionally we should investigate if it's possible save and restore the kv_state after the completion has been generated as well.
Reactions are currently unavailable