Architecture for Jan to support multiple Inference Engines #1271
freelerobot
started this conversation in
Feature Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Previous thread: #771
Context
Solution
I envision an architecture in Jan that has the following:
Models Extension
/modelsAPI endpointInference Extension
/chat/completions, later/audio/speech)model.json)Extension for each Inference Engine
/chat/completionsendpointExample
File Tree
/jan /models /llama2-70b llama2-gguf-q4_k_m.bin # uses Nitro model.json /llama2-70b-intel-bigdl #pytorch files model.json /engines /nitro engine.json /openaimodel.jsongpt4-32k-1603engine.jsonexample for NitroExecution Path
llama2-70b-intel-bigdlInference Extensionloads themodel.jsonforllama2-70b-intel-bigdland sees engine isintel-bigdlInference Extensionroutes it tointel-bigdlInference Engine Extensionintel-bigdlInference Engine Extension takes in/chat/completionsrequest, runs inference, and returns result through SSEBeta Was this translation helpful? Give feedback.
All reactions