From 384947d37a3be89d1ab286e45f97e4b4529c22e1 Mon Sep 17 00:00:00 2001 From: 0xSage Date: Tue, 21 Nov 2023 19:50:00 +0800 Subject: [PATCH] docs: polish models spec --- docs/docs/specs/models.md | 231 ++++++++++++-------------------------- 1 file changed, 72 insertions(+), 159 deletions(-) diff --git a/docs/docs/specs/models.md b/docs/docs/specs/models.md index 063dc5124f..8516264316 100644 --- a/docs/docs/specs/models.md +++ b/docs/docs/specs/models.md @@ -1,193 +1,106 @@ -# Models Spec v1 -:::warning +--- +title: Models +--- -Draft Specification: functionality has not been implemented yet. +:::caution -Feedback: [HackMD: Models Spec](https://hackmd.io/ulO3uB1AQCqLa5SAAMFOQw) +Draft Specification: functionality has not been implemented yet. ::: ## Overview -Jan's Model API aims to be as similar as possible to [OpenAI's Models API](https://platform.openai.com/docs/api-reference/models), with additional methods for managing and running models locally. - -### Objectives - -- Users can download, import and delete models -- Users can use remote models (e.g. OpenAI, OpenRouter) -- Users can start/stop models and use them in a thread (or via Chat Completions API) -- User can configure default model parameters at the model level (to be overridden later at `chat/completions` or `assistant`/`thread` level) - -## Design Principle -- Don't go for simplicity yet -- Underlying abstractions are changing very frequently (e.g. ggufv3) -- Provide a minimalist framework over the abstractions that takes care of coordination between tools -- Show direct system state for now - -## KIVs to Model Spec v2 -- OpenAI and Azure OpenAI -- Importing via URL -- Multiple Partitions - -## Models folder structure -- Models in Jan are stored in the `/models` folder. -- Models are stored and organized by folders, which are atomic representations of a model for easy packaging and version control. -```sh -/jan/ # Jan root folder - /models/ - llama2-70b-q4_k_m/ - model-binary-1.gguf +In Jan, models are primary entities with the following capabilities: + +- Users can import, configure, and run models locally. +- An [OpenAI Model API](https://platform.openai.com/docs/api-reference/models) compatible endpoint at `localhost:3000/v1/models`. +- Supported model formats: `ggufv3`, and more. + +## Folder Structure + +- Models are stored in the `/models` folder. +- Models are organized by individual folders, each containing the binaries and configurations needed to run the model. This makes for easy packaging and sharing. +- Model folder names are unique and used as `model_id` default values. + +```bash +jan/ # Jan root folder + models/ + llama2-70b-q4_k_m/ # Example: standard GGUF model model.json - mistral-7b-gguf-q3_k_l/ + model-binary-1.gguf + mistral-7b-gguf-q3_k_l/ # Example: quantizations are separate folders model.json mistral-7b-q3-K-L.gguf - mistral-7b-gguf-q8_k_m./ + mistral-7b-gguf-q8_k_m/ # Example: quantizations are separate folders model.json mistral-7b-q8_k_k.gguf - random-model-q4_k_m/ - random-model-q4_k_m.bin - random-model-q4_k_m.json # (autogenerated) + llava-ggml-Q5/ # Example: model with many partitions + model.json + mmprj.bin + model_q5.ggml ``` -## Model Object -- Jan represents models as `json`-based Model Object files, known colloquially as `model.json`. --Jan aims for rough equivalence with [OpenAI's Model Object](https://platform.openai.com/docs/api-reference/models/object) with additional properties to support local models. -- Jan's models follow a `model.json` naming convention, and are built to be extremely lightweight, with the only mandatory field being a `source_url` to download the model binaries. - -### Types of Models +## `model.json` -There are 3 types of models. +- Each `model` folder contains a `model.json` file, which is a representation of a model. +- `model.json` contains metadata and default parameters used to run a model. +- The only required field is `source_url`. -- [x] Local model, yet-to-be downloaded (we have the URL) -- [x] Local model (downloaded) +### GGUF Example -## Examples -### Local Model +Here's a standard example `model.json` for a GGUF model. -- Model has 1 binary `model-zephyr-7B.json` -- See [source](https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/) +- `source_url`: https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/. -#### `model.json` ```json -"type": "model", -"version": "1", -"id": "zephyr-7b" // used in chat-completions model_name, matches folder name -"name": "Zephyr 7B" -"owned_by": "" // OpenAI compatibility -"created": 1231231 // unix timestamp -"description": "..." -"state": enum[null, "downloading", "available"] -// KIV: remote: // Subsequent -// KIV: type: "llm" // For future where there are different types -"format": "ggufv3", // State format, rather than engine "source_url": "https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/blob/main/zephyr-7b-beta.Q4_K_M.gguf", -"settings" { - "ctx_len": "2048", - "ngl": "100", - "embedding": "true", - "n_parallel": "4", - // KIV: "pre_prompt": "A chat between a curious user and an artificial intelligence", - // KIV:"user_prompt": "USER: ", - // KIV: "ai_prompt": "ASSISTANT: " +"type": "model", // Defaults to "model" +"version": "1", // Defaults to 1 +"id": "zephyr-7b" // Defaults to foldername +"name": "Zephyr 7B" // Defaults to foldername +"owned_by": "you" // Defaults to you +"created": 1231231 // Defaults to file creation time +"description": "" +"state": enum[null, "downloading", "ready", "starting", "stopping", ...] +"format": "ggufv3", // Defaults to "ggufv3" +"settings": { // Models are initialized with these settings + "ctx_len": "2048", + "ngl": "100", + "embedding": "true", + "n_parallel": "4", + // KIV: "pre_prompt": "A chat between a curious user and an artificial intelligence", + // KIV:"user_prompt": "USER: ", + // KIV: "ai_prompt": "ASSISTANT: " } -"parameters": { - "temperature": "0.7", - "token_limit": "2048", - "top_k": "0", - "top_p": "1", - "stream": "true" - }, - "metadata": {} - "assets": [ - "file://.../zephyr-7b-q4_k_m.bin", - "https://huggin" - ] -``` - -### Deferred Download -```sh -models/ - mistral-7b/ - model.json - hermes-7b/ - model.json +"parameters": { // Models are called with these parameters + "temperature": "0.7", + "token_limit": "2048", + "top_k": "0", + "top_p": "1", + "stream": "true" +}, +"metadata": {} // Defaults to {} +"assets": [ // Filepaths to model binaries; Defaults to current dir + "file://.../zephyr-7b-q4_k_m.bin", +] ``` -- Jan ships with a default model folders containing recommended models -- Only the Model Object `json` files are included -- Users must later explicitly download the model binaries -### Multiple model partitions +## API Reference -```sh -llava-ggml-Q5/ - model.json - mmprj.bin - model_q5.ggml -``` - -### Locally fine-tuned/ custom imported model - -```sh -llama-70b-finetune/ - llama-70b-finetune-q5.json - .bin -``` +Jan's Model API is compatible with [OpenAI's Models API](https://platform.openai.com/docs/api-reference/models), with additional methods for managing and running models locally. -## Models API - -| Method | API Call | OpenAI-equivalent | -| -------------- | ------------------------------- | ----------------- | -| List Models | GET /v1/models | true | -| Get Model | GET /v1/models/{model_id} | true | -| Delete Model | DELETE /v1/models/{model_id} | true | -| Start Model | PUT /v1/models/{model_id}/start | no | -| Stop Model | PUT /v1/models/{model_id}/start | no | -| Download Model | POST /v1/models/ | no | +See [Jan Models API](https://jan.ai/api-reference#tag/Models) ## Importing Models -:::warning - -- This has not been confirmed -- Jan should auto-detect and create folders automatically -- Jan's UI will allow users to rename folders and add metadata - -::: - -You can import a model by just dragging it into the `/models` folder, similar to Oobabooga. - -- Jan will detect and generate a corresponding `model.json` file based on model asset filename -- Jan will move it into its own `/model-id` folder once you define a `model-id` via the UI -- Jan will populate the model's `/model-id/model.json` as you add metadata through the UI - -### Jan Model Importers extension - :::caution -- This is only an idea, has not been confirmed as part of spec +This is current under development. ::: -Jan builds "importers" for users to seamlessly import models from a single URL. - -We currently only provide this for [TheBloke models on Huggingface](https://huggingface.co/TheBloke) (i.e. one of the patron saints of llama.cpp), but we plan to add more in the future. - -Currently, pasting a TheBloke Huggingface link in the Explore Models page will fire an importer, resulting in an: - -- Nicely-formatted model card -- Fully-annotated `model.json` file - -### ADR -- `.json`, i.e. the [Model Object](#model-object) -- Why multiple folders? - - Model Partitions (e.g. Llava in the future) -- Why a folder and config file for each quantization? - - Differently quantized models are completely different models -- Milestone -1st December: - - Catalogue of recommended models, anything else = mutate the filesystem -- [@linh] Should we have an API to help quantize models? - - Could be a really cool feature to have (i.e. import from HF, quantize model, run on CPU) -- We should have a helper function to handle hardware compatibility - - POST model/{model-id}/compatibility -- [louis] We are combining states & manifest - - Need to think through \ No newline at end of file +You can import a model by dragging the model binary or gguf file into the `/models` folder. + +- Jan automatically generates a corresponding `model.json` file based on the binary filename. +- Jan automatically organizes it into its own `/models/model-id` folder. +- Jan automatically populates the `model.json` properties, which you can subsequently modify.