|
1 | | -# rag-api-server |
| 1 | +# LlamaEdge-RAG API Server |
| 2 | + |
| 3 | +## Endpoints |
| 4 | + |
| 5 | +- `/v1/rag/embeddings` endpoint for converting text to embeddings in the one-click way. |
| 6 | + |
| 7 | +- `/v1/rag/query` endpoint for querying the RAG model. It is equivalent to the [`/v1/chat/completions` endpoint](https://github.com/LlamaEdge/LlamaEdge/tree/main/api-server#v1chatcompletions-endpoint-for-chat-completions) defined in LlamaEdge API server. |
| 8 | + |
| 9 | +## CLI options for the API server |
| 10 | + |
| 11 | +The `-h` or `--help` option can list the available options of the `rag-api-server` wasm app: |
| 12 | + |
| 13 | + ```console |
| 14 | + $ wasmedge rag-api-server.wasm -h |
| 15 | + |
| 16 | + Usage: rag-api-server.wasm [OPTIONS] --model-name <MODEL-NAME> --prompt-template <TEMPLATE> |
| 17 | + |
| 18 | + Options: |
| 19 | + -m, --model-name <MODEL-NAME> |
| 20 | + Sets names for chat and embedding models. The names are separated by comma without space, for example, '--model-name Llama-2-7b,all-minilm'. |
| 21 | + -a, --model-alias <MODEL-ALIAS> |
| 22 | + Sets model aliases [default: default,embedding] |
| 23 | + -c, --ctx-size <CTX_SIZE> |
| 24 | + Sets context sizes for chat and embedding models. The sizes are separated by comma without space, for example, '--ctx-size 4096,384'. The first value is for the chat model, and the second value is for the embedding model. [default: 4096,384] |
| 25 | + -r, --reverse-prompt <REVERSE_PROMPT> |
| 26 | + Halt generation at PROMPT, return control. |
| 27 | + -p, --prompt-template <TEMPLATE> |
| 28 | + Sets the prompt template. [possible values: llama-2-chat, codellama-instruct, codellama-super-instruct, mistral-instruct, mistrallite, openchat, human-assistant, vicuna-1.0-chat, vicuna-1.1-chat, vicuna-llava, chatml, baichuan-2, wizard-coder, zephyr, stablelm-zephyr, intel-neural, deepseek-chat, deepseek-coder, solar-instruct, gemma-instruct] |
| 29 | + --system-prompt <system_prompt> |
| 30 | + Sets global system prompt. [default: ] |
| 31 | + --qdrant-url <qdrant_url> |
| 32 | + Sets the url of Qdrant REST Service. [default: http://localhost:6333] |
| 33 | + --qdrant-collection-name <qdrant_collection_name> |
| 34 | + Sets the collection name of Qdrant. [default: default] |
| 35 | + --qdrant-limit <qdrant_limit> |
| 36 | + Max number of retrieved result. [default: 3] |
| 37 | + --qdrant-score-threshold <qdrant_score_threshold> |
| 38 | + Minimal score threshold for the search result [default: 0.4] |
| 39 | + --log-prompts |
| 40 | + Print prompt strings to stdout |
| 41 | + --log-stat |
| 42 | + Print statistics to stdout |
| 43 | + --log-all |
| 44 | + Print all log information to stdout |
| 45 | + --web-ui <WEB_UI> |
| 46 | + Root path for the Web UI files [default: chatbot-ui] |
| 47 | + -s, --socket-addr <IP:PORT> |
| 48 | + Sets the socket address [default: 0.0.0.0:8080] |
| 49 | + -h, --help |
| 50 | + Print help |
| 51 | + -V, --version |
| 52 | + Print version |
| 53 | + ``` |
| 54 | + |
| 55 | + Please guarantee that the port is not occupied by other processes. If the port specified is available on your machine and the command is successful, you should see the following output in the terminal: |
| 56 | + |
| 57 | + ```console |
| 58 | + Listening on http://0.0.0.0:8080 |
| 59 | + ``` |
| 60 | + |
| 61 | + If the Web UI is ready, you can navigate to `http://127.0.0.1:8080` to open the chatbot, it will interact with the API of your server. |
0 commit comments