Releases: LlamaEdge/rag-api-server
LlamaEdge-RAG 0.11.0
Major changes:
-
(BREAKING) Rename the VectorDB related fields in the requests
- Rename
url_vdb_servertovdb_server_url - Rename
collection_nametovdb_collection_name
- Rename
-
(NEW) Add the
vdb_api_keyfield to the requests to/v1/create/rag,/v1/chat/completion, and/v1/retrieveendpoints. The field allows users to access the VectorDB server which requires an API key for access. See vectordb.md for details. -
(NEW) Provide the support for setting VectorDB API key via the environment variable
VDB_API_KEY. See vectordb.md for details. -
Add
vectordb.mdfor introducing how to interact with VectorDB
LlamaEdge-RAG 0.10.0
Major changes:
-
Support multiple collections ( Fixes #28 )
-
Improve
--qdrant-collection-name,--qdrant-limit, and--qdrant-score-thresholdCLI options to support both single value and multiple comma-separated values, for examplewasmedge --dir .:. \ --nn-preload default:GGML:AUTO:Llama-3.2-3B-Instruct-Q5_K_M.gguf \ --nn-preload embedding:GGML:AUTO:nomic-embed-text-v1.5-f16.gguf \ rag-api-server.wasm \ ... --qdrant-url http://127.0.0.1:6333 \ --qdrant-collection-name paris,paris2 \ --qdrant-limit 2,3 \ --qdrant-score-threshold 0.5,0.6 \ ...
-
For the requests to both
/v1/chat/completionsand/v1/retrieveendpoints,url_vdb_server,collection_name,limit, andscore_thresholdfields support both single and multiple values. For example,-
Multiple values
curl --location 'http://localhost:8080/v1/retrieve' \ --header 'Content-Type: application/json' \ --data '{ "messages": [ ... ], ..., "url_vdb_server": "http://127.0.0.1:6333", "collection_name": ["paris","paris2"], "limit": [3,3], "score_threshold": [0.7,0.7], ... }'
-
Single value
curl --location 'http://localhost:8080/v1/retrieve' \ --header 'Content-Type: application/json' \ --data '{ "messages": [ ... ], ..., "url_vdb_server": "http://127.0.0.1:6333", "collection_name": ["paris"], "limit": [3], "score_threshold": [0.7], ... }'
-
-
-
Remove duplicated RAG search results ( Fixes #27 )
-
Upgrade dependencies:
llama-core v0.23.4chat-prompts v0.18.1endpoints v0.20.0
LlamaEdge-RAG 0.9.17
Major changes:
- Upgrade dependencies:
llama-core v0.23.3chat-prompts v0.18.0endpoints v0.19.0
LlamaEdge-RAG 0.9.16
Major change:
- Upgrade to
llama-core v0.23.0,chat-prompts v0.17.5, andendpoints v0.18.0 - (NEW) Allow to update qdrant settings in each
chat completionandembeddingrequest:url_vdb_server: The URL of the VectorDB server.collection_name: The name of the collection in VectorDB.limit: Max number of retrieved results.score_threshold: The score threshold for the retrieved results.
LlamaEdge-RAG 0.9.15
Major changes:
-
New endpoints
GET /v1/files/{file_id}: Retrieve information of a specific file by idGET /v1/files/{file_id}/content: Retrieve the content of a specific file by idGET /v1/files/download/{file_id}: Download a specific file by id
-
Upgrade to
llama-core v0.22.0
LlamaEdge-RAG 0.9.14
Major change:
- Support the dynamic number of latest user messages used in the context retrieval. The number is decided by the
context_windowfield of chat requests. (Fixed #25 )
LlamaEdge-RAG 0.9.13
Major change:
- Upgrade to
llama-core v0.21.1
LlamaEdge-RAG 0.9.12
Major change:
- Upgrade to
llama-core v0.21.0,chat-prompts v0.17.2, andendpoints v0.17.0
LlamaEdge-RAG 0.9.11
Major change:
- Upgrade to
llama-core v0.20.0,chat-prompts v0.17.1, andendpoints v0.16.0
LlamaEdge-RAG 0.9.9
Major changes:
- Upgrade to
llama-core v0.19.2