Releases · LlamaEdge/rag-api-server

11 Dec 07:19

github-actions

0.11.0

db828c9

LlamaEdge-RAG 0.11.0

Major changes:

(BREAKING) Rename the VectorDB related fields in the requests
- Rename url_vdb_server to vdb_server_url
- Rename collection_name to vdb_collection_name
(NEW) Add the vdb_api_key field to the requests to /v1/create/rag, /v1/chat/completion, and /v1/retrieve endpoints. The field allows users to access the VectorDB server which requires an API key for access. See vectordb.md for details.
(NEW) Provide the support for setting VectorDB API key via the environment variable VDB_API_KEY. See vectordb.md for details.
Add vectordb.md for introducing how to interact with VectorDB

Assets 4

08 Dec 14:09

github-actions

0.10.0

5764426

LlamaEdge-RAG 0.10.0

Major changes:

Support multiple collections ( Fixes #28 )

Improve --qdrant-collection-name, --qdrant-limit, and --qdrant-score-threshold CLI options to support both single value and multiple comma-separated values, for example

wasmedge --dir .:. \
--nn-preload default:GGML:AUTO:Llama-3.2-3B-Instruct-Q5_K_M.gguf \
--nn-preload embedding:GGML:AUTO:nomic-embed-text-v1.5-f16.gguf \
rag-api-server.wasm \
...
--qdrant-url http://127.0.0.1:6333 \
--qdrant-collection-name paris,paris2 \
--qdrant-limit 2,3 \
--qdrant-score-threshold 0.5,0.6 \
...

For the requests to both /v1/chat/completions and /v1/retrieve endpoints, url_vdb_server, collection_name, limit, and score_threshold fields support both single and multiple values. For example,

Multiple values

curl --location 'http://localhost:8080/v1/retrieve' \
--header 'Content-Type: application/json' \
--data '{
    "messages": [
        ...
    ],
    ...,
    "url_vdb_server": "http://127.0.0.1:6333",
    "collection_name": ["paris","paris2"],
    "limit": [3,3],
    "score_threshold": [0.7,0.7],
    ...
}'

Single value

  curl --location 'http://localhost:8080/v1/retrieve' \
  --header 'Content-Type: application/json' \
  --data '{
      "messages": [
          ...
      ],
      ...,
      "url_vdb_server": "http://127.0.0.1:6333",
      "collection_name": ["paris"],
      "limit": [3],
      "score_threshold": [0.7],
      ...
  }'

Remove duplicated RAG search results ( Fixes #27 )
Upgrade dependencies:
- llama-core v0.23.4
- chat-prompts v0.18.1
- endpoints v0.20.0

Assets 4

29 Nov 07:47

github-actions

0.9.17

62b6b98

LlamaEdge-RAG 0.9.17

Major changes:

Upgrade dependencies:
- llama-core v0.23.3
- chat-prompts v0.18.0
- endpoints v0.19.0

Assets 4

22 Nov 14:05

github-actions

0.9.16

78ff862

LlamaEdge-RAG 0.9.16

Major change:

Upgrade to llama-core v0.23.0, chat-prompts v0.17.5, and endpoints v0.18.0
(NEW) Allow to update qdrant settings in each chat completion and embedding request:
- url_vdb_server: The URL of the VectorDB server.
- collection_name: The name of the collection in VectorDB.
- limit: Max number of retrieved results.
- score_threshold: The score threshold for the retrieved results.

Assets 4

12 Nov 07:24

github-actions

0.9.15

870b988

LlamaEdge-RAG 0.9.15

Major changes:

New endpoints
- GET /v1/files/{file_id}: Retrieve information of a specific file by id
- GET /v1/files/{file_id}/content: Retrieve the content of a specific file by id
- GET /v1/files/download/{file_id}: Download a specific file by id
Upgrade to llama-core v0.22.0

Assets 4

06 Nov 08:00

github-actions

0.9.14

b610b1c

LlamaEdge-RAG 0.9.14

Major change:

Support the dynamic number of latest user messages used in the context retrieval. The number is decided by the context_window field of chat requests. (Fixed #25 )

Assets 4