You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[llmaz](https://github.com/InftyAI/llmaz) is an easy-to-use and advanced inference platform for large language models on Kubernetes, aimed for production use. It uses vLLM as the default model serving backend.
6
+
7
+
Please refer to the [Quick Start](https://github.com/InftyAI/llmaz?tab=readme-ov-file#quick-start) for more details.
Copy file name to clipboardExpand all lines: docs/source/features/structured_outputs.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,7 +16,7 @@ The following parameters are supported, which must be added as extra parameters:
16
16
-`guided_json`: the output will follow the JSON schema.
17
17
-`guided_grammar`: the output will follow the context free grammar.
18
18
-`guided_whitespace_pattern`: used to override the default whitespace pattern for guided json decoding.
19
-
-`guided_decoding_backend`: used to select the guided decoding backend to use.
19
+
-`guided_decoding_backend`: used to select the guided decoding backend to use. Additional backend-specific options can be supplied in a comma separated list following a colon after the backend name. For example `"xgrammar:no-fallback"` will not allow vLLM to fallback to a different backend on error.
20
20
21
21
You can see the complete list of supported parameters on the [OpenAI-Compatible Server](#openai-compatible-server)page.
Copy file name to clipboardExpand all lines: docs/source/models/pooling_models.md
+1-2Lines changed: 1 addition & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -108,8 +108,7 @@ A code example can be found here: <gh-file:examples/offline_inference/basic/clas
108
108
### `LLM.score`
109
109
110
110
The {class}`~vllm.LLM.score` method outputs similarity scores between sentence pairs.
111
-
It is primarily designed for [cross-encoder models](https://www.sbert.net/examples/applications/cross-encoder/README.html).
112
-
These types of models serve as rerankers between candidate query-document pairs in RAG systems.
111
+
It is designed for embedding models and cross encoder models. Embedding models use cosine similarity, and [cross-encoder models](https://www.sbert.net/examples/applications/cross-encoder/README.html) serve as rerankers between candidate query-document pairs in RAG systems.
113
112
114
113
:::{note}
115
114
vLLM can only perform the model inference component (e.g. embedding, reranking) of RAG.
Our Score API applies a cross-encoder model to predict scores for sentence pairs.
336
+
Our Score API can apply a cross-encoder model or an embedding model to predict scores for sentence pairs. When using an embedding model the score corresponds to the cosine similarity between each embedding pair.
337
337
Usually, the score for a sentence pair refers to the similarity between two sentences, on a scale of 0 to 1.
338
338
339
-
You can find the documentation for these kind of models at [sbert.net](https://www.sbert.net/docs/package_reference/cross_encoder/cross_encoder.html).
339
+
You can find the documentation for cross encoder models at [sbert.net](https://www.sbert.net/docs/package_reference/cross_encoder/cross_encoder.html).
0 commit comments