-
Notifications
You must be signed in to change notification settings - Fork 13
Description
Hybrid-mode search (i.e. driven by findAndRerank, as opposed to plain vector ANN) returns "scores" that are not a deterministic function of <query, result>, rather depend on the other results in the list (at least in the known cases of reranker models used within Astra DB; presumably this is a general property).
Therefore, usage of these scores that implicitly assume they can be treated as "actual scores" is faulty and should be discouraged with a notice to the end user.
Considering that (1) the abstractions in LangChain allow hooking up a vector store to a threshold-based retriever, and that (2) there is a "tower" of similarity-search methods ultimately using score-returning methods internally, here is the solution that was chosen:
- if the search is hybrid, and the method is returning a score, a warning is emitted - while the application continue working undisturbed;
- however, to achieve this only when the user requires the score, the "tower" needs to be slightly modified to abvoid warnings when the score-returning methods are used internally and the score actually discarded.
(Note: the fact that these scores are not in the [0:1] interval, hence break some expectation of LangChain, is secondary - it could be fixed if this were the intention.)