Skip to content

A warning when asking for scores from hybrid search #128

@hemidactylus

Description

@hemidactylus

Hybrid-mode search (i.e. driven by findAndRerank, as opposed to plain vector ANN) returns "scores" that are not a deterministic function of <query, result>, rather depend on the other results in the list (at least in the known cases of reranker models used within Astra DB; presumably this is a general property).

Therefore, usage of these scores that implicitly assume they can be treated as "actual scores" is faulty and should be discouraged with a notice to the end user.

Considering that (1) the abstractions in LangChain allow hooking up a vector store to a threshold-based retriever, and that (2) there is a "tower" of similarity-search methods ultimately using score-returning methods internally, here is the solution that was chosen:

  1. if the search is hybrid, and the method is returning a score, a warning is emitted - while the application continue working undisturbed;
  2. however, to achieve this only when the user requires the score, the "tower" needs to be slightly modified to abvoid warnings when the score-returning methods are used internally and the score actually discarded.

(Note: the fact that these scores are not in the [0:1] interval, hence break some expectation of LangChain, is secondary - it could be fixed if this were the intention.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions