Skip to content

checks: add retrieval quality checks#2451

Open
harsh21234i wants to merge 3 commits into
Giskard-AI:mainfrom
harsh21234i:feat/retrieval-quality-checks
Open

checks: add retrieval quality checks#2451
harsh21234i wants to merge 3 commits into
Giskard-AI:mainfrom
harsh21234i:feat/retrieval-quality-checks

Conversation

@harsh21234i
Copy link
Copy Markdown
Contributor

Closes #2445

Summary

  • Add built-in strict retrieval checks: RecallAtK, PrecisionAtK, HitRateAtK, MRR, NDCGAtK, and InfAP
  • Support configurable threshold, JSONPath keys for relevant/retrieved IDs, and k where applicable
  • Export the new checks from giskard.checks
  • Add unit coverage for empty inputs, perfect retrieval, partial overlap, ranking sensitivity, sparse-label InfAP,
    duplicate retrieved IDs, missing keys, and registry validation

Scope

This PR implements the strict exact-ID matching strategy first. Cosine similarity, LLM-judged relevance, and
documentation updates can be added in follow-up PRs.

Testing

  • uv run -m pytest -q libs/giskard-checks/tests/builtin/test_retrieval.py
  • uv run -m pytest -q libs/giskard-checks/tests/builtin
  • uv run ruff check ...
  • uv run basedpyright ...

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive set of retrieval quality metrics, including Recall@K, Precision@K, HitRate@K, MRR, NDCG@K, and InfAP, along with corresponding unit tests. Feedback suggests refining the _as_sequence helper to handle None values correctly, adjusting the Precision@K calculation to use a standard denominator, and renaming the InfAP metric to AveragePrecision for better alignment with information retrieval terminology.

Comment thread libs/giskard-checks/src/giskard/checks/builtin/retrieval.py
Comment thread libs/giskard-checks/src/giskard/checks/builtin/retrieval.py Outdated
Comment on lines +214 to +215
@Check.register("inf_ap")
class InfAP[InputType, OutputType, TraceType: Trace]( # pyright: ignore[reportMissingTypeArgument]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The metric implemented here is standard Average Precision (AP). In IR literature, Inferred Average Precision (InfAP) refers to a specific estimator designed for incomplete relevance judgments (where some documents are unjudged). Since this implementation assumes strict exact-ID matching against a provided set (complete judgment), it should be renamed to AveragePrecision to avoid confusion with the specialized InfAP metric.

harsh21234i and others added 2 commits May 14, 2026 10:11
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

Add built-in retrieval quality checks

1 participant