Skip to content

Add hallucination judge check#2436

Open
mindbomber wants to merge 3 commits into
Giskard-AI:mainfrom
mindbomber:codex/add-hallucination-check
Open

Add hallucination judge check#2436
mindbomber wants to merge 3 commits into
Giskard-AI:mainfrom
mindbomber:codex/add-hallucination-check

Conversation

@mindbomber
Copy link
Copy Markdown

Summary

  • Adds a new Hallucination LLM judge check registered as hallucination.
  • Adds a dedicated hallucination prompt that distinguishes fabricated factual claims from groundedness/omission failures.
  • Supports direct answer/context values, trace JSONPath extraction, and no-context mode.
  • Exports the new check from giskard.checks and giskard.checks.judges.
  • Adds unit coverage for pass, fail, no-context, and trace-extraction paths.

Fixes #2369.

Verification

  • uv run pytest libs\giskard-checks\tests\builtin\test_hallucination.py libs\giskard-checks\tests\builtin\test_groundedness.py libs\giskard-checks\tests\core\test_jsonpath_enforcement.py -q -> 19 passed
  • uv run ruff check libs\giskard-checks\src\giskard\checks\judges\hallucination.py libs\giskard-checks\src\giskard\checks\judges\__init__.py libs\giskard-checks\src\giskard\checks\__init__.py libs\giskard-checks\tests\builtin\test_hallucination.py -> passed
  • uv run ruff format --check libs\giskard-checks\src\giskard\checks\judges\hallucination.py libs\giskard-checks\src\giskard\checks\judges\__init__.py libs\giskard-checks\src\giskard\checks\__init__.py libs\giskard-checks\tests\builtin\test_hallucination.py -> passed
  • git diff --check -> passed

AANA gate

  • Intake gate: pass, recommended action accept
  • Code-change guardrail: pass, recommended action accept
  • AIx score: 1.0
  • Violations: none

Scope note

This introduces the built-in judge interface and prompt behavior. It does not claim to certify factuality or replace external evidence verification; the check provides a reviewable hallucination signal for Giskard test workflows.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new Hallucination check to the Giskard checks library, designed to detect fabricated facts in AI agent answers. The implementation includes the Hallucination class, a Jinja2 prompt template, and comprehensive unit tests. Feedback was provided to improve the get_inputs method by ensuring type consistency with TraceType, better formatting of list-based context inputs using newlines, and safer handling of None values to prevent the LLM from misinterpreting them as content.

Comment on lines +57 to +77
@override
async def get_inputs(self, trace: Trace[InputType, OutputType]) -> dict[str, str]:
inputs = {
"answer": str(
provided_or_resolve(
trace,
key=self.answer_key,
value=provide_not_none(self.answer),
)
),
"context": "",
}
if self.context is not None or self.context_key is not None:
inputs["context"] = str(
provided_or_resolve(
trace,
key=self.context_key,
value=provide_not_none(self.context),
)
)
return inputs
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The get_inputs implementation has a few areas for improvement:

  1. Type Consistency: It uses Trace[InputType, OutputType] instead of the generic TraceType defined in the class signature. While technically compatible here, using TraceType is consistent with the base class and allows for proper type resolution in subclasses.
  2. List Formatting: The context attribute supports list[str]. Using str() on a list results in a Python-style string representation (e.g., "['chunk1', 'chunk2']"), which is suboptimal for LLM prompts. Joining chunks with newlines is generally preferred.
  3. None Handling: If a value resolves to None, str(None) produces the string "None". This can be misinterpreted by the LLM as actual content. Defaulting to an empty string is safer.
    @override
    async def get_inputs(self, trace: TraceType) -> dict[str, str]:
        def _fmt(v) -> str:
            if isinstance(v, list):
                return "\n\n".join(map(str, v))
            return str(v) if v is not None else ""

        inputs = {
            "answer": _fmt(
                provided_or_resolve(
                    trace,
                    key=self.answer_key,
                    value=provide_not_none(self.answer),
                )
            ),
            "context": "",
        }
        if self.context is not None or self.context_key is not None:
            inputs["context"] = _fmt(
                provided_or_resolve(
                    trace,
                    key=self.context_key,
                    value=provide_not_none(self.context),
                )
            )
        return inputs

@@ -0,0 +1,44 @@
Your role is to evaluate whether an AI agent's answer contains hallucinated or fabricated factual claims.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, wehere did you base this prompt on? Any reference to research or other existing libraries would be great.

Copy link
Copy Markdown
Member

@davidberenstein1957 davidberenstein1957 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mindbomber this looks nice. Would you be able to resolve the checks and conflict, and address my minor comment?

@mindbomber
Copy link
Copy Markdown
Author

Thanks, addressed in 3e87688.

  • Merged latest main into the PR branch to clear the branch drift/conflict.
  • Added non-rendered prompt rationale comments with references to SelfCheckGPT and Ragas-style faithfulness/groundedness framing, plus the local Groundedness judge's omission handling.
  • Updated Hallucination.get_inputs to use TraceType, format list[str] context as newline-separated chunks, and avoid rendering None as prompt content.
  • Added coverage for list context formatting.

Verification:

  • uv run pytest libs\giskard-checks\tests\builtin\test_hallucination.py -q
  • uv run ruff check libs\giskard-checks\src\giskard\checks\judges\hallucination.py libs\giskard-checks\tests\builtin\test_hallucination.py
  • uv run ruff format --check libs\giskard-checks\src\giskard\checks\judges\hallucination.py libs\giskard-checks\tests\builtin\test_hallucination.py
  • git diff --check

The new CI run is queued now; the authorize gate may still need maintainer action.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

Add hallucination check

2 participants