Skip to content

feat(giskard-checks): minimal OWASP LLM suite generator (LLM01 indirect injection)#2438

Merged
kevinmessiaen merged 29 commits into
mainfrom
feat/minimal-suite-generator
May 13, 2026
Merged

feat(giskard-checks): minimal OWASP LLM suite generator (LLM01 indirect injection)#2438
kevinmessiaen merged 29 commits into
mainfrom
feat/minimal-suite-generator

Conversation

@kevinmessiaen
Copy link
Copy Markdown
Member

@kevinmessiaen kevinmessiaen commented May 7, 2026

Summary

  • Adds BaseLLMGenerator + LLMGenerator — a multi-turn input generator hierarchy mirroring the existing BaseLLMCheck/LLMJudge pattern
  • Refactors UserSimulator to extend BaseLLMGenerator (removes duplicated loop logic)
  • Adds ScenarioCategory enum + generate_suite() factory that loads predefined OWASP scenarios from JSONL datasets and injects agent description as an annotation
  • Ships one LLM01:2025 indirect injection scenario (JSONL + Jinja2 prompt template), with multiple_runs=5 for replay-ability
  • Adds InputGenerationException for generator-side errors (e.g. schema incompatibility)
  • Adds input_type support to InputGenerator.__call__ — generators can now produce structured BaseModel inputs, not just str
  • LLMGeneratorOutput[T] is now generic; the LLM is asked to produce a T-typed message via with_output(LLMGeneratorOutput[T])
  • Interact.generate() infers input_type at runtime from the target callable's first parameter annotation — no API change required at the call site
  • LLMGenerator gains as_template: bool = False — when True, renders the inline prompt as a Jinja2 template (enabling {{ _instr_output }} schema injection); default False guards against prompt injection from user-controlled strings

Usage (str target, no change needed):

def my_agent_adapter(input: str) -> str:
    return my_agent({"content": input, "role": "user"})

suite = generate_suite(
    categories=[ScenarioCategory.LLM01_INDIRECT_INJECTION],
    description="A documentation chatbot for Giskard",
)
suite.run(target=my_agent_adapter)

Usage (structured BaseModel target):

class UserMessage(BaseModel):
    role: str
    content: str

def my_agent(input: UserMessage) -> str:
    return call_llm(input)

suite = generate_suite(
    categories=[ScenarioCategory.LLM01_INDIRECT_INJECTION],
    description="A documentation chatbot for Giskard",
)
suite.run(target=my_agent)  # input_type=UserMessage inferred automatically

Test Plan

  • uv run pytest libs/giskard-checks/tests/ -q — 514 passed, 4 skipped
  • Import smoke test: from giskard.checks import BaseLLMGenerator, LLMGenerator, ScenarioCategory, generate_suite, UserSimulator, Suite, InputGenerationException
  • generate_suite() returns a Suite with 1 scenario, multiple_runs=5, and annotations["description"] injected
  • Scenario with BaseModel-annotated target: input_type inferred, LLM produces structured output
  • schema_issue set by LLM → InputGenerationException raised with "schema issue: ..." message

🤖 Generated with Claude Code

kevinmessiaen and others added 10 commits May 7, 2026 13:22
Implement BaseLLMGenerator (abstract multi-turn LLM generator base) and LLMGenerator (configurable via inline prompt or template path), mirroring the BaseLLMCheck/LLMJudge pattern on the generator side. Register LLMGenerator under the 'llm_generator' kind discriminator and expose both classes from the generators package.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
- Remove @OverRide from BaseLLMGenerator.__call__ (async generator protocol differs)
- Align LLMGenerator validator error messages to match LLMJudge style (hyphen, no trailing period)
- Remove dead second response in test_llm_generator_stops_at_max_steps
- Add tests for max_steps=0 and message=None-with-goal-not-reached cases
- Extract MockGenerator and LLMTrace to shared conftest.py; add __init__.py to enable relative imports

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
- Create scenarios subdirectory and llm01_indirect_injection.j2 template
- Template drives LLMGenerator to simulate indirect prompt injection attacks
- Support multi-turn interactions with configurable max_turns limit

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…_suite, ScenarioCategory

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…e generic parameters

- Changed the return type annotation of generate_suite() to Suite[str, Any] for improved type clarity.
kevinmessiaen and others added 9 commits May 7, 2026 15:45
…ype for structured output

Replaces the ValueError stub with a real implementation: parameterizes
LLMGeneratorOutput[T] from input_type, raises InputGenerationException on
schema_issue, and adds overloads for typed return. Tests cover BaseModel
output, schema_issue raising, schema inclusion, and backward-compatible str
output.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
kevinmessiaen and others added 8 commits May 11, 2026 10:11
…mpts to enforce goal_reached and message rules on first turn
- Document schema_issue field in user_simulator.j2 and llm01_indirect_injection.j2 prompts
- Fix _infer_input_type to fall back to __call__ hints for callable-class targets
- Add tests for callable-class input type inference
- Change generate_suite() categories param to optional (None = all categories)
- Add docstring to InputGenerationException
- Remove trivial test_exceptions.py (covered by test_llm_generator.py)
- Add tests/scenarios/__init__.py for consistency

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…tances in Python 3.14+

- Import inspect to facilitate type hint inspection.
- Update fallback mechanism for callable instances to correctly retrieve parameter hints from __call__.
- Ensure compatibility with changes in get_type_hints behavior in Python 3.14+.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

2 participants