Giskard-AI · kevinmessiaen · May 15, 2026 · May 15, 2026 · May 15, 2026 · May 15, 2026
diff --git a/.cursor/rules/guidelines.mdc b/.cursor/rules/guidelines.mdc
@@ -15,3 +15,16 @@ Do not remove unrelated code, functionalities, or comments without explicit reas
 ## No CODEMAP
 
 Never generate architecture maps, codemap files, or similar high-maintenance documentation that duplicates what is already in cursor rules, docstrings, and README. These files go stale fast and create maintenance burden.
+
+## Root Causes, No Temporary Fixes
+
+Trace failures to their source before making a change. Do not:
+- Add `# type: ignore` / `# pyright: ignore` without first confirming no structural fix is possible.
+- Modify test assertions to force a pass — if a test reveals a real bug, fix the bug.
+- Add `try/except` to suppress exceptions that should be fixed.
+
+If the root cause is in a different library than where the symptom appears, say so before proposing a fix.
+
+## Minimal Scope
+
+Only touch files directly required by the task. Do not reformat untouched lines, rename unrelated variables, or add type annotations to code you didn't otherwise change. Propose nearby improvements separately and wait for approval.
diff --git a/.cursor/skills/giskard-oss-pr-ready/SKILL.md b/.cursor/skills/giskard-oss-pr-ready/SKILL.md
@@ -48,6 +48,16 @@ If checks fail from missing tools, run **`make install`** and **`make install-to
 
    Use valid shell assignments in `.env` (e.g. `GEMINI_API_KEY=...`). If you point `TEST_MODEL` / `TEST_EMBEDDING_MODEL` at another provider, add whatever API keys that provider expects (LiteLLM / the underlying SDK).
 
+## Verification gate
+
+Before declaring a branch merge-ready, include the following evidence in your response:
+
+1. Last line(s) of `make check` output — must show no errors.
+2. Pytest summary from `make test-unit` — e.g. `52 passed, 0 failed`.
+3. If any step was skipped or failed, state it explicitly.
+
+"It should pass" is not evidence. Show actual output.
+
 ## Library-specific rules
 
 For package conventions (API patterns, tests layout), read `libs/<package>/.cursor/rules/`. Makefile targets are always invoked from the repository root as listed above.
diff --git a/AGENTS.md b/AGENTS.md
@@ -1,69 +1,49 @@
 # AGENTS.md
 
-Agent-focused notes for the **giskard-oss** monorepo. Human-oriented docs live in [README.md](README.md) and [https://docs.giskard.ai/oss](https://docs.giskard.ai/oss).
-
-## Project overview
-
-- **Stack:** Python 3.12+ (CI also exercises 3.13–3.14), [uv](https://docs.astral.sh/uv/) for env and runs, [Ruff](https://docs.astral.sh/ruff/) for lint/format, [basedpyright](https://github.com/DetachHead/basedpyright) for types.
-- **Layout:** Libraries under `libs/` — `giskard-core`, `giskard-agents`, `giskard-checks`, `giskard-llm`. Work from the **repository root** unless a task is scoped to one package.
-- **Package-specific conventions:** See `libs/<package>/.cursor/rules/` (e.g. `giskard-checks` has project and development rules).
-- **Always invoke Python via `uv run`** — bare `python` or `pytest` will fail with `ModuleNotFoundError`.
-
-## Setup
-
-**Recommended for coding agents:** run `make setup-for-agents` from the repo root. It runs the same steps as `make setup`, writes `GISKARD_TELEMETRY_DISABLED=false` to `.env`, and sends a one-shot analytics event. Optional context (both optional):
-
-```bash
-make setup-for-agents AGENT_NAME="ClawBot" REASON="Working on issue #1234"
-```
-
-Human or minimal setup without that target:
-
-```bash
-make install          # uv sync — project dependencies
-make install-tools    # ruff, vermin, basedpyright, pre-commit (with pre-commit-uv)
-make pre-commit-install   # optional: git hooks
-# or all of the above:
-make setup
-```
-
-If `make check` fails with missing tools, run `make install` and `make install-tools` (same order as CI).
-
-## Commands (run from repo root)
-
-| Goal | Command |
-|------|---------|
-| Format (Ruff format + fix) | `make format` |
-| Lint | `make lint` |
-| Full gate (lint, format check, Python 3.12 compat, types, security, licenses) | `make check` |
-| Unit tests (all libs) | `make test-unit` |
-| Unit tests for one package | `make test-unit PACKAGE=giskard-checks` (also `giskard-core`, `giskard-agents`) |
-| All tests including functional | `make test` |
-| Functional tests only | `make test-functional` |
-
-CI (`.github/workflows/ci.yml`) runs `make install install-tools`, then `make check`, then `make test-unit` per package — mirror that before opening or updating a PR.
-
-## Functional / integration tests
-
-- `make test-functional` and full `make test` call live APIs; they are **not** the default PR gate in `ci.yml` (see `.github/workflows/integration-tests.yml` for secrets-driven runs).
-- Local: create a **repo-root** `.env` (gitignored). Export vars before pytest, e.g. `set -a && source .env && set +a` then `make test-functional PACKAGE=giskard-agents`. Typical vars include `GEMINI_API_KEY`; optional `TEST_MODEL`, `TEST_EMBEDDING_MODEL` (see `libs/giskard-agents/tests/conftest.py`).
-
-## PR / change discipline
-
-**Pre-flight checklist** (run in order before opening or updating a PR):
-
-```bash
-make format       # Ruff format + autofix
-make check        # lint, type-check, compat, security, licenses
-make test-unit    # or: make test-unit PACKAGE=giskard-checks
-```
-
-If `make check` fails with missing tools: run `make install && make install-tools` first.
-
-**Rules:**
-
-- End **agent-opened** PR titles with `🤖🤖🤖🤖` — required for the expedited-agent PR workflow. Do not omit.
-- Minimal diffs only: implement exactly what was asked; no drive-by refactors or unrelated file edits.
-- Do not add unsolicited features, architecture docs, or codemap files.
-
+giskard-oss — behavioral config for autonomous coding agents (no human in the loop). Human-oriented docs: [README.md](README.md). Always invoke Python via `uv run` — bare `python` or `pytest` will fail.
+
+## Workflow Orchestration
+
+### 1. Setup
+Run once: `make setup-for-agents AGENT_NAME="<name>" REASON="<issue or task>"`
+If tools are missing later: `make install && make install-tools`
+
+### 2. Plan Before Acting
+– Write approach to tasks/todo.md before touching any file
+– If scope is unclear, read `libs/<pkg>/.cursor/rules/` before proceeding
+– If something goes wrong, stop and re-plan
+
+### 3. Verification Before Done
+– Before any PR: `make format && make check && make test-unit PACKAGE=<affected-lib>`
+– Show `make check` and pytest summary output in the PR description
+– No `# type: ignore` without structural-fix explanation; no patching test assertions
+
+### 4. PR Rules
+– End PR titles with `🤖🤖🤖🤖` — required for the expedited-agent PR workflow
+– Minimal diffs: implement exactly what was asked
+
+### 5. Self-Improvement Loop
+– After any mistake: update lessons.md with what went wrong and what rule would have prevented it
+
+### 6. Clarify Before Acting
+– When an issue is ambiguous, contradictory, or missing acceptance criteria: **do not open a PR**. Post a comment on the issue asking the specific questions needed to proceed, then stop.
+– When you have a better approach than what was requested: **do not implement your alternative silently**. Comment on the issue explaining your suggestion and the trade-offs, then wait for confirmation.
+– When responding to a PR review comment: if the requested change is unclear, you disagree, or you see a better path: **comment back with your question or counter-proposal**, do not just apply the change blindly.
+– One comment is enough — do not loop. If no response comes, remain stopped.
+– When scope is unambiguous and you have no better suggestion: proceed directly without asking.
+
+## Task Management
+1. Plan First — write plan to tasks/todo.md
+2. Verify Plan — review before starting; proceed if unambiguous
-2. Verify Plan — review before starting; proceed if unambiguous
+5. Document Results — add review section to tasks/todo.md
-2. Verify Plan — review before starting; proceed if unambiguous
+5. Document Results — add review section to tasks/todo.md
+3. Track Progress — mark items complete as you go
+4. Explain Changes — high-level summary at each step
+5. Document Results — add review section to todo.md
+6. Capture Lessons — update lessons.md after corrections
+
+## Core Principles
+– Simplicity First: make every change as simple as possible; prefer deleting lines over adding them
+– No Laziness: find root causes; no band-aids, no temporary fixes; senior developer standards
+– Minimal Impact: only touch what's necessary; no side effects; no reformatting untouched lines
+
+---
 This file follows the open [AGENTS.md](https://agents.md/) convention for coding agents.
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -1,44 +1,47 @@
 # CLAUDE.md
 
-Claude-specific notes for the **giskard-oss** monorepo. Human-oriented docs live in [README.md](README.md); agent-oriented operational notes live in [AGENTS.md](AGENTS.md).
-
-## Project overview
-
-- **Stack:** Python 3.12+, [uv](https://docs.astral.sh/uv/) for env/runs, [Ruff](https://docs.astral.sh/ruff/) for lint/format, [basedpyright](https://github.com/DetachHead/basedpyright) for types.
-- **Layout:** Libraries under `libs/` — `giskard-core`, `giskard-agents`, `giskard-checks`, `giskard-llm`. Work from the repo root unless a task is strictly scoped to one package.
-
-## Setup
-
-```bash
-make setup            # uv sync + install tools + pre-commit hooks
-# or, for coding agents:
-make setup-for-agents
-```
-
-## Commands
-
-| Goal | Command |
-|------|---------|
-| Format | `make format` |
-| Lint | `make lint` |
-| Full gate | `make check` |
-| Unit tests (all) | `make test-unit` |
-| Unit tests (one package) | `make test-unit PACKAGE=giskard-checks` |
-| All tests | `make test` |
-
-Before opening or updating a PR: `make format && make check && make test-unit`.
-
-## PR discipline
-
-- End agent-opened PR titles with `🤖🤖🤖🤖` — required for the expedited-agent PR workflow.
-- Always run `uv run` (not bare `python` or `pytest`).
-
----
-
-## Core principles
-
-**Make every change as small as possible.** Prefer deleting lines over adding them. The best fix often removes code entirely.
-
-**Find the root cause.** No band-aids. If a symptom has an underlying cause, address that — not the symptom.
-
-**Only touch what's necessary.** No side effects. Leave every file you didn't need to change exactly as you found it.
+giskard-oss — behavioral config for Claude Code (interactive assistant with a human in the loop).
-giskard-oss — behavioral config for Claude Code (interactive assistant with a human in the loop).
+giskard-oss — behavioral config for Claude Code (interactive assistant with a human in the loop). Human-oriented docs: [README.md](README.md). Always invoke Python via uv run — bare python or pytest will fail.
-giskard-oss — behavioral config for Claude Code (interactive assistant with a human in the loop).
+giskard-oss — behavioral config for Claude Code (interactive assistant with a human in the loop). Human-oriented docs: [README.md](README.md). Always invoke Python via uv run — bare python or pytest will fail.
+
+## Workflow Orchestration
+
+### 1. Plan Mode Default
+– Enter plan mode for ANY non-trivial task (3+ steps or touches multiple libs)
+– Write the approach to tasks/todo.md before touching files
+– If something goes sideways, STOP and re-plan immediately
+
+### 2. Subagent Strategy
+– Use subagents to keep the main context window clean
+– One focused task per subagent
+– For complex cross-lib problems, spawn parallel subagents
+
+### 3. Self-Improvement Loop
+– After ANY correction: update lessons.md
+– Write a rule that prevents the same mistake
+– Propose additions to CLAUDE.md in the format "Add rule: X"
+
+### 4. Verification Before Done
+– Never mark a task complete without proving it works
+– Run: `make format && make check && make test-unit PACKAGE=<affected-lib>`
+– Show actual output. Never assume tests pass.
+– Ask yourself: "Would a staff engineer approve this?"
+
+### 5. Demand Elegance
+– Pause and ask "is there a more elegant solution?"
+– Skip for simple fixes — don't over-engineer
+
+### 6. Autonomous Bug Fixing
+– When given a bug report with clear scope: just fix it
+– No `# type: ignore`, no patched test assertions — fix the root cause
+
+## Task Management
+1. Plan First — write plan to tasks/todo.md
+2. Verify Plan — check in before starting
+3. Track Progress — mark items complete as you go
+4. Explain Changes — high-level summary at each step
+5. Document Results — add review section to todo.md
-5. Document Results — add review section to todo.md
+5. Document Results — add review section to tasks/todo.md
-5. Document Results — add review section to todo.md
+5. Document Results — add review section to tasks/todo.md
+6. Capture Lessons — update lessons.md after corrections
+
+## Core Principles
+– Simplicity First: make every change as simple as possible; prefer deleting lines over adding them
+– No Laziness: find root causes; no band-aids, no temporary fixes; senior developer standards
+– Minimal Impact: only touch what's necessary; no side effects; no reformatting untouched lines
diff --git a/lessons.md b/lessons.md
@@ -0,0 +1,8 @@
+# Lessons
+
+One entry per corrected mistake. Format: date, what went wrong, rule that prevents it.
+
+---
+
+**2026-05-15** — Assumed CLAUDE.md needed to mirror AGENTS.md structure.
+Rule: CLAUDE.md and AGENTS.md serve different audiences and are independent — don't force them to share structure or cross-reference each other.