A Spec Kit extension that bakes quality and testing
into the core SDD workflow as a built-in step at every lifecycle event. Hooks attach to
/speckit-plan, /speckit-tasks, /speckit-analyze (optional), and /speckit-implement
so quality work happens automatically and in the right order.
It anchors on the stock Spec Kit artefacts (no custom spec format required) and defers
to your project constitution.md to decide how strict the gates are.
# From a Spec Kit project root, after `specify init`
specify extension add --force --from https://github.com/NDViet/speckit-test-extension/archive/refs/heads/master.zip EXTENSIONOr from a local folder (offline / development):
specify extension add --dev /path/to/speckit-test-extensionRequires Spec Kit >= 0.8.0.
Spec Kit's core workflow produces a rich spec.md (prioritized user stories with Acceptance
Scenarios, Functional Requirements, Success Criteria) and a tasks.md breakdown -- but the
testing layer is not enforced by the core commands themselves:
- Nothing in
/speckit-plandecides how each spec item will be tested - Nothing in
/speckit-tasksguarantees a test task per testable spec item - Tests get written from memory, not from spec -- traceability breaks
- There is no requirement-level measure of "which spec items are actually tested" before merge
- Pre-merge sign-off is ad-hoc and leaves no tracked record
- Non-functional changes (perf, security, race conditions, refactors) get the same generic unit-test treatment as a fresh feature, so QA effort lands in the wrong places
This extension closes those gaps by hooking quality work into the core workflow at every lifecycle event, and by adapting the test recipe to the type of change.
Every command fires as a hook on a non-skippable core command (/speckit-plan,
/speckit-tasks, /speckit-implement). Nothing here is invoked manually in the happy path.
A developer or PM running the standard Spec Kit workflow gets the quality scaffolding for
free.
| Artefact | Owned by | Other commands |
|---|---|---|
plan.md § Testing Strategy |
planaudit |
read-only |
tasks.md (test tasks) |
/speckit-tasks (+ tasksaudit --write) |
read-only |
test-plan.md |
qaprep |
read-only |
tests/** scaffolds |
qaprep (create only; never overwrites) |
dev/QA implement |
checklists/test.md |
qaprep (refresh; never unchecks) |
dev/QA tick + add Evidence |
review.md |
qareview (overwrite each run) |
read-only |
No command edits two files. No file has two writers. Reading is unrestricted.
Every generated test case, task, and stub traces to a real US{n}-AS{m}, FR-###, or
buildable SC-###. The chain
spec item -> plan.md Testing Strategy -> tasks.md test task -> test file -> test-plan.md matrix -> review.md is checkable end to end. Missing the item ID in the test header caps
qareview's rating at Weak.
planaudit runs a deterministic keyword scan of spec.md and plan.md Technical Context
once, then writes the result into plan.md § Testing Strategy header as
**Change Profile:** [tag, ...]. The other three commands read it from there; nobody
re-detects.
| Tag | Detection signal | Adds to the test recipe |
|---|---|---|
feature |
default | standard unit/contract per P1 AS/FR |
ui |
UI screens, UX flows, design files, user-facing UI stories | a11y + visual regression + cross-browser/device matrix |
api |
endpoints declared, contracts/ exists, SDK, schema |
contract tests, schema-compat, error envelopes, authz |
bugfix |
linked defect ID, FR phrased "MUST no longer ...", "regression" | reproducing test (fails pre-fix, passes post-fix) + root-cause guard |
refactor |
plan says "refactor / no behaviour change", no new FR | characterization tests on touched surface; new functional cases forbidden unless surface changes |
concurrency |
race / lock / deadlock / async / parallel / atomic | concurrent N-caller stress + invariants + deterministic-seed repeat-N |
performance |
buildable SC-### with latency/throughput/resource budget |
baseline-vs-new perf test with a regression budget (% over baseline = FAIL) |
security |
auth / PII / permissions / crypto / threat | authz negative tests, secrets handling, SAST + dep-scan |
data-migration |
schema change, backfill, ETL, migration files | up/down/idempotency + rollback drill |
Multi-tag is allowed; conflicts (e.g. refactor + new FR) are logged, not silenced.
The before_implement hook fires tasksaudit audit-only -- it can BLOCK but never silently
mutates tasks.md. Adding the missing tasks is a deliberate developer action
(/speckit-test-tasksaudit --write). This rule keeps automated hooks from editing files
behind the developer's back.
Your project's constitution.md may escalate which layers block, but cannot silently waive
a gate. Waiving requires an explicit --advisory flag, which is recorded.
The core Spec Kit workflow is
/speckit-plan -> /speckit-tasks -> /speckit-analyze (optional) -> /speckit-implement.
Because /speckit-analyze is skippable, every mandatory quality gate anchors to one of the
non-skippable commands.
| Command | Hook | Mode | What it does | Expected outcome |
|---|---|---|---|---|
speckit.test.planaudit |
after_plan |
mandatory | Detects the Change Profile; appends ## Testing Strategy to plan.md mapping every P1 AS / FR to concrete fail-first unit/contract cases plus profile-specific additions (e.g. authz-negative for security, characterization for refactor, reproducing test for bugfix). Idempotent. |
plan.md now contains a Testing Strategy section that /speckit-tasks will materialize. Includes profile tag, framework, test root, and per-item case rows. |
speckit.test.qaprep |
after_tasks |
mandatory | Writes FEATURE_DIR/test-plan.md with QA-grade Test Case Catalogue (manual + automated TCs with preconditions / steps / expected / test data / env), traceability matrix, env matrix, risk register, entry/exit, append-only Test Execution Log; scaffolds higher-layer + profile-driven tests as failing stubs; seeds checklists/test.md with Evidence-link discipline. Idempotent; never overwrites existing tests or manual edits between markers. |
test-plan.md is the QA source of truth -- manual testers can execute from §3 without re-reading spec, automation engineers can scaffold from it. Stubs fail until implemented, so missing coverage is visible in CI. |
speckit.test.tasksaudit |
before_implement |
mandatory gate | Audit-only. BLOCKS /speckit-implement if any P1 spec item lacks a unit/contract test task; escalates the blocking set per active Change Profile (e.g. performance makes SC perf tasks gated; bugfix requires a reproducing test task). With --write (manual, opt-in), adds the missing tasks in place. |
Either Gate: PASS (proceed) or Gate: BLOCKED with the exact task lines to add. The hook never silently mutates tasks.md. |
speckit.test.qareview |
after_analyze (advisory) + after_implement (mandatory) |
both | Reads all artefacts, lifts /speckit-analyze's findings (does not re-derive), computes per-item coverage (Strong / Medium / Weak / Stub / Missing), runs stub scan + traceability + constitution + checklist + execution-evidence + Change-Profile checks, emits Gate: PASS / BLOCKED to chat and to FEATURE_DIR/review.md (overwritten each run). |
The after_analyze run is advisory (pre-impl readiness check). The after_implement run is the formal pre-merge gate -- it requires recorded test runs in §8 of test-plan.md, evidence links on every ticked checklist row, and profile-specific pass conditions. |
Command files are named
speckit.test.*.md; Spec Kit surfaces them with dots -> hyphens, so they appear as/speckit-test-*in chat. You rarely invoke them manually -- the hooks handle it.
/speckit-constitution defines whether/which tests are mandatory
/speckit-specify -> /speckit-clarify spec items: US{n}-AS{m}, FR-###, SC-###
/speckit-plan
|
+-- after_plan *mandatory -> planaudit
detects Change Profile;
writes ## Testing Strategy to plan.md
/speckit-tasks
|
+-- after_tasks *mandatory -> qaprep
writes test-plan.md (incl. TC Catalogue + Exec Log)
+ scaffolds higher-layer + profile-driven tests
+ seeds checklists/test.md
/speckit-analyze (OPTIONAL in Spec Kit; skippable)
|
+-- after_analyze advisory -> qareview
refreshes review.md with pre-impl read of
consistency + readiness + Catalogue completeness;
NEVER gates implement
/speckit-implement
|
+-- before_implement *mandatory -> tasksaudit (gate, audit-only)
| BLOCKS if any P1 unit/contract task missing
| (plus Change-Profile escalations)
|
+-- after_implement *mandatory -> qareview
refreshes review.md with the FORMAL pre-merge
Gate PASS / BLOCKED, incl. real test results +
execution evidence + profile checks
One line:
/speckit-plan -> /speckit-tasks -> [/speckit-analyze] -> /speckit-implement
planaudit qaprep qareview tasksaudit (gate)
(advisory) + qareview (formal)
| Path | Produced by | When | Lifecycle |
|---|---|---|---|
FEATURE_DIR/plan.md § Testing Strategy |
planaudit |
after /speckit-plan |
Refreshed in place |
FEATURE_DIR/tasks.md (test tasks) |
/speckit-tasks materializes the Strategy; tasksaudit --write adds gaps |
after /speckit-tasks |
Edited additively |
FEATURE_DIR/test-plan.md |
qaprep |
after /speckit-tasks |
Refreshed; §5, 6 preserve QA edits between markers; §8 Exec Log is append-only |
tests/{integration,e2e,regression,perf,a11y,concurrency,security,migration}/ stubs |
qaprep |
after /speckit-tasks |
Created if absent; never overwrites |
tests/manual/<slug>.runbook.md |
qaprep (when TC type = manual) |
after /speckit-tasks |
Created if absent; manual testers fill in evidence |
FEATURE_DIR/checklists/test.md |
qaprep |
after /speckit-tasks |
Refreshed; ticked rows + Evidence links preserved |
FEATURE_DIR/review.md |
qareview |
after /speckit-analyze (advisory) and after /speckit-implement (formal) |
Overwritten each run; git log review.md shows verdict evolution |
The hooks automate everything they can deterministically, but the following touchpoints require a human. The artefacts make these touchpoints explicit so they cannot be forgotten.
| Touchpoint | Who | What | Where it surfaces |
|---|---|---|---|
| Confirm framework choice | Developer | If planaudit could not infer the test framework, it tags the section NEEDS CONFIRMATION |
plan.md § Testing Strategy |
| Add missing test tasks | Developer | tasksaudit BLOCKS on the hook; developer runs --write then reviews the additions |
tasks.md after --write |
| Implement failing stubs | Developer (dev-layer) + QA (higher layers) | qaprep writes stubs that fail; implementation turns them green |
tests/** |
| Fill the Test Case Catalogue | QA / Test author | Preconditions, Steps, Expected, Test data ref, Env per TC -- automated TCs link to the test file, manual TCs link to a runbook | test-plan.md § 3 |
| Edit Entry/Exit and Risk Register | QA lead | qaprep regenerates §1, 2, 3, 4, 7, 9 but preserves §5 (Entry/Exit) and §6 (Risk Register) between QA-edit markers | test-plan.md § 5 + § 6 |
| Tick the checklist with Evidence | QA / Reviewer | Every tick MUST carry an Evidence link (run URL / screenshot path / defect ID / commit SHA); bare ticks are flagged as Major by qareview |
checklists/test.md |
| Record each test run | QA / CI | Append a row to the Test Execution Log: date, runner, build/commit, TC IDs run, pass/fail/skip, defects filed | test-plan.md § 8 |
| Triage defects | QA + Dev | When qareview reports a P1 TC FAIL, file a defect and link the ID in §8 -- a fail with no linked defect is a Blocker |
defect tracker + test-plan.md § 8 |
| Sign off the review | Tech lead + QA lead | review.md is overwritten each run; the PR body should link to the latest review.md at the merge commit |
PR description |
| Waive a gate | Tech lead with constitution authority | Only with explicit --advisory, recorded in the section banner; never silent |
plan.md / tasks.md / test-plan.md banners |
The rule of thumb: anything that requires judgement (was the bug really fixed, is this visual diff acceptable, is the perf regression worth shipping) is a human decision and the artefact has a slot for the human to record it. Anything mechanical (does an item have a test, is a stub still failing, is the schema backwards-compatible) is checked by the hooks.
Commit everything that is part of the spec-driven record of the change. Ignore everything that is a transient run output.
These are the durable artefacts that make the change reviewable and the verdict reproducible.
specs/**/plan.md
specs/**/tasks.md
specs/**/test-plan.md
specs/**/checklists/**
specs/**/review.md
tests/** # incl. qaprep stubs (failing stubs document missing coverage)
tests/manual/** # manual runbooks are part of the spec
review.md is intentionally overwritten on every qareview run. Commit it alongside the
PR -- git log review.md is the verdict history.
Fixture data: small, deterministic fixtures belong in tests/fixtures/ and should be
committed. Large or generated fixtures belong in a fixture store and should be referenced
by URL/SHA from test-plan.md § 7, not committed.
Secrets: never commit credentials, even for test environments. test-plan.md § 7 lists
accounts and references, not values. Use .env.example for shape and an ignored .env
for values.
Approved baselines (visual, perf, schema): commit them under
tests/{visual,perf,contract}/baselines/. Diff output is ignored; the baseline is
checked in so the regression budget is reviewable.
spec.md itself: always committed -- it is the input to the entire chain.
The formal pre-merge gate is whatever qareview writes to FEATURE_DIR/review.md on the
after_implement run -- the run that completes after /speckit-implement finishes, when
real test results are available. The file has a metadata header (feature, timestamp,
verdict, mode, Change Profile) followed by the Blocker / Major / Minor tables and a single
CI-grepable line:
SPECTEST QAREVIEW: 12 items, 7 Strong, 3 Blockers, 2 Majors, 1 Minor -- FAIL
CI can grep that line; the PR description links to review.md at the merge commit. To
refresh the verdict after fixing a Blocker, re-run /speckit-implement (its
after_implement hook fires qareview again).
If the developer also runs /speckit-analyze before implementing, review.md gets an
earlier advisory verdict (consistency + Catalogue readiness, no test results yet) -- useful
as a sanity check but never the formal gate.
| Event | Command | Mode | Behaviour |
|---|---|---|---|
after_plan |
planaudit |
mandatory | Detects Change Profile; appends Testing Strategy to plan.md |
after_tasks |
qaprep |
mandatory | Writes test-plan.md (incl. TC Catalogue + Exec Log), scaffolds higher-layer + profile-driven tests, seeds checklists/test.md |
before_implement |
tasksaudit |
mandatory | Audit-only gate; BLOCKS if any P1 unit/contract task missing (plus profile escalations) |
after_analyze |
qareview |
advisory | Refreshes review.md with pre-implementation readiness verdict; never gates implement |
after_implement |
qareview |
mandatory | Refreshes review.md with the formal pre-merge verdict, incl. real test results + execution evidence + profile checks |
| Convention | Detail |
|---|---|
| Feature directory | Resolved via .specify/scripts/powershell/check-prerequisites.ps1 -Json (FEATURE_DIR) |
| Testable items | US{n}-AS{m}, FR-###, buildable SC-### -- drawn from stock spec.md |
| Change Profile tags | feature, ui, api, bugfix, refactor, concurrency, performance, security, data-migration -- detected once by planaudit, read everywhere else |
| Test tasks | Identified by ### Tests for User Story N subsection or a test path -- not by [P] |
[P] marker |
Means parallelizable (Spec Kit semantics); never used to identify tests |
[US#] label |
Maps a task to its user story |
| TC-ID | Stable test-case ID in test-plan.md § 3, referenced from test file headers and Execution Log |
| Evidence link | Every ticked checklist row needs a run URL / screenshot / defect ID / commit SHA; bare ticks = Major in qareview |
| Tests mandatory? | Unit/contract (TDD) tests are mandatory by default. constitution.md can escalate which layers block; --advisory is the only opt-out, recorded explicitly |
| Verdict file | FEATURE_DIR/review.md -- written by qareview on after_analyze (advisory) and after_implement (formal). Overwritten each run |
| Test plan | FEATURE_DIR/test-plan.md (written by qaprep on after_tasks); § 5 + § 6 preserve QA edits; § 8 Exec Log is append-only |
| Checklist | FEATURE_DIR/checklists/test.md (seeded by qaprep on after_tasks) |
| Stub detection | .skip / .todo / xit / xdescribe, @pytest.mark.skip, throw new Error('TODO'), raise NotImplementedError, expect(true).toBe(true), bare toBeTruthy() / assert True, empty bodies, unmodified qaprep stubs |
| Confidence rating | Strong (test cites item ID), Medium (keyword match), Weak (file maps to story only), Stub (0%), Missing (no test) |
| Out of scope | Read from spec.md § Assumptions (Spec Kit has no ## Out of scope heading) |
MIT