-
Notifications
You must be signed in to change notification settings - Fork 20
Expand file tree
/
Copy pathhypothesis-tests.md.txt
More file actions
335 lines (270 loc) · 14.6 KB
/
hypothesis-tests.md.txt
File metadata and controls
335 lines (270 loc) · 14.6 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
---
name: hypothesis-tests
description: Generate property-based tests using Hypothesis. Builds input strategies in tests/strategies.py that model the valid search-space for each function, then writes minimal, behaviour-focused tests.
argument-hint: "[file, directory, or description of what to test]"
disable-model-invocation: true
---
# Property-Based Tests with Hypothesis
You are now in property-based test authoring mode. Your job is to read
production code, design Hypothesis strategies that model the valid input space
for each function, and write property-based tests that exercise the function's
core behavioural contracts.
## Scope
$ARGUMENTS
- If the user names specific files or directories, scope your work to those.
- If no argument is given, work through the Python source files in the current
project, prioritising modules with complex logic and no existing tests.
- For large codebases, use `AskUserQuestion` to let the user choose which
modules to start with. Don't try to do everything at once.
## Consulting the Hypothesis docs
Hypothesis is a large library with many features — strategies, settings,
stateful testing, database configuration, third-party extensions, `target()`,
`note()`, health checks, and more. **Do not rely on memory alone.** When you
need to use a feature you aren't fully confident about, look it up:
- Use `WebSearch` to find the relevant Hypothesis docs page (e.g.
`"hypothesis python st.composite site:hypothesis.readthedocs.io"`).
- Use `WebFetch` to read the page and extract the exact API, parameters, and
usage examples.
This applies especially to:
- Less common strategies (`st.from_type`, `st.from_regex`, `st.recursive`,
`st.deferred`, `st.runner`, `st.data`)
- `@st.composite` — the `draw` callable interface, returning vs. drawing
- `register_type_strategy` and `st.from_type` interactions
- `settings` profiles and deadline / suppressed health check configuration
- Stateful testing (`RuleBasedStateMachine`) if the user's code has stateful
APIs
Getting the API right matters more than speed. A strategy that misuses
`@st.composite` or silently generates invalid data is worse than taking an
extra minute to check the docs.
## Workflow
1. **Survey** — Glob for `*.py` files in scope. Read each file and build a
mental model of its functions: what they accept, what they return, what
invariants they maintain. Note any existing tests in `tests/`.
2. **Design strategies** — For each function worth testing, define a Hypothesis
strategy that generates valid inputs. Write all strategies to
`tests/strategies.py`. See the strategy design guidance below. Look up
Hypothesis docs for any strategy combinators you're not certain about.
3. **Write tests** — Create test files (e.g. `tests/test_<module>.py`) that
import strategies from `tests/strategies.py` and use `@given` to test
behavioural properties. See the test design guidance below.
4. **Verify** — Run the tests with `pytest`. Fix any failures that reveal bugs
in your strategies or tests (not bugs in the production code — report those
to the user). Run `pytest --co -q` first to check collection before running
the full suite.
Use `TaskCreate` to track progress across modules when there are more than a
handful of files.
## Strategy Design (`tests/strategies.py`)
The strategies module is the heart of this skill. A strategy is not just "some
data that has the right type" — it is a **model of the function's valid input
space**.
### Principles
1. **Start from the function, not the type.** Read the function body. Look for
guards, assertions, early returns, conditional branches, and error paths.
These reveal constraints that the type signature doesn't capture. A parameter
typed `str` might actually need to be a non-empty string, a valid identifier,
a path with a specific extension, or one of a fixed set of values.
2. **Ask: could we sample this value from the distribution we've defined?** For
every strategy, mentally sample a few values and trace them through the
function. Would they hit an unguarded code path that raises? Would they
produce a meaningless result that no real caller would ever see? If so,
tighten the strategy.
3. **Encode constraints, don't filter.** Prefer `st.integers(min_value=1)` over
`st.integers().filter(lambda x: x > 0)`. Prefer `st.from_regex(r"[a-z_]\w*",
fullmatch=True)` over `st.text().filter(str.isidentifier)`. Filtering
discards generated values and slows the search; encoding constraints produces
valid values directly.
4. **Mirror the production domain.** If the function processes a list of records,
the strategy should produce records that look like real records — with
realistic field relationships, not random noise. Use `st.builds()` and
`@st.composite` to construct structured objects.
5. **Compose from small pieces.** Build a vocabulary of reusable atomic
strategies (e.g. `valid_name`, `positive_int`, `nonempty_text`) and compose
them into more complex structures. This makes the strategies readable and
keeps `tests/strategies.py` a useful reference for what constitutes valid
input throughout the codebase.
6. **`st.builds()` for simple constructors, `@st.composite` for everything
else.** `st.builds()` is clean when each argument maps directly to an
independent strategy:
```python
task_strategy = st.builds(
Task,
title=st.text(min_size=1, max_size=200),
priority=st.sampled_from(Priority),
due_date=st.dates(min_value=date(2020, 1, 1)),
)
```
But as soon as there are **dependencies between fields**, conditional logic,
or you need to build intermediate values, switch to `@st.composite`. It
gives you an imperative `draw()` callable that makes complex generation
readable:
```python
@st.composite
def valid_date_range(draw):
start = draw(st.dates(min_value=date(2020, 1, 1)))
end = draw(st.dates(min_value=start))
return DateRange(start=start, end=end)
@st.composite
def valid_pipeline(draw):
n_steps = draw(st.integers(min_value=1, max_value=10))
steps = draw(st.lists(step_strategy, min_size=n_steps, max_size=n_steps))
# Ensure step names are unique within a pipeline
names = draw(
st.lists(
valid_name, min_size=n_steps, max_size=n_steps, unique=True
)
)
for step, name in zip(steps, names):
step.name = name
return Pipeline(steps=steps)
```
`@st.composite` is usually the right choice for domain objects. Don't
contort `st.builds()` with `st.flatmap()` chains when `@st.composite`
would be clearer. Check the Hypothesis docs if you're unsure about the
`draw()` interface.
7. **Register strategies for your types.** If a type appears in many strategies
(e.g. a core domain model), register a strategy for it so that
`st.from_type(MyType)` works automatically:
```python
from hypothesis import strategies as st
st.register_type_strategy(Task, st.builds(
Task,
title=st.text(min_size=1, max_size=200),
priority=st.sampled_from(Priority),
))
```
This is especially useful when Hypothesis needs to infer strategies from
type annotations (e.g. `st.builds(func)` with no explicit keyword
strategies). Place registrations at the top of `tests/strategies.py` after
the strategy definitions they reference. Consult the Hypothesis docs for
`register_type_strategy` and `st.from_type` to understand how resolution
works — especially with generic types and forward references.
8. **Don't over-constrain.** The point of property-based testing is to explore
the input space. If you constrain the strategy so tightly that it generates
only a handful of values, you've written a parameterised test, not a
property-based test. Find the balance: tight enough that every value is valid,
loose enough that Hypothesis can surprise you.
### Structure of `tests/strategies.py`
```python
"""Hypothesis strategies for <project name>.
Each strategy models the valid input space for a function or group of
functions. Strategies are named after the domain concept they represent,
not the function they're used with.
"""
from hypothesis import strategies as st
# -- Atomic strategies (reused across composed strategies) --
# -- Composed / domain strategies (@st.composite for dependent fields) --
# -- Type registrations (so st.from_type(T) resolves automatically) --
```
Group strategies by domain concept. Add a brief comment above each strategy
(or group) explaining what it models and why the constraints exist.
## Test Design
### What to test
Write tests that express **core behavioural contracts**: things that must be
true for all valid inputs. Good properties include:
- **Roundtrip / inverse:** `decode(encode(x)) == x`
- **Idempotence:** `f(f(x)) == f(x)`
- **Invariant preservation:** `len(merge(a, b)) <= len(a) + len(b)`
- **Monotonicity / ordering:** `x <= y implies f(x) <= f(y)`
- **Equivalence to a reference:** `fast_path(x) == naive_impl(x)`
- **Commutativity / associativity:** algebraic laws when applicable
- **No crash (smoke):** the function returns without raising for all valid
inputs — but only as a last resort when no stronger property exists.
If the function has no checkable contract, leave a `# TODO` and move on.
### What NOT to test
- **Structural trivia.** Don't assert that the output has a particular key or
field unless that's the actual contract. A test that says
`assert "name" in result` is testing the output schema, not the behaviour.
If you need schema tests, use Pydantic or a JSON schema validator — not
Hypothesis.
- **Reimplementing the function.** If your test computes the expected output by
running the same logic as the production code, it proves nothing. Test
properties, not point values.
- **Side effects of the current implementation.** If the function happens to
sort its output today but the docstring doesn't promise that, don't test for
it. Test the contract, not the accident.
- **Functions that are just glue.** If a function's only job is to call three
other functions and return the result, testing it end-to-end duplicates
coverage. Test the leaf functions instead. Leave a `# TODO: integration test`
comment if warranted.
### Mocks
- **Minimise mocks.** If you can test a function by passing real (generated)
data, do that. Mocks obscure what's actually being tested and make tests
brittle.
- **Never mock the function under test.** If you're mocking so much of a
function's environment that the test is mostly mocks, the function isn't
unit-testable in its current form. Leave a `# TODO: needs refactoring for
testability` comment and move on.
- **Acceptable mocks:** external I/O (network, filesystem, database) that would
make the test slow or non-deterministic. Use `unittest.mock.patch` or
dependency injection, not monkeypatch hacks.
### Structure of test files
```python
"""Property-based tests for <module>."""
from hypothesis import given, settings, assume
from hypothesis import strategies as st
from tests.strategies import <relevant strategies>
from mypackage.module import <functions under test>
class TestFunctionName:
"""Properties of function_name."""
@given(...)
def test_<property_name>(self, ...):
...
```
- One test class per function (or per closely related group).
- One test method per property.
- Name tests after the property they check: `test_roundtrip`,
`test_idempotent`, `test_length_invariant` — not `test_function_works`.
- Use `@settings(max_examples=...)` only when the default (100) is too slow.
Don't lower it just to make tests pass faster — that defeats the purpose.
- Use `assume()` sparingly and only for constraints that are hard to encode in
the strategy. Every `assume()` is a missed opportunity to improve the
strategy.
### Coverage discipline
Every test you write will show up in coverage reports. A low-value test that
merely calls a function and checks it doesn't crash creates the illusion of
coverage. The person reading the coverage report will assume the function's
behaviour is verified when it isn't.
**Rules:**
- If you can't identify a meaningful property, don't write the test. Leave a
`# TODO: no obvious property — needs manual test or refactoring` comment in
the test file so it shows up in searches but doesn't inflate coverage.
- If a function is trivial (a one-liner, a simple delegation), don't test it.
Coverage of trivial code is noise.
- If a function's interesting behaviour is in an external dependency (e.g. it
calls `requests.post` and returns the response), don't mock the dependency
just to get line coverage. That tests your mock, not your code.
- Prefer fewer, stronger tests over many weak ones. One test with a genuine
roundtrip property is worth more than five tests that each check a different
output field.
## Presenting Changes
For each file you create or modify, write a short summary like:
> **`tests/strategies.py`** — Added `valid_task` and `priority_value`
> strategies. `valid_task` generates `Task` objects with non-empty titles
> (1-200 chars), valid priority enums, and dates after 2020-01-01. Constraints
> based on the `Task.__init__` validation in `models.py:34`.
>
> **`tests/test_task_manager.py`** — 3 property tests for `merge_tasks()`:
> roundtrip with split/merge, length invariant, idempotence of deduplication.
> Left TODO for `sync_tasks()` (requires network mock with complex
> state — better as integration test).
## Critical Rules
- **Read before testing.** Never write strategies for code you haven't read.
You must understand the function's actual constraints, not just its type
signature.
- **Valid inputs only.** Strategies must generate values the function is designed
to handle. Testing with invalid inputs to trigger error paths is a different
activity — don't conflate it with property-based testing of the happy path.
If you want to test error handling, do that in separate, explicit tests (not
with `@given`).
- **No spurious coverage.** If you can't state the property you're testing in
one sentence, don't write the test. Better to leave a TODO than to pad the
coverage report.
- **Strategies go in `tests/strategies.py`.** Keep them separate from tests so
they can be reused and so a reader can understand the valid input space
without wading through assertions.
- **Keep tests minimal.** Each test should assert one property. Don't bundle
multiple checks into a single test method — it makes failures harder to
diagnose and obscures which property was violated.
- **Ask when uncertain.** If you're unsure whether a function has a testable
property, or whether a constraint belongs in the strategy, use
`AskUserQuestion`.