Status quo
Test reordering is implemented by TestSuiteSorter and driven by two orthogonal knobs plus one toggle:
- a main order:
default, reverse, random, duration-ascending, duration-descending, size-ascending, size-descending
- a defects overlay: on/off
- a dependency resolver: on/off
Both the XML executionOrder attribute and the CLI --order-by option accept a comma-separated list where each token flips one of these three knobs. Under the hood, TestSuiteSorter::sort() applies them in a hard-coded three-phase pipeline:
- main order
- defects-first overlay (stable, so the main order is preserved inside
each defect tier)
- dependency resolution
Recent fixes
Three real bugs in TestSuiteSorter have just been corrected:
cmpDefectPriorityAndTime() used to fall through to a duration-based tie-break for any pair of equally-weighted items. Combined with PHP's stable usort(), this could invert the order produced by the main phase, breaking the implicit "unit suite before end-to-end suite" ordering when both contained equally-weighted defects. The comparator now returns 0 for equal weights and the preceding phase's order is preserved.
- Skipped and incomplete tests were being incorrectly treated as defects. They are no longer reordered by the "defects first" strategy.
cmpSize treated every non-TestCase, non-DataProviderTestSuite input as unknown, which meant a plain TestSuite wrapping a TestCase class was always sorted as unknown rather than by the size of the tests it contained. The comparator now recurses into child tests and takes the maximum size weight.
What is still "awkward"
Even with the bug fixes, several things about the current design feel wrong:
- Phase order is hard-coded. The user cannot express "resolve dependencies first, then apply defects on top of the result", or "defects first, then randomize the non-defective tail", even though both are sensible asks. The comma-separated token list looks like a pipeline but is really just three flags whose precedence is fixed in
sort().
- Contradictory tokens are silently accepted.
duration-ascending,duration-descending resolves to "last wins" without a warning, and reverse,random is similarly under-specified. A misspelled XML token is ignored entirely.
default means two different things. As a token it resets every knob to its default; as an enum value it means "no main order". This conflates "I have an opinion about the default" with "I don't".
- The "defects" notion is binary. Every defective status collapses into "hoist to front, ordered by severity". There is no way to say "prioritize only failures and errors", or "put warnings last, before success".
- The result cache drives ordering, but the cache is opaque to the user. There is no way to see the priorities the sorter is about to apply without reading the JSON by hand, and no diagnostic when the cache is stale or absent.
- Suites and test cases are treated the same by the comparator. A top-level
TestSuite aggregates priority as max() of its direct children, and that aggregate then competes with leaf TestCase weights in the same comparator. This is why the original bug only surfaced in the mixed unit-vs-end-to-end case.
- No single source of truth for "what reordering applied". The
Test Suite Sorted event fires with no payload; users cannot inspect the effective pipeline after configuration is resolved.
Proposal
The goal should be an ordering subsystem where the user expresses an ordered pipeline of stages, each stage is a well-defined transform over a list of tests, and the configuration surface (XML + CLI) round-trips cleanly to that pipeline.
Sketch of the new model
- A
ReorderStage interface with a single method: apply(list<Test> $tests, Context $context): list<Test>.
- Concrete stages:
ByDuration(Direction), BySize(Direction), Reverse, Random
DefectsFirst(WeightPolicy) — weight policy lets callers exclude specific statuses or remap weights (this subsumes the current "skipped is not a defect" rule)
ResolveDependencies
- A
ReorderPipeline is an ordered list of stages. Applied at each TestSuite level, exactly as today, but the phase order is whatever the user declared instead of the hard-coded three-phase sequence.
Context carries the ResultCache, the current TestSuite, and a way for a stage to emit diagnostics (for a future --explain-order feature).
Configuration
- XML
executionOrder and CLI --order-by both accept an ordered, comma-separated list of stage tokens. Precedence is token order, not hard-coded phase order.
- Contradictory tokens (two main orders,
depends and no-depends together, etc.) raise a configuration error instead of silently resolving to last-wins.
- Unknown tokens raise a configuration error, not silent skip.
- The existing short tokens (
default, depends, no-depends, defects, reverse, random, duration-ascending, size-ascending, etc.) map onto the new stage types. default stays as a reset word but is disambiguated from "no main order".
- Deprecated tokens (
duration, size) keep emitting their existing deprecation events during the transition and are removed in PHPUnit 14 as already announced.
Diagnostics
The "Test Suite Sorted" event gains a payload describing the effective pipeline and, optionally, the per-test weights that were applied. A new --explain-order CLI flag (or extension to --debug) prints the effective pipeline once at startup. This makes the "why did test X run at position Y?" question answerable without reading the result cache JSON.
Backward compatibility
- Single-token inputs (
defects, depends, duration-ascending, ...) behave exactly as today.
- Multi-token inputs that happen to match the existing three-phase interpretation (
depends,defects, depends,defects,duration-ascending, ...) continue to produce the same order, now pinned by the end-to-end tests added in the bug-fix commit.
- Multi-token inputs that previously relied on the silent last-wins behavior (e.g.
duration-ascending,duration-descending) will start raising configuration errors. This is a deliberate, user-visible change; it should go through the usual deprecation cycle and be released in a major version.
Open questions
- Should
DefectsFirst be weight-customizable via XML/CLI at all, or should the "skipped and incomplete are not defects" decision stay as a hard-coded framework policy? Exposing it invites bikeshedding; hiding it loses flexibility.
- Should the top-level
TestSuite comparator aggregate children by max() (current behavior), by sum(), or by a count of leaf defects? The current aggregation is why this bug surfaced only under specific conditions, and the new design is a good opportunity to revisit it.
- Can the
ResolveDependencies stage be generalized so that a test's declared #[Depends] can influence the sort order even when depends is not explicitly enabled? In practice, violating a declared dependency is always wrong, and making it opt-out feels safer than making it opt-in.
Status quo
Test reordering is implemented by
TestSuiteSorterand driven by two orthogonal knobs plus one toggle:default,reverse,random,duration-ascending,duration-descending,size-ascending,size-descendingBoth the XML
executionOrderattribute and the CLI--order-byoption accept a comma-separated list where each token flips one of these three knobs. Under the hood,TestSuiteSorter::sort()applies them in a hard-coded three-phase pipeline:each defect tier)
Recent fixes
Three real bugs in
TestSuiteSorterhave just been corrected:cmpDefectPriorityAndTime()used to fall through to a duration-based tie-break for any pair of equally-weighted items. Combined with PHP's stableusort(), this could invert the order produced by the main phase, breaking the implicit "unit suite before end-to-end suite" ordering when both contained equally-weighted defects. The comparator now returns0for equal weights and the preceding phase's order is preserved.cmpSizetreated every non-TestCase, non-DataProviderTestSuiteinput asunknown, which meant a plainTestSuitewrapping aTestCaseclass was always sorted asunknownrather than by the size of the tests it contained. The comparator now recurses into child tests and takes the maximum size weight.What is still "awkward"
Even with the bug fixes, several things about the current design feel wrong:
sort().duration-ascending,duration-descendingresolves to "last wins" without a warning, andreverse,randomis similarly under-specified. A misspelled XML token is ignored entirely.defaultmeans two different things. As a token it resets every knob to its default; as an enum value it means "no main order". This conflates "I have an opinion about the default" with "I don't".TestSuiteaggregates priority asmax()of its direct children, and that aggregate then competes with leafTestCaseweights in the same comparator. This is why the original bug only surfaced in the mixed unit-vs-end-to-end case.Test Suite Sortedevent fires with no payload; users cannot inspect the effective pipeline after configuration is resolved.Proposal
The goal should be an ordering subsystem where the user expresses an ordered pipeline of stages, each stage is a well-defined transform over a list of tests, and the configuration surface (XML + CLI) round-trips cleanly to that pipeline.
Sketch of the new model
ReorderStageinterface with a single method:apply(list<Test> $tests, Context $context): list<Test>.ByDuration(Direction),BySize(Direction),Reverse,RandomDefectsFirst(WeightPolicy)— weight policy lets callers exclude specific statuses or remap weights (this subsumes the current "skipped is not a defect" rule)ResolveDependenciesReorderPipelineis an ordered list of stages. Applied at eachTestSuitelevel, exactly as today, but the phase order is whatever the user declared instead of the hard-coded three-phase sequence.Contextcarries theResultCache, the currentTestSuite, and a way for a stage to emit diagnostics (for a future--explain-orderfeature).Configuration
executionOrderand CLI--order-byboth accept an ordered, comma-separated list of stage tokens. Precedence is token order, not hard-coded phase order.dependsandno-dependstogether, etc.) raise a configuration error instead of silently resolving to last-wins.default,depends,no-depends,defects,reverse,random,duration-ascending,size-ascending, etc.) map onto the new stage types.defaultstays as a reset word but is disambiguated from "no main order".duration,size) keep emitting their existing deprecation events during the transition and are removed in PHPUnit 14 as already announced.Diagnostics
The "Test Suite Sorted" event gains a payload describing the effective pipeline and, optionally, the per-test weights that were applied. A new
--explain-orderCLI flag (or extension to--debug) prints the effective pipeline once at startup. This makes the "why did test X run at position Y?" question answerable without reading the result cache JSON.Backward compatibility
defects,depends,duration-ascending, ...) behave exactly as today.depends,defects,depends,defects,duration-ascending, ...) continue to produce the same order, now pinned by the end-to-end tests added in the bug-fix commit.duration-ascending,duration-descending) will start raising configuration errors. This is a deliberate, user-visible change; it should go through the usual deprecation cycle and be released in a major version.Open questions
DefectsFirstbe weight-customizable via XML/CLI at all, or should the "skipped and incomplete are not defects" decision stay as a hard-coded framework policy? Exposing it invites bikeshedding; hiding it loses flexibility.TestSuitecomparator aggregate children bymax()(current behavior), bysum(), or by a count of leaf defects? The current aggregation is why this bug surfaced only under specific conditions, and the new design is a good opportunity to revisit it.ResolveDependenciesstage be generalized so that a test's declared#[Depends]can influence the sort order even whendependsis not explicitly enabled? In practice, violating a declared dependency is always wrong, and making it opt-out feels safer than making it opt-in.