fix(coverage): fail fast when concurrent runs share coverage.reportsDirectory by jgamaraalv · Pull Request #10466 · vitest-dev/vitest

jgamaraalv · 2026-05-27T17:15:16Z

Description

Two or more vitest run --coverage processes pointed at the same working directory (the default coverage.reportsDirectory) delete each other's reports. Both clean() (start of a run) and cleanAfterRun() (end of a run) operate on the shared coverage/ directory, so this surfaced intermittently as a raw ENOENT: lstat coverage/.tmp crash — and even when it didn't crash, the runs silently clobbered each other's final reports.

What changed

The first approach in this PR isolated each run's temp directory (.tmp-<nanoid>). As raised in review, that stopped the crash but was not enough: the final reports still share coverage/, so concurrent same-directory runs keep destroying each other's results.

So instead of isolating, this PR makes concurrent same-directory runs fail fast with an actionable error:

BaseCoverageProvider now acquires a cross-process lock on coverage.reportsDirectory for the duration of the run. The lock lives in the OS temp directory, keyed by the resolved reports directory, so it is not wiped by the directory cleanup it guards.
If a second live process targets the same reports directory, it fails immediately with an error pointing the user to a unique --coverage.reportsDirectory, instead of crashing or losing reports.
The lock is re-entrant for the owning process (watch-mode reruns).
Stale locks left by processes that no longer exist are reclaimed automatically (PID liveness check).
The temp directory name is restored to the deterministic .tmp[-shard].

The fix lives entirely in BaseCoverageProvider, so both the v8 and istanbul providers inherit it.

Running coverage for several processes in parallel is still done the supported way — giving each run its own coverage.reportsDirectory — which is exactly what the new error message tells users to do.

Related to #10111.

Please don't delete this checklist!

Before submitting the PR, please make sure you do the following:

It's really useful if your PR references an issue where it is discussed ahead of time.
- References Coverage chunk writes should ensure reportsDirectory/.tmp exists before writing coverage-*.json #10111 (related)
Ideally, include a test that fails without this PR but passes with it.
- Unit/regression tests in test/coverage-test/test/temporary-files.unit.test.ts
Please, don't make changes to pnpm-lock.yaml unless you introduce a new test example.
- Not touched
Please check Allow edits by maintainers

Tests

Run the tests with pnpm test:ci
Unit tests cover: lock acquire/release, re-entrancy (watch reruns), fail-fast when another live process holds the lock, and automatic reclaim of a stale lock.

Documentation

N/A — bug fix. The new error message is self-documenting and points to coverage.reportsDirectory.

Changesets

Changes in changelog are generated from PR name.
- fix(coverage): fail fast when concurrent runs share coverage.reportsDirectory

…ent same-dir runs don't ENOENT

…NOENT-safe cleanup

…same-dir runs don't ENOENT clean() and cleanAfterRun() removed the shared reportsDirectory recursively, deleting a concurrent run's in-flight .tmp dir. clean() now preserves .tmp* entries (only prior reports are removed) and cleanAfterRun() uses a non-recursive rmdir, so 3 parallel 'vitest run --coverage' in one cwd all exit 0.

…ts from orphan temp dirs Add unit tests proving clean()/cleanAfterRun() preserve a concurrent run's temp dir; make temporary-files and empty-coverage-directory tests start from a clean reportsDirectory now that clean() no longer sweeps other runs' .tmp* dirs.

clean() and cleanAfterRun() read reportsDirectory with readdirSync guarded only by a separate preceding existsSync, leaving a TOCTOU window where a concurrent same-cwd run can remove the directory between the check and the read and surface a raw ENOENT — the exact failure class this feature promises to eliminate. Replace the existsSync-then- readdirSync pattern with an ENOENT-safe safeReaddir helper in both call sites. Also drop the task-added explanatory comments on the rewritten lines per the repo's no-explanatory-comments rule.

Remove the task-id (T-01, AC-1, AC-3) and explanatory comments added to the coverage temp-dir tests; they leak internal planning ids into the suite and violate the repo's no-explanatory-comments rule. Assertions unchanged.

github-actions · 2026-05-27T17:15:38Z

Hello @jgamaraalv. Your PR has been labeled maybe automated because it appears to have been fully generated by AI with no human involvement.

To keep your PR open, please follow these steps:

Confirm that you are a real human. If you are an automated agent, disclose that
Confirm you've read, reviewed and stand behind its content
Confirm you've read the full issue along with all of its comments, as well as any linked issues and their comments
Make sure it follows our contribution guidelines and uses the correct GitHub template
Disclose any AI tools you used (e.g. Claude, Copilot, Codex)

Please, do not generate or format the response with AI. If you do not speak English, reply in your native language or use translation software like Google Translate or Deepl. If the response is generated, the PR will be closed automatically.

These measures help us reduce maintenance burden and keep the team's work efficient. See our AI contributions policy for more context.

netlify · 2026-05-27T17:16:31Z

✅ Deploy Preview for vitest-dev ready!

Built without sensitive environment variables

Name	Link
🔨 Latest commit	`ea8a383`
🔍 Latest deploy log	https://app.netlify.com/projects/vitest-dev/deploys/6a391b8fd06aa400085139a9
😎 Deploy Preview	https://deploy-preview-10466--vitest-dev.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.
🤖 Make changes	Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

jgamaraalv · 2026-05-27T17:29:29Z

My problem before saw issue #10111:

v8 coverage: concurrent runs in the same project dir crash with ENOENT lstat coverage/.tmp

Environment: vitest 4.0.18, @vitest/coverage-v8 4.0.18, Node 22.14.0, pnpm 10.30.2, macOS (Darwin 25.3.0)
What happens: Running two or more vitest run --coverage processes concurrently in the same working directory intermittently fails with: Error: ENOENT: no such file or directory, lstat '<cwd>/coverage/.tmp' { errno: -2, code: 'ENOENT', syscall: 'lstat', path: '<cwd>/coverage/.tmp' }
One process's coverage cleanup/merge empties coverage/.tmp while another process is still reading it. A single run is always clean.

Repro:

In any project with coverage.provider: 'v8' and default reportsDirectory:
rm -rf coverage
for i in 1 2 3; do (vitest run --coverage &) ; done; wait # ENOENT on coverage/.tmp
vs. sequential:
vitest run --coverage → always exit 0

Expected:

concurrent same-dir runs either isolate their tmp dir per process or fail with a clear "coverage dir already in use" message, instead of an unhandled ENOENT. Workaround: give each run a unique --coverage.reportsDirectory (e.g. coverage-$PPID) or serialize the runs.

jgamaraalv · 2026-05-27T17:51:09Z

Hello @jgamaraalv. Your PR has been labeled maybe automated because it appears to have been fully generated by AI with no human involvement.

To keep your PR open, please follow these steps:

Confirm that you are a real human. If you are an automated agent, disclose that

Confirm you've read, reviewed and stand behind its content

Confirm you've read the full issue along with all of its comments, as well as any linked issues and their comments

Make sure it follows our contribution guidelines and uses the correct GitHub template

Disclose any AI tools you used (e.g. Claude, Copilot, Codex)

Please, do not generate or format the response with AI. If you do not speak English, reply in your native language or use translation software like Google Translate or Deepl. If the response is generated, the PR will be closed automatically.

These measures help us reduce maintenance burden and keep the team's work efficient. See our AI contributions policy for more context.

Confirmo que sou um humano.
Li, revisei, testei e retestei tudo, da issue originária até a solução final.
Li toda a issue após enfrentar o mesmo problema e ver que já existia alguém que passou por algo semelhante.
Segui todas as diretrizes de contribuição e o template.
Utilizei o Claude Code para desenvolvimento assistido por IA.

AriPerkkio

Two or more vitest run --coverage processes running in the same working directory

Each of them will also remove ./coverage during start, removing each others possible results. This fix is not enough.

Proper fix is from user side to use vitest run --coverage --coverage.reportsDirectory=some-unique-path.

jgamaraalv · 2026-05-27T19:42:02Z

Two or more vitest run --coverage processes running in the same working directory

Each of them will also remove ./coverage during start, removing each others possible results. This fix is not enough.

Proper fix is from user side to use vitest run --coverage --coverage.reportsDirectory=some-unique-path.

@AriPerkkio You're right — isolating only the temp dir wasn't enough. Even with separate .tmp dirs, both runs still write their final reports into the same coverage/ directory and clean() wipes the other run's results, so the data loss stayed; I had only hidden the ENOENT crash.

I pushed a different approach. Instead of isolating, Vitest now takes a lock on coverage.reportsDirectory (stored in the OS temp dir, keyed by the resolved path) for the duration of the run.

If a second process targets the same directory while another is still running, it fails right away with an actionable error telling the user to give each run its own --coverage.reportsDirectory, which is exactly the fix you described, just surfaced by Vitest instead of an intermittent crash.

The lock is re-entrant for the same process (watch reruns) and stale locks from dead processes are reclaimed automatically. The temp dir name is back to the deterministic .tmp[-shard].

So this doesn't try to make same-directory concurrent runs "work" — it makes them fail fast with a clear message and points to the unique-directory fix.

If you'd prefer not to add this and just document the unique reportsDirectory requirement instead let me know.

…irectory Two or more `vitest run --coverage` processes pointed at the same reportsDirectory delete each other's reports, which previously surfaced as an intermittent raw `ENOENT: lstat coverage/.tmp` crash. Per-run temp-dir isolation stopped the crash but still let concurrent runs clobber each other's final reports, so it was not enough. Replace it with a cross-process lock, keyed by the resolved reportsDirectory and stored in the OS temp dir (outside the reports directory it guards): - A second live process targeting the same reportsDirectory now fails immediately with an actionable error pointing to `--coverage.reportsDirectory`, instead of crashing or silently losing reports. - The lock is re-entrant for the owning process (watch mode reruns). - Stale locks left by processes that no longer exist are reclaimed automatically. The temp directory name is restored to the deterministic `.tmp[-shard]`.

AriPerkkio · 2026-05-28T05:33:57Z

[...] Vitest now takes a lock on coverage.reportsDirectory (stored in the OS temp dir, keyed by the resolved path) for the duration of the run.

If a second process targets the same directory while another is still running, it fails right away with an actionable error telling the user to give each run its own --coverage.reportsDirectory, which is exactly the fix you described, just surfaced by Vitest instead of an intermittent crash.

I like this approach, but need some time to think through it before commenting about the implementation. 👍

Maybe the lockfiles could be stored in .vitest/lockfiles instead of in tmpdir(). We might want similar solution for other reporters too, not sure yet.

AriPerkkio

Overall looks good. This will prevent two simultaneous Vitest runs from deleting each other's results. Some minor comments below.

Addressing review feedback on the reportsDirectory lock: - Simplify acquisition from a 10-iteration loop to a single exclusive create plus one retry, which is only reached after reclaiming a stale lock. The bounded retry was the only reason to loop at all. - Reclaim a stale lock with an atomic rename instead of an unconditional remove-then-recreate. Two processes racing to reclaim the same stale lock could previously both succeed and run coverage concurrently in the same directory; now only the process whose rename wins removes the file, the loser retries the exclusive create and fails fast. - Release the lock only when this process can confirm it owns it, so a transient unreadable read can no longer delete another process's lock. - Document the signal-0 liveness check and drop the practically unreachable secondary throw in favour of a single shared error helper.

Replace the module-level cleanups array and afterEach hook with onTestFinished registrations inside the helpers that allocate the temp reports directory, lock file, and child process. Cleanup now lives next to the resource it tears down. Assertions unchanged.

AriPerkkio

Looks good to me. @hi-ogawa & @sheremet-va any concerns?

Move the cross-process lock logic out of BaseCoverageProvider into a standalone ReportsDirectoryLock class with acquire()/release(), and drop the path-returning getter in favor of the lock owning its own lockFile. Addresses review feedback that the lock implementation cluttered the provider and the getter felt unjustified.

… locks Address review findings on the cross-process reportsDirectory lock: - Close a write-window TOCTOU: publish the lock atomically by writing the full payload to a temp file then fs.link()-ing it into place, so the lock file is never observable empty/partial and can't be stolen mid-write. - Release the lock on process exit (best-effort, owner-checked) so a throwing report path, reportOnFailure, watch keepResults, or graceful close no longer leaks a stale lock into the OS temp directory. - Make stale-lock reclaim race-safe: rename the contested lock to a private path, re-read that inode, and only discard it when it is still the stale owner observed; otherwise restore it (never destroying a displaced live owner's copy) and fail closed. Document the residual, inherent 3+-process advisory-lock window. - Validate the full owner identity (pid + reportsDirectory + timestamp + per-acquire nonce) before honoring or releasing a lock, defeating PID reuse and hash-collision false matches. - Treat only ESRCH as a dead process in the liveness check; let unexpected readOwner I/O errors (EACCES/EMFILE) propagate instead of silently stealing an unreadable lock. - Tolerate EPERM/EBUSY/EACCES on the reclaim rename (Windows) and surface the lock timestamp in the actionable in-use error.

jgamaraalv · 2026-06-11T15:56:05Z

@hi-ogawa @AriPerkkio @sheremet-va

Last two review comments (resolved in e5199e2)

Two nits are addressed by e5199e2 (refactor(coverage): extract reportsDirectory lock into a dedicated class):

The reportsDirectoryLockFile getter that "didn't feel called for" is gone. The lock logic now lives in a dedicated ReportsDirectoryLock class that owns its own lockFile (computed in its constructor) and exposes acquire() / release().
The lock implementation no longer clutters BaseCoverageProvider — the provider just holds a ReportsDirectoryLock instance, which also resolves the "split this into a standalone class" suggestion.

About the follow-up commit 4d81d3a

Heads up that I pushed one more commit after the approval — fix(coverage): harden reportsDirectory lock against races and corrupt locks (4d81d3a). While revisiting the extracted lock I ran it through a deeper correctness/concurrency pass and tightened a few things on the lock primitive itself (not the public behaviour):

atomic lock publish (write temp + fs.link) so the lock file is never observable empty/partial and can't be stolen mid-write;
release the lock on process exit so a throwing report path / reportOnFailure / watch / graceful close no longer leaks a stale lock into the OS temp dir;
race-safe stale-lock reclaim (rename-aside + re-read + fail-closed, never destroying a displaced live owner's copy);
full owner-identity check (pid + reportsDirectory + timestamp + per-acquire nonce) to defeat PID reuse and hash-collision false matches;
only ESRCH counts as a dead process; unreadable-lock I/O errors propagate instead of silently stealing the lock;
tolerate EPERM/EBUSY/EACCES on the reclaim rename (Windows).

This sits on top of an already-reviewed/approved feature, so if you'd rather keep the PR minimal and scoped to just the requested fixes, I'm happy to drop 4d81d3a and leave only the review-comment resolution — just let me know your preference. Equally happy to keep it if the extra hardening is welcome.

jgamaraalv · 2026-06-18T17:48:19Z

@AriPerkkio any news on this issue?

AriPerkkio

This PR started as small improvement but now the LLM behind this code is getting quite verbose. I'm starting to consider whether we should just discard this PR and let human rewrite it without LLMs wall-of-text comments and never-going-to-happen-edge-case -handlings. In the end we will be the ones maintaining this for years. 😬

Maybe this lockfile stuff and all possible edge cases should be extracted into its own npm package even.

Drop the speculative machinery that crept into the lock (atomic write+link publish, nonce/timestamp owner identity, rename-aside stale reclaim with restore, process-exit handler) in favour of a plain create-with-wx lock that reclaims a dead owner's lock on the next run. Same fail-fast behaviour for concurrent same-directory runs, far less to maintain. A leaked lock from a hard-killed process is cleared by the next run's dead-pid check, so the exit handler was redundant.

jgamaraalv · 2026-06-22T19:25:34Z

Yeah, fair, and you're right. That last hardening commit got away from me.. I went down a rabbit hole guarding race windows that don't really happen (the 3-process SIGKILL reclaim especially) and ended up with comments longer than the code they were explaining.

Pushed 888edb5 that rips all of it back out. What's left is about as small as this lock gets:

write the lockfile with wx (create-or-fail), that's the actual mutual exclusion
already there + owner pid alive -> fail fast with the actionable error
already there + owner pid dead -> remove it, try once more
release in cleanAfterRun

No more nonce/timestamp identity, no write+link publish, no rename-aside/restore dance, no exit handler. coverage.ts is ~280 lines lighter and the tests dropped the cases that only existed to prove the machinery I just deleted.

Honestly I'd rather keep it this small than spin it into its own package, there isn't much left to extract now. If you still want a human to rewrite it from scratch I won't argue, but I think this version is one you'd actually be ok maintaining for years. Sorry for the noise on the way here.

jgamaraalv · 2026-06-22T19:47:17Z

Yeah, fair, and you're right. That last hardening commit got away from me.. I went down a rabbit hole guarding race windows that don't really happen (the 3-process SIGKILL reclaim especially) and ended up with comments longer than the code they were explaining.

Pushed 888edb5 that rips all of it back out. What's left is about as small as this lock gets:

write the lockfile with wx (create-or-fail), that's the actual mutual exclusion

already there + owner pid alive -> fail fast with the actionable error

already there + owner pid dead -> remove it, try once more

release in cleanAfterRun

No more nonce/timestamp identity, no write+link publish, no rename-aside/restore dance, no exit handler. coverage.ts is ~280 lines lighter and the tests dropped the cases that only existed to prove the machinery I just deleted.

Honestly I'd rather keep it this small than spin it into its own package, there isn't much left to extract now. If you still want a human to rewrite it from scratch I won't argue, but I think this version is one you'd actually be ok maintaining for years. Sorry for the noise on the way here.

A propósito, quero frisar que nenhuma alteração feita com o auxilio de LLM foi feita sem a minha revisão e testes de que as implementações realmente funcionavam. Os comentários extras foram adicionados depois de surgirem algumas dúvidas sobre algumas implementações, conforme respondi em um dos seus request changes e mais uma vez peço desculpas por isso.

Novamente, acho as correções desse PR 100% válidas, não descartaria, mas fica ao seu critério e a critério do time, que respeito muito o trabalho.

Encerro as minhas contribuições nesse PR por aqui.
Abraços! 🇧🇷

jgamaraalv added 6 commits May 27, 2026 11:43

fix(coverage): isolate per-run temp dir and harden cleanup so concurr…

26d2b12

…ent same-dir runs don't ENOENT

test(coverage): regression tests for per-run temp-dir isolation and E…

5e50d5c

…NOENT-safe cleanup

github-actions Bot added the maybe automated User is likely an AI agent, or the content was generated by an AI assistant without user control label May 27, 2026

This comment was marked as duplicate.

Sign in to view

AriPerkkio requested changes May 27, 2026

View reviewed changes

jgamaraalv force-pushed the fix/coverage-concurrent-tmp-dir branch from 2d33ce7 to 231ef67 Compare May 27, 2026 19:44

jgamaraalv changed the title ~~fix(coverage): isolate per-run temp dir so concurrent same-dir coverage runs don't ENOENT~~ fix(coverage): fail fast when concurrent runs share coverage.reportsDirectory May 27, 2026

jgamaraalv requested a review from AriPerkkio May 27, 2026 19:53

AriPerkkio removed the maybe automated User is likely an AI agent, or the content was generated by an AI assistant without user control label May 28, 2026

AriPerkkio requested changes Jun 3, 2026

View reviewed changes

Comment thread packages/vitest/src/node/coverage.ts Outdated

Comment thread packages/vitest/src/node/coverage.ts Outdated

Comment thread test/coverage-test/test/temporary-files.unit.test.ts Outdated

Comment thread packages/vitest/src/node/coverage.ts Outdated

jgamaraalv added 2 commits June 3, 2026 21:34

jgamaraalv requested a review from AriPerkkio June 4, 2026 00:37

AriPerkkio reviewed Jun 10, 2026

View reviewed changes

Comment thread test/coverage-test/test/temporary-files.unit.test.ts Outdated

fix: lint

f2eea11

AriPerkkio previously approved these changes Jun 10, 2026

View reviewed changes

AriPerkkio requested review from hi-ogawa and sheremet-va June 10, 2026 11:40

hi-ogawa reviewed Jun 11, 2026

View reviewed changes

Comment thread packages/vitest/src/node/coverage.ts Outdated

Comment thread packages/vitest/src/node/coverage.ts Outdated

jgamaraalv dismissed AriPerkkio’s stale review via e5199e2 June 11, 2026 15:01

jgamaraalv requested review from AriPerkkio and hi-ogawa June 11, 2026 15:57

refactor: move verbose lock class to end of file

ea8a383

AriPerkkio requested changes Jun 22, 2026

View reviewed changes

Comment thread packages/vitest/src/node/coverage.ts Outdated

Comment thread packages/vitest/src/node/coverage.ts Outdated

jgamaraalv requested a review from AriPerkkio June 22, 2026 19:36

hi-ogawa removed their request for review June 23, 2026 00:23

Uh oh!

Conversation

jgamaraalv commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

What changed

Please don't delete this checklist!

Tests

Documentation

Changesets

Uh oh!

github-actions Bot commented May 27, 2026

Uh oh!

netlify Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for vitest-dev ready!

Uh oh!

This comment was marked as duplicate.

Uh oh!

jgamaraalv commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

v8 coverage: concurrent runs in the same project dir crash with ENOENT lstat coverage/.tmp

Repro:

Expected:

Uh oh!

jgamaraalv commented May 27, 2026

Uh oh!

AriPerkkio left a comment

Choose a reason for hiding this comment

Uh oh!

jgamaraalv commented May 27, 2026

Uh oh!

AriPerkkio commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AriPerkkio left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AriPerkkio left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jgamaraalv commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jgamaraalv commented Jun 18, 2026

Uh oh!

AriPerkkio left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jgamaraalv commented Jun 22, 2026

Uh oh!

jgamaraalv commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jgamaraalv commented May 27, 2026 •

edited

Loading

netlify Bot commented May 27, 2026 •

edited

Loading

jgamaraalv commented May 27, 2026 •

edited

Loading

AriPerkkio commented May 28, 2026 •

edited

Loading

jgamaraalv commented Jun 11, 2026 •

edited

Loading