Skip to content

fix(coverage): fail fast when concurrent runs share coverage.reportsDirectory#10466

Open
jgamaraalv wants to merge 14 commits into
vitest-dev:mainfrom
jgamaraalv:fix/coverage-concurrent-tmp-dir
Open

fix(coverage): fail fast when concurrent runs share coverage.reportsDirectory#10466
jgamaraalv wants to merge 14 commits into
vitest-dev:mainfrom
jgamaraalv:fix/coverage-concurrent-tmp-dir

Conversation

@jgamaraalv

@jgamaraalv jgamaraalv commented May 27, 2026

Copy link
Copy Markdown

Description

Two or more vitest run --coverage processes pointed at the same working directory (the default coverage.reportsDirectory) delete each other's reports. Both clean() (start of a run) and cleanAfterRun() (end of a run) operate on the shared coverage/ directory, so this surfaced intermittently as a raw ENOENT: lstat coverage/.tmp crash — and even when it didn't crash, the runs silently clobbered each other's final reports.

What changed

The first approach in this PR isolated each run's temp directory (.tmp-<nanoid>). As raised in review, that stopped the crash but was not enough: the final reports still share coverage/, so concurrent same-directory runs keep destroying each other's results.

So instead of isolating, this PR makes concurrent same-directory runs fail fast with an actionable error:

  • BaseCoverageProvider now acquires a cross-process lock on coverage.reportsDirectory for the duration of the run. The lock lives in the OS temp directory, keyed by the resolved reports directory, so it is not wiped by the directory cleanup it guards.
  • If a second live process targets the same reports directory, it fails immediately with an error pointing the user to a unique --coverage.reportsDirectory, instead of crashing or losing reports.
  • The lock is re-entrant for the owning process (watch-mode reruns).
  • Stale locks left by processes that no longer exist are reclaimed automatically (PID liveness check).
  • The temp directory name is restored to the deterministic .tmp[-shard].

The fix lives entirely in BaseCoverageProvider, so both the v8 and istanbul providers inherit it.

Running coverage for several processes in parallel is still done the supported way — giving each run its own coverage.reportsDirectory — which is exactly what the new error message tells users to do.

Related to #10111.


Please don't delete this checklist!

Before submitting the PR, please make sure you do the following:


Tests

  • Run the tests with pnpm test:ci
  • Unit tests cover: lock acquire/release, re-entrancy (watch reruns), fail-fast when another live process holds the lock, and automatic reclaim of a stale lock.

Documentation

  • N/A — bug fix. The new error message is self-documenting and points to coverage.reportsDirectory.

Changesets

  • Changes in changelog are generated from PR name.
    • fix(coverage): fail fast when concurrent runs share coverage.reportsDirectory

…same-dir runs don't ENOENT

clean() and cleanAfterRun() removed the shared reportsDirectory recursively, deleting a concurrent run's in-flight .tmp dir. clean() now preserves .tmp* entries (only prior reports are removed) and cleanAfterRun() uses a non-recursive rmdir, so 3 parallel 'vitest run --coverage' in one cwd all exit 0.
…ts from orphan temp dirs

Add unit tests proving clean()/cleanAfterRun() preserve a concurrent run's temp dir; make temporary-files and empty-coverage-directory tests start from a clean reportsDirectory now that clean() no longer sweeps other runs' .tmp* dirs.
clean() and cleanAfterRun() read reportsDirectory with readdirSync
guarded only by a separate preceding existsSync, leaving a TOCTOU window
where a concurrent same-cwd run can remove the directory between the
check and the read and surface a raw ENOENT — the exact failure class
this feature promises to eliminate. Replace the existsSync-then-
readdirSync pattern with an ENOENT-safe safeReaddir helper in both call
sites.

Also drop the task-added explanatory comments on the rewritten lines per
the repo's no-explanatory-comments rule.
Remove the task-id (T-01, AC-1, AC-3) and explanatory comments added to
the coverage temp-dir tests; they leak internal planning ids into the
suite and violate the repo's no-explanatory-comments rule. Assertions
unchanged.
@github-actions github-actions Bot added the maybe automated User is likely an AI agent, or the content was generated by an AI assistant without user control label May 27, 2026
@github-actions

Copy link
Copy Markdown

Hello @jgamaraalv. Your PR has been labeled maybe automated because it appears to have been fully generated by AI with no human involvement.

To keep your PR open, please follow these steps:

  • Confirm that you are a real human. If you are an automated agent, disclose that
  • Confirm you've read, reviewed and stand behind its content
  • Confirm you've read the full issue along with all of its comments, as well as any linked issues and their comments
  • Make sure it follows our contribution guidelines and uses the correct GitHub template
  • Disclose any AI tools you used (e.g. Claude, Copilot, Codex)

Please, do not generate or format the response with AI. If you do not speak English, reply in your native language or use translation software like Google Translate or Deepl. If the response is generated, the PR will be closed automatically.

These measures help us reduce maintenance burden and keep the team's work efficient. See our AI contributions policy for more context.

@netlify

netlify Bot commented May 27, 2026

Copy link
Copy Markdown

Deploy Preview for vitest-dev ready!

Built without sensitive environment variables

Name Link
🔨 Latest commit ea8a383
🔍 Latest deploy log https://app.netlify.com/projects/vitest-dev/deploys/6a391b8fd06aa400085139a9
😎 Deploy Preview https://deploy-preview-10466--vitest-dev.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
🤖 Make changes Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

jgamaraalv

This comment was marked as duplicate.

@jgamaraalv

jgamaraalv commented May 27, 2026

Copy link
Copy Markdown
Author

My problem before saw issue #10111:

v8 coverage: concurrent runs in the same project dir crash with ENOENT lstat coverage/.tmp

  • Environment: vitest 4.0.18, @vitest/coverage-v8 4.0.18, Node 22.14.0, pnpm 10.30.2, macOS (Darwin 25.3.0)
  • What happens: Running two or more vitest run --coverage processes concurrently in the same working directory intermittently fails with: Error: ENOENT: no such file or directory, lstat '<cwd>/coverage/.tmp' { errno: -2, code: 'ENOENT', syscall: 'lstat', path: '<cwd>/coverage/.tmp' }
  • One process's coverage cleanup/merge empties coverage/.tmp while another process is still reading it. A single run is always clean.

Repro:

In any project with coverage.provider: 'v8' and default reportsDirectory:
rm -rf coverage
for i in 1 2 3; do (vitest run --coverage &) ; done; wait # ENOENT on coverage/.tmp
vs. sequential:
vitest run --coverage → always exit 0

Expected:

concurrent same-dir runs either isolate their tmp dir per process or fail with a clear "coverage dir already in use" message, instead of an unhandled ENOENT. Workaround: give each run a unique --coverage.reportsDirectory (e.g. coverage-$PPID) or serialize the runs.

@jgamaraalv

Copy link
Copy Markdown
Author

Hello @jgamaraalv. Your PR has been labeled maybe automated because it appears to have been fully generated by AI with no human involvement.

To keep your PR open, please follow these steps:

  • Confirm that you are a real human. If you are an automated agent, disclose that
  • Confirm you've read, reviewed and stand behind its content
  • Confirm you've read the full issue along with all of its comments, as well as any linked issues and their comments
  • Make sure it follows our contribution guidelines and uses the correct GitHub template
  • Disclose any AI tools you used (e.g. Claude, Copilot, Codex)

Please, do not generate or format the response with AI. If you do not speak English, reply in your native language or use translation software like Google Translate or Deepl. If the response is generated, the PR will be closed automatically.

These measures help us reduce maintenance burden and keep the team's work efficient. See our AI contributions policy for more context.

  • Confirmo que sou um humano.
  • Li, revisei, testei e retestei tudo, da issue originária até a solução final.
  • Li toda a issue após enfrentar o mesmo problema e ver que já existia alguém que passou por algo semelhante.
  • Segui todas as diretrizes de contribuição e o template.
  • Utilizei o Claude Code para desenvolvimento assistido por IA.

@AriPerkkio AriPerkkio left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two or more vitest run --coverage processes running in the same working directory

Each of them will also remove ./coverage during start, removing each others possible results. This fix is not enough.

Proper fix is from user side to use vitest run --coverage --coverage.reportsDirectory=some-unique-path.

@jgamaraalv

Copy link
Copy Markdown
Author

Two or more vitest run --coverage processes running in the same working directory

Each of them will also remove ./coverage during start, removing each others possible results. This fix is not enough.

Proper fix is from user side to use vitest run --coverage --coverage.reportsDirectory=some-unique-path.

@AriPerkkio You're right — isolating only the temp dir wasn't enough. Even with separate .tmp dirs, both runs still write their final reports into the same coverage/ directory and clean() wipes the other run's results, so the data loss stayed; I had only hidden the ENOENT crash.

I pushed a different approach. Instead of isolating, Vitest now takes a lock on coverage.reportsDirectory (stored in the OS temp dir, keyed by the resolved path) for the duration of the run.

If a second process targets the same directory while another is still running, it fails right away with an actionable error telling the user to give each run its own --coverage.reportsDirectory, which is exactly the fix you described, just surfaced by Vitest instead of an intermittent crash.

The lock is re-entrant for the same process (watch reruns) and stale locks from dead processes are reclaimed automatically. The temp dir name is back to the deterministic .tmp[-shard].

So this doesn't try to make same-directory concurrent runs "work" — it makes them fail fast with a clear message and points to the unique-directory fix.

If you'd prefer not to add this and just document the unique reportsDirectory requirement instead let me know.

…irectory

Two or more `vitest run --coverage` processes pointed at the same
reportsDirectory delete each other's reports, which previously surfaced as
an intermittent raw `ENOENT: lstat coverage/.tmp` crash.

Per-run temp-dir isolation stopped the crash but still let concurrent runs
clobber each other's final reports, so it was not enough. Replace it with a
cross-process lock, keyed by the resolved reportsDirectory and stored in the
OS temp dir (outside the reports directory it guards):

- A second live process targeting the same reportsDirectory now fails
  immediately with an actionable error pointing to
  `--coverage.reportsDirectory`, instead of crashing or silently losing
  reports.
- The lock is re-entrant for the owning process (watch mode reruns).
- Stale locks left by processes that no longer exist are reclaimed
  automatically.

The temp directory name is restored to the deterministic `.tmp[-shard]`.
@jgamaraalv jgamaraalv force-pushed the fix/coverage-concurrent-tmp-dir branch from 2d33ce7 to 231ef67 Compare May 27, 2026 19:44
@jgamaraalv jgamaraalv changed the title fix(coverage): isolate per-run temp dir so concurrent same-dir coverage runs don't ENOENT fix(coverage): fail fast when concurrent runs share coverage.reportsDirectory May 27, 2026
@jgamaraalv jgamaraalv requested a review from AriPerkkio May 27, 2026 19:53
@AriPerkkio

AriPerkkio commented May 28, 2026

Copy link
Copy Markdown
Member

[...] Vitest now takes a lock on coverage.reportsDirectory (stored in the OS temp dir, keyed by the resolved path) for the duration of the run.

If a second process targets the same directory while another is still running, it fails right away with an actionable error telling the user to give each run its own --coverage.reportsDirectory, which is exactly the fix you described, just surfaced by Vitest instead of an intermittent crash.

I like this approach, but need some time to think through it before commenting about the implementation. 👍

Maybe the lockfiles could be stored in .vitest/lockfiles instead of in tmpdir(). We might want similar solution for other reporters too, not sure yet.

@AriPerkkio AriPerkkio removed the maybe automated User is likely an AI agent, or the content was generated by an AI assistant without user control label May 28, 2026

@AriPerkkio AriPerkkio left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. This will prevent two simultaneous Vitest runs from deleting each other's results. Some minor comments below.

Comment thread packages/vitest/src/node/coverage.ts Outdated
Comment thread packages/vitest/src/node/coverage.ts Outdated
Comment thread test/coverage-test/test/temporary-files.unit.test.ts Outdated
Comment thread packages/vitest/src/node/coverage.ts Outdated
Addressing review feedback on the reportsDirectory lock:

- Simplify acquisition from a 10-iteration loop to a single exclusive
  create plus one retry, which is only reached after reclaiming a stale
  lock. The bounded retry was the only reason to loop at all.
- Reclaim a stale lock with an atomic rename instead of an unconditional
  remove-then-recreate. Two processes racing to reclaim the same stale
  lock could previously both succeed and run coverage concurrently in the
  same directory; now only the process whose rename wins removes the file,
  the loser retries the exclusive create and fails fast.
- Release the lock only when this process can confirm it owns it, so a
  transient unreadable read can no longer delete another process's lock.
- Document the signal-0 liveness check and drop the practically
  unreachable secondary throw in favour of a single shared error helper.
Replace the module-level cleanups array and afterEach hook with
onTestFinished registrations inside the helpers that allocate the temp
reports directory, lock file, and child process. Cleanup now lives next
to the resource it tears down. Assertions unchanged.
@jgamaraalv jgamaraalv requested a review from AriPerkkio June 4, 2026 00:37
Comment thread test/coverage-test/test/temporary-files.unit.test.ts Outdated
AriPerkkio
AriPerkkio previously approved these changes Jun 10, 2026

@AriPerkkio AriPerkkio left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. @hi-ogawa & @sheremet-va any concerns?

Comment thread packages/vitest/src/node/coverage.ts Outdated
Comment thread packages/vitest/src/node/coverage.ts Outdated
Move the cross-process lock logic out of BaseCoverageProvider into a
standalone ReportsDirectoryLock class with acquire()/release(), and
drop the path-returning getter in favor of the lock owning its own
lockFile. Addresses review feedback that the lock implementation
cluttered the provider and the getter felt unjustified.
… locks

Address review findings on the cross-process reportsDirectory lock:

- Close a write-window TOCTOU: publish the lock atomically by writing the
  full payload to a temp file then fs.link()-ing it into place, so the
  lock file is never observable empty/partial and can't be stolen
  mid-write.
- Release the lock on process exit (best-effort, owner-checked) so a
  throwing report path, reportOnFailure, watch keepResults, or graceful
  close no longer leaks a stale lock into the OS temp directory.
- Make stale-lock reclaim race-safe: rename the contested lock to a
  private path, re-read that inode, and only discard it when it is still
  the stale owner observed; otherwise restore it (never destroying a
  displaced live owner's copy) and fail closed. Document the residual,
  inherent 3+-process advisory-lock window.
- Validate the full owner identity (pid + reportsDirectory + timestamp +
  per-acquire nonce) before honoring or releasing a lock, defeating PID
  reuse and hash-collision false matches.
- Treat only ESRCH as a dead process in the liveness check; let
  unexpected readOwner I/O errors (EACCES/EMFILE) propagate instead of
  silently stealing an unreadable lock.
- Tolerate EPERM/EBUSY/EACCES on the reclaim rename (Windows) and surface
  the lock timestamp in the actionable in-use error.
@jgamaraalv

jgamaraalv commented Jun 11, 2026

Copy link
Copy Markdown
Author

@hi-ogawa @AriPerkkio @sheremet-va

Last two review comments (resolved in e5199e2)

Two nits are addressed by e5199e2 (refactor(coverage): extract reportsDirectory lock into a dedicated class):

  • The reportsDirectoryLockFile getter that "didn't feel called for" is gone. The lock logic now lives in a dedicated ReportsDirectoryLock class that owns its own lockFile (computed in its constructor) and exposes acquire() / release().
  • The lock implementation no longer clutters BaseCoverageProvider — the provider just holds a ReportsDirectoryLock instance, which also resolves the "split this into a standalone class" suggestion.

About the follow-up commit 4d81d3a

Heads up that I pushed one more commit after the approval — fix(coverage): harden reportsDirectory lock against races and corrupt locks (4d81d3a). While revisiting the extracted lock I ran it through a deeper correctness/concurrency pass and tightened a few things on the lock primitive itself (not the public behaviour):

  • atomic lock publish (write temp + fs.link) so the lock file is never observable empty/partial and can't be stolen mid-write;
  • release the lock on process exit so a throwing report path / reportOnFailure / watch / graceful close no longer leaks a stale lock into the OS temp dir;
  • race-safe stale-lock reclaim (rename-aside + re-read + fail-closed, never destroying a displaced live owner's copy);
  • full owner-identity check (pid + reportsDirectory + timestamp + per-acquire nonce) to defeat PID reuse and hash-collision false matches;
  • only ESRCH counts as a dead process; unreadable-lock I/O errors propagate instead of silently stealing the lock;
  • tolerate EPERM/EBUSY/EACCES on the reclaim rename (Windows).

This sits on top of an already-reviewed/approved feature, so if you'd rather keep the PR minimal and scoped to just the requested fixes, I'm happy to drop 4d81d3a and leave only the review-comment resolution — just let me know your preference. Equally happy to keep it if the extra hardening is welcome.

@jgamaraalv

Copy link
Copy Markdown
Author

@AriPerkkio any news on this issue?

@AriPerkkio AriPerkkio left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR started as small improvement but now the LLM behind this code is getting quite verbose. I'm starting to consider whether we should just discard this PR and let human rewrite it without LLMs wall-of-text comments and never-going-to-happen-edge-case -handlings. In the end we will be the ones maintaining this for years. 😬

Maybe this lockfile stuff and all possible edge cases should be extracted into its own npm package even.

Comment thread packages/vitest/src/node/coverage.ts Outdated
Comment thread packages/vitest/src/node/coverage.ts Outdated
Drop the speculative machinery that crept into the lock (atomic
write+link publish, nonce/timestamp owner identity, rename-aside stale
reclaim with restore, process-exit handler) in favour of a plain
create-with-wx lock that reclaims a dead owner's lock on the next run.

Same fail-fast behaviour for concurrent same-directory runs, far less to
maintain. A leaked lock from a hard-killed process is cleared by the next
run's dead-pid check, so the exit handler was redundant.
@jgamaraalv

Copy link
Copy Markdown
Author

Yeah, fair, and you're right. That last hardening commit got away from me.. I went down a rabbit hole guarding race windows that don't really happen (the 3-process SIGKILL reclaim especially) and ended up with comments longer than the code they were explaining.

Pushed 888edb5 that rips all of it back out. What's left is about as small as this lock gets:

  • write the lockfile with wx (create-or-fail), that's the actual mutual exclusion
  • already there + owner pid alive -> fail fast with the actionable error
  • already there + owner pid dead -> remove it, try once more
  • release in cleanAfterRun

No more nonce/timestamp identity, no write+link publish, no rename-aside/restore dance, no exit handler. coverage.ts is ~280 lines lighter and the tests dropped the cases that only existed to prove the machinery I just deleted.

Honestly I'd rather keep it this small than spin it into its own package, there isn't much left to extract now. If you still want a human to rewrite it from scratch I won't argue, but I think this version is one you'd actually be ok maintaining for years. Sorry for the noise on the way here.

@jgamaraalv jgamaraalv requested a review from AriPerkkio June 22, 2026 19:36
@jgamaraalv

Copy link
Copy Markdown
Author

Yeah, fair, and you're right. That last hardening commit got away from me.. I went down a rabbit hole guarding race windows that don't really happen (the 3-process SIGKILL reclaim especially) and ended up with comments longer than the code they were explaining.

Pushed 888edb5 that rips all of it back out. What's left is about as small as this lock gets:

  • write the lockfile with wx (create-or-fail), that's the actual mutual exclusion
  • already there + owner pid alive -> fail fast with the actionable error
  • already there + owner pid dead -> remove it, try once more
  • release in cleanAfterRun

No more nonce/timestamp identity, no write+link publish, no rename-aside/restore dance, no exit handler. coverage.ts is ~280 lines lighter and the tests dropped the cases that only existed to prove the machinery I just deleted.

Honestly I'd rather keep it this small than spin it into its own package, there isn't much left to extract now. If you still want a human to rewrite it from scratch I won't argue, but I think this version is one you'd actually be ok maintaining for years. Sorry for the noise on the way here.

A propósito, quero frisar que nenhuma alteração feita com o auxilio de LLM foi feita sem a minha revisão e testes de que as implementações realmente funcionavam. Os comentários extras foram adicionados depois de surgirem algumas dúvidas sobre algumas implementações, conforme respondi em um dos seus request changes e mais uma vez peço desculpas por isso.

Novamente, acho as correções desse PR 100% válidas, não descartaria, mas fica ao seu critério e a critério do time, que respeito muito o trabalho.

Encerro as minhas contribuições nesse PR por aqui.
Abraços! 🇧🇷

@hi-ogawa hi-ogawa removed their request for review June 23, 2026 00:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants