Skip to content

test: ensure temporary directories are cleaned up after tests#5163

Merged
chenyukang merged 2 commits into
nervosnetwork:developfrom
jetjinser:4177
Apr 22, 2026
Merged

test: ensure temporary directories are cleaned up after tests#5163
chenyukang merged 2 commits into
nervosnetwork:developfrom
jetjinser:4177

Conversation

@jetjinser
Copy link
Copy Markdown
Contributor

@jetjinser jetjinser commented Apr 7, 2026

What problem does this PR solve?

After running make test, temporary directories with the ckb-tmp- prefix remained in $TMPDIR because background tasks held Arc<Shared> references, preventing TempDir cleanup.
This is fixed by broadcasting the global exit signal or passing weak Arc.

For make integration, the old temp_path function would create a TempDir, immediately drop it, and return a plain PathBuf. Obviously the TempDir was gone right after that function returned, but the test code would go ahead and recreate the directory at that same path later—and nobody ever cleaned it up. So ckb-it-* dirs just piled up.
The fix is to stop throwing away the TempDir handle: the Worker now owns one root TempDir for the whole run, and everything else gets a subpath under it. When the root goes out of scope, the entire tree gets wiped automatically.

Issue Number: close #4177

What is changed and how it works?

Unit tests (make test): RpcTestSuite got a Drop impl that calls ckb_stop_handler::broadcast_exit_signals() so background services actually stop before Shared(TempDir) goes away.

Integration tests (make integration): instead of temp_path throwing away the TempDir, the Worker creates a root TempDir upfront and passes the path down to Node::new and Net::new. All test subdirs live under that root, so cleanup is automatic and works.

Related changes

This PR should not have any impact except test.

Fixes the issue where hundreds of `ckb-tmp-*` directories accumulate in `TMPDIR`
after running `make test` or `make integration`.

The `Arc<SledBackend>` held by spawned tasks prevented the `TempDir` from being
dropped. Lets tasks exit and release their refs.
@jetjinser jetjinser requested a review from a team as a code owner April 7, 2026 23:45
@jetjinser jetjinser requested review from zhangsoledad and removed request for a team April 7, 2026 23:45
@jetjinser jetjinser marked this pull request as draft April 8, 2026 00:41
@jetjinser jetjinser marked this pull request as ready for review April 9, 2026 03:34
@quake quake requested a review from Copilot April 10, 2026 02:09
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses lingering temporary directories created during unit (make test) and integration (make integration) test runs by changing how test working directories are owned and by ensuring certain background tasks no longer keep strong references that prevent cleanup.

Changes:

  • Integration tests: introduce a root TempDir owned by the test Worker, and derive node/net working directories from it instead of generating “throwaway” temp paths.
  • Unit tests: adjust background behavior to allow tempdir cleanup (HeaderMap uses Weak, RPC tests trigger stop signaling on teardown).
  • Refactor test APIs: update Spec::before_run, Node::new, and Net::new signatures to accept a root directory.

Reviewed changes

Copilot reviewed 31 out of 31 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
test/src/worker.rs Creates and retains a root TempDir; passes per-spec subpaths into test setup.
test/src/utils.rs Removes temp_path helper that previously discarded TempDir handles.
test/src/specs/mod.rs Updates Spec::before_run to accept a root_dir for per-spec setup.
test/src/specs/** Updates specs to pass root_dir into Node::new and Net::new.
test/src/node.rs Makes node working dirs deterministic under a provided root; stores root_dir on Node.
test/src/net.rs Makes net working dir deterministic under a provided root; updates constructor signature.
test/src/lib.rs Minor import adjustment.
shared/src/types/header_map/mod.rs Prevents background task from keeping HeaderMap alive by using Weak.
rpc/src/tests/mod.rs Adds Drop for RpcTestSuite that broadcasts stop signals on teardown.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread test/src/worker.rs Outdated
Comment on lines +169 to +182
@@ -162,7 +177,9 @@ impl Worker {
})
.unwrap();

let mut nodes = spec.before_run();
let nodes_root = spec_dir.join("nodes");
let mut nodes = spec.before_run(nodes_root);

Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_spec builds spec_dir solely from spec.name(), and then all node/net working dirs become deterministic under that path. This changes previous behavior where temp_path generated a fresh random directory each time; with retries (or multiple nets/nodes in a spec), stale data can be reused and different specs with the same type-name suffix could collide. Consider making the per-spec root unique per attempt (e.g., include retried or a random suffix / TempDir per spec run) and/or removing any existing spec_dir before creating nodes.

Copilot uses AI. Check for mistakes.
Comment thread rpc/src/tests/mod.rs
Comment on lines +228 to +233
// Shut down background tasks to release `Arc<Shared>` and allow temp dir cleanup.
// It's should be safe in this test suite.
impl Drop for RpcTestSuite {
fn drop(&mut self) {
ckb_stop_handler::broadcast_exit_signals();
}
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling ckb_stop_handler::broadcast_exit_signals() in RpcTestSuite's Drop will cancel the global stop token for the entire test process and cannot be reset. Since many RPC tests create and drop their own RpcTestSuite, this risks prematurely stopping services in other tests (including tests that run after the first one), leading to flakiness or failures. Prefer an explicit, suite-local shutdown mechanism (e.g., stop/join the services started in setup_rpc_test_suite) or ensure the broadcast is performed once at the very end of the whole test binary rather than on every suite drop.

Suggested change
// Shut down background tasks to release `Arc<Shared>` and allow temp dir cleanup.
// It's should be safe in this test suite.
impl Drop for RpcTestSuite {
fn drop(&mut self) {
ckb_stop_handler::broadcast_exit_signals();
}
// Do not broadcast global exit signals from a suite-local `Drop`.
// `ckb_stop_handler::broadcast_exit_signals()` cancels a process-global stop token
// and cannot be reset, so calling it here would interfere with other tests that
// create and drop their own `RpcTestSuite`.
impl Drop for RpcTestSuite {
fn drop(&mut self) {}

Copilot uses AI. Check for mistakes.
Comment thread rpc/src/tests/mod.rs
@quake
Copy link
Copy Markdown
Member

quake commented Apr 10, 2026

thanks for the PR, good catch!

I have a small suggestion:

Net::new doesn't need a root_dir parameter. Its working_dir is only used internally for NetworkConfig::path (secret key, peer store, etc.), and Net::working_dir() is never called externally.

Instead of threading root_dir through Spec::before_run - Node - Net (which adds a root_dir field/method to Node and touches ~30 call sites), just let Net own a TempDir directly:

pub struct Net {
    // ...
    _temp_dir: TempDir, // auto-cleanup on drop
}
impl Net {
    // signature unchanged — no root_dir needed
    pub fn new(spec_name: &str, consensus: &Consensus, protocols: Vec<SupportProtocols>) -> Self {
        let temp_dir = tempfile::Builder::new()
            .prefix(&format!("ckb-it-{}-net-", spec_name))
            .tempdir()
            .expect("create tempdir failed");
        let working_dir = temp_dir.path().to_path_buf();
        // ...
    }
}

This eliminates the root_dir field and root_dir() method on Node, keeps Net::new's original signature, and avoids changing all ~30 call sites.

@jetjinser
Copy link
Copy Markdown
Contributor Author

jetjinser commented Apr 10, 2026

Instead of threading root_dir through Spec::before_run - Node - Net (which adds a root_dir field/method to Node and touches ~30 call sites), just let Net own a TempDir directly:

pub struct Net {
    // ...
    _temp_dir: TempDir, // auto-cleanup on drop
}
impl Net {
    // signature unchanged — no root_dir needed
    pub fn new(spec_name: &str, consensus: &Consensus, protocols: Vec<SupportProtocols>) -> Self {
        let temp_dir = tempfile::Builder::new()
            .prefix(&format!("ckb-it-{}-net-", spec_name))
            .tempdir()
            .expect("create tempdir failed");
        let working_dir = temp_dir.path().to_path_buf();
        // ...
    }
}

This eliminates the root_dir field and root_dir() method on Node, keeps Net::new's original signature, and avoids changing all ~30 call sites.

Thanks for the review and advice!

However, we cannot simply bind the lifetime of TempDir to Net. The NetworkStat owns a PathBuf from the TempDir, which DumpPeerStoreService uses to keep active file handles open. Since DumpPeerStoreService is dropped after Net, these unreleased handles prevent the TempDir from cleaning up the directory properly (i.e., it fails silently).

Ideally, this would be the best approach for cleanup—in fact, it’s exactly what I attempted before implementing the current solution in this PR.

Please let me know if I’ve overlooked any details or if you have further suggestions!

@quake
Copy link
Copy Markdown
Member

quake commented Apr 10, 2026

However, we cannot simply bind the lifetime of TempDir to Net. The NetworkStat owns a PathBuf from the TempDir, which DumpPeerStoreService uses to keep active file handles open. Since DumpPeerStoreService is dropped after Net, these unreleased handles prevent the TempDir from cleaning up the directory properly (i.e., it fails silently).

Ideally, this would be the best approach for cleanup—in fact, it’s exactly what I attempted before implementing the current solution in this PR.

Please let me know if I’ve overlooked any details or if you have further suggestions!

_async_runtime is declared before _working_dir, so the runtime drops first. This shuts down all spawned tasks including DumpPeerStoreService, which runs its Drop impl (writing peer store to disk) while the directory still exists. Only after that does _working_dir: TempDir drop and clean up the directory.

even if there's a async timing issue, it probably doesn't matter here. DumpPeerStoreService::drop() writes peer store data, which is irrelevant in a test context, we're about to delete the entire directory anyway. A failed write just produces a warn-level log.

Have you tried the simpler approach locally, letting Net own its own TempDir internally (keeping the current Net::new signature unchanged)?. If you haven't tried it, I can give it a shot on my machine to verify it works.

@jetjinser
Copy link
Copy Markdown
Contributor Author

jetjinser commented Apr 10, 2026

_async_runtime is declared before _working_dir, so the runtime drops first. This shuts down all spawned tasks including DumpPeerStoreService, which runs its Drop impl (writing peer store to disk) while the directory still exists. Only after that does _working_dir: TempDir drop and clean up the directory.

Yes! I completely forgot about that.

Based on your suggestion, I pushed a new commit.

Comment thread test/src/worker.rs
Comment on lines +169 to +172
// Used to extend the lifecycle of TempPathBuf
let node_paths = nodes
.iter()
.map(|node| node.working_dir())
.map(|node| node.owned_working_dir())
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the reason for the changes to the Notify struct: node_paths is now passed along with Notify to ensure that the TempDir within the Node isn't dropped prematurely when the current function ends. This prevents potential errors in subsequent operations.

Comment thread test/src/net.rs
Comment on lines +26 to +28
// _async_runtime MUST be declared before _working_dir
// to ensure the runtime stops and releases file handles
// before the TempDir attempts to clean up.
Copy link
Copy Markdown
Collaborator

@eval-exec eval-exec Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can combine Runtime and TempPathBuf to one struct? Cc @quake

@chenyukang chenyukang added this pull request to the merge queue Apr 21, 2026
@chenyukang chenyukang removed this pull request from the merge queue due to a manual request Apr 21, 2026
@chenyukang chenyukang added this pull request to the merge queue Apr 21, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Apr 21, 2026
@chenyukang chenyukang merged commit 3416c8b into nervosnetwork:develop Apr 22, 2026
56 of 57 checks passed
@nervos-bot nervos-bot Bot mentioned this pull request Apr 22, 2026
eval-exec pushed a commit to eval-exec/ckb that referenced this pull request Apr 28, 2026
## 🤖 New release

* `ckb-constant`: 1.1.0 -> 1.1.1
* `ckb-types`: 1.1.0 -> 1.1.1
* `ckb-resource`: 1.1.0 -> 1.1.1
* `ckb-app-config`: 1.2.0 -> 1.2.1
* `ckb-network`: 1.2.0 -> 1.2.1
* `ckb-tx-pool`: 1.2.0 -> 1.2.1
* `ckb-shared`: 1.1.0 -> 1.1.1
* `ckb-chain`: 1.2.0 -> 1.2.1
* `ckb-rich-indexer`: 1.1.0 -> 1.1.1
* `ckb-rpc`: 1.2.1 -> 1.2.2

<details><summary><i><b>Changelog</b></i></summary><p>

## `ckb-constant`

<blockquote>

##
[1.1.1](nervosnetwork/ckb@ckb-constant-v1.1.0...ckb-constant-v1.1.1)
- 2026-04-24

### Changed

- update assume valid target (by @doitian)

### Contributors

- @doitian
</blockquote>

## `ckb-types`

<blockquote>

##
[1.1.1](nervosnetwork/ckb@ckb-types-v1.1.0...ckb-types-v1.1.1)
- 2026-04-24

### Fixed

- overhaul proposal selection and prioritization logic (nervosnetwork#5023) (by
@zhangsoledad)

### Contributors

- @zhangsoledad
</blockquote>

## `ckb-resource`

<blockquote>

##
[1.1.1](nervosnetwork/ckb@ckb-resource-v1.1.0...ckb-resource-v1.1.1)
- 2026-04-24

### Changed

- update testnet bootnodes (nervosnetwork#5176) (by @jiangxianliang007)
- sort WalkDir entries in resource bundling (nervosnetwork#5156) (by @eval-exec)

### Contributors

- @jiangxianliang007
- @eval-exec
</blockquote>

## `ckb-app-config`

<blockquote>

##
[1.2.1](nervosnetwork/ckb@ckb-app-config-v1.2.0...ckb-app-config-v1.2.1)
- 2026-04-24

### Fixed

- overhaul proposal selection and prioritization logic (nervosnetwork#5023) (by
@zhangsoledad)

### Contributors

- @zhangsoledad
</blockquote>

## `ckb-network`

<blockquote>

##
[1.2.1](nervosnetwork/ckb@ckb-network-v1.2.0...ckb-network-v1.2.1)
- 2026-04-24

### Changed

- upgrade deps (nervosnetwork#5132) (by @driftluo)

### Contributors

- @driftluo
</blockquote>

## `ckb-tx-pool`

<blockquote>

##
[1.2.1](nervosnetwork/ckb@ckb-tx-pool-v1.2.0...ckb-tx-pool-v1.2.1)
- 2026-04-24

### Fixed

- overhaul proposal selection and prioritization logic (nervosnetwork#5023) (by
@zhangsoledad)

### Contributors

- @zhangsoledad
</blockquote>

## `ckb-shared`

<blockquote>

##
[1.1.1](nervosnetwork/ckb@ckb-shared-v1.1.0...ckb-shared-v1.1.1)
- 2026-04-24

### Fixed

- ensure temporary directories are cleaned up after tests (nervosnetwork#5163) (by
@jetjinser)

### Contributors

- @jetjinser
</blockquote>

## `ckb-chain`

<blockquote>

##
[1.2.1](nervosnetwork/ckb@ckb-chain-v1.2.0...ckb-chain-v1.2.1)
- 2026-04-24

### Fixed

- overhaul proposal selection and prioritization logic (nervosnetwork#5023) (by
@zhangsoledad)

### Contributors

- @zhangsoledad
</blockquote>

## `ckb-rich-indexer`

<blockquote>

##
[1.1.1](nervosnetwork/ckb@ckb-rich-indexer-v1.1.0...ckb-rich-indexer-v1.1.1)
- 2026-04-24

### Fixed

- preserve leading zero bytes and handle all-0xFF overflow in
rich-indexer prefix search upper bound (nervosnetwork#5166) (by @Copilot)

### Contributors

- @Copilot
</blockquote>

## `ckb-rpc`

<blockquote>

##
[1.2.2](nervosnetwork/ckb@ckb-rpc-v1.2.1...ckb-rpc-v1.2.2)
- 2026-04-24

### Changed

- *(rpc)* correct PRC typo in docs and comments (nervosnetwork#5130) (by @eval-exec)

### Fixed

- ensure temporary directories are cleaned up after tests (nervosnetwork#5163) (by
@jetjinser)
- generate GitHub-compatible RPC doc anchors (nervosnetwork#5136) (by @eval-exec)
- overhaul proposal selection and prioritization logic (nervosnetwork#5023) (by
@zhangsoledad)

### Contributors

- @jetjinser
- @eval-exec
- @zhangsoledad
</blockquote>


</p></details>

---
This PR was generated with
[release-plz](https://github.com/release-plz/release-plz/).

Co-authored-by: nervos-bot[bot] <46313439+nervos-bot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unit tests and Integration tests do not clean up temporary files

5 participants