-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Description
Version
tokio 1.21.1, with grep results prettified:
- tokio v1.21.1
- tokio-util v0.7.4
- tokio-stream v0.1.11
- tokio-macros v1.8.0 (proc-macro)
- tokio-io-timeout v1.2.0
- tokio-rustls v0.23.4
- tokio-postgres v0.7.6 (https://github.com/neondatabase/rust-postgres.git?rev=d052ee8b86fff9897c77b0fe89ea9daba0e1fa38#d052ee8b)
- tokio-native-tls v0.3.0
- hyper v0.14.20
We use rust 1.62.1.
Platform
Initially detected from a CI run on:
- amazon ec2
Linux hostname 5.10.144-127.601.amzn2.x86_64 #1 SMP Thu Sep 29 01:11:59 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Reproduced on:
- ubuntu 22.04
Linux hostname 5.15.0-52-generic #58-Ubuntu SMP Thu Oct 13 08:03:55 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux- AMD Ryzen 9 3900XT 12-Core Processor
Description
Assertion failure message with release build:
thread 'mgmt request worker' panicked at 'assertion failed: cx_core.is_none()', /home/joonas/.cargo/registry/src/github.zerozr99.workers.dev-1ecc6299db9ec823/tokio-1.21.1/src/runtime/scheduler/multi_thread/worker.rs:263:21
In our codebase this is after a few tries reproducable under load locally for me. Load as in while true; do cargo clean && cargo build; done for example running in the background. I can try out some patches if needed.
I haven't been able to find an MVCE.
Full steps to reproduce in our codebase
# install all dependencies from repository README.md:
# https://github.com/koivunej/neon/tree/tokio_assertion_failure#running-local-installation
git clone --recursive --branch tokio_assertion_failure https://github.com/koivunej/neon.git
# release build is needed to reproduce
BUILD_TYPE=release CARGO_BUILD_FLAGS="--features=testing,profiling" make -s -j4
# install more dependencies
PYTHON_KEYRING_BACKEND=keyring.backends.null.Keyring ./scripts/pysync
# add external load in another terminal to this
while true; do NEON_BIN=target/release ./scripts/pytest test_runner/regress/test_gc_aggressive.py::test_gc_aggressive; done
Expect to see:
FAILED test_runner/regress/test_gc_aggressive.py::test_gc_aggressive - requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
Then you will find the assertion failure in test_output/test_gc_aggressive/repo/pageserver.log. I have also copied the full stacktrace to the next <details>.
RUST_BACKTRACE=full of the assertion failure
thread 'mgmt request worker' panicked at 'assertion failed: cx_core.is_none()', /home/joonas/.cargo/registry/src/github.zerozr99.workers.dev-1ecc6299db9ec823/tokio-1.21.1/src/runtime/scheduler/multi_thread/worker.rs:263:21
stack backtrace:
0: 0x56374720a37d - std::backtrace_rs::backtrace::libunwind::trace::h8e036432725b1c57
at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
1: 0x56374720a37d - std::backtrace_rs::backtrace::trace_unsynchronized::h4f83092254c85869
at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
2: 0x56374720a37d - std::sys_common::backtrace::_print_fmt::h9728b5e056a3ece3
at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/sys_common/backtrace.rs:66:5
3: 0x56374720a37d - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h48bb4bd2928827d2
at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/sys_common/backtrace.rs:45:22
4: 0x563747232e9c - core::fmt::write::h909e69a2c24f44cc
at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/fmt/mod.rs:1196:17
5: 0x563747202061 - std::io::Write::write_fmt::h7f4b8ab8af89e9ef
at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/io/mod.rs:1654:15
6: 0x56374720bcf5 - std::sys_common::backtrace::_print::hff4838ebf14a2171
at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/sys_common/backtrace.rs:48:5
7: 0x56374720bcf5 - std::sys_common::backtrace::print::h2499280374189ad9
at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/sys_common/backtrace.rs:35:9
8: 0x56374720bcf5 - std::panicking::default_hook::{{closure}}::h8b270fc55eeb284e
at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:295:22
9: 0x56374720b969 - std::panicking::default_hook::h3217e229d6e9d13c
at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:314:9
10: 0x56374720c3d8 - std::panicking::rust_panic_with_hook::h9acb8048b738d2e0
at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:698:17
11: 0x56374720c249 - std::panicking::begin_panic_handler::{{closure}}::h70f3b839526af6dc
at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:586:13
12: 0x56374720a834 - std::sys_common::backtrace::__rust_end_short_backtrace::h1ecf2cee857fbe0a
at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/sys_common/backtrace.rs:138:18
13: 0x56374720bfb9 - rust_begin_unwind
at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:584:5
14: 0x563747230b63 - core::panicking::panic_fmt::h9f8393e7fd56d655
at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:142:14
15: 0x5637472309ad - core::panicking::panic::h021666fc6a0f7b6b
at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:48:5
16: 0x56374710c22b - <tokio::runtime::scheduler::multi_thread::worker::block_in_place::Reset as core::ops::drop::Drop>::drop::{{closure}}::hd65847a1090ca025
17: 0x5637471062c5 - <tokio::runtime::scheduler::multi_thread::worker::block_in_place::Reset as core::ops::drop::Drop>::drop::h42ae149038909fb7
18: 0x56374697512e - core::ptr::drop_in_place<tokio::runtime::scheduler::multi_thread::worker::block_in_place::Reset>::h1e6f731fa79d34ba
at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/ptr/mod.rs:486:1
19: 0x56374697512e - tokio::runtime::scheduler::multi_thread::worker::block_in_place::hda495eb5ef5a1acd
at /home/joonas/.cargo/registry/src/github.zerozr99.workers.dev-1ecc6299db9ec823/tokio-1.21.1/src/runtime/scheduler/multi_thread/worker.rs:340:5
20: 0x563746b08340 - tokio::task::blocking::block_in_place::ha97b73b75ce70862
at /home/joonas/.cargo/registry/src/github.zerozr99.workers.dev-1ecc6299db9ec823/tokio-1.21.1/src/task/blocking.rs:77:9
21: 0x563746b08340 - pageserver::tenant::Tenant::gc_iteration::{{closure}}::hcc45b24d96148799
at /home/joonas/src/neon/neon/pageserver/src/tenant.rs:530:9
22: 0x563746b08340 - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::h308288025478c0c0
at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/future/mod.rs:91:19
23: 0x56374687c68c - <tracing::instrument::Instrumented<T> as core::future::future::Future>::poll::hda287b8f128780d0
at /home/joonas/.cargo/registry/src/github.zerozr99.workers.dev-1ecc6299db9ec823/tracing-0.1.37/src/instrument.rs:272:9
24: 0x563746b0fcfd - pageserver::http::routes::timeline_gc_handler::{{closure}}::h0e56b6cccdfe75f6
at /home/joonas/src/neon/neon/pageserver/src/http/routes.rs:849:91
25: 0x563746b0fcfd - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::h4dee783785ea8184
at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/future/mod.rs:91:19
26: 0x56374678f3b7 - <core::pin::Pin<P> as core::future::future::Future>::poll::h5dbc8583f5dbf765
at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/future/future.rs:124:9
27: 0x56374678f3b7 - routerify::route::Route<B,E>::process::{{closure}}::h7fffd52673600116
at /home/joonas/.cargo/registry/src/github.zerozr99.workers.dev-1ecc6299db9ec823/routerify-3.0.0/src/route/mod.rs:105:32
28: 0x56374678f3b7 - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::ha39752ecfad407be
at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/future/mod.rs:91:19
29: 0x56374678f3b7 - routerify::router::Router<B,E>::process::{{closure}}::hc3d490240cd467ff
at /home/joonas/.cargo/registry/src/github.zerozr99.workers.dev-1ecc6299db9ec823/routerify-3.0.0/src/router/mod.rs:308:89
30: 0x56374678f3b7 - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::h88afc17f6a7162c2
at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/future/mod.rs:91:19
31: 0x56374678f3b7 - <routerify::service::request_service::RequestService<B,E> as tower_service::Service<http::request::Request<hyper::body::body::Body>>>::call::{{closure}}::hf419aede28588ee7
at /home/joonas/.cargo/registry/src/github.zerozr99.workers.dev-1ecc6299db9ec823/routerify-3.0.0/src/service/request_service.rs:56:72
32: 0x5637467b93a5 - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::h2de32919bd847725
at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/future/mod.rs:91:19
33: 0x5637467d596e - <core::pin::Pin<P> as core::future::future::Future>::poll::h3faa950168332df5
at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/future/future.rs:124:9
34: 0x5637467d596e - <hyper::proto::h1::dispatch::Server<S,hyper::body::body::Body> as hyper::proto::h1::dispatch::Dispatch>::poll_msg::hd5117f65306c4294
at /home/joonas/.cargo/registry/src/github.zerozr99.workers.dev-1ecc6299db9ec823/hyper-0.14.20/src/proto/h1/dispatch.rs:491:35
35: 0x5637467d596e - hyper::proto::h1::dispatch::Dispatcher<D,Bs,I,T>::poll_write::hc55c2ea65eaff573
at /home/joonas/.cargo/registry/src/github.zerozr99.workers.dev-1ecc6299db9ec823/hyper-0.14.20/src/proto/h1/dispatch.rs:297:43
36: 0x5637467d596e - hyper::proto::h1::dispatch::Dispatcher<D,Bs,I,T>::poll_loop::h214e07f7181a2707
at /home/joonas/.cargo/registry/src/github.zerozr99.workers.dev-1ecc6299db9ec823/hyper-0.14.20/src/proto/h1/dispatch.rs:161:21
37: 0x5637467d30fd - hyper::proto::h1::dispatch::Dispatcher<D,Bs,I,T>::poll_inner::h2b3d24b8f8211935
at /home/joonas/.cargo/registry/src/github.zerozr99.workers.dev-1ecc6299db9ec823/hyper-0.14.20/src/proto/h1/dispatch.rs:137:16
38: 0x5637467d30fd - hyper::proto::h1::dispatch::Dispatcher<D,Bs,I,T>::poll_catch::hfead020b3bd85cd6
at /home/joonas/.cargo/registry/src/github.zerozr99.workers.dev-1ecc6299db9ec823/hyper-0.14.20/src/proto/h1/dispatch.rs:120:28
39: 0x5637466f9f52 - <hyper::proto::h1::dispatch::Dispatcher<D,Bs,I,T> as core::future::future::Future>::poll::hb9d39bd98e716b09
at /home/joonas/.cargo/registry/src/github.zerozr99.workers.dev-1ecc6299db9ec823/hyper-0.14.20/src/proto/h1/dispatch.rs:424:9
40: 0x5637466f9f52 - <hyper::server::conn::ProtoServer<T,B,S,E> as core::future::future::Future>::poll::h7665d21f4b883402
at /home/joonas/.cargo/registry/src/github.zerozr99.workers.dev-1ecc6299db9ec823/hyper-0.14.20/src/server/conn.rs:952:47
41: 0x5637466f9f52 - <hyper::server::conn::upgrades::UpgradeableConnection<I,S,E> as core::future::future::Future>::poll::hb96f5473d0574cb8
at /home/joonas/.cargo/registry/src/github.zerozr99.workers.dev-1ecc6299db9ec823/hyper-0.14.20/src/server/conn.rs:1012:30
42: 0x56374671e6bf - <hyper::common::drain::Watching<F,FN> as core::future::future::Future>::poll::hf0c8ec2a7a8ed8b0
43: 0x56374671e6bf - <hyper::server::server::new_svc::NewSvcTask<I,N,S,E,W> as core::future::future::Future>::poll::h846866b9a0929fda
at /home/joonas/.cargo/registry/src/github.zerozr99.workers.dev-1ecc6299db9ec823/hyper-0.14.20/src/server/server.rs:728:36
44: 0x5637467f65e7 - tokio::runtime::task::core::CoreStage<T>::poll::{{closure}}::h9a58eefb1d854ebe
at /home/joonas/.cargo/registry/src/github.zerozr99.workers.dev-1ecc6299db9ec823/tokio-1.21.1/src/runtime/task/core.rs:184:17
45: 0x5637467f65e7 - tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut::hbd0e5f206f1f3f6f
at /home/joonas/.cargo/registry/src/github.zerozr99.workers.dev-1ecc6299db9ec823/tokio-1.21.1/src/loom/std/unsafe_cell.rs:14:9
46: 0x5637467f65e7 - tokio::runtime::task::core::CoreStage<T>::poll::hbee48de80c4fcccd
at /home/joonas/.cargo/registry/src/github.zerozr99.workers.dev-1ecc6299db9ec823/tokio-1.21.1/src/runtime/task/core.rs:174:13
47: 0x56374685c61a - tokio::runtime::task::harness::poll_future::{{closure}}::h7ca64421cdeddcb2
at /home/joonas/.cargo/registry/src/github.zerozr99.workers.dev-1ecc6299db9ec823/tokio-1.21.1/src/runtime/task/harness.rs:480:19
48: 0x56374685c61a - <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::h2de3a15ff26ba160
at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panic/unwind_safe.rs:271:9
49: 0x56374685c61a - std::panicking::try::do_call::hf6d7a880e62abda6
at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:492:40
50: 0x56374685c61a - std::panicking::try::h531c1d3ec5cbe2b2
at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:456:19
51: 0x56374685c61a - std::panic::catch_unwind::h4f0af80b22a9de64
at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panic.rs:137:14
52: 0x56374685c61a - tokio::runtime::task::harness::poll_future::h57ec7dda84531f03
at /home/joonas/.cargo/registry/src/github.zerozr99.workers.dev-1ecc6299db9ec823/tokio-1.21.1/src/runtime/task/harness.rs:468:18
53: 0x56374685c61a - tokio::runtime::task::harness::Harness<T,S>::poll_inner::heca3dd74238bdd7e
at /home/joonas/.cargo/registry/src/github.zerozr99.workers.dev-1ecc6299db9ec823/tokio-1.21.1/src/runtime/task/harness.rs:104:27
54: 0x56374685c61a - tokio::runtime::task::harness::Harness<T,S>::poll::he0f319957dba656d
at /home/joonas/.cargo/registry/src/github.zerozr99.workers.dev-1ecc6299db9ec823/tokio-1.21.1/src/runtime/task/harness.rs:57:15
55: 0x5637470e35c5 - std::thread::local::LocalKey<T>::with::h38aaa913b8a48d65
56: 0x563747107563 - tokio::runtime::scheduler::multi_thread::worker::Context::run_task::hf28064e32e379826
57: 0x563747106a80 - tokio::runtime::scheduler::multi_thread::worker::Context::run::hec211607b213b37b
58: 0x56374710a7b7 - tokio::macros::scoped_tls::ScopedKey<T>::set::hd7166d6799738ff0
59: 0x5637471064a9 - tokio::runtime::scheduler::multi_thread::worker::run::h958f4678849dd1fe
60: 0x5637470f575c - <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll::h0ab71826e7387519
61: 0x5637470da7e9 - tokio::runtime::task::harness::Harness<T,S>::poll::h091e55b483c30575
62: 0x5637470f4f5a - tokio::runtime::blocking::pool::Inner::run::h3a91a3d2536a1c92
63: 0x5637470e6df2 - std::sys_common::backtrace::__rust_begin_short_backtrace::h6a13e50bb80c5a9b
64: 0x5637470e751f - core::ops::function::FnOnce::call_once{{vtable.shim}}::h81568063c1016e71
65: 0x563747212053 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h191d5c5ea3edb31d
at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/alloc/src/boxed.rs:1872:9
66: 0x563747212053 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h42ef7cb2ae640a31
at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/alloc/src/boxed.rs:1872:9
67: 0x563747212053 - std::sys::unix::thread::Thread::new::thread_start::he47f7169665dab60
at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/sys/unix/thread.rs:108:17
68: 0x7f3b56aacb43 - start_thread
at ./nptl/./nptl/pthread_create.c:442:8
69: 0x7f3b56b3ea00 - clone3
at ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
70: 0x0 - <unknown>
In this branch, I've sprinkled a lot of block_in_place around the blocking parts after I ran into a deadlock caused by the unstealable lifo slot because there was blocking within the runtime threads. It is unlikely that I've caught all places of blocking within async context.
If I ended up misusing the block_in_place and block_on then I wish the assertion would have a clear message about the misuse. However since it only triggers under external load (and while being nice -20), I suspect it is a real tokio issue.