fix(build,ci): pyo3-build-config abi3-py311 + PYO3_NO_PYTHON env mirror across CircleCI (unbreaks GHA nightly + CircleCI cluster)#1881
Merged
Conversation
Contributor
|
Merging to
After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here |
c872a88 to
1ffbdb6
Compare
1ffbdb6 to
1828521
Compare
Unbreaks the nightly cross-check matrix (all 8 targets failing since 2026-05-13) and any local build with `PYO3_NO_PYTHON=1` set. Root cause ========== PR #1833 (db8dbf6, merged 2026-05-13) added the workspace `pyo3-build-config` dependency to support the new zero-copy `send_output_raw` infrastructure: pyo3-build-config = { version = "0.28", features = ["resolve-config"] } The feature list omitted `abi3-py3X`. pyo3-build-config 0.28's own build script asserts at compile time: - If a Python interpreter is available: probe it, require interpreter version >= the workspace's abi3 minimum (3.11). - If `PYO3_NO_PYTHON=1` is set (the nightly cross-check env from #1869, which can't always provision a 3.11+ interpreter inside cross-rs Docker images): require an `abi3-py3X` feature flag on pyo3-build-config itself so it knows which ABI to bake in. Neither branch was satisfied: the env said "don't probe an interpreter" but the feature list said "I haven't told you what ABI to target." Result: error: failed to run custom build command for `pyo3-build-config v0.28.3` error: An abi3-py3* feature must be specified when compiling without a Python interpreter. This fires once per native target (matrix.cross=false) and once per cross-rs container (matrix.cross=true), so all 8 cross-check jobs in the nightly matrix have been red continuously since 2026-05-13. The other pyo3 consumers in the workspace (`apis/python/operator`, `libraries/extensions/ros2-bridge/python`, `binaries/runtime`, `binaries/cli` partially) already pin `abi3-py311` on their pyo3 dependency. The workspace `pyo3-build-config` dep just missed the same setting. Fix === Add `"abi3-py311"` to the workspace `pyo3-build-config` feature list in `Cargo.toml`. One-line change. Matches the abi3 minimum already used by every other pyo3 consumer in the workspace. Verification ============ Local reproduction + fix confirmation on macos-aarch64: # without fix: $ PYO3_NO_PYTHON=1 cargo check -p dora-runtime error: failed to run custom build command for `pyo3-build-config v0.28.3` error: An abi3-py3* feature must be specified when compiling without a Python interpreter. # with fix: $ PYO3_NO_PYTHON=1 cargo check -p dora-runtime Finished `dev` profile in 29.56s ✓ Full workspace check + fmt + clippy with PYO3_NO_PYTHON=1: cargo fmt --all -- --check ✓ cargo check --all --exclude dora-{node-api,operator-api,ros2-bridge,cli-api}-python ✓ cargo clippy --all --exclude dora-{node-api,operator-api,ros2-bridge}-python -- -D warnings ✓ Unblocks ======== - All 8 cross-check nightly targets (failing since 2026-05-13) - Local builds in environments where Python 3.11+ isn't available on the build host (some CI images, sandboxed builds) Does NOT address ================ The CLI Tests (macos-latest) nightly job (also failing today with a node-sender SIGTERM, exit code 143) is unrelated to this regression. Filed separately for investigation. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
1828521 to
f7bd6e0
Compare
heyong4725
added a commit
that referenced
this pull request
May 20, 2026
…idge Closes the two non-cross-check nightly failures still red after #1881: 1. `CLI Tests (macos-latest)` — `python-dataflow` sender exits 143 (#1882). 2. `ROS2 Bridge Examples` — `typed::tests::test_python_array_code` fails with `ModuleNotFoundError: No module named 'numpy'`. (1) Make python-dataflow sender well-behaved ============================================ `examples/python-dataflow/sender.py` ran a fixed `range(100) + sleep(0.1)` loop with no event polling — total runtime ≈ 10 s, sitting right at the `--stop-after 10s` edge. In yesterday's failing nightly the sender was at message 90 of 100 when the daemon's stop message landed; on Linux the sender narrowly finishes first, on macOS the runner is just slow enough that the daemon's stop arrives first, the sender (deep in a `send_output` + `sleep` pair) never observes it, the daemon's grace window elapses, and the sender gets SIGTERMed. dora correctly reports exit code 143 as a node failure (per @phil-opp's design contract in #1882: SIGTERM signals cleanup may not have happened, which is information the user wants surfaced). The fix is to drain pending events between sends so the soft-stop message lands and the sender exits via `return` before the grace window elapses: for i in range(100): for event in node.drain(): if event["type"] == "STOP": return node.send_output("message", pa.array([i])) time.sleep(0.1) `node.drain()` is dora's documented "give me all available events without blocking" API (`apis/python/node/src/lib.rs::drain`). The other senders in the tree (`examples/dynamic-add-remove/sender.py`, `examples/python- async/send_data.py`) already follow the canonical `for event in node:` pattern; this brings python-dataflow into the same shape. (2) Install numpy in ROS2 bridge nightly job ============================================ The Rust test `typed::tests::test_python_array_code` at `libraries/extensions/ros2-bridge/python/src/typed/mod.rs:58` loads the embedded Python fixture at `libraries/extensions/ros2-bridge/python/test_utils.py`, which does `import numpy as np` at line 3. The `ros2-bridge` nightly job at `.github/workflows/nightly.yml` runs `pip install pyarrow` before `cargo test -p dora-ros2-bridge-python` but never installs numpy. Result: `ModuleNotFoundError`. One-line fix: pip install pyarrow numpy The two fixes are independent but bundled because both are part of the same "drive nightly to green after #1881 unblocked cross-check" sweep. Closes #1882. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
3 tasks
heyong4725
added a commit
that referenced
this pull request
May 20, 2026
Closes the ROS2 Bridge Examples failure that remained red after #1881 unblocked the cross-check matrix. The other failing nightly job (`CLI Tests (macos-latest)` python-dataflow SIGTERM) is NOT addressed here — see #1882 for the deeper investigation now needed. The ROS2 fix ============ `typed::tests::test_python_array_code` at `libraries/extensions/ros2-bridge/python/src/typed/mod.rs:58` loads the embedded Python fixture at `libraries/extensions/ros2-bridge/python/test_utils.py:3`, which does `import numpy as np`. The `ros2-bridge` nightly job at `.github/workflows/nightly.yml` installs `pip install pyarrow` before `cargo test -p dora-ros2-bridge-python` but never installs numpy. Result: `ModuleNotFoundError: No module named 'numpy'`. One-line fix: - run: pip install pyarrow + run: pip install pyarrow numpy Why the SIGTERM issue was reverted from this PR ================================================ The initial commit on this branch added a `node.drain()` STOP-poll to `examples/python-dataflow/sender.py`. Code review surfaced that the change did not actually make the nightly test pass — re-running the exact CI invocation (`dora run examples/python-dataflow/dataflow.yml --uv --stop-after 10s`) still produced ExitCode(143) for all three nodes. Subsequent local investigation on macos-aarch64 confirmed the SIGTERM bug is deeper than the sender pattern: * `receiver.py` and `transformer.py` already use the canonical `for event in node:` loop with explicit `break` on STOP. Both still report ExitCode(143). * A trivial sender (10 messages × 100ms = ~1s natural runtime) with `--stop-after 5s` still produces ExitCode(143) on all three nodes 10s after the soft-stop is sent. * The dora daemon code at `running_dataflow.rs:361-367` does send `NodeEvent::Stop` through each node's subscribe channel before the 10-second SIGTERM grace window. But adding debug prints to `receiver.py` shows nothing reaches stdout between "starting" and the eventual SIGTERM-triggered flush, suggesting either: - `node = Node()` is blocking longer than expected on macOS, or - The daemon's stdout-capture buffers prints until process exit, masking what the receiver was actually doing. Either way, the fix is on the dora-daemon side and out of scope for a one-line example tweak. #1882 stays open and a follow-up post on that thread documents the investigation trail. This PR is now scoped narrowly to the ROS2 numpy fix, which IS verifiable: the test fails today with `ModuleNotFoundError`, the one-line `pip install pyarrow numpy` change resolves it. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
trunk-io Bot
pushed a commit
that referenced
this pull request
May 20, 2026
Closes the ROS2 Bridge Examples failure that remained red after #1881 unblocked the cross-check matrix. The other failing nightly job (`CLI Tests (macos-latest)` python-dataflow SIGTERM) is NOT addressed here — see #1882 for the deeper investigation now needed. The ROS2 fix ============ `typed::tests::test_python_array_code` at `libraries/extensions/ros2-bridge/python/src/typed/mod.rs:58` loads the embedded Python fixture at `libraries/extensions/ros2-bridge/python/test_utils.py:3`, which does `import numpy as np`. The `ros2-bridge` nightly job at `.github/workflows/nightly.yml` installs `pip install pyarrow` before `cargo test -p dora-ros2-bridge-python` but never installs numpy. Result: `ModuleNotFoundError: No module named 'numpy'`. One-line fix: - run: pip install pyarrow + run: pip install pyarrow numpy Why the SIGTERM issue was reverted from this PR ================================================ The initial commit on this branch added a `node.drain()` STOP-poll to `examples/python-dataflow/sender.py`. Code review surfaced that the change did not actually make the nightly test pass — re-running the exact CI invocation (`dora run examples/python-dataflow/dataflow.yml --uv --stop-after 10s`) still produced ExitCode(143) for all three nodes. Subsequent local investigation on macos-aarch64 confirmed the SIGTERM bug is deeper than the sender pattern: * `receiver.py` and `transformer.py` already use the canonical `for event in node:` loop with explicit `break` on STOP. Both still report ExitCode(143). * A trivial sender (10 messages × 100ms = ~1s natural runtime) with `--stop-after 5s` still produces ExitCode(143) on all three nodes 10s after the soft-stop is sent. * The dora daemon code at `running_dataflow.rs:361-367` does send `NodeEvent::Stop` through each node's subscribe channel before the 10-second SIGTERM grace window. But adding debug prints to `receiver.py` shows nothing reaches stdout between "starting" and the eventual SIGTERM-triggered flush, suggesting either: - `node = Node()` is blocking longer than expected on macOS, or - The daemon's stdout-capture buffers prints until process exit, masking what the receiver was actually doing. Either way, the fix is on the dora-daemon side and out of scope for a one-line example tweak. #1882 stays open and a follow-up post on that thread documents the investigation trail. This PR is now scoped narrowly to the ROS2 numpy fix, which IS verifiable: the test fails today with `ModuleNotFoundError`, the one-line `pip install pyarrow numpy` change resolves it. Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Unbreaks the pyo3 abi3 build regression across both CI providers: GHA nightly cross-check (all 8 targets) and CircleCI's workspace-building jobs (12 jobs). Two coordinated changes — the workspace
pyo3-build-configfeature addition (which fixes builds withPYO3_NO_PYTHON=1), plus mirroring the GHA nightly env (PYO3_NO_PYTHON=1, plusCROSS_BUILD_ENV_PASSTHROUGH=PYO3_NO_PYTHONfor the cross-rs runs) into the CircleCI jobs that were missing it.Root cause (regression bisection)
db8dbf60) merged, adding the workspacepyo3-build-configdep with an incomplete feature listPR #1833 added:
— missing any
abi3-py3X.pyo3-build-config 0.28's own build script then has two failure modes:Interpreter present, version < abi3 minimum (3.11) → fails with
Python interpreter version (3.10) is less than abi3 minimum (3.11). Hits every CircleCI job whose container ships Python 3.10 (cimg/rust:1.92.0linux, thecross-rsDocker images, the older macOS Xcode 15.4 system Python, etc.).PYO3_NO_PYTHON=1set, no abi3-py3X feature on pyo3-build-config → fails withAn abi3-py3* feature must be specified when compiling without a Python interpreter. Hits GHA nightly's cross-check matrix (which setsPYO3_NO_PYTHON=1from fix(ci): provision Python 3.11+ before pyo3 abi3-py311 builds in nightly (closes #1866) #1869 specifically to skip the probe in cross-rs Docker containers).The two failure modes together explain every red job on this head — they're the same root cause hitting different env shapes.
Fix
1.
Cargo.toml: addabi3-py311to the workspacepyo3-build-configfeatures, so the no-interpreter path knows which ABI to target.2.
.circleci/config.yml: mirror the GHA nightly env (PYO3_NO_PYTHON=1) into every CircleCI job that builds the workspace, so the interpreter probe is skipped on containers without Python 3.11+. Thecross-check-crossjob additionally getsCROSS_BUILD_ENV_PASSTHROUGH=PYO3_NO_PYTHONsocross-rspropagates the env into the Docker container it spawns.Jobs touched (all add
PYO3_NO_PYTHON: "1"toenvironment:):clippy(was failing)cross-check-native(was failing — 1 target)cross-check-macos(defensive — macOS Xcode 15.4 ships older Python)cross-check-cross(was failing — 5 targets; also getsCROSS_BUILD_ENV_PASSTHROUGH)msrv(was failing)test-linux(was failing)test-macos(was failing)examples-linux(was failing)bench-example(was failing)e2e(defensive — depends on test-linux's workspace)contract-tests(defensive — same)bench(defensive —cargo test --no-runs the benchmarks)Verification
Local reproduction + fix confirmation on macos-aarch64:
Plus YAML validation on the CircleCI config change:
Test plan
cargo fmt --all -- --checkPYO3_NO_PYTHON=1 cargo check --all --exclude dora-{node-api,operator-api,ros2-bridge,cli-api}-python✓cargo clippy --all --exclude dora-{node-api,operator-api,ros2-bridge}-python -- -D warnings✓.circleci/config.yml:1003,:1018) addressed —cross-check-crossnow carries both env varsLikely closes #1860
#1860 tracked the CircleCI environment cluster (clippy, cross-check-non-x86_64-linux, msrv) as failing across all PRs this consolidation push. The root cause turned out to be the same pyo3-build-config regression — so this PR should close that issue. I'll confirm after CircleCI runs on this head.
What this does NOT address
The
CLI Tests (macos-latest)GHA nightly job failure is unrelated (filed as #1882). Python sender on macOS gets SIGTERMed on--stop-after, exit code 143. Different code path, different fix needed.🤖 Generated with Claude Code