Skip to content

fix(build,ci): pyo3-build-config abi3-py311 + PYO3_NO_PYTHON env mirror across CircleCI (unbreaks GHA nightly + CircleCI cluster)#1881

Merged
heyong4725 merged 1 commit into
mainfrom
fix/nightly-pyo3-build-config-abi3
May 20, 2026
Merged

fix(build,ci): pyo3-build-config abi3-py311 + PYO3_NO_PYTHON env mirror across CircleCI (unbreaks GHA nightly + CircleCI cluster)#1881
heyong4725 merged 1 commit into
mainfrom
fix/nightly-pyo3-build-config-abi3

Conversation

@heyong4725
Copy link
Copy Markdown
Collaborator

@heyong4725 heyong4725 commented May 20, 2026

Summary

Unbreaks the pyo3 abi3 build regression across both CI providers: GHA nightly cross-check (all 8 targets) and CircleCI's workspace-building jobs (12 jobs). Two coordinated changes — the workspace pyo3-build-config feature addition (which fixes builds with PYO3_NO_PYTHON=1), plus mirroring the GHA nightly env (PYO3_NO_PYTHON=1, plus CROSS_BUILD_ENV_PASSTHROUGH=PYO3_NO_PYTHON for the cross-rs runs) into the CircleCI jobs that were missing it.

Root cause (regression bisection)

Date GHA Nightly Notes
2026-05-12 ✅ green Last known-good
2026-05-13 ❌ red PR #1833 (db8dbf60) merged, adding the workspace pyo3-build-config dep with an incomplete feature list
2026-05-14 → 2026-05-19 ❌ red Continuous failure

PR #1833 added:

pyo3-build-config = { version = "0.28", features = ["resolve-config"] }

— missing any abi3-py3X. pyo3-build-config 0.28's own build script then has two failure modes:

  1. Interpreter present, version < abi3 minimum (3.11) → fails with Python interpreter version (3.10) is less than abi3 minimum (3.11). Hits every CircleCI job whose container ships Python 3.10 (cimg/rust:1.92.0 linux, the cross-rs Docker images, the older macOS Xcode 15.4 system Python, etc.).

  2. PYO3_NO_PYTHON=1 set, no abi3-py3X feature on pyo3-build-config → fails with An abi3-py3* feature must be specified when compiling without a Python interpreter. Hits GHA nightly's cross-check matrix (which sets PYO3_NO_PYTHON=1 from fix(ci): provision Python 3.11+ before pyo3 abi3-py311 builds in nightly (closes #1866) #1869 specifically to skip the probe in cross-rs Docker containers).

The two failure modes together explain every red job on this head — they're the same root cause hitting different env shapes.

Fix

1. Cargo.toml: add abi3-py311 to the workspace pyo3-build-config features, so the no-interpreter path knows which ABI to target.

- pyo3-build-config = { version = "0.28", features = ["resolve-config"] }
+ pyo3-build-config = { version = "0.28", features = ["resolve-config", "abi3-py311"] }

2. .circleci/config.yml: mirror the GHA nightly env (PYO3_NO_PYTHON=1) into every CircleCI job that builds the workspace, so the interpreter probe is skipped on containers without Python 3.11+. The cross-check-cross job additionally gets CROSS_BUILD_ENV_PASSTHROUGH=PYO3_NO_PYTHON so cross-rs propagates the env into the Docker container it spawns.

Jobs touched (all add PYO3_NO_PYTHON: "1" to environment:):

  • clippy (was failing)
  • cross-check-native (was failing — 1 target)
  • cross-check-macos (defensive — macOS Xcode 15.4 ships older Python)
  • cross-check-cross (was failing — 5 targets; also gets CROSS_BUILD_ENV_PASSTHROUGH)
  • msrv (was failing)
  • test-linux (was failing)
  • test-macos (was failing)
  • examples-linux (was failing)
  • bench-example (was failing)
  • e2e (defensive — depends on test-linux's workspace)
  • contract-tests (defensive — same)
  • bench (defensive — cargo test --no-runs the benchmarks)

Verification

Local reproduction + fix confirmation on macos-aarch64:

# Without the fix (Cargo.toml side, simulating GHA nightly env):
$ PYO3_NO_PYTHON=1 cargo check -p dora-runtime
error: failed to run custom build command for `pyo3-build-config v0.28.3`
  error: An abi3-py3* feature must be specified when compiling
         without a Python interpreter.

# With the Cargo.toml fix + PYO3_NO_PYTHON=1:
$ PYO3_NO_PYTHON=1 cargo check -p dora-runtime
Finished `dev` profile in 29.56s ✓

Plus YAML validation on the CircleCI config change:

$ python3 -c "import yaml; yaml.safe_load(open('.circleci/config.yml'))"
(no output, exit 0)  ✓

Test plan

  • cargo fmt --all -- --check
  • PYO3_NO_PYTHON=1 cargo check --all --exclude dora-{node-api,operator-api,ros2-bridge,cli-api}-python
  • cargo clippy --all --exclude dora-{node-api,operator-api,ros2-bridge}-python -- -D warnings
  • CircleCI YAML parses
  • Reviewer's pointer (.circleci/config.yml:1003, :1018) addressed — cross-check-cross now carries both env vars
  • CircleCI green on this head
  • Next GHA nightly green

Likely closes #1860

#1860 tracked the CircleCI environment cluster (clippy, cross-check-non-x86_64-linux, msrv) as failing across all PRs this consolidation push. The root cause turned out to be the same pyo3-build-config regression — so this PR should close that issue. I'll confirm after CircleCI runs on this head.

What this does NOT address

The CLI Tests (macos-latest) GHA nightly job failure is unrelated (filed as #1882). Python sender on macOS gets SIGTERMed on --stop-after, exit code 143. Different code path, different fix needed.

🤖 Generated with Claude Code

@trunk-io
Copy link
Copy Markdown
Contributor

trunk-io Bot commented May 20, 2026

Merging to main in this repository is managed by Trunk.

  • To merge this pull request, check the box to the left or comment /trunk merge below.

After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here

@heyong4725 heyong4725 force-pushed the fix/nightly-pyo3-build-config-abi3 branch from c872a88 to 1ffbdb6 Compare May 20, 2026 03:30
@heyong4725 heyong4725 changed the title fix(build): add abi3-py311 to workspace pyo3-build-config (unbreaks nightly cross-check) fix(build,ci): pyo3-build-config abi3-py311 + PYO3_NO_PYTHON env mirror across CircleCI (unbreaks GHA nightly + CircleCI cluster) May 20, 2026
@heyong4725 heyong4725 force-pushed the fix/nightly-pyo3-build-config-abi3 branch from 1ffbdb6 to 1828521 Compare May 20, 2026 03:36
Unbreaks the nightly cross-check matrix (all 8 targets failing since
2026-05-13) and any local build with `PYO3_NO_PYTHON=1` set.

Root cause
==========

PR #1833 (db8dbf6, merged 2026-05-13) added the workspace
`pyo3-build-config` dependency to support the new zero-copy
`send_output_raw` infrastructure:

    pyo3-build-config = { version = "0.28", features = ["resolve-config"] }

The feature list omitted `abi3-py3X`. pyo3-build-config 0.28's own
build script asserts at compile time:

  - If a Python interpreter is available: probe it, require
    interpreter version >= the workspace's abi3 minimum (3.11).
  - If `PYO3_NO_PYTHON=1` is set (the nightly cross-check env from
    #1869, which can't always provision a 3.11+ interpreter inside
    cross-rs Docker images): require an `abi3-py3X` feature flag on
    pyo3-build-config itself so it knows which ABI to bake in.

Neither branch was satisfied: the env said "don't probe an
interpreter" but the feature list said "I haven't told you what ABI
to target." Result:

    error: failed to run custom build command for `pyo3-build-config v0.28.3`
      error: An abi3-py3* feature must be specified when compiling
             without a Python interpreter.

This fires once per native target (matrix.cross=false) and once per
cross-rs container (matrix.cross=true), so all 8 cross-check jobs in
the nightly matrix have been red continuously since 2026-05-13.

The other pyo3 consumers in the workspace (`apis/python/operator`,
`libraries/extensions/ros2-bridge/python`, `binaries/runtime`,
`binaries/cli` partially) already pin `abi3-py311` on their pyo3
dependency. The workspace `pyo3-build-config` dep just missed the
same setting.

Fix
===

Add `"abi3-py311"` to the workspace `pyo3-build-config` feature list
in `Cargo.toml`. One-line change. Matches the abi3 minimum already
used by every other pyo3 consumer in the workspace.

Verification
============

Local reproduction + fix confirmation on macos-aarch64:

  # without fix:
  $ PYO3_NO_PYTHON=1 cargo check -p dora-runtime
  error: failed to run custom build command for `pyo3-build-config v0.28.3`
    error: An abi3-py3* feature must be specified when compiling
           without a Python interpreter.

  # with fix:
  $ PYO3_NO_PYTHON=1 cargo check -p dora-runtime
  Finished `dev` profile in 29.56s ✓

Full workspace check + fmt + clippy with PYO3_NO_PYTHON=1:

  cargo fmt --all -- --check                                    ✓
  cargo check --all --exclude dora-{node-api,operator-api,ros2-bridge,cli-api}-python  ✓
  cargo clippy --all --exclude dora-{node-api,operator-api,ros2-bridge}-python -- -D warnings  ✓

Unblocks
========

- All 8 cross-check nightly targets (failing since 2026-05-13)
- Local builds in environments where Python 3.11+ isn't available
  on the build host (some CI images, sandboxed builds)

Does NOT address
================

The CLI Tests (macos-latest) nightly job (also failing today with a
node-sender SIGTERM, exit code 143) is unrelated to this regression.
Filed separately for investigation.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@heyong4725 heyong4725 force-pushed the fix/nightly-pyo3-build-config-abi3 branch from 1828521 to f7bd6e0 Compare May 20, 2026 03:45
@heyong4725 heyong4725 merged commit 794bf4c into main May 20, 2026
24 of 36 checks passed
@heyong4725 heyong4725 deleted the fix/nightly-pyo3-build-config-abi3 branch May 20, 2026 03:53
heyong4725 added a commit that referenced this pull request May 20, 2026
…idge

Closes the two non-cross-check nightly failures still red after #1881:

1. `CLI Tests (macos-latest)` — `python-dataflow` sender exits 143
   (#1882).
2. `ROS2 Bridge Examples` — `typed::tests::test_python_array_code`
   fails with `ModuleNotFoundError: No module named 'numpy'`.

(1) Make python-dataflow sender well-behaved
============================================

`examples/python-dataflow/sender.py` ran a fixed `range(100) + sleep(0.1)`
loop with no event polling — total runtime ≈ 10 s, sitting right at
the `--stop-after 10s` edge. In yesterday's failing nightly the
sender was at message 90 of 100 when the daemon's stop message
landed; on Linux the sender narrowly finishes first, on macOS the
runner is just slow enough that the daemon's stop arrives first,
the sender (deep in a `send_output` + `sleep` pair) never observes
it, the daemon's grace window elapses, and the sender gets
SIGTERMed. dora correctly reports exit code 143 as a node failure
(per @phil-opp's design contract in #1882: SIGTERM signals cleanup
may not have happened, which is information the user wants
surfaced).

The fix is to drain pending events between sends so the soft-stop
message lands and the sender exits via `return` before the grace
window elapses:

    for i in range(100):
        for event in node.drain():
            if event["type"] == "STOP":
                return
        node.send_output("message", pa.array([i]))
        time.sleep(0.1)

`node.drain()` is dora's documented "give me all available events
without blocking" API
(`apis/python/node/src/lib.rs::drain`). The other senders in the
tree (`examples/dynamic-add-remove/sender.py`, `examples/python-
async/send_data.py`) already follow the canonical `for event in
node:` pattern; this brings python-dataflow into the same shape.

(2) Install numpy in ROS2 bridge nightly job
============================================

The Rust test `typed::tests::test_python_array_code` at
`libraries/extensions/ros2-bridge/python/src/typed/mod.rs:58`
loads the embedded Python fixture at
`libraries/extensions/ros2-bridge/python/test_utils.py`, which
does `import numpy as np` at line 3. The `ros2-bridge` nightly
job at `.github/workflows/nightly.yml` runs `pip install pyarrow`
before `cargo test -p dora-ros2-bridge-python` but never installs
numpy. Result: `ModuleNotFoundError`. One-line fix:

    pip install pyarrow numpy

The two fixes are independent but bundled because both are part of
the same "drive nightly to green after #1881 unblocked cross-check"
sweep.

Closes #1882.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
heyong4725 added a commit that referenced this pull request May 20, 2026
Closes the ROS2 Bridge Examples failure that remained red after #1881
unblocked the cross-check matrix. The other failing nightly job
(`CLI Tests (macos-latest)` python-dataflow SIGTERM) is NOT addressed
here — see #1882 for the deeper investigation now needed.

The ROS2 fix
============

`typed::tests::test_python_array_code` at
`libraries/extensions/ros2-bridge/python/src/typed/mod.rs:58`
loads the embedded Python fixture at
`libraries/extensions/ros2-bridge/python/test_utils.py:3`, which
does `import numpy as np`. The `ros2-bridge` nightly job at
`.github/workflows/nightly.yml` installs `pip install pyarrow`
before `cargo test -p dora-ros2-bridge-python` but never installs
numpy. Result: `ModuleNotFoundError: No module named 'numpy'`.

One-line fix:

    -        run: pip install pyarrow
    +        run: pip install pyarrow numpy

Why the SIGTERM issue was reverted from this PR
================================================

The initial commit on this branch added a `node.drain()` STOP-poll
to `examples/python-dataflow/sender.py`. Code review surfaced that
the change did not actually make the nightly test pass —
re-running the exact CI invocation
(`dora run examples/python-dataflow/dataflow.yml --uv --stop-after
10s`) still produced ExitCode(143) for all three nodes.

Subsequent local investigation on macos-aarch64 confirmed the
SIGTERM bug is deeper than the sender pattern:

* `receiver.py` and `transformer.py` already use the canonical
  `for event in node:` loop with explicit `break` on STOP. Both
  still report ExitCode(143).
* A trivial sender (10 messages × 100ms = ~1s natural runtime)
  with `--stop-after 5s` still produces ExitCode(143) on all
  three nodes 10s after the soft-stop is sent.
* The dora daemon code at `running_dataflow.rs:361-367` does
  send `NodeEvent::Stop` through each node's subscribe channel
  before the 10-second SIGTERM grace window. But adding debug
  prints to `receiver.py` shows nothing reaches stdout between
  "starting" and the eventual SIGTERM-triggered flush, suggesting
  either:
    - `node = Node()` is blocking longer than expected on macOS, or
    - The daemon's stdout-capture buffers prints until process
      exit, masking what the receiver was actually doing.

Either way, the fix is on the dora-daemon side and out of scope
for a one-line example tweak. #1882 stays open and a follow-up
post on that thread documents the investigation trail.

This PR is now scoped narrowly to the ROS2 numpy fix, which IS
verifiable: the test fails today with `ModuleNotFoundError`, the
one-line `pip install pyarrow numpy` change resolves it.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
trunk-io Bot pushed a commit that referenced this pull request May 20, 2026
Closes the ROS2 Bridge Examples failure that remained red after #1881
unblocked the cross-check matrix. The other failing nightly job
(`CLI Tests (macos-latest)` python-dataflow SIGTERM) is NOT addressed
here — see #1882 for the deeper investigation now needed.

The ROS2 fix
============

`typed::tests::test_python_array_code` at
`libraries/extensions/ros2-bridge/python/src/typed/mod.rs:58`
loads the embedded Python fixture at
`libraries/extensions/ros2-bridge/python/test_utils.py:3`, which
does `import numpy as np`. The `ros2-bridge` nightly job at
`.github/workflows/nightly.yml` installs `pip install pyarrow`
before `cargo test -p dora-ros2-bridge-python` but never installs
numpy. Result: `ModuleNotFoundError: No module named 'numpy'`.

One-line fix:

    -        run: pip install pyarrow
    +        run: pip install pyarrow numpy

Why the SIGTERM issue was reverted from this PR
================================================

The initial commit on this branch added a `node.drain()` STOP-poll
to `examples/python-dataflow/sender.py`. Code review surfaced that
the change did not actually make the nightly test pass —
re-running the exact CI invocation
(`dora run examples/python-dataflow/dataflow.yml --uv --stop-after
10s`) still produced ExitCode(143) for all three nodes.

Subsequent local investigation on macos-aarch64 confirmed the
SIGTERM bug is deeper than the sender pattern:

* `receiver.py` and `transformer.py` already use the canonical
  `for event in node:` loop with explicit `break` on STOP. Both
  still report ExitCode(143).
* A trivial sender (10 messages × 100ms = ~1s natural runtime)
  with `--stop-after 5s` still produces ExitCode(143) on all
  three nodes 10s after the soft-stop is sent.
* The dora daemon code at `running_dataflow.rs:361-367` does
  send `NodeEvent::Stop` through each node's subscribe channel
  before the 10-second SIGTERM grace window. But adding debug
  prints to `receiver.py` shows nothing reaches stdout between
  "starting" and the eventual SIGTERM-triggered flush, suggesting
  either:
    - `node = Node()` is blocking longer than expected on macOS, or
    - The daemon's stdout-capture buffers prints until process
      exit, masking what the receiver was actually doing.

Either way, the fix is on the dora-daemon side and out of scope
for a one-line example tweak. #1882 stays open and a follow-up
post on that thread documents the investigation trail.

This PR is now scoped narrowly to the ROS2 numpy fix, which IS
verifiable: the test fails today with `ModuleNotFoundError`, the
one-line `pip install pyarrow numpy` change resolves it.

Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CircleCI: chronic clippy/cross-check/msrv failures across recent PRs — env, not code (cc @phil-opp)

1 participant