Summary
The CLI Tests (macos-latest) nightly job has been failing on the python-dataflow --stop-after 10s step for several nights. The Linux equivalent passes. Filed as a tracked regression so it doesn't get lost while the cross-check failures (#1881) take priority.
Symptom
In python-dataflow cli-test step (dora run dataflow.yml --uv --stop-after 10s):
sender: Sent message 90 (~10 seconds in, just before --stop-after fires)
dora_coordinator: Received destroy command
dora_daemon: received destroy command -> exiting
dora_coordinator: Coordinator took 206ms for handling event: Control
dora_coordinator: stopped
[ERROR]
Dataflow 019e40ae-d20f-755a-9796-993037f95238 failed:
Node `sender` failed: exited with code 143
Exit code 143 = 128 + SIGTERM. The sender was terminated by SIGTERM as part of the --stop-after-triggered destroy sequence, which is expected shutdown behavior, but dora reports the resulting exit code as a node failure.
What's going wrong (two candidate root causes)
Either:
-
dora's --stop-after shutdown path treats SIGTERM as a node failure — the destroy command sends SIGTERM to nodes, then the dataflow result aggregator interprets the resulting 143 as Node failed: exited with code 143. This would mean the same pattern would surface on Linux too, but Linux passes... so probably not this.
-
Python sender on macOS doesn't trap SIGTERM in time and gets force-killed — the dora node API on Python likely listens for the stop signal and exits cleanly via a KeyboardInterrupt-style mechanism on Linux, but the equivalent on macOS isn't catching SIGTERM fast enough (or at all), so the sender gets force-killed. Linux's signal delivery + Python interrupt handling tends to be faster than macOS's. This is more likely.
Examples to inspect:
examples/python-dataflow/sender.py:17 — the only line that's emitting messages
- The shutdown handling in
apis/python/node/src/lib.rs and apis/python/node/dora/__init__.pyi
Reproduction
Local (macos-arm64):
dora build examples/python-dataflow/dataflow.yml --uv
dora run examples/python-dataflow/dataflow.yml --uv --stop-after 10s
Expected: clean exit, zero error.
Actual: Node sender failed: exited with code 143.
Timeline
| Date |
CLI Tests (macos-latest) |
| 2026-05-12 |
✅ passing |
| 2026-05-13 onward |
mixed: some greens, but consistent failures of this exact step in failed nightly runs (mixed with the cross-check pyo3 regression that #1881 fixes) |
Need a clean bisect on this specifically since the cross-check failures were masking signal during the same window.
Not blocking
Filed during investigation of nightly failures alongside #1881.
Summary
The
CLI Tests (macos-latest)nightly job has been failing on thepython-dataflow--stop-after 10sstep for several nights. The Linux equivalent passes. Filed as a tracked regression so it doesn't get lost while the cross-check failures (#1881) take priority.Symptom
In
python-dataflowcli-test step (dora run dataflow.yml --uv --stop-after 10s):Exit code 143 = 128 + SIGTERM. The sender was terminated by SIGTERM as part of the
--stop-after-triggered destroy sequence, which is expected shutdown behavior, but dora reports the resulting exit code as a node failure.What's going wrong (two candidate root causes)
Either:
dora's
--stop-aftershutdown path treats SIGTERM as a node failure — the destroy command sends SIGTERM to nodes, then the dataflow result aggregator interprets the resulting 143 asNode failed: exited with code 143. This would mean the same pattern would surface on Linux too, but Linux passes... so probably not this.Python sender on macOS doesn't trap SIGTERM in time and gets force-killed — the dora node API on Python likely listens for the stop signal and exits cleanly via a
KeyboardInterrupt-style mechanism on Linux, but the equivalent on macOS isn't catching SIGTERM fast enough (or at all), so the sender gets force-killed. Linux's signal delivery + Python interrupt handling tends to be faster than macOS's. This is more likely.Examples to inspect:
examples/python-dataflow/sender.py:17— the only line that's emitting messagesapis/python/node/src/lib.rsandapis/python/node/dora/__init__.pyiReproduction
Local (macos-arm64):
Expected: clean exit, zero error.
Actual:
Node sender failed: exited with code 143.Timeline
Need a clean bisect on this specifically since the cross-check failures were masking signal during the same window.
Not blocking
docs/agentic-qa-policy.md). It runs on nightly only.Filed during investigation of nightly failures alongside #1881.