Skip to content

feat(uv): per-node managed Python venvs at build + runtime (rescue of #1515)#1820

Merged
trunk-io[bot] merged 10 commits into
mainfrom
rescue/python-uv-managed-env
May 15, 2026
Merged

feat(uv): per-node managed Python venvs at build + runtime (rescue of #1515)#1820
trunk-io[bot] merged 10 commits into
mainfrom
rescue/python-uv-managed-env

Conversation

@heyong4725
Copy link
Copy Markdown
Collaborator

@heyong4725 heyong4725 commented May 15, 2026

Rescues @Harsh-Sahu43's #1515 design (per-node uv-managed Python envs at build + runtime). Reimplemented against current main after 7+ weeks of build/daemon drift made a direct merge impossible (5 conflicts, including the post-#1716 CI rebalance).

Why this matters

Today's --uv is a half-measure: it just swaps pipuv pip in build commands. It buys uv's speed but none of uv's hermetic-env semantics. Python nodes still run against ambient site-packages, which means:

  1. Build/runtime drift — install at build, system Python changes, run uses different Python
  2. Cross-dataflow contamination — last dora build wins, no per-dataflow isolation
  3. Cross-node contamination within a dataflow — tensorflow node and pytorch node fight over shared site-packages
  4. No reproducibilitydora build today, dora run 3 weeks later, transitive deps drift silently

Industry baseline: Python isolation is table stakes for any production-grade Python framework (ROS2 underlay/overlay, Ray runtime_env, Bazel hermetic toolchains, Modal per-function envs).

What this PR does

For Python nodes (.py custom nodes with a build: block, and runtime nodes with Python operators) when --uv is set:

  • dora build --uv creates <working-dir>/.dora/python-envs/<node-id>/ per node via uv venv --clear
  • Installs the Dora Python API into it (preferring an in-tree workspace package over PyPI, idempotent via import dora probe)
  • Runs the node's build commands inside that venv with VIRTUAL_ENV set and the env's bin/ (or Scripts/ on Windows) prepended to PATH
  • dora start / dora run automatically reuse the same interpreter at spawn time
  • Failures are fail-closed: missing interpreter errors out rather than silently falling through to ambient Python

Custom Python nodes without a build: block keep using ambient uv run python — script-only nodes don't need isolation. Restart path re-derives the env dir deterministically so node restarts pick up the same venv.

Plus a dora doctor uv availability check so users without uv get a clear hint before they hit a cryptic build error.

Pieces

File Change
libraries/core/src/build/mod.rs New managed_python_env_dir, managed_python_bin_dir, managed_python_interpreter helpers; Builder::build_node_inner prepares the env when uv=true; new prepare_python_env orchestrator
libraries/core/src/build/build_command.rs New prepare_managed_python_env, ensure_managed_python_runtime, apply_managed_python_env, managed-PATH composition; run_build_command gains python_env_dir arg
binaries/cli/src/command/build/local.rs + binaries/daemon/src/lib.rs Capture python_env_dirs per node alongside node_working_dirs
binaries/daemon/src/spawn/spawner.rs + command.rs Spawn-side uses managed interpreter when one was prepared; falls back to uv run python for unbuilt nodes
binaries/cli/src/command/doctor.rs New check_uv() step in dora doctor
libraries/core/Cargo.toml Adds tokio io-util and macros features for build-command output piping
docs/cli.md + guide/src/operations/cli.md Documents the new --uv semantics

Per-node env path (race fix from pre-landing review)

Initially used <working-dir>/.dora/python-env (matching the original PR design). Pre-landing review caught that Builder.base_working_dir is shared across all nodes — under parallel builds (--mode parallel), two Python nodes sharing a working_dir would race on uv venv --clear against the same directory and silently clobber each other. Fixed to <working-dir>/.dora/python-envs/<node-id>/. Regression test added (env_dir_includes_node_id_so_parallel_builds_do_not_race).

BuildInfo backward compat

#[derive(Debug, Clone, serde::Serialize, serde::Deserialize)]
pub struct BuildInfo {
    pub node_working_dirs: BTreeMap<NodeId, PathBuf>,
    #[serde(default)]
    pub python_env_dirs: BTreeMap<NodeId, PathBuf>,
}

#[serde(default)] means existing build-info YAMLs still parse — verified by build_info_deserializes_without_python_env_dirs.

Tests added

In libraries/core/src/build/mod.rs::tests:

  • assigns_managed_env_dir_to_python_custom_nodes — Python .py node gets <working-dir>/.dora/python-envs/<id>
  • skips_managed_env_dir_for_non_python_custom_nodes — Rust/C++ nodes get None
  • env_dir_includes_node_id_so_parallel_builds_do_not_race — regression test for the race fix
  • build_info_deserializes_without_python_env_dirs — old build-info YAML still parses
  • managed_python_helpers_use_platform_layoutbin/python on Unix, Scripts/python.exe on Windows

Tests use YAML deserialization (not struct literals) to avoid coupling to the exact ResolvedNode field list.

Local validation

  • cargo fmt --all -- --check — clean
  • cargo clippy -p dora-core --features build -p dora-cli -p dora-daemon -- -D warnings — clean
  • cargo test -p dora-core --features build — 157 passing (5 new)
  • cargo test -p dora-cli — 145 passing
  • cargo test -p dora-daemon — 32 passing
  • dora doctor smoke-run on the host — confirmed PASS uv: uv 0.10.8 (...)
  • Authoritative integration test: needs uv on PATH + a real Python dataflow. Reviewer with uv installed can verify:
    dora build --uv examples/python-dataflow/dataflow.yml --local
    ls examples/python-dataflow/.dora/python-envs/   # expect one dir per Python node
    dora run examples/python-dataflow/dataflow.yml --uv --stop-after 5s
    Expected: dora spawns the managed bin/python interpreter rather than uv run python.

Follow-ups (not in this PR)

Closes

Closes #1515.

Co-Authored-By: Harsh-Sahu43

🤖 Generated with Claude Code

@trunk-io
Copy link
Copy Markdown
Contributor

trunk-io Bot commented May 15, 2026

😎 Merged successfully - details.

heyong4725 added a commit that referenced this pull request May 15, 2026
…d race

Pre-landing review of #1820 caught a critical race condition in the
original PR design (#1515) that the rescue inherited.

Builder.base_working_dir is shared across all nodes (the CLI clones the
Builder per task in binaries/cli/src/command/build/local.rs:63). When
multiple Python nodes share a working_dir -- which they do whenever:

  - the dataflow has 2+ runtime nodes with Python operators, OR
  - the dataflow has 2+ custom Python nodes without git sources

...managed_python_env_dir computed the same `<base>/.dora/python-env`
for all of them. Under parallel builds (the --mode parallel flow at
local.rs:83), two concurrent prepare_managed_python_env calls would
both call `uv venv --clear` against the SAME directory, silently
clobbering each other's envs and producing nondeterministic build
artifacts.

Fix: include node.id in the path. Each node now gets
`<working-dir>/.dora/python-envs/<node-id>/` regardless of whether the
working_dir is shared. The spawn-side already re-derives the path
through the same function, so spawn-time interpreter lookup stays
consistent automatically.

Added a regression test
(env_dir_includes_node_id_so_parallel_builds_do_not_race) that asserts
two nodes sharing a working_dir get distinct env dirs.

Docs updated (docs/cli.md, guide/src/operations/cli.md) to show the
new path layout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
heyong4725 and others added 3 commits May 14, 2026 21:35
…1515)

Today's --uv is a half-measure: it just swaps pip -> uv pip in build
commands. It buys uv speed but none of uv hermetic-env semantics --
Python nodes still run against ambient site-packages, which means:

- Build-time deps and runtime Python can diverge (system Python updates,
  user activates a different venv, etc.)
- Two dataflows installing conflicting versions of numpy/tensorflow/pytorch
  overwrite each other site-packages
- Two Python nodes in the same dataflow with incompatible deps fight in
  a single site-packages

This adds per-node managed envs to the --uv flow:

* dora build --uv now creates <working-dir>/.dora/python-env/ per Python
  node via uv venv --clear, installs the Dora Python API into it
  (preferring an in-tree workspace package over PyPI), then runs the
  build commands inside that venv with VIRTUAL_ENV set and PATH prepended.
* dora start / dora run automatically reuse the same interpreter when
  spawning the node, so build-time deps == runtime deps deterministically.
* Failures are fail-closed: if the managed interpreter is missing, we
  error out rather than silently picking ambient Python.
* Custom Python nodes without a build: block keep using ambient uv
  (script-only nodes do not need isolation).
* BuildInfo gains a python_env_dirs map (with #[serde(default)] so
  existing build-info YAMLs still parse).

Plumbing:

* libraries/core/src/build/mod.rs: new managed_python_env_dir,
  managed_python_bin_dir, managed_python_interpreter helpers (pub so the
  daemon spawn path can reuse them); Builder::build_node_inner prepares
  the env when uv=true.
* libraries/core/src/build/build_command.rs: new prepare_managed_python_env
  + ensure_managed_python_runtime + apply_managed_python_env / managed
  PATH composition; run_build_command gains python_env_dir arg.
* binaries/cli/src/command/build/local.rs + binaries/daemon/src/lib.rs:
  capture python_env_dirs per node alongside node_working_dirs.
* binaries/daemon/src/spawn/spawner.rs + command.rs: spawn-side uses the
  managed interpreter directly when one was prepared; falls back to
  uv run python for unbuilt nodes.
* Node restart path re-derives python_env_dir from node_working_dir
  deterministically so restarts pick up the same venv.

dora-core now requires the tokio io-util and macros features for the
build-command output piping.

Rescue of #1515 (Harsh-Sahu43) -- reimplemented against current main
(7+ weeks of build/daemon drift since the PR was opened) using the
original design as reference. Author marked tests + docs as TODO; this
adds both.

Co-Authored-By: Harsh-Sahu43 <noreply@github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…d race

Pre-landing review of #1820 caught a critical race condition in the
original PR design (#1515) that the rescue inherited.

Builder.base_working_dir is shared across all nodes (the CLI clones the
Builder per task in binaries/cli/src/command/build/local.rs:63). When
multiple Python nodes share a working_dir -- which they do whenever:

  - the dataflow has 2+ runtime nodes with Python operators, OR
  - the dataflow has 2+ custom Python nodes without git sources

...managed_python_env_dir computed the same `<base>/.dora/python-env`
for all of them. Under parallel builds (the --mode parallel flow at
local.rs:83), two concurrent prepare_managed_python_env calls would
both call `uv venv --clear` against the SAME directory, silently
clobbering each other's envs and producing nondeterministic build
artifacts.

Fix: include node.id in the path. Each node now gets
`<working-dir>/.dora/python-envs/<node-id>/` regardless of whether the
working_dir is shared. The spawn-side already re-derives the path
through the same function, so spawn-time interpreter lookup stays
consistent automatically.

Added a regression test
(env_dir_includes_node_id_so_parallel_builds_do_not_race) that asserts
two nodes sharing a working_dir get distinct env dirs.

Docs updated (docs/cli.md, guide/src/operations/cli.md) to show the
new path layout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`dora build --uv` and the managed Python env flow (#1820,
`<working-dir>/.dora/python-envs/<node-id>/`) depend on the `uv`
binary being on PATH. Without `uv`, the build fails with
`failed to spawn `uv venv ...`` which doesn't point users at the
install command.

Add a `check_uv()` to `dora doctor` that runs `uv --version`:

  - PASS: `uv: uv 0.10.8 (...)` with the version string
  - WARN: not installed; advises the install command
  - WARN: present but `--version` exited non-zero

WARN, not FAIL, because pure Rust/C++ users do not need `uv` and a
working dora install without `uv` is still healthy -- just can't run
Python dataflows under `--uv`. This mirrors the WARN-not-FAIL choice
for the /dev/shm check (#1818).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two findings from pre-landing review of #1820:

Finding 1: silent fallback to ambient uv when build artifacts lack a
managed env for a node that needs one. This happens with a stale session
file, a prior non---uv build, or `dora start --uv` without rebuilding.
The spawn code at lib.rs:2413 passed None to spawn_node and the
spawn-side fell back to `uv run python` against the ambient environment.
For .py nodes with `build:` and runtime Python operators this silently
loses the isolation the user explicitly asked for by passing --uv.

Fix: at lib.rs spawn dispatch, when uv=true and python_env_dirs has no
entry for the node, compute the expected managed env dir. If the node
would need one, bail with a clear error pointing the user at
`dora build --uv` (or omit --uv).

Finding 2: spawn-side ran the managed interpreter directly but did not
set VIRTUAL_ENV or prepend the venv's bin/ to PATH. Python imports
resolved correctly (the interpreter was the managed one) but
subprocesses (subprocess.run([..]), python, pip, console scripts) and
`python -m pip` inside the node still resolved from the ambient
environment. Not actually hermetic.

Fix: new apply_managed_python_runtime_env helper in spawner.rs sets
VIRTUAL_ENV and composes PATH (managed bin first, then node.env's PATH
if any, then ambient). Called from both Custom Python spawn (Custom +
node.build.is_some()) and Runtime Python spawn paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@heyong4725
Copy link
Copy Markdown
Collaborator Author

Both findings addressed in 54f126d. Reviewer was right on both — silent degradation in the missing-env case + the runtime not actually being hermetic.

Finding 1 — silent fallback to ambient uv

binaries/daemon/src/lib.rs (the spawn dispatch around the old line 2413) now computes the expected managed env dir for the node and bails with a clear error when:

  • uv=true, AND
  • python_env_dirs.get(&node_id) is None, AND
  • managed_python_env_dir(&node, &node_working_dir) returns Some (i.e., the node would need a managed env)

Error message points the user at the right next step:

node <node_id> is a Python node that needs a managed env under --uv, but no managed env was recorded for it during build. Re-run dora build --uv <dataflow> against the current build session, or omit --uv to run against the ambient Python.

Non-Python nodes are unaffected (the predicate function returns None for them, so the bail doesn't fire).

Finding 2 — runtime not hermetic

New apply_managed_python_runtime_env helper in binaries/daemon/src/spawn/spawner.rs:

  • Sets VIRTUAL_ENV = python_env_dir
  • Prepends <python_env_dir>/bin (or Scripts on Windows) to PATH
  • PATH composition order: managed bin first, then node.env's PATH if set, then daemon's ambient PATH

Called from both spawn paths after the existing env injection but before stdio setup:

  • Custom Python: when self.uv && python_env_dir.is_some() && n.build.is_some() — mirrors the condition in path_spawn_command that selects the managed interpreter
  • Runtime Python: when self.uv && python_env_dir.is_some() — Python runtime nodes always use the managed interpreter when one was prepared

Why both VIRTUAL_ENV and PATH (not just one): VIRTUAL_ENV is conventional metadata that some tools check; the actual resolution work is done by PATH. Setting both matches what uv itself sets when you uv run a script.

Local validation:

cargo fmt --all -- --check                                            clean
cargo clippy -p dora-core --features build -p dora-cli -p dora-daemon clean
cargo test -p dora-core --features build                              157 passing
cargo test -p dora-cli                                                145 passing
cargo test -p dora-daemon                                             32 passing

Caveat I'd flag separately: the runtime hermeticity fix is exercised end-to-end only by actually running dora start --uv with a Python node that does subprocess.run(["pip", ...]) and seeing the managed pip get invoked. No unit test for that — the env-application is straightforward (read PATH, prepend, write back), but the integration assertion would require spawning a real Python process. Worth a follow-up integration test once qa-examples gains a Python-isolation case.

Pre-landing review of #1820 found that the new fail-closed guard in
binaries/daemon/src/lib.rs rejected ALL .py custom nodes under --uv when
python_env_dirs has no entry. But:

- Build records managed envs only for Python custom nodes WITH `build:`
  (libraries/core/src/build/mod.rs Builder::build_node_inner filtered on
  `n.build.is_some()`).
- Spawn intentionally falls back to ambient `uv run python` for script-
  only Python custom nodes (binaries/daemon/src/spawn/command.rs:84,
  `python_env_dir.filter(|_| node.build.is_some())`).
- managed_python_env_dir returned Some for EVERY .py custom node, so the
  new guard at lib.rs:2419 over-fired: it would bail on script-only
  Python under `dora start --uv` / `dora run --uv`, breaking dataflows
  that worked before this PR.

Fix: pull the `build.is_some()` requirement into
node_requires_managed_python_env itself. managed_python_env_dir now
returns None for script-only Python custom nodes — matching the build
record AND the spawn-side fallback. The new guard, the build flow, and
the spawn flow now all use the same predicate; no callers need a local
`.filter(|_| n.build.is_some())` anymore.

Tests updated to cover both shapes:

- assigns_managed_env_dir_to_build_backed_python_custom_nodes (renamed
  + node yaml now includes `build:`)
- skips_managed_env_dir_for_script_only_python_custom_nodes (new
  regression test)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@heyong4725
Copy link
Copy Markdown
Collaborator Author

Right, the fail-closed guard was over-fired. Fix in 40f1a7d.

Root cause — three call sites had drifted to different predicates:

Call site Predicate for "node needs managed env"
Builder::build_node_inner (Custom branch) is_python_custom_node(n) + outer `.filter(
path_spawn_command (spawn-side fallback) node.build.is_some() (implicit .py)
managed_python_env_dir just is_python_custom_node(n)
New guard at lib.rs:2419 managed_python_env_dir(...).is_some()

So managed_python_env_dir answered "is this a .py custom node?" but the actual predicate used by build+spawn was "is this a .py custom node with build:". The guard, which I built on top of managed_python_env_dir, inherited the looser predicate and over-fired.

Fix — pull build.is_some() into node_requires_managed_python_env itself:

fn node_requires_managed_python_env(node: &ResolvedNode) -> bool {
    match &node.kind {
        CoreNodeKind::Custom(custom) => {
            is_python_custom_node(custom) && custom.build.is_some()
        }
        CoreNodeKind::Runtime(runtime) => runtime
            .operators
            .iter()
            .any(|op| matches!(op.config.source, OperatorSource::Python(_))),
    }
}

managed_python_env_dir now answers the right question. The redundant .filter(|_| n.build.is_some()) at Builder::build_node_inner (mod.rs:87) was removed since it's now load-bearing on the function rather than the caller. The spawn-side filter at command.rs:84 and the new guard now both naturally match.

Behavior matrix after the fix:

Node Managed env?
Custom .py with build: Yes
Custom .py without build: (script-only) No — uses ambient uv run python
Custom non-.py No
Runtime with Python operator Yes
Runtime non-Python No

Tests added/updated:

  • assigns_managed_env_dir_to_build_backed_python_custom_nodes (renamed from assigns_managed_env_dir_to_python_custom_nodes; test node yaml now includes a build: line)
  • skips_managed_env_dir_for_script_only_python_custom_nodes — new regression test; would have caught this exact issue
  • env_dir_includes_node_id_so_parallel_builds_do_not_race updated to include build: (otherwise both test nodes would now return None and the test would have been a no-op)

Local validation:

cargo fmt --all -- --check                                            clean
cargo clippy -p dora-core --features build -p dora-cli -p dora-daemon clean
cargo test -p dora-core --features build                              158 passing (was 157)
cargo test -p dora-cli                                                145 passing
cargo test -p dora-daemon                                             32 passing

heyong4725 and others added 5 commits May 14, 2026 22:02
Audited every doc file that mentions `--uv`, Python isolation, or the
relevant CLI commands. The initial doc updates only covered the CLI
build flag in `docs/cli.md` and the mirror in
`guide/src/operations/cli.md`. Five gaps remained:

- CHANGELOG.md: no entry for the new feature. Added under Unreleased >
  Added, with separate bullets for the managed env work and the new
  `dora doctor` uv check.
- docs/yaml-spec.md + guide/src/concepts/dataflow-yaml.md: the `build:`
  field row said "pip/pip3 lines use uv when --uv is passed" -- accurate
  before but understates what `--uv` does now. Expanded to mention
  managed venvs + the on-disk layout, with a link to the CLI ref.
- docs/debugging.md: the `dora doctor` "Checks performed" list missed
  the new `uv` check. Added as item #2 between SHM and coordinator
  (matches the order checks run in).
- docs/python-guide.md: had no mention of `--uv` semantics at all. Added
  a "Reproducible Dependencies with `--uv`" section before "Next Steps"
  that lays out the four guarantees (build-time deps == runtime deps,
  no cross-node/cross-dataflow contamination, hermetic subprocesses),
  notes the script-only-Python carve-out, and cross-references the CLI
  ref and `dora doctor`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pulls in the CHANGELOG entries that the previous docs commit dropped
because the file is tracked as 'Changelog.md' in the git index while
the path on disk is 'CHANGELOG.md' (case mismatch on a case-insensitive
filesystem).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…solation

`examples/python-operator-dataflow/` has `dataflow_conda.yml` that
demonstrates per-operator Python isolation via `conda_env:`. After
PR #1820 lands, dora has a second way to do the same thing: pass
`--uv` and dora creates one managed venv per operator from its
`build:` line.

Add an "Alternative: managed venvs with --uv" subsection under
Installation in the example's README that:

- Shows `dora build --uv dataflow.yml && dora run --uv dataflow.yml`
- Notes the env layout (`<dataflow-dir>/.dora/python-envs/<node-id>/`)
- Explains when to pick which (`conda_env:` for pre-existing conda
  envs with explicit names; `--uv` for dora to provision from
  `pip install` build steps automatically)
- Cross-references `dora doctor` for the `uv` availability check

The example's existing dataflow.yml already has the per-operator
`build: pip install -r requirements.txt` lines that `--uv` consumes;
no YAML change is needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
python-dataflow has two variants:

- dataflow.yml -- script-only Python nodes, no build: blocks
- dataflow_dynamic.yml -- two nodes with DIFFERENT git+ pip installs
  per node (opencv-video-capture vs opencv-plot)

The existing README claimed "Or use uv to manage the Python environment
automatically" for dataflow.yml, which overstated what --uv does in
that case: for script-only nodes the spawn just routes through
`uv run python` against the ambient uv env, not a dora-managed env.

Two updates:

1. Variants table now flags each file's build profile -- dataflow.yml as
   "script-only / runs against your ambient Python" and dataflow_dynamic.yml
   as "the natural fit for the --uv managed-env flow" (since each node
   has its own build: line with different deps).

2. New section "Per-node managed environments (--uv + build:)" uses
   dataflow_dynamic.yml as the demo: shows the build/run commands, the
   `.dora/python-envs/<node-id>/` layout, and what `dora doctor`
   guarantees.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pre-landing review of #1820 found that managed_python_env_dir returned
Some for any Runtime node with a Python operator, regardless of whether
that operator pins conda_env. Under --uv this caused two problems:

- Build wastefully created a uv venv at .dora/python-envs/<node-id>/
  for a node whose Python operator was going to run under `conda run`
  anyway.
- Spawn-side, my new apply_managed_python_runtime_env helper then
  overlaid VIRTUAL_ENV and prepended the uv venv's bin dir to PATH on
  top of the conda-run command. The conda interpreter would run, but
  any subprocess (subprocess.run, python -m pip, console scripts)
  would resolve from .dora/python-envs/... instead of the conda env
  the user asked for. Mixing two environment managers.

Fix in node_requires_managed_python_env: require the matched Python
operator to have conda_env.is_none(). This:

- Avoids creating wasted venvs at build time
- Returns None at spawn time -> apply_managed_python_runtime_env skips
  injection (its condition is `python_env_dir.is_some()`)
- Fail-closed guard in lib.rs sees expected = None -> does not bail
- Restart path re-derives None -> spawn falls through to conda path

Two regression tests added:

- skips_managed_env_dir_for_runtime_python_operators_with_conda_env
- assigns_managed_env_dir_for_runtime_python_operators_without_conda_env
  (confirms we didn't break the no-conda path)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@heyong4725
Copy link
Copy Markdown
Collaborator Author

Good catch — the conda_env interaction was a real env-manager-collision bug. Fix in c0986b3.

Root cause: node_requires_managed_python_env (libraries/core/src/build/mod.rs) was any(|op| matches!(op.config.source, OperatorSource::Python(_))) for Runtime nodes — it didn't look at the operator's conda_env field. So a Python operator with conda_env: my-env still got a managed env dir, which meant:

  1. Build wastefully ran uv venv --clear .dora/python-envs/<node-id>/ and uv pip install dora-rs even though the runtime spawn was going to ignore that env entirely.
  2. Spawn-side, path_spawn_command correctly built conda run -n my-env python -uc "import dora..." for the operator, but my new apply_managed_python_runtime_env then injected VIRTUAL_ENV=.dora/python-envs/... and prepended .dora/python-envs/.../bin to PATH over the top. So the conda interpreter ran, but subprocess.run(["pip", ...]), console scripts, and python -m pip inside the operator would resolve from the uv venv — not the conda env the user asked for. Two env managers fighting.

Fix: require the matched Python operator's conda_env to be None for the Runtime branch of the predicate:

CoreNodeKind::Runtime(runtime) => runtime.operators.iter().any(|operator| {
    matches!(
        &operator.config.source,
        OperatorSource::Python(py) if py.conda_env.is_none()
    )
}),

Applying the fix at the predicate level fixes every downstream site in one shot:

  • Build (Builder::build_node_inner): managed_python_env_dir returns None, no venv created, no wasted disk
  • Spawn injection (apply_managed_python_runtime_env): its condition is python_env_dir.is_some(), which is now false → no VIRTUAL_ENV/PATH overlay on the conda command
  • Fail-closed guard (lib.rs): expected.is_some() is now false → no bail, conda path runs cleanly
  • Restart path: re-derives None deterministically → spawn falls through to the conda branch

Regression tests added in libraries/core/src/build/mod.rs::tests:

  • skips_managed_env_dir_for_runtime_python_operators_with_conda_env — the actual regression
  • assigns_managed_env_dir_for_runtime_python_operators_without_conda_env — confirms we didn't accidentally exclude plain Python operators too

Local validation:

cargo fmt --all -- --check                                            clean
cargo clippy -p dora-core --features build -p dora-cli -p dora-daemon clean
cargo test -p dora-core --features build                              160 passing (2 new)
cargo test -p dora-cli                                                145 passing
cargo test -p dora-daemon                                             32 passing

The pattern emerging across the last few reviews — these all came from managed_python_env_dir's predicate being a different shape than what the build/spawn flows actually expected. The predicate is now load-bearing in three places (build, spawn injection, fail-closed guard, restart) and explicitly enumerates the four real cases (custom .py + build, custom .py script-only, runtime Python + conda_env, runtime Python plain). The tests now cover all four shapes plus the parallel-build race regression.

@heyong4725
Copy link
Copy Markdown
Collaborator Author

/trunk merge

@trunk-io trunk-io Bot merged commit 6ae0fa9 into main May 15, 2026
40 of 42 checks passed
@trunk-io trunk-io Bot deleted the rescue/python-uv-managed-env branch May 15, 2026 04:01
@phil-opp
Copy link
Copy Markdown
Collaborator

phil-opp commented May 15, 2026

I reviewed the diff rather than the PR description. I see one compatibility risk and one testing gap worth considering.

Potential regression for Python operators without build:

The new managed-env predicate appears to give every runtime Python operator without conda_env: a managed venv, even when the operator has no build: command. That differs from custom .py nodes, where script-only nodes explicitly keep using the ambient uv env.

This could break existing flows like examples/python-operator-dataflow/dataflow_llm.yml, where operators use python: ... without build: and likely rely on dependencies installed in the caller's environment. After this change, dora run --uv would spawn those operators inside a fresh per-node env containing only the Dora runtime package, so imports for packages installed via requirements_llm.txt may fail.

A safer direction would be either:

  • only create managed envs for runtime Python operators that have build:, matching the custom-node behavior
  • treat this as an intentional breaking change and update docs/examples accordingly

Missing end-to-end coverage for the new contract

The added tests mostly cover path/predicate logic. Given this touches CLI build metadata, daemon spawn, runtime env injection, and Python behavior, I would expect at least one focused smoke/integration test that proves:

  • dora build --uv installs into a per-node env
  • dora run --uv reuses that interpreter
  • subprocess python/pip resolution comes from the managed env

Smaller note

dora doctor now warns about missing uv for everyone, including pure Rust/C++ users. Since it is only a warning, this is not a blocker, but the wording should stay clearly scoped to Python --uv flows to avoid noise.

(generated by codex)

heyong4725 added a commit that referenced this pull request May 17, 2026
The cli-python job's "Test Python template project" step has been
failing on every main commit since #1820 ("feat(uv): per-node
managed Python venvs at build + runtime", commit 6ae0fa9) --
5+ merged commits all red, blocking the Trunk merge queue alongside
the ros2-bridge regression that #1839 just resolved.

Root cause
==========

#1820 changed `dora build --uv` to install each node package into
its own managed venv at `.dora/python-envs/<node>/` (good for
deterministic per-node deps + no cross-node contamination). But the
`dora new --lang python` templates generated test stubs at
`tests/test_<node>.py` do:

    from <node>.main import main

The CI step pytest invocation is `uv run pytest` from the
test_python_project root, which resolves against the ambient venv
(VIRTUAL_ENV set by the earlier setup-python-uv step to the
repo-level .venv/). That venv has dora-rs + pyarrow + pytest + ruff
but no node packages -- those are in the per-node venvs. Pre-#1820,
`dora build` installed nodes into whatever venv was active, so the
imports worked.

Actual failure from CircleCI v1.1 API log on main HEAD (job 5882,
step "Test Python template project"):

    FAILED listener-1/tests/test_listener_1.py::test_import_main
      > from listener_1.main import main
      E ModuleNotFoundError: No module named 'listener_1'
    FAILED talker-1/tests/test_talker_1.py::test_import_main
      > from talker_1.main import main
      E ModuleNotFoundError: No module named 'talker_1'
    FAILED talker-2/tests/test_talker_2.py::test_import_main
      > from talker_2.main import main
      E ModuleNotFoundError: No module named 'talker_2'
    3 failed, 3 passed

Fix
===

After `dora build --uv` (which sets up the per-node venvs), also
install the three node packages into the active venv so `uv run
pytest` at the project root can resolve their imports:

    uv pip install -e listener-1 -e talker-1 -e talker-2

This preserves the per-node-venv isolation that #1820 introduced
(those venvs still drive `dora build` and `dora run`) while giving
the test pass a venv that sees all node packages. The root venv is
ephemeral and only used by this CI step, so no production
contamination.

Scope notes
===========

* CI-only fix; the `dora new` template itself is unchanged. Users
  running `pytest` in their own `dora new` projects will hit the
  same issue, but fixing that requires a template-level change
  (e.g. root pyproject.toml that path-depends on each node, or a
  documented "for testing, also pip install -e <node>/" step in
  the README). That is a separate follow-up; this PR keeps scope
  to "unblock the merge queue."
* The hardcoded node names listener-1, talker-1, talker-2 match
  the `dora new --lang python` template (see
  binaries/cli/src/template/python/mod.rs:126-130). If that
  template grows or shrinks the node set, this list needs to
  follow. A loop over `*/pyproject.toml` would be more robust but
  also more clever-shell; keeping the explicit list for clarity.

Validation
==========

Local repro is awkward (the CI step builds the whole dora CLI from
source first), but the failure mode is mechanically obvious from
the log and the fix is a one-line `uv pip install -e` of the same
packages the build step references. CircleCI will validate on this
PR.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants