Find available CPU for running ray AOI pipeline. by tcnichol · Pull Request #146 · awi-response/darts-nextgen

tcnichol · 2025-07-23T17:41:56Z

I added a few new features to the ray AOI pipeline.

Previously there were more CPUs available, but typically only a small number were used when running the pipeline. I added some more code as a first draft to make sure that the pipeline is using the available CPUs. Other changes are needing to check the current network configuration. New method added to utils.cuda

…s being written

add method to get network interface

tcnichol · 2025-07-23T19:10:07Z

Note - running this I used the environment from the pixi.toml file environment cuda128. That matched our setup at NCSA.

changes in concurrency (avoiding backpressured) adding psutil dependency

Dockerfile

create_conda.sh

darts/src/darts/pipelines/_ray_wrapper.py

darts/src/darts/pipelines/ray_v2.py

relativityhd · 2025-07-31T15:31:19Z

darts/src/darts/pipelines/ray_v2.py

        @ray.remote
        def init_worker():
+            # Set critical CUDA variables before any imports
+            os.environ["CUDA_VISIBLE_DEVICES"] = "0"  # Or your device index


Same as above, shouldn't this be done automatically?

darts/src/darts/pipelines/ray_v2.py

darts/src/darts/utils/cuda.py

pixi.toml

update_conda.sh

using same logger not ray wrapper logger

tcnichol added 7 commits July 16, 2025 14:16

this seems to be running, not done yet, but pit looks like new data i…

4a80626

…s being written

add TODO note for network check

f5955ac

add TODO note for network check

4468e74

Merge branch 'main' into ray-todd-scratch-2

8466541

add another TODO to finish before ready for review

a88daa3

add dependency netifaces

4b531ec

add method to get network interface

using value obtained for network dynamically, not hardcoded ens3

fac10cf

tcnichol requested a review from relativityhd July 23, 2025 17:41

tcnichol changed the title ~~Find Proper CPU and network interface for running in ray~~ Find available CPU for running ray AOI pipeline. Jul 23, 2025

tcnichol added 3 commits July 23, 2025 15:04

change 'available cpu' and 'safe cpu' are separate values

1fe51f9

adding more logging,

14cddbf

changes in concurrency (avoiding backpressured) adding psutil dependency

not sure these values are right, will work on this on another branch

160afae

relativityhd requested changes Jul 31, 2025

View reviewed changes

tcnichol added 16 commits August 11, 2025 14:18

deleting files

c30812f

remove psutil logging not needed

c84045b

change logging from info to debug

2b212cd

move netifaces import

84b7010

using same logger not ray wrapper logger

making logging the same, need to test and see if it shows same results

bc4aec5

if statement, for if we don't get a value for cpus or devices

d9a3e07

Merge branch 'main' into ray-todd-scratch-2

4597da9

Merge branch 'main' into ray-todd-scratch-2

0cee399

else case - use cpus and devices supplied (if any)

3510f3b

sticking with original values for concurrency

ff06d7a

commented out some things that did not work

a26614d

adding pixi instructions to read me

f74ca9e

removing unneeded logging

4bd8400

change in readme

ce4c7f5

adding commands to readme

cf76a7e

new info in readme

0e18a51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Find available CPU for running ray AOI pipeline. #146

Find available CPU for running ray AOI pipeline. #146
tcnichol wants to merge 26 commits intomainfrom
ray-todd-scratch-2

tcnichol commented Jul 23, 2025

Uh oh!

tcnichol commented Jul 23, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

relativityhd Jul 31, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tcnichol commented Jul 23, 2025

Uh oh!

tcnichol commented Jul 23, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

relativityhd Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants