fix: use math.ceil() instead of int() for n_local_cores to prevent ZeroDivisionError on single-core machines#91
Conversation
|
Hi! Could you reformat this PR using our existing PR template: https://github.com/mllam/mllam-data-prep/blob/main/.github/pull_request_template.md ? This guarantees that we follow the development workflows that we have in place. If you have questions about some parts of it feel free to ask! |
|
@joeloskarsson |
| [ | ||
| ["example.danra.yaml", "--dask-distributed-local-core-fraction", "1.0"], | ||
| ["example.danra.yaml", "--dask-distributed-local-core-fraction", "0.5"], | ||
| ["example.danra.yaml", "--dask-distributed-local-core-fraction", "0.0"], |
There was a problem hiding this comment.
How did this test pass if --dask-distributed-local-core-fraction 0.0 results in division by zero? Seems to potentially point to an issue with the test that should then be fixed here as well.
There was a problem hiding this comment.
Hello @joeloskarsson, I believe the reason the test passes is due to
args.dask_distributed_local_core_fraction > 0.0:
in the cli.py at 54th line file as when 0.0 is provided the above condition becomes False so the entire
dask block is skipped thus the test passes. But you are right I should have been more clear I will add comments for better clarity.
Thank you!
Describe your changes
When using
--dask-distributed-local-core-fractionwith a value between0and1on asingle-core machine (e.g. standard GitHub CI), the following line:
truncates towards zero. For example:
This causes a
ZeroDivisionErrorwhen computingmemory_per_worker:The fix replaces
int()withmath.ceil()to always round up to at least 1 worker:This ensures
n_local_cores >= 1regardless of the fraction or number of system cores.Changes:
cli.py— Addedimport mathand replacedint()withmath.ceil()for then_local_corescalculation.test_distributed.py— Added"0.5"as a parametrized test case to cover the fractional value scenario that triggered the bug.Testing — all 4 distributed tests pass:
Issue Link
Closes #32
Type of change
Checklist before requesting a review
pullwith--rebaseoption if possible).Checklist for reviewers
Each PR comes with its own improvements and flaws. The reviewer should check the following:
Author checklist after completed review
added— when you have added new functionalitychanged— when default behaviour of the code has been changedfixes— when your contribution fixes a bugChecklist for assignee
added,changedorfixed)