avoid crash in test step of PyTorch easyblock if runtest is not a command #2883

akesandgren · 2023-02-09T07:47:19Z

(created using eb --new-pr)

Flamefire · 2023-02-09T08:13:52Z

The changeset is huge and it is not exactly clear what is going on here. What is the use case? Why/when would run_test be not a string?

How about changinging tests_out, tests_ec = super(EB_PyTorch, self).test_step(return_output_ec=True) to:

test_result = super(EB_PyTorch, self).test_step(return_output_ec=True)
if test_result is None:
    # Test skipped
    return
tests_out, tests_ec = test_result

And maybe log a message instead of the "Test skipped" comment.
This keeps all the handling IF a test should be run to the super-test_step and only handles the case where it isn't run

But again: Why would you want that given that we have --ignore-test-failure and --skip-test-step?

akesandgren · 2023-02-09T08:16:26Z

Because someone might get into their heads (like me) to set runtest = False which would then result in a crash...
So it's mostly about making sure that bad things don't happen unexpectedly.

Flamefire · 2023-02-09T08:20:51Z

Because someone might get into their heads (like me) to set runtest = False which would then result in a crash... So it's mostly about making sure that bad things don't happen unexpectedly.

Then maybe add to the very top of the function a if self.cfg['runtest'] is False to make this explicit but IMO the proposed solution is better. Or a clear error when it is not a string_type.

But I'd really use an early return here. Avoids all that indentation and complications

akesandgren · 2023-02-09T08:28:47Z

Well runtest = True also crashes, but the
test_result = super(EB_PyTorch, self).test_step(return_output_ec=True)
version is probably easier to get right.

Flamefire · 2023-02-09T08:35:53Z

Well runtest = True also crashes,

True (no pun intended). Then maybe instead of logging "Tests skipped" we should simply throw an EasyBuildError if the result is None maybe with some handling of the potential values. E.g. runtest=False -> Refer to --skip-test-step, runtest set (if runtest:) -> Likely invalid format, otherwise unknown error in the super-method

akesandgren · 2023-02-09T09:59:06Z

Wouldn't it be better to pick up on this as early as possible... i.e. at init time or so...

Flamefire · 2023-02-09T11:24:30Z

Wouldn't it be better to pick up on this as early as possible... i.e. at init time or so...

Not sure if we really want to duplicate the logic. E.g. the super-test_step does some checking e.g. against self.testcmd and only then decides whether to run the tests. And all official ECs have this right anyway. So I'd just improve the error, not preventatively avoid it.
My thinking is that if we change e.g. the conditions in (super-)test_step or set self.testcmd somewhere in pytorch we could miss that error handling in __init__ which could become plain wrong. Hence I like the Python mantra: "Better ask for forgiveness than permission" --> Handle the error case if and when it appears. Not avoid it in case someone broke something.

akesandgren · 2023-02-09T11:33:28Z

Ok, makes sense. just tired of waiting for +30min of building before the code hits the problem :-)

…ult instead of trying to avoid the problem explicitly.

akesandgren · 2023-02-09T12:19:25Z

@Flamefire something like this?

Flamefire

Yes almost. I'd invert the condition though such that the "False" message is correct and the other a best guess.

Flamefire · 2023-02-09T13:19:15Z

easybuild/easyblocks/p/pytorch.py

+            if self.cfg['runtest'] or self.cfg['runtest'] is None:
+                msg = "runtest must be set to a command to run."
+            else:
+                msg = "Do not set runtest to False, use --skip-test-step instead."


Suggested change

if self.cfg['runtest'] or self.cfg['runtest'] is None:

msg = "runtest must be set to a command to run."

else:

msg = "Do not set runtest to False, use --skip-test-step instead."

if self.cfg['runtest'] is False:

msg = "Do not set runtest to False, use --skip-test-step instead."

else:

msg = "Tests did not run. Make sure `runtest` is set to a command."

changed it around now.

boegel

lgtm

boegel · 2023-03-03T03:42:23Z

Test report by @boegel

Overview of tested easyconfigs (in order)

FAIL (build issue) PyTorch-1.12.1-foss-2022a.eb (partial log available at https://gist.github.com/9121f45c935850102c496e262d160d51)

Build succeeded for 0 out of 1 (1 easyconfigs in total)
node3118.skitty.os - Linux RHEL 8.6, x86_64, Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz (skylake_avx512), Python 3.6.8
See https://gist.github.com/bddfff9d5a0629e1915c2ed89e583698 for a full test report.

boegel · 2023-03-03T09:36:47Z

The fact that some PyTorch tests (still) fail is irrelevant to the changes in this PR, so I'll go ahead and merge this...

void pytorch test_step crashing if runtest is not a command

84c5610

akesandgren changed the title ~~void pytorch test_step crashing if runtest is not a command~~ avoid pytorch test_step crashing if runtest is not a command Feb 9, 2023

test return from super(EB_PyTorch, self).test_step and handle the res…

0a8ebd2

…ult instead of trying to avoid the problem explicitly.

Flamefire suggested changes Feb 9, 2023

View reviewed changes

reverse test of runtest and clarify error message.

8fff3f6

Flamefire approved these changes Feb 9, 2023

View reviewed changes

akesandgren requested a review from boegel February 9, 2023 16:09

boegel changed the title ~~avoid pytorch test_step crashing if runtest is not a command~~ avoid crash in test step of PyTorch easyblock if runtest is not a command Feb 15, 2023

boegel added the bug fix label Feb 15, 2023

boegel added this to the next release (4.7.1?) milestone Feb 15, 2023

boegel approved these changes Mar 2, 2023

View reviewed changes

boegel merged commit 5b64b4c into easybuilders:develop Mar 3, 2023

akesandgren deleted the 20230209084122_new_pr_pytorch branch March 3, 2023 10:07

avoid crash in test step of PyTorch easyblock if runtest is not a command #2883

avoid crash in test step of PyTorch easyblock if runtest is not a command #2883

Uh oh!

Conversation

akesandgren commented Feb 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Flamefire commented Feb 9, 2023

Uh oh!

akesandgren commented Feb 9, 2023

Uh oh!

Flamefire commented Feb 9, 2023

Uh oh!

akesandgren commented Feb 9, 2023

Uh oh!

Flamefire commented Feb 9, 2023

Uh oh!

akesandgren commented Feb 9, 2023

Uh oh!

Flamefire commented Feb 9, 2023

Uh oh!

akesandgren commented Feb 9, 2023

Uh oh!

akesandgren commented Feb 9, 2023

Uh oh!

Flamefire left a comment

Choose a reason for hiding this comment

Uh oh!

Flamefire Feb 9, 2023

Choose a reason for hiding this comment

Uh oh!

akesandgren Feb 9, 2023

Choose a reason for hiding this comment

Uh oh!

boegel left a comment

Choose a reason for hiding this comment

Uh oh!

boegel commented Mar 3, 2023

Overview of tested easyconfigs (in order)

Uh oh!

boegel commented Mar 3, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

akesandgren commented Feb 9, 2023 •

edited

Loading