Skip to content

Conversation

@akesandgren
Copy link
Contributor

@akesandgren akesandgren commented Feb 9, 2023

(created using eb --new-pr)

@Flamefire
Copy link
Contributor

The changeset is huge and it is not exactly clear what is going on here. What is the use case? Why/when would run_test be not a string?

How about changinging tests_out, tests_ec = super(EB_PyTorch, self).test_step(return_output_ec=True) to:

test_result = super(EB_PyTorch, self).test_step(return_output_ec=True)
if test_result is None:
    # Test skipped
    return
tests_out, tests_ec = test_result

And maybe log a message instead of the "Test skipped" comment.
This keeps all the handling IF a test should be run to the super-test_step and only handles the case where it isn't run

But again: Why would you want that given that we have --ignore-test-failure and --skip-test-step?

@akesandgren
Copy link
Contributor Author

Because someone might get into their heads (like me) to set runtest = False which would then result in a crash...
So it's mostly about making sure that bad things don't happen unexpectedly.

@Flamefire
Copy link
Contributor

Because someone might get into their heads (like me) to set runtest = False which would then result in a crash... So it's mostly about making sure that bad things don't happen unexpectedly.

Then maybe add to the very top of the function a if self.cfg['runtest'] is False to make this explicit but IMO the proposed solution is better. Or a clear error when it is not a string_type.

But I'd really use an early return here. Avoids all that indentation and complications

@akesandgren
Copy link
Contributor Author

Well runtest = True also crashes, but the
test_result = super(EB_PyTorch, self).test_step(return_output_ec=True)
version is probably easier to get right.

@Flamefire
Copy link
Contributor

Well runtest = True also crashes,

True (no pun intended). Then maybe instead of logging "Tests skipped" we should simply throw an EasyBuildError if the result is None maybe with some handling of the potential values. E.g. runtest=False -> Refer to --skip-test-step, runtest set (if runtest:) -> Likely invalid format, otherwise unknown error in the super-method

@akesandgren
Copy link
Contributor Author

Wouldn't it be better to pick up on this as early as possible... i.e. at init time or so...

@Flamefire
Copy link
Contributor

Wouldn't it be better to pick up on this as early as possible... i.e. at init time or so...

Not sure if we really want to duplicate the logic. E.g. the super-test_step does some checking e.g. against self.testcmd and only then decides whether to run the tests. And all official ECs have this right anyway. So I'd just improve the error, not preventatively avoid it.
My thinking is that if we change e.g. the conditions in (super-)test_step or set self.testcmd somewhere in pytorch we could miss that error handling in __init__ which could become plain wrong. Hence I like the Python mantra: "Better ask for forgiveness than permission" --> Handle the error case if and when it appears. Not avoid it in case someone broke something.

@akesandgren
Copy link
Contributor Author

Ok, makes sense. just tired of waiting for +30min of building before the code hits the problem :-)

@akesandgren akesandgren changed the title void pytorch test_step crashing if runtest is not a command avoid pytorch test_step crashing if runtest is not a command Feb 9, 2023
…ult instead of trying to avoid the problem explicitly.
@akesandgren
Copy link
Contributor Author

@Flamefire something like this?

Copy link
Contributor

@Flamefire Flamefire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes almost. I'd invert the condition though such that the "False" message is correct and the other a best guess.

Comment on lines 271 to 274
if self.cfg['runtest'] or self.cfg['runtest'] is None:
msg = "runtest must be set to a command to run."
else:
msg = "Do not set runtest to False, use --skip-test-step instead."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if self.cfg['runtest'] or self.cfg['runtest'] is None:
msg = "runtest must be set to a command to run."
else:
msg = "Do not set runtest to False, use --skip-test-step instead."
if self.cfg['runtest'] is False:
msg = "Do not set runtest to False, use --skip-test-step instead."
else:
msg = "Tests did not run. Make sure `runtest` is set to a command."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed it around now.

@akesandgren akesandgren requested a review from boegel February 9, 2023 16:09
@boegel boegel changed the title avoid pytorch test_step crashing if runtest is not a command avoid crash in test step of PyTorch easyblock if runtest is not a command Feb 15, 2023
@boegel boegel added the bug fix label Feb 15, 2023
@boegel boegel added this to the next release (4.7.1?) milestone Feb 15, 2023
Copy link
Member

@boegel boegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@boegel
Copy link
Member

boegel commented Mar 3, 2023

Test report by @boegel

Overview of tested easyconfigs (in order)

Build succeeded for 0 out of 1 (1 easyconfigs in total)
node3118.skitty.os - Linux RHEL 8.6, x86_64, Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz (skylake_avx512), Python 3.6.8
See https://gist.github.com/bddfff9d5a0629e1915c2ed89e583698 for a full test report.

@boegel
Copy link
Member

boegel commented Mar 3, 2023

The fact that some PyTorch tests (still) fail is irrelevant to the changes in this PR, so I'll go ahead and merge this...

@boegel boegel merged commit 5b64b4c into easybuilders:develop Mar 3, 2023
@akesandgren akesandgren deleted the 20230209084122_new_pr_pytorch branch March 3, 2023 10:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants