-
Notifications
You must be signed in to change notification settings - Fork 305
Fix issues with PyTorch test-results (XML files) parsing and add tests #3803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
998aabb to
00d4922
Compare
3753ef5 to
af8491e
Compare
|
@boegelbot please test @ jsc-zen3 |
|
@boegel: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de PR test command '
Test results coming soon (I hope)... - notification for comment with ID 3137222772 processed Message to humans: this is just bookkeeping information for me, |
|
@boegelbot please test @ jsc-zen3-a100 |
|
@boegel: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de PR test command '
Test results coming soon (I hope)... - notification for comment with ID 3137822827 processed Message to humans: this is just bookkeeping information for me, |
|
Test report by @boegelbot Overview of tested easyconfigs (in order)
Build succeeded for 0 out of 1 (1 easyconfigs in total) |
|
@Flamefire Are we now just detecting more failing tests? |
|
Not by this PR at least, maybe a prior one. Does the EC include And/or we can increase the max-failures to like 5... |
Yes, since the easyconfigs used by the bot are the ones in
@Flamefire I'm actually in favor of bumping the default value of That seems like a reasonable number to me, since a warning will always be printed as soon as there's a single failed test. It's pretty trivial to be more strict locally by introducing a hook that lower If that makes sense, please open a separate trivial PR for that (just so it stands out in changelog, and so we can add some motivation to the PR description for that change). |
I'd argue the other way round: It is pretty trivial to increase it to 10 ;-)
I'd just increased the number of this single EC. But we could as well change the default in the easyblock from 0 to 10 and cleanup the easyconfigs.
We seem to be fine with "3" (including the current failure here) for the majority and I'd keep the number as low as possible even though I've been more generous with the new 2.6 EC.
Then we'd need to look at the logs to see why it failed. But 2/2030 seems to be OK to ignore. |
|
i've launched the install of i guess this is way more than what you are getting right ? |
Indeed. Might be some real issue there. Can you attach the log of the test step? Better in the easyconfig PR to keep it sorted and visible for others in the future. |
The main difference being that it's a bit unreasonable to expect everyone to introduce a hook to slightly increase the tolerance for failing tests, as opposed to letting people who are paying close attention and know how what to do to deep dive into what's causing those failing tests (like you) opt-in to being more strict. I really feel 10 is a pretty reasonable default, 2 or 3 seems really strict to me, especially since we know there frequently are flaky tests. |
|
Test report by @boegelbot Overview of tested easyconfigs (in order)
Build succeeded for 0 out of 1 (1 easyconfigs in total) |
@Flamefire More than before? |
|
The failing install is
Likely due to
For that the success isn't detected after detecting the failure. (A test where reports have success and failure should be merged as a success result by this parsing logic) |
The code parses class names if they start with the prefix 'test.' and then trims a prefix consisting of the common part. That common part specifically excludes the 'test.' part of the prefix which hence needs to be re-added to match with `startswith`.
Select some test reports from real runs, trim them down a bit and parse them in the test. Provide script for automatic cleanup of test reports.
We can't rely on the "tests"-attribute as a test might appear in the errors- and failures-count attribute. Seen in the provided test case within the 2nd, duplicated <testcase> after the first contained a `<failure>`: <error message="failed on teardown with "AssertionError: Scalars are not equal![...]
Current failures include > Parsing the test result files missed the following failed suites: distributed/algorithms/quantization/test_quantization The suite name as contained in the XML results is: > dist-nccl/distributed/algorithms/quantization/test_quantization So if the suite name isn't found as-is (fast due to dict hashing) also check for the name without the variant (rare). To avoid false-positives limit to variants starting with `dist-`.
…dependency options Most of the options have a True/False value which we should set to False/0 when we don't have/use that dependency. This ensures that a) no system lib will be found and b) no warning will be shown. Also update the list with options added or removed until PyTorch 2.7
As PyTorch is sensitive to specific NCCL versions one approach is to use it as a build dependency only and add an rpath to it after copying it into a (non-standard) folder inside the PyTorch module. This is similar to the PyPI package that depends on various nvidia-packages and adds relative rpaths to ensure they are used when loading the torch package/libraries.
This reverts commit 906d8cf.
499dc76 to
a986775
Compare
Some are missing all tags except for 'time'. Just ignore those.
PyTorch reruns single tests by skipping portions of the test before that. If those other tests don't succeed the parser will error out during merging as it will see a test that was skipped and failed. Handle that by ignoring the skipped test result during merge.
In PyTorch `test_testing.py` it runs a subtest via Python code, i.e. as `python -c` This shows up in the test report path and as not having a `file` attribute for the <testcase> tag. `determine_suite_name` fails in `reported_file = os.path.basename(file_attribute.pop())` with > KeyError: 'pop from an empty set' Simply ignore those.
|
@boegelbot please test @ jsc-zen3-a100 |
|
@boegel: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de PR test command '
Test results coming soon (I hope)... - notification for comment with ID 3552321124 processed Message to humans: this is just bookkeeping information for me, |
|
Test report by @boegelbot Overview of tested easyconfigs (in order)
Build succeeded for 1 out of 1 (total: 8 hours 35 mins 43 secs) (1 easyconfigs in total) |
e56a39c to
bcac200
Compare
Select some test reports from real runs, trim them down a bit and parse them in the test.
Provide a script for automatic cleanup of test reports:
The script is fully deterministic so rerunning it won't change the files. The idea is to have realistic looking files for debugging, provide enough reference to original files (e.g. just trim, not replace filenames) and get the file size down
The test files are mostly randomly selected with a few inclusions and modifications to cover (almost) all of the parsing code and reproduce past failures (tested with initial version of the parsing code).
I also created a few artificial, files to run into most error conditions checked by the parser.
In a recent run I discovered a too strict condition (tests can be skipped before/after rerunning) which I fixed and included a test-testcase
@boegel what do you think about this? I could try reducing the number of test files but I think the current 30 files aren't too much given they are all text only.