Fix issues with PyTorch test-results (XML files) parsing and add tests #3803

Flamefire · 2025-06-26T14:37:30Z

Select some test reports from real runs, trim them down a bit and parse them in the test.

Provide a script for automatic cleanup of test reports:

Shorten unused values and filenames
Trim or remove stdout/stderr
Remove whitespace
Format XML

The script is fully deterministic so rerunning it won't change the files. The idea is to have realistic looking files for debugging, provide enough reference to original files (e.g. just trim, not replace filenames) and get the file size down

The test files are mostly randomly selected with a few inclusions and modifications to cover (almost) all of the parsing code and reproduce past failures (tested with initial version of the parsing code).
I also created a few artificial, files to run into most error conditions checked by the parser.

In a recent run I discovered a too strict condition (tests can be skipped before/after rerunning) which I fixed and included a test-testcase

@boegel what do you think about this? I could try reducing the number of test files but I think the current 30 files aren't too much given they are all text only.

boegel · 2025-07-30T17:24:41Z

@boegelbot please test @ jsc-zen3
EB_ARGS="PyTorch-2.6.0-foss-2024a.eb --installpath /tmp/$USER/pr3803"
CORE_CNT=16

boegelbot · 2025-07-30T17:33:09Z

@boegel: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=3803 EB_ARGS="PyTorch-2.6.0-foss-2024a.eb --installpath /tmp/$USER/pr3803" EB_REPO=easybuild-easyblocks EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_3803 --ntasks="16" ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

exit code: 0
output:

Submitted batch job 7417

Test results coming soon (I hope)...

- notification for comment with ID 3137222772 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

boegel · 2025-07-30T21:07:58Z

@boegelbot please test @ jsc-zen3-a100
EB_ARGS="PyTorch-2.1.2-foss-2023a-CUDA-12.1.1.eb --installpath /tmp/$USER/pr3803"
CORE_CNT=16

boegelbot · 2025-07-30T21:13:09Z

@boegel: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=3803 EB_ARGS="PyTorch-2.1.2-foss-2023a-CUDA-12.1.1.eb --installpath /tmp/$USER/pr3803" EB_REPO=easybuild-easyblocks EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_3803 --ntasks="16" ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

exit code: 0
output:

Submitted batch job 7421

Test results coming soon (I hope)...

- notification for comment with ID 3137822827 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

boegelbot · 2025-07-31T05:00:54Z

Test report by @boegelbot

Overview of tested easyconfigs (in order)

FAIL (build issue) PyTorch-2.1.2-foss-2023a-CUDA-12.1.1.eb (partial log available at https://gist.github.com/boegelbot/bc2c041a79236f93ce33c6eb96f0d943)

Build succeeded for 0 out of 1 (1 easyconfigs in total)
jsczen3c1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.6, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.21
See https://gist.github.com/boegelbot/9849fbdfa2700b1743353ad90f5c8f8c for a full test report.

boegel · 2025-07-31T05:46:14Z

@Flamefire Are we now just detecting more failing tests?

== 2025-07-31 05:00:52,030 build_log.py:434 WARNING 3 test failures, 0 test errors (out of 209700):
	dynamo/test_functions (1 failed, 165 passed, 2 skipped, 2 rerun)
	dynamo/test_dynamic_shapes (2 failed, 2028 passed, 52 skipped, 32 xfailed, 4 rerun)

Flamefire · 2025-07-31T06:57:25Z

Not by this PR at least, maybe a prior one. Does the EC include PyTorch-2.1.0_skip-dynamo-test_predistpatch.patch? That should fix the 2nd failure according to my notes.

And/or we can increase the max-failures to like 5...

boegel · 2025-07-31T12:36:37Z

Not by this PR at least, maybe a prior one. Does the EC include PyTorch-2.1.0_skip-dynamo-test_predistpatch.patch? That should fix the 2nd failure according to my notes.

Yes, since the easyconfigs used by the bot are the ones in develop.

And/or we can increase the max-failures to like 5...

@Flamefire I'm actually in favor of bumping the default value of max_failed_tests to 10.

That seems like a reasonable number to me, since a warning will always be printed as soon as there's a single failed test.

It's pretty trivial to be more strict locally by introducing a hook that lower max_failed_tests to 5, 2, or even 0...

If that makes sense, please open a separate trivial PR for that (just so it stands out in changelog, and so we can add some motivation to the PR description for that change).

Flamefire · 2025-07-31T13:31:38Z

It's pretty trivial to be more strict locally by introducing a hook that lower max_failed_tests to 5, 2, or even 0...

I'd argue the other way round: It is pretty trivial to increase it to 10 ;-)

If that makes sense, please open a separate trivial PR for that

I'd just increased the number of this single EC. But we could as well change the default in the easyblock from 0 to 10 and cleanup the easyconfigs.
Current PyTorch 2 ECs:

p/PyTorch/PyTorch-2.0.1-foss-2022a.eb:max_failed_tests = 2
p/PyTorch/PyTorch-2.0.1-foss-2022b.eb:max_failed_tests = 3
p/PyTorch/PyTorch-2.1.2-foss-2022a.eb:max_failed_tests = 2
p/PyTorch/PyTorch-2.1.2-foss-2022b.eb:max_failed_tests = 2
p/PyTorch/PyTorch-2.1.2-foss-2023a-CUDA-12.1.1.eb:max_failed_tests = 2
p/PyTorch/PyTorch-2.1.2-foss-2023a.eb:max_failed_tests = 2
p/PyTorch/PyTorch-2.1.2-foss-2023b.eb:max_failed_tests = 2
p/PyTorch/PyTorch-2.3.0-foss-2023b.eb:max_failed_tests = 6
p/PyTorch/PyTorch-2.6.0-foss-2024a.eb:max_failed_tests = 16

We seem to be fine with "3" (including the current failure here) for the majority and I'd keep the number as low as possible even though I've been more generous with the new 2.6 EC.

Not by this PR at least, maybe a prior one. Does the EC include PyTorch-2.1.0_skip-dynamo-test_predistpatch.patch? That should fix the 2nd failure according to my notes.

Yes, since the easyconfigs used by the bot are the ones in develop.

Then we'd need to look at the logs to see why it failed. But 2/2030 seems to be OK to ignore.

arielzn · 2025-07-31T13:58:17Z

i've launched the install of PyTorch-2.6.0-foss-2024a.eb from easybuilders/easybuild-easyconfigs#22824 which was just merged yesterday.
i've put the usual --ignore-test-failure to let the install pass in case we are over the limit set, but i've got quite some more tests failed, i'm on eb 5.1.1:

WARNING: Test failure ignored: An error was raised during test step: 'Failing because not all failed tests could be determined. Tests failed to start, crashed or the test accounting in the PyTorch EasyBlock needs updating!
Missing: profiler/test_torch_tidy
You can check the test failures (in the log) manually and if they are harmless, use --ignore-test-failures to make the test step pass.
284 test failures, 0 test errors (out of 254109):
Failed tests (suites/files):
 dynamo/test_dynamic_shapes (4 failed, 1734 passed, 76 skipped, 0 errors)
 dynamo/test_inline_inbuilt_nn_modules (4 failed, 1213 passed, 30 skipped, 0 errors)
 dynamo/test_misc (3 failed, 522 passed, 22 skipped, 0 errors)
 functorch/test_aotdispatch (19 failed, 1626 passed, 424 skipped, 0 errors)
 functorch/test_ops (41 failed, 7411 passed, 2692 skipped, 0 errors)
 functorch/test_vmap (7 failed, 1807 passed, 307 skipped, 0 errors)
 inductor/test_aot_inductor_arrayref (2 failed, 83 passed, 98 skipped, 0 errors)
 inductor/test_binary_folding (1 failed, 2 passed, 0 skipped, 0 errors)
 inductor/test_cpu_select_algorithm (126 failed, 0 passed, 1211 skipped, 0 errors)
 inductor/test_mkldnn_pattern_matcher (2 failed, 102 passed, 10 skipped, 0 errors)
 inductor/test_torchinductor (1 failed, 718 passed, 61 skipped, 0 errors)
 inductor/test_torchinductor_dynamic_shapes (1 failed, 651 passed, 124 skipped, 0 errors)
 inductor/test_torchinductor_opinfo (2 failed, 3014 passed, 590 skipped, 0 errors)
 nn/test_convolution (7 failed, 359 passed, 222 skipped, 0 errors)
 test_ao_sparsity (3 failed, 85 passed, 0 skipped, 0 errors)
 test_autocast (2 failed, 12 passed, 6 skipped, 0 errors)
 test_expanded_weights (12 failed, 165 passed, 12 skipped, 0 errors)
 test_jit (2 failed, 2082 passed, 134 skipped, 0 errors)
 test_jit_autocast (1 failed, 8 passed, 45 skipped, 0 errors)
 test_jit_legacy (2 failed, 1663 passed, 105 skipped, 0 errors)
 test_jit_llga_fuser (2 failed, 104 passed, 1 skipped, 0 errors)
 test_jit_profiling (2 failed, 2082 passed, 134 skipped, 0 errors)
 test_mkldnn (13 failed, 32 passed, 2 skipped, 0 errors)
 test_mkldnn_fusion (1 failed, 7 passed, 0 skipped, 0 errors)
 test_modules (20 failed, 2926 passed, 659 skipped, 0 errors)
 test_ops (2 failed, 24469 passed, 9746 skipped, 0 errors)
 test_quantization (2 failed, 1042 passed, 72 skipped, 0 errors)
Could not count failed tests for the following test suites/files:
 profiler/test_torch_tidy (Undetected or did not run properly)'

i guess this is way more than what you are getting right ?

Flamefire · 2025-07-31T14:02:54Z

i guess this is way more than what you are getting right ?

Indeed. Might be some real issue there. Can you attach the log of the test step? Better in the easyconfig PR to keep it sorted and visible for others in the future.

boegel · 2025-07-31T14:23:26Z

It's pretty trivial to be more strict locally by introducing a hook that lower max_failed_tests to 5, 2, or even 0...

I'd argue the other way round: It is pretty trivial to increase it to 10 ;-)

The main difference being that it's a bit unreasonable to expect everyone to introduce a hook to slightly increase the tolerance for failing tests, as opposed to letting people who are paying close attention and know how what to do to deep dive into what's causing those failing tests (like you) opt-in to being more strict.

I really feel 10 is a pretty reasonable default, 2 or 3 seems really strict to me, especially since we know there frequently are flaky tests.

boegelbot · 2025-08-01T15:55:26Z

Test report by @boegelbot

Overview of tested easyconfigs (in order)

FAIL (build issue) PyTorch-2.6.0-foss-2024a.eb (partial log available at https://gist.github.com/boegelbot/1ffc849d3d44c8d9a14cb56c7703476e)

Build succeeded for 0 out of 1 (1 easyconfigs in total)
jsczen3c4.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.6, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.21
See https://gist.github.com/boegelbot/173029da55cbc0802070925510a8e8f1 for a full test report.

boegel · 2025-08-02T07:04:17Z

12 test failures, 0 test errors (out of 254957):
Failed tests (suites/files):
	dynamo/test_dynamic_shapes (3 failed, 1735 passed, 76 skipped, 0 errors)
	dynamo/test_inline_inbuilt_nn_modules (3 failed, 1214 passed, 30 skipped, 0 errors)
	dynamo/test_misc (3 failed, 522 passed, 22 skipped, 0 errors)
	inductor/test_cpu_select_algorithm (2 failed, 124 passed, 1211 skipped, 0 errors)
	inductor/test_torchinductor_opinfo (1 failed, 3016 passed, 590 skipped, 0 errors)
Counted failures of tests from the following test suites/files that are not contained in the summary output of PyTorch:
inductor/test_torchinductor_opinfo (at easybuild/easyblocks/pytorch.py:747 in test_step)

@Flamefire More than before?

Flamefire · 2025-08-04T10:00:45Z

The failing install is

Failing because there were unexpected failures detected: inductor/test_torchinductor_opinfo

Likely due to

The following tests failed and then succeeded when run in a new process['test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCPU::test_comprehensive_nn_functional_max_pool2d_cpu_float16']

For that the success isn't detected after detecting the failure. (A test where reports have success and failure should be merged as a success result by this parsing logic)
I'd need the XML folder and/or need to double check why that was missed.
So looks like a fluke only.

The code parses class names if they start with the prefix 'test.' and then trims a prefix consisting of the common part. That common part specifically excludes the 'test.' part of the prefix which hence needs to be re-added to match with `startswith`.

Select some test reports from real runs, trim them down a bit and parse them in the test. Provide script for automatic cleanup of test reports.

We can't rely on the "tests"-attribute as a test might appear in the errors- and failures-count attribute. Seen in the provided test case within the 2nd, duplicated <testcase> after the first contained a `<failure>`: <error message="failed on teardown with "AssertionError: Scalars are not equal![...]

Current failures include > Parsing the test result files missed the following failed suites: distributed/algorithms/quantization/test_quantization The suite name as contained in the XML results is: > dist-nccl/distributed/algorithms/quantization/test_quantization So if the suite name isn't found as-is (fast due to dict hashing) also check for the name without the variant (rare). To avoid false-positives limit to variants starting with `dist-`.

…dependency options Most of the options have a True/False value which we should set to False/0 when we don't have/use that dependency. This ensures that a) no system lib will be found and b) no warning will be shown. Also update the list with options added or removed until PyTorch 2.7

As PyTorch is sensitive to specific NCCL versions one approach is to use it as a build dependency only and add an rpath to it after copying it into a (non-standard) folder inside the PyTorch module. This is similar to the PyPI package that depends on various nvidia-packages and adds relative rpaths to ensure they are used when loading the torch package/libraries.

This reverts commit 906d8cf.

Some are missing all tags except for 'time'. Just ignore those.

PyTorch reruns single tests by skipping portions of the test before that. If those other tests don't succeed the parser will error out during merging as it will see a test that was skipped and failed. Handle that by ignoring the skipped test result during merge.

In PyTorch `test_testing.py` it runs a subtest via Python code, i.e. as `python -c` This shows up in the test report path and as not having a `file` attribute for the <testcase> tag. `determine_suite_name` fails in `reported_file = os.path.basename(file_attribute.pop())` with > KeyError: 'pop from an empty set' Simply ignore those.

boegel · 2025-11-19T12:03:31Z

@boegelbot please test @ jsc-zen3-a100
EB_ARGS="PyTorch-2.1.2-foss-2023a-CUDA-12.1.1.eb --installpath /tmp/$USER/pr3803"
CORE_CNT=16

boegelbot · 2025-11-19T12:13:09Z

@boegel: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=3803 EB_ARGS="PyTorch-2.1.2-foss-2023a-CUDA-12.1.1.eb --installpath /tmp/$USER/pr3803" EB_CONTAINER= EB_REPO=easybuild-easyblocks EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_3803 --ntasks="16" --partition=jsczen3g --gres=gpu:1 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

exit code: 0
output:

Submitted batch job 8837

Test results coming soon (I hope)...

- notification for comment with ID 3552321124 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

boegelbot · 2025-11-19T20:49:47Z

Test report by @boegelbot

Overview of tested easyconfigs (in order)

SUCCESS PyTorch-2.1.2-foss-2023a-CUDA-12.1.1.eb

Build succeeded for 1 out of 1 (total: 8 hours 35 mins 43 secs) (1 easyconfigs in total)
jsczen3g1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.6, x86_64, AMD EPYC-Milan Processor (zen3), 1 x NVIDIA NVIDIA A100 80GB PCIe, 580.95.05, Python 3.9.21
See https://gist.github.com/boegelbot/9abee1bc32c8eaa580d2bec0e64b5423 for a full test report.

Flamefire force-pushed the pytorch-log-parse-test branch 2 times, most recently from 998aabb to 00d4922 Compare June 26, 2025 15:22

boegel added enhancement tests labels Jul 2, 2025

boegel added this to the release after 5.1.1 milestone Jul 2, 2025

Flamefire changed the title ~~Add test for PyTorch test-results (XML files) parsing~~ Fix issues with PyTorch test-results (XML files) parsing and add tests Jul 29, 2025

Flamefire force-pushed the pytorch-log-parse-test branch from 3753ef5 to af8491e Compare July 29, 2025 10:30

Flamefire mentioned this pull request Jul 30, 2025

{tools}[foss/2024a] PyTorch v2.6.0, parameterized v0.9.0, optree v0.14.1, ... easybuilders/easybuild-easyconfigs#22824

Merged

verdurin mentioned this pull request Aug 28, 2025

{ai}[foss/2024a] accelerate v1.10.0 easybuilders/easybuild-easyconfigs#23665

Merged

Flamefire mentioned this pull request Sep 4, 2025

{ai}[foss/2023b] PyTorch v2.3.0 w/ CUDA 12.4.0 easybuilders/easybuild-easyconfigs#23553

Open

1 task

boegel modified the milestones: next release (5.1.2), release after 5.1.2 (5.2.0?) Sep 10, 2025

Flamefire added 3 commits September 19, 2025 10:07

Use dict.items

ca278d8

Allow rerun and skipped tests

3ff46c0

Fix trimming test case name

a609de2

The code parses class names if they start with the prefix 'test.' and then trims a prefix consisting of the common part. That common part specifically excludes the 'test.' part of the prefix which hence needs to be re-added to match with `startswith`.

Flamefire added 10 commits September 19, 2025 10:07

Add test for PyTorch log parsing

f3699b4

Select some test reports from real runs, trim them down a bit and parse them in the test. Provide script for automatic cleanup of test reports.

Gracefully handle empty test result files

420d850

Also clean errror-tags

b472ced

Ignore error on formatting empty XML

2210890

Isolate against more user env variables

804a048

Revert "Symlink NCCL library when added as a build dependency"

a986775

This reverts commit 906d8cf.

Flamefire force-pushed the pytorch-log-parse-test branch from 499dc76 to a986775 Compare September 19, 2025 08:21

Flamefire mentioned this pull request Sep 19, 2025

{ai}[foss/2024a] PyTorch v2.7.1 w/ CUDA 12.6.0 easybuilders/easybuild-easyconfigs#23923

Open

4 tasks

This was referenced Oct 13, 2025

{ai}[foss/2022b] PyTorch v2.1.2 w/ CUDA 12.0.0 easybuilders/easybuild-easyconfigs#20155

Closed

{ai,lib}[GCCcore/12.2.0,foss/2022b] PyTorch v2.1.2, NCCL v2.18.3 w/ CUDA 12.0.0 easybuilders/easybuild-easyconfigs#20520

Merged

Flamefire added 4 commits October 20, 2025 11:13

Use raise-from for better error reporting

4feb3a9

Don't fail for incomplete testcase tags

c98ab49

Some are missing all tags except for 'time'. Just ignore those.

Update for 2.8+

bacadb8

Flamefire mentioned this pull request Oct 25, 2025

{ai}[foss/2024a] PyTorch v2.9.1 w/ CUDA 12.6.0 easybuilders/easybuild-easyconfigs#24365

Open

3 tasks

Flamefire added 4 commits November 7, 2025 17:02

Add CLI arg to sort suites by custom attribute

174b7bb

Show number of failed tests in list of failed suites

79fa0c5

Merge branch 'develop' into pytorch-log-parse-test

f10aa17

Merge branch 'develop' into pytorch-log-parse-test

bcac200

Flamefire force-pushed the pytorch-log-parse-test branch from e56a39c to bcac200 Compare December 1, 2025 07:51

Fix issues with PyTorch test-results (XML files) parsing and add tests #3803

Are you sure you want to change the base?

Fix issues with PyTorch test-results (XML files) parsing and add tests #3803

Uh oh!

Conversation

Flamefire commented Jun 26, 2025

Uh oh!

boegel commented Jul 30, 2025

Uh oh!

boegelbot commented Jul 30, 2025

Uh oh!

boegel commented Jul 30, 2025

Uh oh!

boegelbot commented Jul 30, 2025

Uh oh!

boegelbot commented Jul 31, 2025

Overview of tested easyconfigs (in order)

Uh oh!

boegel commented Jul 31, 2025

Uh oh!

Flamefire commented Jul 31, 2025

Uh oh!

boegel commented Jul 31, 2025

Uh oh!

Flamefire commented Jul 31, 2025

Uh oh!

arielzn commented Jul 31, 2025

Uh oh!

Flamefire commented Jul 31, 2025

Uh oh!

boegel commented Jul 31, 2025

Uh oh!

boegelbot commented Aug 1, 2025

Overview of tested easyconfigs (in order)

Uh oh!

boegel commented Aug 2, 2025

Uh oh!

Flamefire commented Aug 4, 2025

Uh oh!

boegel commented Nov 19, 2025

Uh oh!

boegelbot commented Nov 19, 2025

Uh oh!

boegelbot commented Nov 19, 2025

Overview of tested easyconfigs (in order)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants