Skip to content

Conversation

@Flamefire
Copy link
Contributor

Select some test reports from real runs, trim them down a bit and parse them in the test.

Provide a script for automatic cleanup of test reports:

  • Shorten unused values and filenames
  • Trim or remove stdout/stderr
  • Remove whitespace
  • Format XML

The script is fully deterministic so rerunning it won't change the files. The idea is to have realistic looking files for debugging, provide enough reference to original files (e.g. just trim, not replace filenames) and get the file size down

The test files are mostly randomly selected with a few inclusions and modifications to cover (almost) all of the parsing code and reproduce past failures (tested with initial version of the parsing code).
I also created a few artificial, files to run into most error conditions checked by the parser.

In a recent run I discovered a too strict condition (tests can be skipped before/after rerunning) which I fixed and included a test-testcase

@boegel what do you think about this? I could try reducing the number of test files but I think the current 30 files aren't too much given they are all text only.

@Flamefire Flamefire force-pushed the pytorch-log-parse-test branch 2 times, most recently from 998aabb to 00d4922 Compare June 26, 2025 15:22
@boegel boegel added this to the release after 5.1.1 milestone Jul 2, 2025
@Flamefire Flamefire changed the title Add test for PyTorch test-results (XML files) parsing Fix issues with PyTorch test-results (XML files) parsing and add tests Jul 29, 2025
@Flamefire Flamefire force-pushed the pytorch-log-parse-test branch from 3753ef5 to af8491e Compare July 29, 2025 10:30
@boegel
Copy link
Member

boegel commented Jul 30, 2025

@boegelbot please test @ jsc-zen3
EB_ARGS="PyTorch-2.6.0-foss-2024a.eb --installpath /tmp/$USER/pr3803"
CORE_CNT=16

@boegelbot
Copy link

@boegel: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=3803 EB_ARGS="PyTorch-2.6.0-foss-2024a.eb --installpath /tmp/$USER/pr3803" EB_REPO=easybuild-easyblocks EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_3803 --ntasks="16" ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 7417

Test results coming soon (I hope)...

- notification for comment with ID 3137222772 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegel
Copy link
Member

boegel commented Jul 30, 2025

@boegelbot please test @ jsc-zen3-a100
EB_ARGS="PyTorch-2.1.2-foss-2023a-CUDA-12.1.1.eb --installpath /tmp/$USER/pr3803"
CORE_CNT=16

@boegelbot
Copy link

@boegel: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=3803 EB_ARGS="PyTorch-2.1.2-foss-2023a-CUDA-12.1.1.eb --installpath /tmp/$USER/pr3803" EB_REPO=easybuild-easyblocks EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_3803 --ntasks="16" ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 7421

Test results coming soon (I hope)...

- notification for comment with ID 3137822827 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link

Test report by @boegelbot

Overview of tested easyconfigs (in order)

Build succeeded for 0 out of 1 (1 easyconfigs in total)
jsczen3c1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.6, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.21
See https://gist.github.com/boegelbot/9849fbdfa2700b1743353ad90f5c8f8c for a full test report.

@boegel
Copy link
Member

boegel commented Jul 31, 2025

@Flamefire Are we now just detecting more failing tests?

== 2025-07-31 05:00:52,030 build_log.py:434 WARNING 3 test failures, 0 test errors (out of 209700):
	dynamo/test_functions (1 failed, 165 passed, 2 skipped, 2 rerun)
	dynamo/test_dynamic_shapes (2 failed, 2028 passed, 52 skipped, 32 xfailed, 4 rerun)

@Flamefire
Copy link
Contributor Author

Not by this PR at least, maybe a prior one. Does the EC include PyTorch-2.1.0_skip-dynamo-test_predistpatch.patch? That should fix the 2nd failure according to my notes.

And/or we can increase the max-failures to like 5...

@boegel
Copy link
Member

boegel commented Jul 31, 2025

Not by this PR at least, maybe a prior one. Does the EC include PyTorch-2.1.0_skip-dynamo-test_predistpatch.patch? That should fix the 2nd failure according to my notes.

Yes, since the easyconfigs used by the bot are the ones in develop.

And/or we can increase the max-failures to like 5...

@Flamefire I'm actually in favor of bumping the default value of max_failed_tests to 10.

That seems like a reasonable number to me, since a warning will always be printed as soon as there's a single failed test.

It's pretty trivial to be more strict locally by introducing a hook that lower max_failed_tests to 5, 2, or even 0...

If that makes sense, please open a separate trivial PR for that (just so it stands out in changelog, and so we can add some motivation to the PR description for that change).

@Flamefire
Copy link
Contributor Author

It's pretty trivial to be more strict locally by introducing a hook that lower max_failed_tests to 5, 2, or even 0...

I'd argue the other way round: It is pretty trivial to increase it to 10 ;-)

If that makes sense, please open a separate trivial PR for that

I'd just increased the number of this single EC. But we could as well change the default in the easyblock from 0 to 10 and cleanup the easyconfigs.
Current PyTorch 2 ECs:

p/PyTorch/PyTorch-2.0.1-foss-2022a.eb:max_failed_tests = 2
p/PyTorch/PyTorch-2.0.1-foss-2022b.eb:max_failed_tests = 3
p/PyTorch/PyTorch-2.1.2-foss-2022a.eb:max_failed_tests = 2
p/PyTorch/PyTorch-2.1.2-foss-2022b.eb:max_failed_tests = 2
p/PyTorch/PyTorch-2.1.2-foss-2023a-CUDA-12.1.1.eb:max_failed_tests = 2
p/PyTorch/PyTorch-2.1.2-foss-2023a.eb:max_failed_tests = 2
p/PyTorch/PyTorch-2.1.2-foss-2023b.eb:max_failed_tests = 2
p/PyTorch/PyTorch-2.3.0-foss-2023b.eb:max_failed_tests = 6
p/PyTorch/PyTorch-2.6.0-foss-2024a.eb:max_failed_tests = 16

We seem to be fine with "3" (including the current failure here) for the majority and I'd keep the number as low as possible even though I've been more generous with the new 2.6 EC.

Not by this PR at least, maybe a prior one. Does the EC include PyTorch-2.1.0_skip-dynamo-test_predistpatch.patch? That should fix the 2nd failure according to my notes.

Yes, since the easyconfigs used by the bot are the ones in develop.

Then we'd need to look at the logs to see why it failed. But 2/2030 seems to be OK to ignore.

@arielzn
Copy link

arielzn commented Jul 31, 2025

i've launched the install of PyTorch-2.6.0-foss-2024a.eb from easybuilders/easybuild-easyconfigs#22824 which was just merged yesterday.
i've put the usual --ignore-test-failure to let the install pass in case we are over the limit set, but i've got quite some more tests failed, i'm on eb 5.1.1:

WARNING: Test failure ignored: An error was raised during test step: 'Failing because not all failed tests could be determined. Tests failed to start, crashed or the test accounting in the PyTorch EasyBlock needs updating!
Missing: profiler/test_torch_tidy
You can check the test failures (in the log) manually and if they are harmless, use --ignore-test-failures to make the test step pass.
284 test failures, 0 test errors (out of 254109):
Failed tests (suites/files):
 dynamo/test_dynamic_shapes (4 failed, 1734 passed, 76 skipped, 0 errors)
 dynamo/test_inline_inbuilt_nn_modules (4 failed, 1213 passed, 30 skipped, 0 errors)
 dynamo/test_misc (3 failed, 522 passed, 22 skipped, 0 errors)
 functorch/test_aotdispatch (19 failed, 1626 passed, 424 skipped, 0 errors)
 functorch/test_ops (41 failed, 7411 passed, 2692 skipped, 0 errors)
 functorch/test_vmap (7 failed, 1807 passed, 307 skipped, 0 errors)
 inductor/test_aot_inductor_arrayref (2 failed, 83 passed, 98 skipped, 0 errors)
 inductor/test_binary_folding (1 failed, 2 passed, 0 skipped, 0 errors)
 inductor/test_cpu_select_algorithm (126 failed, 0 passed, 1211 skipped, 0 errors)
 inductor/test_mkldnn_pattern_matcher (2 failed, 102 passed, 10 skipped, 0 errors)
 inductor/test_torchinductor (1 failed, 718 passed, 61 skipped, 0 errors)
 inductor/test_torchinductor_dynamic_shapes (1 failed, 651 passed, 124 skipped, 0 errors)
 inductor/test_torchinductor_opinfo (2 failed, 3014 passed, 590 skipped, 0 errors)
 nn/test_convolution (7 failed, 359 passed, 222 skipped, 0 errors)
 test_ao_sparsity (3 failed, 85 passed, 0 skipped, 0 errors)
 test_autocast (2 failed, 12 passed, 6 skipped, 0 errors)
 test_expanded_weights (12 failed, 165 passed, 12 skipped, 0 errors)
 test_jit (2 failed, 2082 passed, 134 skipped, 0 errors)
 test_jit_autocast (1 failed, 8 passed, 45 skipped, 0 errors)
 test_jit_legacy (2 failed, 1663 passed, 105 skipped, 0 errors)
 test_jit_llga_fuser (2 failed, 104 passed, 1 skipped, 0 errors)
 test_jit_profiling (2 failed, 2082 passed, 134 skipped, 0 errors)
 test_mkldnn (13 failed, 32 passed, 2 skipped, 0 errors)
 test_mkldnn_fusion (1 failed, 7 passed, 0 skipped, 0 errors)
 test_modules (20 failed, 2926 passed, 659 skipped, 0 errors)
 test_ops (2 failed, 24469 passed, 9746 skipped, 0 errors)
 test_quantization (2 failed, 1042 passed, 72 skipped, 0 errors)
Could not count failed tests for the following test suites/files:
 profiler/test_torch_tidy (Undetected or did not run properly)'

i guess this is way more than what you are getting right ?

@Flamefire
Copy link
Contributor Author

i guess this is way more than what you are getting right ?

Indeed. Might be some real issue there. Can you attach the log of the test step? Better in the easyconfig PR to keep it sorted and visible for others in the future.

@boegel
Copy link
Member

boegel commented Jul 31, 2025

It's pretty trivial to be more strict locally by introducing a hook that lower max_failed_tests to 5, 2, or even 0...

I'd argue the other way round: It is pretty trivial to increase it to 10 ;-)

The main difference being that it's a bit unreasonable to expect everyone to introduce a hook to slightly increase the tolerance for failing tests, as opposed to letting people who are paying close attention and know how what to do to deep dive into what's causing those failing tests (like you) opt-in to being more strict.

I really feel 10 is a pretty reasonable default, 2 or 3 seems really strict to me, especially since we know there frequently are flaky tests.

@boegelbot
Copy link

Test report by @boegelbot

Overview of tested easyconfigs (in order)

Build succeeded for 0 out of 1 (1 easyconfigs in total)
jsczen3c4.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.6, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.21
See https://gist.github.com/boegelbot/173029da55cbc0802070925510a8e8f1 for a full test report.

@boegel
Copy link
Member

boegel commented Aug 2, 2025

12 test failures, 0 test errors (out of 254957):
Failed tests (suites/files):
	dynamo/test_dynamic_shapes (3 failed, 1735 passed, 76 skipped, 0 errors)
	dynamo/test_inline_inbuilt_nn_modules (3 failed, 1214 passed, 30 skipped, 0 errors)
	dynamo/test_misc (3 failed, 522 passed, 22 skipped, 0 errors)
	inductor/test_cpu_select_algorithm (2 failed, 124 passed, 1211 skipped, 0 errors)
	inductor/test_torchinductor_opinfo (1 failed, 3016 passed, 590 skipped, 0 errors)
Counted failures of tests from the following test suites/files that are not contained in the summary output of PyTorch:
inductor/test_torchinductor_opinfo (at easybuild/easyblocks/pytorch.py:747 in test_step)

@Flamefire More than before?

@Flamefire
Copy link
Contributor Author

The failing install is

Failing because there were unexpected failures detected: inductor/test_torchinductor_opinfo

Likely due to

The following tests failed and then succeeded when run in a new process['test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCPU::test_comprehensive_nn_functional_max_pool2d_cpu_float16']

For that the success isn't detected after detecting the failure. (A test where reports have success and failure should be merged as a success result by this parsing logic)
I'd need the XML folder and/or need to double check why that was missed.
So looks like a fluke only.

The code parses class names if they start with the prefix 'test.'
and then trims a prefix consisting of the common part.
That common part specifically excludes the 'test.' part of the prefix
which hence needs to be re-added to match with `startswith`.
Select some test reports from real runs, trim them down a bit and parse
them in the test.
Provide script for automatic cleanup of test reports.
We can't rely on the "tests"-attribute as a test might appear in the errors- and failures-count attribute.
Seen in the provided test case within the 2nd, duplicated <testcase>
after the first contained a `<failure>`:
  <error message="failed on teardown with &quot;AssertionError: Scalars are not equal![...]
Current failures include
> Parsing the test result files missed the following failed suites: distributed/algorithms/quantization/test_quantization

The suite name as contained in the XML results is:
> dist-nccl/distributed/algorithms/quantization/test_quantization

So if the suite name isn't found as-is (fast due to dict hashing)
also check for the name without the variant (rare).
To avoid false-positives limit to variants starting with `dist-`.
…dependency options

Most of the options have a True/False value which we should set to
False/0 when we don't have/use that dependency.
This ensures that a) no system lib will be found and b) no warning will
be shown.

Also update the list with options added or removed until PyTorch 2.7
As PyTorch is sensitive to specific NCCL versions one approach is to use
it as a build dependency only and add an rpath to it after copying it
into a (non-standard) folder inside the PyTorch module.
This is similar to the PyPI package that depends on various
nvidia-packages and adds relative rpaths to ensure they are used when
loading the torch package/libraries.
Some are missing all tags except for 'time'.
Just ignore those.
PyTorch reruns single tests by skipping portions of the test before that.
If those other tests don't succeed the parser will error out during
merging as it will see a test that was skipped and failed.
Handle that by ignoring the skipped test result during merge.
In PyTorch `test_testing.py` it runs a subtest via Python code, i.e. as `python -c`
This shows up in the test report path and as not having a `file`
attribute for the <testcase> tag.
`determine_suite_name` fails in `reported_file = os.path.basename(file_attribute.pop())` with
> KeyError: 'pop from an empty set'

Simply ignore those.
@boegel
Copy link
Member

boegel commented Nov 19, 2025

@boegelbot please test @ jsc-zen3-a100
EB_ARGS="PyTorch-2.1.2-foss-2023a-CUDA-12.1.1.eb --installpath /tmp/$USER/pr3803"
CORE_CNT=16

@boegelbot
Copy link

@boegel: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=3803 EB_ARGS="PyTorch-2.1.2-foss-2023a-CUDA-12.1.1.eb --installpath /tmp/$USER/pr3803" EB_CONTAINER= EB_REPO=easybuild-easyblocks EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_3803 --ntasks="16" --partition=jsczen3g --gres=gpu:1 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 8837

Test results coming soon (I hope)...

- notification for comment with ID 3552321124 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link

Test report by @boegelbot

Overview of tested easyconfigs (in order)

  • SUCCESS PyTorch-2.1.2-foss-2023a-CUDA-12.1.1.eb

Build succeeded for 1 out of 1 (total: 8 hours 35 mins 43 secs) (1 easyconfigs in total)
jsczen3g1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.6, x86_64, AMD EPYC-Milan Processor (zen3), 1 x NVIDIA NVIDIA A100 80GB PCIe, 580.95.05, Python 3.9.21
See https://gist.github.com/boegelbot/9abee1bc32c8eaa580d2bec0e64b5423 for a full test report.

@Flamefire Flamefire force-pushed the pytorch-log-parse-test branch from e56a39c to bcac200 Compare December 1, 2025 07:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants