Skip to content

{tools}[GCCcore/11.3.0] rocm-smi v5.6.0#18278

Merged
boegel merged 3 commits into
easybuilders:developfrom
akesandgren:20230707095830_new_pr_rocm-smi560
Feb 27, 2024
Merged

{tools}[GCCcore/11.3.0] rocm-smi v5.6.0#18278
boegel merged 3 commits into
easybuilders:developfrom
akesandgren:20230707095830_new_pr_rocm-smi560

Conversation

@akesandgren

Copy link
Copy Markdown
Contributor

(created using eb --new-pr)

@akesandgren akesandgren added this to the 4.x milestone Jul 7, 2023
@akesandgren

Copy link
Copy Markdown
Contributor Author

Test report by @akesandgren
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
b-cn1605.hpc2n.umu.se - Linux Ubuntu 22.04, x86_64, AMD EPYC 7313 16-Core Processor, Python 3.10.6
See https://gist.github.com/akesandgren/3c7b543bbac4f6028e221011510bc7c7 for a full test report.

@branfosj

Copy link
Copy Markdown
Member

Test report by @branfosj
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
bear-pg0204u20a - Linux RHEL 8.6, x86_64, Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz (icelake), Python 3.6.8
See https://gist.github.com/branfosj/981b60a73efb209ec80d7de4e9963e16 for a full test report.

@jfgrimm

jfgrimm commented Jan 16, 2024

Copy link
Copy Markdown
Member

Test report by @jfgrimm
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
node051.viking2.yor.alces.network - Linux Rocky Linux 8.8, x86_64, AMD EPYC 7643 48-Core Processor, Python 3.6.8
See https://gist.github.com/jfgrimm/749b0fd560edbe86ac48186504d184c7 for a full test report.

@jfgrimm jfgrimm modified the milestones: 4.x, release after 4.9.0 Jan 16, 2024
@jfgrimm

jfgrimm commented Jan 16, 2024

Copy link
Copy Markdown
Member

@boegelbot: please test @ generoso

jfgrimm
jfgrimm previously approved these changes Jan 16, 2024

@jfgrimm jfgrimm left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@boegelbot

Copy link
Copy Markdown
Collaborator

@jfgrimm: Request for testing this PR well received on login1

PR test command 'EB_PR=18278 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs /opt/software/slurm/bin/sbatch --job-name test_PR_18278 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 12623

Test results coming soon (I hope)...

Details

- notification for comment with ID 1893586324 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot

Copy link
Copy Markdown
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
cns1 - Linux Rocky Linux 8.5, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/boegelbot/4f2be31d66a73980cb3212d07ccb4845 for a full test report.

@jfgrimm

jfgrimm commented Jan 16, 2024

Copy link
Copy Markdown
Member

@boegelbot: please test @ jsc-zen2

@boegelbot

Copy link
Copy Markdown
Collaborator

@jfgrimm: Request for testing this PR well received on jsczen2l1.int.jsc-zen2.easybuild-test.cluster

PR test command 'EB_PR=18278 EB_ARGS= EB_REPO=easybuild-easyconfigs /opt/software/slurm/bin/sbatch --mem-per-cpu=4000M --job-name test_PR_18278 --ntasks=8 ~/boegelbot/eb_from_pr_upload_jsc-zen2.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 4095

Test results coming soon (I hope)...

Details

- notification for comment with ID 1893590418 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot

Copy link
Copy Markdown
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
jsczen2c1.int.jsc-zen2.easybuild-test.cluster - Linux Rocky Linux 8.5, x86_64, AMD EPYC 7742 64-Core Processor (zen2), Python 3.6.8
See https://gist.github.com/boegelbot/a3a7109de3538988ddeadce5d1913d1d for a full test report.

@boegel

boegel commented Feb 8, 2024

Copy link
Copy Markdown
Member

@boegelbot please test @ jsc-zen3

@boegel

boegel commented Feb 8, 2024

Copy link
Copy Markdown
Member

Test report by @boegel
FAILED
Build succeeded for 0 out of 1 (1 easyconfigs in total)
node3120.skitty.os - Linux RHEL 8.8, x86_64, Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz, Python 3.6.8
See https://gist.github.com/boegel/44dc3a6685354f8fd776608e68ae2b08 for a full test report.

@boegelbot

Copy link
Copy Markdown
Collaborator

@boegel: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=18278 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_18278 --ntasks=8 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 3579

Test results coming soon (I hope)...

Details

- notification for comment with ID 1933737478 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot

Copy link
Copy Markdown
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
jsczen3c1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.3, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.18
See https://gist.github.com/boegelbot/80ae244a66604324094cd8112b9ec041 for a full test report.

@boegel

boegel commented Feb 8, 2024

Copy link
Copy Markdown
Member

@akesandgren I'm seeing a build error for refman.pdf, which suggests a missing (build) dependency:

[ 45%] Generating latex/refman.pdf
cd /tmp/vsc40023/easybuild_build/rocmsmi/5.6.0/GCCcore-11.3.0/easybuild_obj/rocm_smi/latex && make > /dev/null
make[3]: warning: jobserver unavailable: using -j1.  Add '+' to parent make rule.
make[3]: *** [Makefile:6: refman.pdf] Error 1
make[2]: *** [rocm_smi/CMakeFiles/docs.dir/build.make:76: rocm_smi/latex/refman.pdf] Error 2

see also ROCm/rocm_smi_lib#93

Any ideas here?

@akesandgren

Copy link
Copy Markdown
Contributor Author

What does your configure_step say about latex ? (probably uppercase)

@boegel

boegel commented Feb 8, 2024

Copy link
Copy Markdown
Member

What does your configure_step say about latex ? (probably uppercase)

-- Found Doxygen: /usr/bin/doxygen (found version "1.8.14") found components: doxygen dot
-- Found LATEX: /usr/bin/latex  found components: PDFLATEX

@akesandgren

Copy link
Copy Markdown
Contributor Author

Thought so, my test-env is clean of things like that.

But if it does find both then as far as I can see in the CMakeLists.txt it should do the right thing.

What does the 20 or so lines about the "Generating latex/refman.pdf" say? esp Generating latex/refman.tex and forward.

@boegel

boegel commented Feb 12, 2024

Copy link
Copy Markdown
Member

@akesandgren Here's some extra output (obtained using --parallel 1), not very helpful I think, we may need to strip out the > /dev/null part...

Details
[  3%] Generating latex/refman.tex
cd /tmp/easybuild_build/rocmsmi/5.6.0/GCCcore-11.3.0/easybuild_obj/rocm_smi && /usr/bin/doxygen /tmp/easybuild_build/rocmsmi/5.6.0/GCCcore-11.3.0/easybuild_obj/rocm_smi/Doxyfile
/tmp/easybuild_build/rocmsmi/5.6.0/GCCcore-11.3.0/rocm_smi_lib-rocm-5.6.0/include/rocm_smi/rocm_smi.h:887: warning: Member RSMI_GPU_METRICS_API_CONTENT_VER_1 (macro definition) of file rocm_smi.hh
 is not documented.
/tmp/easybuild_build/rocmsmi/5.6.0/GCCcore-11.3.0/rocm_smi_lib-rocm-5.6.0/include/rocm_smi/rocm_smi.h:888: warning: Member RSMI_GPU_METRICS_API_CONTENT_VER_2 (macro definition) of file rocm_smi.hh
 is not documented.
/tmp/easybuild_build/rocmsmi/5.6.0/GCCcore-11.3.0/rocm_smi_lib-rocm-5.6.0/include/rocm_smi/rocm_smi.h:889: warning: Member RSMI_GPU_METRICS_API_CONTENT_VER_3 (macro definition) of file rocm_smi.hh
 is not documented.
/tmp/easybuild_build/rocmsmi/5.6.0/GCCcore-11.3.0/rocm_smi_lib-rocm-5.6.0/include/rocm_smi/rocm_smi.h:892: warning: Member RSMI_NUM_HBM_INSTANCES (macro definition) of file rocm_smi.h is not docuu
mented.
/tmp/easybuild_build/rocmsmi/5.6.0/GCCcore-11.3.0/rocm_smi_lib-rocm-5.6.0/include/rocm_smi/rocm_smi.h:895: warning: Member CENTRIGRADE_TO_MILLI_CENTIGRADE (macro definition) of file rocm_smi.h iss
 not documented.
/tmp/easybuild_build/rocmsmi/5.6.0/GCCcore-11.3.0/rocm_smi_lib-rocm-5.6.0/include/rocm_smi/rocm_smi.h:2216: warning: Found unknown command `\utilization_counters'
/tmp/easybuild_build/rocmsmi/5.6.0/GCCcore-11.3.0/rocm_smi_lib-rocm-5.6.0/include/rocm_smi/rocm_smi.h:3501: warning: Found unknown command `\accessible'
Searching for include files...
Searching for example files...
Searching for images...
Searching for dot files...
Searching for msc files...
Searching for dia files...
Searching for files to exclude
Searching INPUT for files to process...
Reading and parsing tag files
Parsing files
Reading /tmp/easybuild_build/rocmsmi/5.6.0/GCCcore-11.3.0/rocm_smi_lib-rocm-5.6.0/README.md...
Preprocessing /tmp/easybuild_build/rocmsmi/5.6.0/GCCcore-11.3.0/rocm_smi_lib-rocm-5.6.0/include/rocm_smi/rocm_smi.h...
Parsing file /tmp/easybuild_build/rocmsmi/5.6.0/GCCcore-11.3.0/rocm_smi_lib-rocm-5.6.0/include/rocm_smi/rocm_smi.h...
Building group list...
Building directory list...
Building namespace list...
Building file list...
Building class list...
Associating documentation with classes...
Computing nesting relations for classes...
Building example list...
Searching for enumerations...
Searching for documented typedefs...
Searching for members imported via using declarations...
Searching for included using directives...
Searching for documented variables...
Building interface member list...
Building member list...
Searching for friends...
Searching for documented defines...
Computing class inheritance relations...
Computing class usage relations...
Flushing cached template relations that have become invalid...
Computing class relations...
Add enum values to enums...
Searching for member function documentation...
Creating members for template instances...
Building page list...
Search for main page...
Computing page relations...
Determining the scope of groups...
Sorting lists...
Freeing entry tree
Determining which enums are documented
Computing member relations...
Building full member lists recursively...
Adding members to member groups.
Computing member references...
Inheriting documentation...
Generating disk names...
Adding source references...
Adding xrefitems...
Sorting member lists...
Computing dependencies between directories...
Generating citations page...
Counting data structures...
Resolving user defined references...
Finding anchors and sections in the documentation...
Transferring function references...
Combining using relations...
Adding members to index pages...
Generating style sheet...
Generating search indices...
Generating example documentation...
Generating file sources...
Generating code for file rocm_smi.h...
Generating file documentation...
Generating docs for file rocm_smi.h...
Generating page documentation...
Generating docs for page md__tmp_easybuild_build_rocmsmi_5.6.0_GCCcore-11.3.0_rocm_smi_lib-rocm-5.6.0_README...
Generating docs for page deprecated...
Generating group documentation...
Generating class documentation...
Generating docs for compound id...
Generating docs for compound metrics_table_header_t...
Generating docs for compound rsmi_counter_value_t...
Generating docs for compound rsmi_error_count_t...
Generating docs for compound rsmi_evt_notification_data_t...
Generating docs for compound rsmi_freq_volt_region_t...
Generating docs for compound rsmi_frequencies_t...
Generating docs for compound rsmi_gpu_metrics_t...
Generating docs for compound rsmi_od_vddc_point_t...
Generating docs for compound rsmi_od_volt_curve_t...
Generating docs for compound rsmi_od_volt_freq_data_t...
Generating docs for compound rsmi_pcie_bandwidth_t...
Generating docs for compound rsmi_power_profile_status_t...
Generating docs for compound rsmi_process_info_t...
Generating docs for compound rsmi_range_t...
Generating docs for compound rsmi_retired_page_record_t...
Generating docs for compound rsmi_utilization_counter_t...
Generating docs for compound rsmi_version_t...
Generating namespace index...
Generating graph info page...
Generating directory documentation...
Generating index page...
Generating page index...
Generating module index...
Generating namespace index...
Generating namespace member index...
Generating annotated compound index...
Generating alphabetical compound index...
Generating hierarchical class index...
Generating member index...
Generating file index...
Generating file member index...
Generating example index...
finalizing index lists...
writing tag file...
lookup cache used 367/65536 hits=1962 misses=378
finished...
[  6%] Generating latex/refman.pdf
cd /tmp/easybuild_build/rocmsmi/5.6.0/GCCcore-11.3.0/easybuild_obj/rocm_smi/latex && make > /dev/null
make[3]: *** [Makefile:6: refman.pdf] Error 1
make[2]: *** [rocm_smi/CMakeFiles/docs.dir/build.make:76: rocm_smi/latex/refman.pdf] Error 2
make[2]: Leaving directory '/tmp/easybuild_build/rocmsmi/5.6.0/GCCcore-11.3.0/easybuild_obj'
make[1]: *** [CMakeFiles/Makefile2:372: rocm_smi/CMakeFiles/docs.dir/all] Error 2
make[1]: Leaving directory '/tmp/easybuild_build/rocmsmi/5.6.0/GCCcore-11.3.0/easybuild_obj'
make: *** [Makefile:159: all] Error 2

@boegel

boegel commented Feb 12, 2024

Copy link
Copy Markdown
Member

@akesandgren Patching out the > /dev/null reveals the actual problem:

cd /tmp/easybuild_build/rocmsmi/5.6.0/GCCcore-11.3.0/easybuild_obj/rocm_smi/latex && make
make[3]: Entering directory '/tmp/easybuild_build/rocmsmi/5.6.0/GCCcore-11.3.0/easybuild_obj/rocm_smi/latex'
rm -f *.ps *.dvi *.aux *.toc *.idx *.ind *.ilg *.log *.out *.brf *.blg *.bbl refman.pdf
pdflatex refman
This is pdfTeX, Version 3.14159265-2.6-1.40.19 (TeX Live 2018) (preloaded format=pdflatex)
 restricted \write18 enabled.
entering extended mode
(./refman.tex
LaTeX2e <2017-04-15>
Babel <3.17> and hyphenation patterns for 3 language(s) loaded.
(/usr/share/texlive/texmf-dist/tex/latex/base/book.cls
Document Class: book 2014/09/29 v1.4h Standard LaTeX document class
(/usr/share/texlive/texmf-dist/tex/latex/base/bk10.clo))
(/usr/share/texlive/texmf-dist/tex/latex/base/fixltx2e.sty

Package fixltx2e Warning: fixltx2e is not required with releases after 2015
(fixltx2e)                All fixes are now in the LaTeX kernel.
(fixltx2e)                See the latexrelease package for details.

) (/usr/share/texlive/texmf-dist/tex/latex/tools/calc.sty) (./doxygen.sty
(/usr/share/texlive/texmf-dist/tex/latex/base/alltt.sty)
(/usr/share/texlive/texmf-dist/tex/latex/tools/array.sty)
(/usr/share/texlive/texmf-dist/tex/latex/float/float.sty)
(/usr/share/texlive/texmf-dist/tex/latex/base/ifthen.sty)
(/usr/share/texlive/texmf-dist/tex/latex/tools/verbatim.sty)
(/usr/share/texlive/texmf-dist/tex/latex/xcolor/xcolor.sty
(/usr/share/texlive/texmf-dist/tex/latex/graphics-cfg/color.cfg)
(/usr/share/texlive/texmf-dist/tex/latex/graphics-def/pdftex.def)
(/usr/share/texlive/texmf-dist/tex/latex/colortbl/colortbl.sty))
(/usr/share/texlive/texmf-dist/tex/latex/tools/longtable.sty)

! LaTeX Error: File `tabu.sty' not found.

Type X to quit or <RETURN> to proceed,
or enter new name. (Default extension: sty)

Enter file name:
! Emergency stop.
<read *>

l.14 \RequirePackage
                    {tabularx}^^M
!  ==> Fatal error occurred, no output PDF file produced!
Transcript written on refman.log.
make[3]: *** [Makefile:6: refman.pdf] Error 1

This tells me it's better to simply disable building the docs, which may require a patch?

@akesandgren

Copy link
Copy Markdown
Contributor Author

Yeah will need a patch, fixing it up...

@akesandgren

Copy link
Copy Markdown
Contributor Author

This should work for you @boegel

@boegel

boegel commented Feb 27, 2024

Copy link
Copy Markdown
Member

Test report by @boegel
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
node3129.skitty.os - Linux RHEL 8.8, x86_64, Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz, Python 3.6.8
See https://gist.github.com/boegel/c674914b2598f1ed6a877c8d7789b1bc for a full test report.

@boegel boegel left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@boegel

boegel commented Feb 27, 2024

Copy link
Copy Markdown
Member

Going in, thanks @akesandgren!

@boegel boegel merged commit 3aa5823 into easybuilders:develop Feb 27, 2024
@akesandgren akesandgren deleted the 20230707095830_new_pr_rocm-smi560 branch February 27, 2024 11:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants