Skip to content

Broken Hardware: MOC-R4PCC04U36 / Barcelona #1692

@tssala23

Description

@tssala23

Node Details

  • Node Name: MOC-R4PCC04U36
  • Cluster Node Name (wrk-XY): moc-r4pcc04u36-nairr

Describe the issue the node is experiencing

Node has a lower number of CPU cores than other H100s, 192 compared to 512
lscpu output of u36:

Architecture:                            x86_64                                                                                                                                 
  CPU op-mode(s):                          32-bit, 64-bit
  Address sizes:                           52 bits physical, 57 bits virtual                                                                                                      
  Byte Order:                              Little Endian                                                                                                                        
  CPU(s):                                  192
  On-line CPU(s) list:                     0-191
  Vendor ID:                               AuthenticAMD
  BIOS Vendor ID:                          Advanced Micro Devices, Inc.
  Model name:                              AMD EPYC 9754 128-Core Processor
  BIOS Model name:                         AMD EPYC 9754 128-Core Processor
  CPU family:                              25
  Model:                                   160
  Thread(s) per core:                      2
  Core(s) per socket:                      48
  Socket(s):                               2
  Stepping:                                2
  Frequency boost:                         enabled
  CPU(s) scaling MHz:                      71%
  CPU max MHz:                             3100.3411
  CPU min MHz:                             1500.0000
  BogoMIPS:                                4499.71
  [...]
  NUMA node(s):                            2
  NUMA node0 CPU(s):                       0-47,96-143
  NUMA node1 CPU(s):                       48-95,144-191

Node Status

In cluster can be removed

  • Check this box once this node is no longer in a cluster from a user perspective and can be rebooted and wiped as needed.

Vendor Ticket Information

  • A ticket has been opened with a vendor concerning this hardware

  • Ticket Vendor:

  • Ticket Number (Update the title with this number): ``

  • Serial #: ``

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions