Skip to content

Conversation

@simrankaurb
Copy link
Contributor

This PR aligns blueprint configurations for a3ultra and a4high machines.
(This is a draft PR)

Submission Checklist

NOTE: Community submissions can take up to 2 weeks to be reviewed.

Please take the following actions before submitting this pull request.

  • Fork your PR branch from the Toolkit "develop" branch (not main)
  • Test all changes with pre-commit in a local branch #
  • Confirm that "make tests" passes all tests
  • Add or modify unit tests to cover code changes
  • Ensure that unit test coverage remains above 80%
  • Update all applicable documentation
  • Follow Cluster Toolkit Contribution guidelines #

@simrankaurb simrankaurb requested review from a team and samskillman as code owners November 4, 2025 07:24
@simrankaurb simrankaurb marked this pull request as draft November 4, 2025 07:24
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @simrankaurb, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on enhancing the consistency and maintainability of the a3ultra and a4high machine blueprints. The primary goal is achieved by standardizing variable names and generalizing module IDs across both configurations, making them more uniform and less machine-specific. Additionally, minor script enhancements and formatting adjustments have been applied to improve robustness and readability of the configuration files.

Highlights

  • Standardized Variable Names: Variable names such as a3u_cluster_size and a4h_cluster_size have been generalized to cluster_size, and similar changes were applied to reservation and spot VM related variables, improving consistency across blueprints.
  • Generalized Module IDs: Module IDs like slurm-a3ultra-image, a3ultra-slurm-net-0, a3_ultra_nodeset, and their a4high counterparts have been simplified to generic names such as slurm-image, slurm-net-0, and nodeset, making the configurations more reusable.
  • Script and Configuration Enhancements: Minor improvements include adding the -x flag for command tracing in install_slurm.sh, including apt update -y before CUDA toolkit installation, and refining Ansible task names and indentation for better readability and robustness.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@simrankaurb simrankaurb added the release-chore To not include into release notes label Nov 4, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request does a great job of aligning the blueprint configurations for a3ultra and a4high machines, making them more consistent and easier to maintain. The variable and module ID renaming is a welcome improvement.

However, I've found a couple of critical YAML syntax errors related to indentation in both files. These errors would cause the Ansible playbooks embedded in the YAML to fail. Please see the specific comments for details and suggestions to fix them. Once these are addressed, the PR should be in good shape.

Comment on lines +545 to +547
- name: Reload SystemD
ansible.builtin.systemd:
daemon_reload: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The indentation of this handler definition is incorrect, which will cause a YAML parsing error. The list item (- name: ...) under handlers must be indented. It seems to have been incorrectly de-indented.

              - name: Reload SystemD
                ansible.builtin.systemd:
                  daemon_reload: true

Comment on lines +531 to +544
- name: Deploy /etc/profile.d/nccl-gib.sh
ansible.builtin.copy:
dest: /etc/profile.d/nccl-gib.sh
content: |
# Load NCCL/gIB environment
if [ -f "/usr/local/gib/scripts/set_nccl_env.sh" ]; then
source /usr/local/gib/scripts/set_nccl_env.sh
fi
# Ensure /usr/local/gib/lib64 is in LD_LIBRARY_PATH
if [ -d "/usr/local/gib/lib64" ]; then
export LD_LIBRARY_PATH="/usr/local/gib/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}"
fi
mode: '0644'
# Ensure /usr/local/gib/lib64 is in LD_LIBRARY_PATH
if [ -d "/usr/local/gib/lib64" ]; then
export LD_LIBRARY_PATH="/usr/local/gib/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}"
fi
mode: '0644'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The indentation of this task definition is incorrect. The list item (- name: ...) under tasks must be indented. It has been de-indented to the same level as the tasks: key, which is invalid YAML syntax.

              - name: Deploy /etc/profile.d/nccl-gib.sh
                ansible.builtin.copy:
                  dest: /etc/profile.d/nccl-gib.sh
                  content: |
                    # Load NCCL/gIB environment
                    if [ -f "/usr/local/gib/scripts/set_nccl_env.sh" ]; then
                      source /usr/local/gib/scripts/set_nccl_env.sh
                    fi

                    # Ensure /usr/local/gib/lib64 is in LD_LIBRARY_PATH
                    if [ -d "/usr/local/gib/lib64" ]; then
                      export LD_LIBRARY_PATH="/usr/local/gib/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}"
                    fi
                  mode: '0644'

Comment on lines +546 to +548
- name: Reload SystemD
ansible.builtin.systemd:
daemon_reload: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The indentation of this handler definition is incorrect. The list item (- name: ...) under handlers must be indented. It has been de-indented to the same level as the handlers: key, which is invalid YAML syntax.

              - name: Reload SystemD
                ansible.builtin.systemd:
                  daemon_reload: true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-chore To not include into release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant