Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
4f703f2
added crio-size and coredns restart
Katakam-Rakesh Dec 3, 2025
30e5e09
introduce polling mechanism for slurm controller
VrindaMarwah Dec 4, 2025
9a737a3
Merge branch 'dell:pub/k8s_telemetry' into pub/k8s_telemetry
VrindaMarwah Dec 4, 2025
5f4e360
introduce polling mechanism for slurm in login node and login compile…
VrindaMarwah Dec 4, 2025
09ea233
Merge branch 'pub/k8s_telemetry' of github.com:VrindaMarwah/omnia int…
VrindaMarwah Dec 4, 2025
646654a
modify install_cuda_toolkit path
VrindaMarwah Dec 4, 2025
84fa4f6
added arguments for kube-controller-manager
Katakam-Rakesh Dec 4, 2025
db66540
Merge branch 'dell:pub/k8s_telemetry' into pub/k8s_telemetry
Katakam-Rakesh Dec 4, 2025
058e80b
Fix for Docker credential validation
pullan1 Dec 4, 2025
63e3179
moved import to top
pullan1 Dec 4, 2025
89a147d
added arguments for kube-controller-manager
Katakam-Rakesh Dec 4, 2025
78da47d
Merge branch 'pub/k8s_telemetry' of github.com:Katakam-Rakesh/omnia i…
Katakam-Rakesh Dec 4, 2025
1f748e9
Merge pull request #3774 from pullan1/pub/k8s_telemetry
snarthan Dec 4, 2025
4d42794
address pr comments
VrindaMarwah Dec 4, 2025
15c68e3
added arguments for kubelet config and coredns config map
Katakam-Rakesh Dec 5, 2025
237add6
fix cloud-init
Katakam-Rakesh Dec 5, 2025
5514ff9
updated cloud-int
Katakam-Rakesh Dec 5, 2025
a5aa022
Merge branch 'dell:pub/k8s_telemetry' into pub/k8s_telemetry
Katakam-Rakesh Dec 5, 2025
caef853
added msg for informing cloud-init completion
Katakam-Rakesh Dec 5, 2025
6d5671c
Merge branch 'pub/k8s_telemetry' of github.com:Katakam-Rakesh/omnia i…
Katakam-Rakesh Dec 5, 2025
c7eff98
Update omnia_config.yml
Katakam-Rakesh Dec 5, 2025
e0b722b
move polling script to templates dir
VrindaMarwah Dec 5, 2025
090adef
omnia core and auth 1.0 update
abhishek-sa1 Dec 5, 2025
2694291
Update deploy_auth_service.yml
abhishek-sa1 Dec 5, 2025
e5f0400
Enable idrac telemtry service for iDRAC IP's
nethramg Dec 5, 2025
83958c7
Update deploy_auth_service.yml
abhishek-sa1 Dec 5, 2025
4198053
Ansible lint fixes
nethramg Dec 5, 2025
49bbf28
auth container update
abhishek-sa1 Dec 5, 2025
40fae1c
Update main.yml
abhishek-sa1 Dec 5, 2025
f3b31e9
Update deploy_auth_service.yml
abhishek-sa1 Dec 5, 2025
b82ae3b
Merge pull request #3776 from Katakam-Rakesh/pub/k8s_telemetry
snarthan Dec 5, 2025
6619433
Update deploy_auth_service.yml
abhishek-sa1 Dec 5, 2025
11875d3
updated kube-proxy config map
Katakam-Rakesh Dec 5, 2025
77bc3ec
Merge branch 'pub/k8s_telemetry' of github.com:Katakam-Rakesh/omnia i…
Katakam-Rakesh Dec 5, 2025
5ae9b2e
Merge pull request #3779 from Katakam-Rakesh/pub/k8s_telemetry
snarthan Dec 5, 2025
0fbc9b8
Enable runtime image download for diskless nodes
balajikumaran-c-s Dec 5, 2025
584693a
Enable runtime image download for diskless nodes
balajikumaran-c-s Dec 5, 2025
267aa5a
Update omnia.sh
abhishek-sa1 Dec 5, 2025
42dc017
Few more changes
nethramg Dec 6, 2025
3744329
Merge branch 'dell:pub/k8s_telemetry' into pub/k8s_telemetry
VrindaMarwah Dec 6, 2025
52d3aed
Remove timezone variable from input/provision_config.yml
priti-parate Dec 8, 2025
f8a9fbb
lint fix and adding thread safe measures
nethramg Dec 8, 2025
0900f2e
Adding the idrac ips variable corrected
nethramg Dec 8, 2025
2ea6375
Merge branch 'dell:pub/k8s_telemetry' into pub/k8s_telemetry
nethramg Dec 8, 2025
96fe13a
remove timezone from schema
priti-parate Dec 8, 2025
62064ac
pylint fixes
nethramg Dec 8, 2025
41244b7
Merge pull request #3781 from priti-parate/pub/k8s_telemetry
abhishek-sa1 Dec 8, 2025
06f92ff
update the kubelet files
sakshi-singla-1735 Dec 8, 2025
8a62c3d
updating csi poll value
sakshi-singla-1735 Dec 8, 2025
0e6d6e5
removed csi values part
sakshi-singla-1735 Dec 8, 2025
b808a94
update kubelet
sakshi-singla-1735 Dec 8, 2025
7054326
updating csi poll value
sakshi-singla-1735 Dec 8, 2025
310e3e0
Merge pull request #3782 from sakshi-singla-1735/pub/k8s_telemetry
snarthan Dec 8, 2025
9683a35
Adding the login node details in pxe mapping file
nethramg Dec 8, 2025
ccaf8d2
Merge branch 'dell:pub/k8s_telemetry' into pub/k8s_telemetry
abhishek-sa1 Dec 8, 2025
c79404e
Merge pull request #3778 from nethramg/pub/k8s_telemetry
abhishek-sa1 Dec 8, 2025
eecd401
Timezone Fix
Milisha-Gupta Dec 8, 2025
ba9d3f9
Timezone Fix
Milisha-Gupta Dec 8, 2025
838536c
testing changes
Milisha-Gupta Dec 8, 2025
87cadeb
testing changes
Milisha-Gupta Dec 8, 2025
e285df2
testing changes
Milisha-Gupta Dec 8, 2025
f2edf2d
Update validate_oim_timezone.yml
Milisha-Gupta Dec 8, 2025
3aae5e8
Merge pull request #3783 from Milisha-Gupta/pub/k8s_telemetry
abhishek-sa1 Dec 8, 2025
4236181
Merge branch 'dell:pub/k8s_telemetry' into pub/k8s_telemetry
abhishek-sa1 Dec 8, 2025
61a46db
Merge pull request #3777 from abhishek-sa1/pub/k8s_telemetry
abhishek-sa1 Dec 9, 2025
187f53f
Merge pull request #3773 from VrindaMarwah/pub/k8s_telemetry
jagadeeshnv Dec 10, 2025
d85bd97
Merge branch 'pub/k8s_telemetry' into pub/k8s_telemetry
abhishek-sa1 Dec 10, 2025
9a49953
Update ci-group-slurm_control_node_x86_64.yaml.j2
balajikumaran-c-s Dec 10, 2025
ffe6ae0
Update ci-group-login_node_x86_64.yaml.j2
balajikumaran-c-s Dec 10, 2025
7f26e39
Update ci-group-login_compiler_node_x86_64.yaml.j2
balajikumaran-c-s Dec 10, 2025
127d1a2
Update ci-group-slurm_node_x86_64.yaml.j2
balajikumaran-c-s Dec 10, 2025
fa1f7e3
Update ci-group-service_kube_node_x86_64.yaml.j2
Katakam-Rakesh Dec 11, 2025
e383191
Update storage_config.json
jagadeeshnv Dec 11, 2025
3aa8d55
Merge pull request #3792 from Katakam-Rakesh/pub/k8s_telemetry
snarthan Dec 11, 2025
630a629
Merge pull request #3780 from balajikumaran-c-s/pub/k8s_telemetry
abhishek-sa1 Dec 11, 2025
483c874
Merge branch 'staging' into pub/k8s_telemetry
Katakam-Rakesh Dec 11, 2025
32623fe
Update process_parallel.py
Katakam-Rakesh Dec 11, 2025
e154148
Merge pull request #2 from Katakam-Rakesh/pub/k8s_telemetry
Katakam-Rakesh Dec 11, 2025
f1d5746
Update nfs_client.yml
jagadeeshnv Dec 11, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -86,12 +86,18 @@
"type": "string",
"pattern": "^(|/?([a-zA-Z0-9._-]+/)*[a-zA-Z0-9._-]+\\.yaml)$"

},
"k8s_crio_storage_size": {
"description": "Storage size for CRI-O in Gigabytes only (example: 10G, 15G, 100G)",
"type": "string",
"pattern": "^[1-9][0-9]*G$"
}
},
"required": [
"cluster_name",
"k8s_cni",
"k8s_service_addresses"
"k8s_service_addresses",
"k8s_crio_storage_size"
],
"allOf": [
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,11 +31,11 @@
},
"server_share_path": {
"type": "string",
"pattern": "^/(?:[^/]+(?:/[^/]+)*)?$"
"pattern": "^/(?:[^/]+(?:/[^/]+)*)?/?$"
},
"client_share_path": {
"type": "string",
"pattern": "^/(?:[^/]+(?:/[^/]+)*)?$"
"pattern": "^/(?:[^/]+(?:/[^/]+)*)?/?$"
},
"client_mount_options": {
"type": "string"
Expand All @@ -54,4 +54,4 @@
"required": [
"nfs_client_params"
]
}
}
211 changes: 211 additions & 0 deletions common/library/modules/enable_telemetry_service.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,211 @@
# Copyright 2025 Dell Inc. or its subsidiaries. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""
Dell iDRAC Telemetry - FAST Enable All Reports.

Optimized with parallel processing and connection pooling.
Supports iDRAC 9 and iDRAC 10.
"""

import logging
import os
import time
from concurrent.futures import ThreadPoolExecutor, as_completed
from typing import Dict, List, Optional, Any, Tuple
import requests
import urllib3
from ansible.module_utils.basic import AnsibleModule

# Disable SSL warnings
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
#####################################################
# ALL 37 TELEMETRY REPORTS (iDRAC 9 & 10)
#####################################################

ALL_REPORTS: List[str] = [
"AggregationMetrics", "CPUMemMetrics", "CPURegisters",
"CPUSensor", "MemoryMetrics", "MemorySensor",
"NVMeSMARTData", "StorageDiskSMARTData", "StorageSensor",
"NICSensor", "NICStatistics", "FCPortStatistics",
"FCSensor", "SFPTransceiver", "InfiniBandStatistics",
"PSUMetrics", "PowerMetrics", "PowerStatistics",
"FanSensor", "ThermalMetrics", "ThermalSensor",
"GPUMetrics", "GPUStatistics", "GPUSubsystemPower", "FPGASensor",
"Sensor", "SerialLog", "SystemUsage", "x86SubsystemPower",
"OME-ISM-MetricsData", "OME-PMP-Power-B",
"OME-SFPTransceiver-Metrics", "OME-Telemetry-FCPortStatistics",
"OME-Telemetry-GPU-Aggregate", "OME-Telemetry-GPU-Aggregate-1",
"OME-Telemetry-NIC-Statistics", "OME-Telemetry-SMARTData",
]

def get_report_definitions(
ip_address: str,
user: str,
password: str,
session: requests.Session,
timeout: int,
) -> Optional[List[str]]:
"""Fetch available report definitions from iDRAC."""
url = f"https://{ip_address}/redfish/v1/TelemetryService/MetricReportDefinitions"
try:
response = session.get(
url,
auth=(user, password),
verify=False,
timeout=timeout,
)
if response.status_code == 200:
data = response.json()
return [
member['@odata.id'].split('/')[-1]
for member in data.get('Members', [])
]
except (requests.exceptions.RequestException, ValueError, KeyError):
pass
return None


def configure_server(
ip_address: str,
user: str,
password: str,
timeout: int,
) -> Dict[str, Any]:
"""Configure telemetry for a single server."""
session = requests.Session()
session.verify = False

try:
base_url = f"https://{ip_address}/redfish/v1/TelemetryService"

# Enable Telemetry Service
response = session.patch(
base_url,
json={"ServiceEnabled": True},
auth=(user, password),
timeout=timeout,
)

if response.status_code not in [200, 202, 204]:
return {
"ip": ip_address,
"status": "failed",
"message": f"Service HTTP {response.status_code}"
}

# Get available reports
available_reports = get_report_definitions(
ip_address, user, password, session, timeout
)
if not available_reports:
return {
"ip": ip_address,
"status": "failed",
"message": "Cannot get reports"
}

return {
"ip": ip_address,
"status": "success",
"enabled_reports": available_reports,
}

except requests.exceptions.RequestException as e:
return {
"ip": ip_address,
"status": "failed",
"message": str(e)
}

finally:
try:
session.close()
except OSError as close_error:
logging.warning("Warning: failed to close session for %s: %s", ip_address, close_error)

def run_parallel(
idrac_ips: List[str],
username: str,
password: str,
parallel_jobs: int,
timeout: int,
) -> Tuple[List[Dict], List[Dict]]:
"""Run telemetry configuration in parallel."""
success_results = []
failed_results = []

try:
workers = max(1, min(os.cpu_count() + 1, parallel_jobs))
with ThreadPoolExecutor(max_workers=workers) as executor:
future_to_ip = {
executor.submit(
configure_server, ip, username, password, timeout
): ip for ip in idrac_ips
}

for future in as_completed(future_to_ip):
result = future.result()
if result.get("status") == "success":
success_results.append(result)
else:
failed_results.append(result)
except (OSError, ValueError, requests.exceptions.RequestException) as exc:
logging.warning("Error during parallel execution: %s", exc)

return success_results, failed_results

def main():
"""Main function for Ansible module."""
module_args = {
"idrac_ips": {"type": "list", "required": True, "elements": "str"},
"username": {"type": "str", "required": True},
"password": {"type": "str", "required": True, "no_log": True},
"parallel_jobs": {"type": "int", "default": 64},
"timeout": {"type": "int", "default": 30},
}

module = AnsibleModule(argument_spec=module_args, supports_check_mode=True)

idrac_ips = module.params["idrac_ips"]
username = module.params["username"]
password = module.params["password"]
parallel_jobs = module.params["parallel_jobs"]
timeout = module.params["timeout"]

if module.check_mode:
module.exit_json(changed=False, msg="Check mode - no changes made")

if not idrac_ips:
module.exit_json(msg="No iDRAC IPs provided")

start_time = time.time()
success_results, failed_results = run_parallel(
idrac_ips, username, password, parallel_jobs, timeout
)

duration = time.time() - start_time

module.exit_json(
changed=len(success_results) > 0,
success_count=len(success_results),
failed_count=len(failed_results),
duration_seconds=round(duration, 2),
success_results=success_results,
failed_results=failed_results,
msg=f"Telemetry enabled on {len(success_results)}/{len(idrac_ips)} servers"
)

if __name__ == "__main__":
main()
2 changes: 2 additions & 0 deletions common/vars/openchami_image_cmd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ rhel_aarch64_base_image_name: "rhel-aarch64_base"
base_image_commands:
- "dracut --add 'dmsquash-live livenet network-manager' --install '/usr/lib/systemd/systemd-sysroot-fstab-check' --kver $(basename /lib/modules/*) -N -f --logfile /tmp/dracut.log 2>/dev/null" # noqa: yaml[line-length]
- "echo DRACUT LOG:; cat /tmp/dracut.log"
- "rm -f /var/lib/rpm/__db*"
- "rpmdb --rebuilddb"

# x86_64 compute commands
default_x86_64_compute_commands:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@
done
fi

- path: /root/install_cuda_toolkit.sh
- path: /usr/local/bin/install_cuda_toolkit.sh
permissions: '0755'
content: |
#!/bin/bash
Expand Down Expand Up @@ -162,22 +162,29 @@
{{ ip_name_map[key] }} {{ key }}
{% endfor %}

- path: /usr/local/bin/check_slurm_controller_status.sh
owner: root:root
permissions: '{{ file_mode_755 }}'
content: |
{{ lookup('template', 'templates/slurm/check_slurm_controller_status.sh.j2') | indent(12) }}

runcmd:
- /usr/local/bin/set-ssh.sh
- /root/install_cuda_toolkit.sh
- /usr/local/bin/install_cuda_toolkit.sh
- groupadd -r {{ slurm_group_name }}
- useradd -r -g {{ slurm_group_name }} -d {{ home_dir }} -s /sbin/nologin {{ user }}

- mkdir -p /var/log/slurm /var/run/slurm /var/spool /var/lib/slurm /etc/slurm/epilog.d /etc/munge
- mkdir -p /var/log/slurm /var/run/slurm /var/spool /var/lib/slurm /etc/slurm/epilog.d /etc/munge /var/log/track
- echo "{{ cloud_init_nfs_path }}/$(hostname -s)/var/log/slurm /var/log/slurm nfs defaults,_netdev 0 0" >> /etc/fstab
- echo "{{ cloud_init_nfs_path }}/$(hostname -s)/var/spool /var/spool nfs defaults,_netdev 0 0" >> /etc/fstab
- echo "{{ cloud_init_nfs_path }}/$(hostname -s)/etc/slurm/epilog.d /etc/slurm/epilog.d nfs defaults,_netdev 0 0" >> /etc/fstab
- echo "{{ cloud_init_nfs_path }}/$(hostname -s)/var/spool /var/spool nfs defaults,_netdev 0 0" >> /etc/fstab
- echo "{{ cloud_init_nfs_path }}/$(hostname -s)/etc/munge /etc/munge nfs defaults,_netdev 0 0" >> /etc/fstab
- echo "{{ trackfile_nfs_path }} /var/log/track nfs defaults,_netdev 0 0" >> /etc/fstab
- chmod {{ file_mode }} /etc/fstab
- mount -a
- yes | cp /etc/slurm/epilog.d/slurmd.service /usr/lib/systemd/system/
- /usr/local/bin/check_slurm_controller_status.sh
- chown -R {{ user }}:{{ slurm_group_name }} /var/log/slurm
- chown -R {{ user }}:{{ slurm_group_name }} /var/run/slurm
- chown -R {{ user }}:{{ slurm_group_name }} /var/spool
Expand Down Expand Up @@ -307,3 +314,4 @@

- /root/ldms_sampler.sh
{% endif %}
- echo "Cloud-Init has completed successfully."
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@
fi


- path: /root/install_cuda_toolkit.sh
- path: /usr/local/bin/install_cuda_toolkit.sh
permissions: '0755'
content: |
#!/bin/bash
Expand Down Expand Up @@ -164,22 +164,30 @@
{{ ip_name_map[key] }} {{ key }}
{% endfor %}

- path: /usr/local/bin/check_slurm_controller_status.sh
owner: root:root
permissions: '{{ file_mode_755 }}'
content: |
{{ lookup('template', 'templates/slurm/check_slurm_controller_status.sh.j2') | indent(12) }}

runcmd:
- /usr/local/bin/set-ssh.sh
- /root/install_cuda_toolkit.sh
- /usr/local/bin/install_cuda_toolkit.sh
- groupadd -r {{ slurm_group_name }}
- useradd -r -g {{ slurm_group_name }} -d {{ home_dir }} -s /sbin/nologin {{ user }}

- mkdir -p /var/log/slurm /var/run/slurm /var/spool /var/lib/slurm /etc/slurm/epilog.d /etc/munge
- mkdir -p /var/log/slurm /var/run/slurm /var/spool /var/lib/slurm /etc/slurm/epilog.d /etc/munge /cert /var/log/track
- echo "{{ cloud_init_nfs_path }}/cert /cert nfs defaults,_netdev 0 0" >> /etc/fstab
- echo "{{ cloud_init_nfs_path }}/$(hostname -s)/var/log/slurm /var/log/slurm nfs defaults,_netdev 0 0" >> /etc/fstab
- echo "{{ cloud_init_nfs_path }}/$(hostname -s)/var/spool /var/spool nfs defaults,_netdev 0 0" >> /etc/fstab
- echo "{{ cloud_init_nfs_path }}/$(hostname -s)/etc/slurm/epilog.d /etc/slurm/epilog.d nfs defaults,_netdev 0 0" >> /etc/fstab
- echo "{{ cloud_init_nfs_path }}/$(hostname -s)/var/spool /var/spool nfs defaults,_netdev 0 0" >> /etc/fstab
- echo "{{ cloud_init_nfs_path }}/$(hostname -s)/etc/munge /etc/munge nfs defaults,_netdev 0 0" >> /etc/fstab
- echo "{{ trackfile_nfs_path }} /var/log/track nfs defaults,_netdev 0 0" >> /etc/fstab
- chmod {{ file_mode }} /etc/fstab
- mount -a
- yes | cp /etc/slurm/epilog.d/slurmd.service /usr/lib/systemd/system/
- /usr/local/bin/check_slurm_controller_status.sh
- chown -R {{ user }}:{{ slurm_group_name }} /var/log/slurm
- chown -R {{ user }}:{{ slurm_group_name }} /var/run/slurm
- chown -R {{ user }}:{{ slurm_group_name }} /var/spool
Expand Down Expand Up @@ -208,6 +216,8 @@
- systemctl start slurmd
- systemctl daemon-reexec
- systemctl restart sshd
- cp /cert/pulp_webserver.crt /etc/pki/ca-trust/source/anchors && update-ca-trust
- sed -i 's/^gpgcheck=1/gpgcheck=0/' /etc/dnf/dnf.conf

{% if hostvars['localhost']['openldap_support'] %}
- /usr/local/bin/update_ldap_conf.sh
Expand Down Expand Up @@ -309,3 +319,4 @@

- /root/ldms_sampler.sh
{% endif %}
- echo "Cloud-Init has completed successfully."
Original file line number Diff line number Diff line change
Expand Up @@ -85,20 +85,28 @@
{{ ip_name_map[key] }} {{ key }}
{% endfor %}

- path: /usr/local/bin/check_slurm_controller_status.sh
owner: root:root
permissions: '{{ file_mode_755 }}'
content: |
{{ lookup('template', 'templates/slurm/check_slurm_controller_status.sh.j2') | indent(12) }}

runcmd:
- /usr/local/bin/set-ssh.sh
- groupadd -r {{ slurm_group_name }}
- useradd -r -g {{ slurm_group_name }} -d {{ home_dir }} -s /sbin/nologin {{ user }}

- mkdir -p /var/log/slurm /var/run/slurm /var/spool /var/lib/slurm /etc/slurm/epilog.d /etc/munge
- mkdir -p /var/log/slurm /var/run/slurm /var/spool /var/lib/slurm /etc/slurm/epilog.d /etc/munge /var/log/track
- echo "{{ cloud_init_nfs_path }}/$(hostname -s)/var/log/slurm /var/log/slurm nfs defaults,_netdev 0 0" >> /etc/fstab
- echo "{{ cloud_init_nfs_path }}/$(hostname -s)/var/spool /var/spool nfs defaults,_netdev 0 0" >> /etc/fstab
- echo "{{ cloud_init_nfs_path }}/$(hostname -s)/etc/slurm/epilog.d /etc/slurm/epilog.d nfs defaults,_netdev 0 0" >> /etc/fstab
- echo "{{ cloud_init_nfs_path }}/$(hostname -s)/var/spool /var/spool nfs defaults,_netdev 0 0" >> /etc/fstab
- echo "{{ cloud_init_nfs_path }}/$(hostname -s)/etc/munge /etc/munge nfs defaults,_netdev 0 0" >> /etc/fstab
- echo "{{ trackfile_nfs_path }} /var/log/track nfs defaults,_netdev 0 0" >> /etc/fstab
- chmod {{ file_mode }} /etc/fstab
- mount -a
- yes | cp /etc/slurm/epilog.d/slurmd.service /usr/lib/systemd/system/
- /usr/local/bin/check_slurm_controller_status.sh
- chown -R {{ user }}:{{ slurm_group_name }} /var/log/slurm
- chown -R {{ user }}:{{ slurm_group_name }} /var/run/slurm
- chown -R {{ user }}:{{ slurm_group_name }} /var/spool
Expand Down Expand Up @@ -163,3 +171,4 @@

- /root/ldms_sampler.sh
{% endif %}
- echo "Cloud-Init has completed successfully."
Loading