|
| 1 | +<!--- |
| 2 | + Licensed under the Apache License, Version 2.0 (the "License"); |
| 3 | + you may not use this file except in compliance with the License. |
| 4 | + You may obtain a copy of the License at |
| 5 | +
|
| 6 | + http://www.apache.org/licenses/LICENSE-2.0 |
| 7 | +
|
| 8 | + Unless required by applicable law or agreed to in writing, software |
| 9 | + distributed under the License is distributed on an "AS IS" BASIS, |
| 10 | + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| 11 | + See the License for the specific language governing permissions and |
| 12 | + limitations under the License. See accompanying LICENSE file. |
| 13 | +--> |
| 14 | + |
| 15 | +# NUMA |
| 16 | + |
| 17 | +Non-uniform memory access (NUMA) is a computer memory design used in multiprocessing, |
| 18 | +where the memory access time depends on the memory location relative to the processor. |
| 19 | +Under NUMA, a processor can access its own local memory faster than non-local memory |
| 20 | +(memory local to another processor or memory shared between processors). |
| 21 | +Yarn Containers can make benefit of this NUMA design to get better performance by binding to a |
| 22 | +specific NUMA node and all subsequent memory allocations will be served by the same node, |
| 23 | +reducing remote memory accesses. NUMA support for YARN Container has to be enabled only if worker |
| 24 | +node machines has NUMA support. |
| 25 | + |
| 26 | +# Enabling NUMA |
| 27 | + |
| 28 | +### Prerequisites |
| 29 | + |
| 30 | +- As of now, NUMA awareness works only with `LinuxContainerExecutor` (LCE) |
| 31 | +- To use the feature of NUMA awareness in the cluster,It must be enabled with |
| 32 | + LinuxContainerExecutor (LCE) |
| 33 | +- Steps to enable SecureContainer (LCE) for cluster is documented [here](SecureContainer.md) |
| 34 | + |
| 35 | +## Configurations |
| 36 | + |
| 37 | +**1) Enable/Disable the NUMA awareness** |
| 38 | + |
| 39 | +This property enables the NUMA awareness feature in the Node Manager |
| 40 | +for the containers. By default, the value of this property is false which means it is disabled. |
| 41 | +If this property is `true` then only the below configurations will be applicable otherwise they |
| 42 | +will be ignored. |
| 43 | + |
| 44 | +In `yarn-site.xml` add |
| 45 | + |
| 46 | +``` |
| 47 | + <property> |
| 48 | + <name>yarn.nodemanager.numa-awareness.enabled</name> |
| 49 | + <value>true</value> |
| 50 | + </property> |
| 51 | +``` |
| 52 | + |
| 53 | +**2) NUMA topology** |
| 54 | + |
| 55 | +This property decides whether to read the NUMA topology from the system or from the |
| 56 | +configurations. If this property value is true then the topology will be read from the system using |
| 57 | +`numactl --hardware` command in UNIX systems and similar way in windows. |
| 58 | +If this property is false then the topology will be read using the below configurations. |
| 59 | +Default value of this configuration is false which means NodeManager will read the NUMA topology |
| 60 | +from the below configurations. |
| 61 | + |
| 62 | +In `yarn-site.xml` add |
| 63 | + |
| 64 | +``` |
| 65 | + <property> |
| 66 | + <name>yarn.nodemanager.numa-awareness.read-topology</name> |
| 67 | + <value>false</value> |
| 68 | + </property> |
| 69 | +``` |
| 70 | + |
| 71 | +**3) Numa command** |
| 72 | + |
| 73 | +This property is passed when `yarn.nodemanager.numa-awareness.read-topology` is set to true. |
| 74 | +It is recommended to verify the installation of `numactl` command in the Linux OS of every node. |
| 75 | + |
| 76 | +Use `/usr/bin/numactl --hardware` to verify. |
| 77 | +Sample output of `/usr/bin/numactl --hardware` |
| 78 | + |
| 79 | +``` |
| 80 | +available: 2 nodes (0-1) |
| 81 | +node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
| 82 | +node 0 size: 191297 MB |
| 83 | +node 0 free: 186539 MB |
| 84 | +node 1 cpus: 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 |
| 85 | +node 1 size: 191383 MB |
| 86 | +node 1 free: 185914 MB |
| 87 | +node distances: |
| 88 | +node 0 1 |
| 89 | + 0: 10 21 |
| 90 | + 1: 21 10 |
| 91 | +``` |
| 92 | + |
| 93 | +In `yarn-site.xml` add |
| 94 | + |
| 95 | +``` |
| 96 | + <property> |
| 97 | + <name>yarn.nodemanager.numa-awareness.numactl.cmd</name> |
| 98 | + <value>/usr/bin/numactl</value> |
| 99 | + </property> |
| 100 | +``` |
| 101 | + |
| 102 | +**4) NUMA nodes id’s** |
| 103 | + |
| 104 | +This property is used to provide the NUMA node ids as comma separated values.It will be read only |
| 105 | +when the `yarn.nodemanager.numa-awareness.read-topology` is false. |
| 106 | + |
| 107 | +In ```yarn-site.xml``` add |
| 108 | + |
| 109 | +``` |
| 110 | + <property> |
| 111 | + <name>yarn.nodemanager.numa-awareness.node-ids</name> |
| 112 | + <value>0,1</value> |
| 113 | + </property> |
| 114 | +``` |
| 115 | + |
| 116 | +**5) NUMA Node memory** |
| 117 | + |
| 118 | +This property will be used to read the memory(in MB) configured for each NUMA node specified in |
| 119 | +`yarn.nodemanager.numa-awareness.node-ids` by substituting the node id in the place of |
| 120 | +`<NODE_ID>`.It will be read only when the `yarn.nodemanager.numa-awareness.read-topology` |
| 121 | +is false. |
| 122 | + |
| 123 | +In `yarn-site.xml` add |
| 124 | + |
| 125 | +``` |
| 126 | + <property> |
| 127 | + <name>yarn.nodemanager.numa-awareness.<NODE_ID>.memory</name> |
| 128 | + <value>191297</value> |
| 129 | + </property> |
| 130 | +``` |
| 131 | + |
| 132 | +The value passed is the per node memory available , from the above sample output of |
| 133 | +`numactl --hardware` the value passed for the property is the memory available i.e `191297` |
| 134 | + |
| 135 | +**6) NUMA Node CPUs** |
| 136 | + |
| 137 | +This property will be used to read the number of CPUs configured for each node specified in |
| 138 | +`yarn.nodemanager.numa-awareness.node-ids` by substituting the node id in the place of |
| 139 | +`<NODE_ID>`.It will be read only when the `yarn.nodemanager.numa-awareness.read-topology` is false. |
| 140 | + |
| 141 | +In ```yarn-site.xml``` add |
| 142 | + |
| 143 | +``` |
| 144 | + <property> |
| 145 | + <name>yarn.nodemanager.numa-awareness.<NODE_ID>.cpus</name> |
| 146 | + <value>48</value> |
| 147 | + </property> |
| 148 | +``` |
| 149 | + |
| 150 | +referring to the `numactl --hardware` output , number of cpu's in a node is `48`. |
| 151 | + |
| 152 | +**7) Passing java_opts for map/reduce** |
| 153 | + |
| 154 | +Every container has to be aware of NUMA and the JVM can be notified via passing NUMA flag. |
| 155 | +Spark, Tez and other YARN Applications also need to set the container JVM Opts to leverage |
| 156 | +NUMA Support. |
| 157 | + |
| 158 | +In ```mapred-site.xml``` add |
| 159 | + |
| 160 | +``` |
| 161 | + <property> |
| 162 | + <name>mapreduce.reduce.java.opts</name> |
| 163 | + <value>-XX:+UseNUMA</value> |
| 164 | + </property> |
| 165 | + <property> |
| 166 | + <name>mapreduce.map.java.opts</name> |
| 167 | + <value>-XX:+UseNUMA</value> |
| 168 | + </property> |
| 169 | +``` |
| 170 | + |
| 171 | +# Default configuration |
| 172 | + |
| 173 | +| Property | Default value | |
| 174 | +| --- |-----| |
| 175 | +|yarn.nodemanager.numa-awareness.enabled|false| |
| 176 | +|yarn.nodemanager.numa-awareness.read-topology|false| |
| 177 | + |
| 178 | +# Enable numa balancing at OS Level (Optional) |
| 179 | + |
| 180 | +In linux, by default numa balancing is by default off. For more performance improvement, |
| 181 | +NumaBalancing can be turned on for all the nodes in cluster |
| 182 | + |
| 183 | +``` |
| 184 | +echo 1 | sudo tee /proc/sys/kernel/numa_balancing |
| 185 | +``` |
| 186 | + |
| 187 | +# Verify |
| 188 | + |
| 189 | +**1) NodeManager log** |
| 190 | + |
| 191 | +In any of the NodeManager, grep log file using below command |
| 192 | + |
| 193 | +`grep "NUMA resources allocation is enabled," *` |
| 194 | + |
| 195 | +Sample log with `LinuxContainerExecutor` enabled message |
| 196 | + |
| 197 | +``` |
| 198 | +<nodemanager_ip>.log.2022-06-24-19.gz:2022-06-24 19:16:40,178 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.numa.NumaResourceHandlerImpl (main): NUMA resources allocation is enabled, initializing NUMA resources allocator. |
| 199 | +``` |
| 200 | + |
| 201 | +**2) Container Log** |
| 202 | + |
| 203 | +Grep the NodeManager log using below grep command to check if a container is assigned with NUMA node |
| 204 | +resources. |
| 205 | + |
| 206 | +`grep "NUMA node" | grep <container_id>` |
0 commit comments