Skip to content

Conversation

@quanfuw
Copy link

@quanfuw quanfuw commented Oct 18, 2016

What changes were proposed in this pull request?

Add NUMA aware support for Yarn based deployment mode.
This patch optimizes the memory allocation, executors are bound to NUMA nodes in round-robin for a worker node so that memory allocation tries local NUMA node firstly and only when there is no enough memory in local NUMA node it tries remote ones.
Before this patch, Spark is NUMA unaware in which many remote memory allocations happen and the tremendous remote memory accesses impact performance a lot. We observed significant performance improvement during NUMA aware patch evaluation.

To Do:

  1. Add support for NUMA node numbers' configuration and make testing.
  2. Add NUMA aware support for Mesos based deployment mode and make testing.
  3. Add NUMA aware support for Standalone deployment mode and make testing.

How was this patch tested?

We observed significant performance improvement during evaluation with BigBench. We are still making evaluation and more detailed results will be updated continuously.

Setup:
Cluster Topo: 1 Master + 4 Slaves (Spark on Yarn)
CPU: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz(72 Cores)
Memory: 128GB(2 NUMA Nodes)
NIC: 1x10Gb/Sec
Disk: Write -1.5GB/Sec, Read- 5GB/Sec
SW Version: Hadoop-5.7.0 + Spark-2.0.0

NUMA Introduction

As below diagram depicts, in UMA(Uniform Memory Access) model, processors share one bus. The contention on bus becomes very heavy when processer scales up. NUMA(Non-Uniform Memory Access) processer has a better scalability by dividing processors and memory blocks into nodes, nodes are interconnected with added bus.
For NUMA, the memory accessing to a remote node is much slower than accesing to local one, while, for UMA memory accessing to any nodes is uniform.

image

For more NUMA information, please refer to https://en.wikipedia.org/wiki/Non-uniform_memory_access.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@quanfuw quanfuw changed the title add numa aware support(WIP, not ready for review) [Spark][JIRA: SPARK-17984][YARN, Mesos, Deploy][WIP] add support for numa aware feature Oct 18, 2016
@quanfuw quanfuw changed the title [Spark][JIRA: SPARK-17984][YARN, Mesos, Deploy][WIP] add support for numa aware feature [SPARK-17984][YARN][Mesos][Deploy][WIP] add support for numa aware feature Oct 18, 2016
@quanfuw quanfuw changed the title [SPARK-17984][YARN][Mesos][Deploy][WIP] add support for numa aware feature [SPARK-17984][YARN][Mesos][Deploy][WIP] Add support for numa aware feature Oct 18, 2016
@quanfuw quanfuw changed the title [SPARK-17984][YARN][Mesos][Deploy][WIP] Add support for numa aware feature [SPARK-17984][YARN][Mesos][Deploy][WIP] Add support for NUMA aware feature Oct 18, 2016
@srowen
Copy link
Member

srowen commented Oct 25, 2016

This should be closed in favor of #15579 at least

srowen added a commit to srowen/spark that referenced this pull request Oct 31, 2016
@asfgit asfgit closed this in 26b07f1 Oct 31, 2016
zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 22, 2025
Closes apache#11610
Closes apache#15411
Closes apache#15501
Closes apache#12613
Closes apache#12518
Closes apache#12026
Closes apache#15524
Closes apache#12693
Closes apache#12358
Closes apache#15588
Closes apache#15635
Closes apache#15678
Closes apache#14699
Closes apache#9008

Author: Sean Owen <[email protected]>

Closes apache#15685 from srowen/CloseStalePRs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants