use spin lock in auto growth allocator by wanghuancoder · Pull Request #34910 · PaddlePaddle/Paddle

wanghuancoder · 2021-08-16T02:40:18Z

PR types

Others

PR changes

Others

Describe

测试发现，在多线程申请显存、释放显存过程中，会发生锁碰撞。此时，会有1个线程进入休眠状态。当该线程获得锁后，再从休眠状态唤醒。这个过程是非常耗时的。直接影响了执行器的调度性能。
为此，我们将std::mutex 换成自旋锁，自旋锁在发现锁已经被lock时，本线程会自旋等待，不进入休眠状态，实测性能比std::mutex好很多。
本PR只修改了auto growth allocator，其它的allocator不清楚都是什么依赖关系，暂未修改。

paddle-bot-old · 2021-08-16T02:40:24Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

sneaxiy · 2021-08-17T03:17:41Z

Some comments are listed as follows. @zhiqiu @wanghuancoder

Can you show any testing example and data?
Linus has addressed that: spinlocks can only be used if you actually know you're not being scheduled while using them. See here.

wanghuancoder · 2021-08-17T06:27:02Z

Some comments are listed as follows. @zhiqiu @wanghuancoder

Can you show any testing example and data?

Linus has addressed that: spinlocks can only be used if you actually know you're not being scheduled while using them. See here.

我是在为新版本执行器写GC时发现的该问题，我用文字描述一下我的测试：
有2根线程，1根线程负责mutabledata+LaunchKernel。另1跟线程使用cudaEventSynchronize等待特定OP的kernel执行结束，则释放相应Tensor的显存。测试模型是PTB，BatchSize=20。由于PTB都是小kernel，因此会高频的mutabledata和freeAllocation。统计发现，7655806次加锁中（申请释放显存都会加锁），会有904919次发生锁碰撞（碰撞，就是1跟线程加锁，另一根线程申请锁）。这种情况下，整个测试用时36秒，其中mutable_data用时9.27秒，freeAllocation用时9.96秒。

如果换成1跟线程同时负责以上2跟线程的工作（即不发生锁碰撞），整个测试用时24秒，mutable_data用时2.212，reeAllocation用时1.557秒。

如果仍是2根线程，将Allocator的锁改成自旋锁，整个测试用时27秒，mutable_data用时3.83，reeAllocation用时6.72秒。

sneaxiy

LGTM for some suggestions.

Please add some notes about where the codes of spin_lock.h is from.
Try to implement a BasicLockable class instead of using macro everywhere.

class SpinLock {
  public:
    void lock();
    void unlock();
    DISABLE_COPY_AND_ASSIGN(SpinLock);
};

… spinlocks_for_allocator

This reverts commit 6bacfb0.

)" This reverts commit 97fef01.

use spin lock in auto growth allocator, test=develop

4b78c85

zhiqiu requested review from sneaxiy and zhiqiu August 16, 2021 05:01

wanghuancoder added 3 commits August 18, 2021 08:25

use pthread spin lock, test=develop

0a80831

use lock guard, test=develop

8441fd8

use malloc spin lock, test=develop

e7efd2c

sneaxiy previously approved these changes Aug 19, 2021

View reviewed changes

use lock_guard, test=develop

fca298e

wanghuancoder dismissed sneaxiy’s stale review via fca298e August 19, 2021 08:23

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

9890e1d

… spinlocks_for_allocator

sneaxiy approved these changes Aug 20, 2021

View reviewed changes

wanghuancoder merged commit 6bacfb0 into PaddlePaddle:develop Aug 20, 2021

chancezeus mentioned this pull request Aug 22, 2021

Regression: Fails to compile for ARM64 (Jetson) with latest develop branch #35065

Closed

wanghuancoder added a commit that referenced this pull request Aug 23, 2021

Revert "use spin lock in auto growth allocator (#34910)"

25f8a19

This reverts commit 6bacfb0.

wanghuancoder mentioned this pull request Aug 23, 2021

Revert "use spin lock in auto growth allocator" #35069

Merged

wanghuancoder added a commit that referenced this pull request Aug 23, 2021

Revert "use spin lock in auto growth allocator (#34910)" (#35069)

97fef01

This reverts commit 6bacfb0.

wanghuancoder added a commit that referenced this pull request Aug 24, 2021

Revert "Revert "use spin lock in auto growth allocator (#34910)" (#35069

4c90e45

)" This reverts commit 97fef01.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use spin lock in auto growth allocator#34910

use spin lock in auto growth allocator#34910
wanghuancoder merged 6 commits intoPaddlePaddle:developfrom
wanghuancoder:spinlocks_for_allocator

wanghuancoder commented Aug 16, 2021

Uh oh!

paddle-bot-old bot commented Aug 16, 2021

Uh oh!

sneaxiy commented Aug 17, 2021

Uh oh!

wanghuancoder commented Aug 17, 2021

Uh oh!

sneaxiy left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wanghuancoder commented Aug 16, 2021

PR types

PR changes

Describe

Uh oh!

paddle-bot-old bot commented Aug 16, 2021

Uh oh!

sneaxiy commented Aug 17, 2021

Uh oh!

wanghuancoder commented Aug 17, 2021

Uh oh!

sneaxiy left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sneaxiy left a comment •

edited

Loading