Skip to content

use spin lock in auto growth allocator#34910

Merged
wanghuancoder merged 6 commits intoPaddlePaddle:developfrom
wanghuancoder:spinlocks_for_allocator
Aug 20, 2021
Merged

use spin lock in auto growth allocator#34910
wanghuancoder merged 6 commits intoPaddlePaddle:developfrom
wanghuancoder:spinlocks_for_allocator

Conversation

@wanghuancoder
Copy link
Contributor

PR types

Others

PR changes

Others

Describe

测试发现,在多线程申请显存、释放显存过程中,会发生锁碰撞。此时,会有1个线程进入休眠状态。当该线程获得锁后,再从休眠状态唤醒。这个过程是非常耗时的。直接影响了执行器的调度性能。
为此,我们将std::mutex 换成 自旋锁,自旋锁在发现锁已经被lock时,本线程会自旋等待,不进入休眠状态,实测性能比std::mutex好很多。
本PR只修改了auto growth allocator,其它的allocator不清楚都是什么依赖关系,暂未修改。

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@zhiqiu zhiqiu requested review from sneaxiy and zhiqiu August 16, 2021 05:01
@sneaxiy
Copy link
Collaborator

sneaxiy commented Aug 17, 2021

Some comments are listed as follows. @zhiqiu @wanghuancoder

  1. Can you show any testing example and data?
  2. Linus has addressed that: spinlocks can only be used if you actually know you're not being scheduled while using them. See here.

@wanghuancoder
Copy link
Contributor Author

Some comments are listed as follows. @zhiqiu @wanghuancoder

  1. Can you show any testing example and data?
  2. Linus has addressed that: spinlocks can only be used if you actually know you're not being scheduled while using them. See here.

我是在为新版本执行器写GC时发现的该问题,我用文字描述一下我的测试:
有2根线程,1根线程负责mutabledata+LaunchKernel。另1跟线程使用cudaEventSynchronize等待特定OP的kernel执行结束,则释放相应Tensor的显存。测试模型是PTB,BatchSize=20。由于PTB都是小kernel,因此会高频的mutabledata和freeAllocation。统计发现,7655806次加锁中(申请释放显存都会加锁),会有904919次发生锁碰撞(碰撞,就是1跟线程加锁,另一根线程申请锁)。这种情况下,整个测试用时36秒,其中mutable_data用时9.27秒,freeAllocation用时9.96秒。

如果换成1跟线程同时负责以上2跟线程的工作(即不发生锁碰撞),整个测试用时24秒,mutable_data用时2.212,reeAllocation用时1.557秒。

如果仍是2根线程,将Allocator的锁改成自旋锁,整个测试用时27秒,mutable_data用时3.83,reeAllocation用时6.72秒。

sneaxiy
sneaxiy previously approved these changes Aug 19, 2021
Copy link
Collaborator

@sneaxiy sneaxiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for some suggestions.

  • Please add some notes about where the codes of spin_lock.h is from.
  • Try to implement a BasicLockable class instead of using macro everywhere.
class SpinLock {
  public:
    void lock();
    void unlock();
    DISABLE_COPY_AND_ASSIGN(SpinLock);
};

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants