Skip to content

Conversation

@chenwhql
Copy link
Contributor

PR types

Bug fixes

PR changes

Others

Describe

Try to fix test_multiprocess_reader_exception failed

观察:

  1. 这个单测自记录以来挂过3次,均为Timeout,而且通过rerun都可以通过,本身出现非常低频,且本地无法复现
  2. 这个单测正常执行仅需10余秒,但Timeout均是120秒之后还未执行完成,这基本就是hang住了,不是单测本身执行时间长
    image
    image

推断结论:

  • 这是个要频繁使用python multiprocessing模块的单测,之前在处理多进程dataloader单测时遇到过这种情况,multiprocessing在py3 CI单测并行调度时,会偶尔出现hang住的情况,这种解决方法就是单测独占执行,所以修改了这个单测的cmake配置
  • 超时不用设置,直接设置独占就可以,对整体时间的影响不大

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@chenwhql chenwhql merged commit c209751 into PaddlePaddle:develop Feb 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants