Skip to content

Conversation

@xiaoguoguo626807
Copy link
Contributor

@xiaoguoguo626807 xiaoguoguo626807 commented Aug 28, 2025

PR Category

Execute Infrastructure

PR Types

Bug fixes

Description

pcard-67164

load_merge_state_dict 过程中将tensor拷贝到显存后没有及时释放,导致显存爆增

todo:
当前每处理一次数据就会同步给cpu, 然后再拷贝到gpu, 合并一个参数有几份就会构造几次显存,存在冗余

@paddle-bot
Copy link

paddle-bot bot commented Aug 28, 2025

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@codecov-commenter
Copy link

Codecov Report

❌ Patch coverage is 0% with 3 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@e3ccc1e). Learn more about missing BASE report.

Files with missing lines Patch % Lines
...distributed/flex_checkpoint/dcp/load_state_dict.py 0.00% 3 Missing ⚠️

❌ Your patch status has failed because the patch coverage (0.00%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             develop   #74953   +/-   ##
==========================================
  Coverage           ?    0.00%           
==========================================
  Files              ?        1           
  Lines              ?        3           
  Branches           ?        0           
==========================================
  Hits               ?        0           
  Misses             ?        3           
  Partials           ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@xiaoguoguo626807 xiaoguoguo626807 merged commit c9fbded into PaddlePaddle:develop Aug 29, 2025
75 of 78 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants