Skip to content

Conversation

@zju-ys
Copy link
Contributor

@zju-ys zju-ys commented Jul 22, 2025

Reference Issues/PRs

#1369
#1461

What does this implement/fix? Explain your changes.

1.Detached tensors in the log dictionary before appending them to the training/validation/testing_step_outputs lists. This fixes a memory leak caused by retaining the computation graph for every batch throughout an entire epoch.
2.Detached the loss tensor within the step() method before logging.
3.Move prediction results to CPU to prevent VRAM growth.

Did you add any tests for the change?

I ran my training code for 5 epochs using a memory profiler. Here are two comparison plot:
before
before
after
alfter

fkiraly
fkiraly previously approved these changes Jul 22, 2025
Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@codecov
Copy link

codecov bot commented Jul 22, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@81b5303). Learn more about missing BASE report.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1924   +/-   ##
=======================================
  Coverage        ?   87.03%           
=======================================
  Files           ?      101           
  Lines           ?     8077           
  Branches        ?        0           
=======================================
  Hits            ?     7030           
  Misses          ?     1047           
  Partials        ?        0           
Flag Coverage Δ
cpu 87.03% <100.00%> (?)
pytest 87.03% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@fkiraly fkiraly merged commit a88a404 into sktime:main Aug 5, 2025
34 of 35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working module:models

Projects

Status: Fixed/resolved

Development

Successfully merging this pull request may close these issues.

2 participants