[BUG] fixed memory leak in BaseModel by detach some tensor #1924

zju-ys · 2025-07-22T10:47:32Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

1.Detached tensors in the log dictionary before appending them to the training/validation/testing_step_outputs lists. This fixes a memory leak caused by retaining the computation graph for every batch throughout an entire epoch.
2.Detached the loss tensor within the step() method before logging.
3.Move prediction results to CPU to prevent VRAM growth.

Did you add any tests for the change?

I ran my training code for 5 epochs using a memory profiler. Here are two comparison plot:
before

after

fkiraly

Thanks!

codecov · 2025-07-22T13:11:34Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@81b5303). Learn more about missing BASE report.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1924   +/-   ##
=======================================
  Coverage        ?   87.03%           
=======================================
  Files           ?      101           
  Lines           ?     8077           
  Branches        ?        0           
=======================================
  Hits            ?     7030           
  Misses          ?     1047           
  Partials        ?        0

Flag	Coverage Δ
cpu	`87.03% <100.00%> (?)`
pytest	`87.03% <100.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

zju-ys requested review from benHeid, fkiraly, fnhirwa, jdb78 and yarnabrina as code owners July 22, 2025 10:47

zju-ys force-pushed the main branch 4 times, most recently from f1d90ca to ef47701 Compare July 22, 2025 12:48

fkiraly added bug Something isn't working module:models labels Jul 22, 2025

github-project-automation bot added this to Bugfixing - pytorch-forecasting Jul 22, 2025

fkiraly previously approved these changes Jul 22, 2025

View reviewed changes

zju-ys dismissed fkiraly’s stale review via ef47701 July 26, 2025 08:14

zju-ys force-pushed the main branch from e1fdf9f to ef47701 Compare July 26, 2025 08:14

fkiraly mentioned this pull request Jul 26, 2025

[BUG] TFT predict with output_dir running out of memory on large validation sets #1461

Open

zju-ys force-pushed the main branch 10 times, most recently from 9541569 to ab2eb8d Compare August 4, 2025 09:29

[BUG] fixed memory leak in BaseModel by detach some tensor

9b4c363

zju-ys force-pushed the main branch from ab2eb8d to 9b4c363 Compare August 4, 2025 09:47

fkiraly merged commit a88a404 into sktime:main Aug 5, 2025
34 of 35 checks passed

github-project-automation bot moved this to Fixed/resolved in Bugfixing - pytorch-forecasting Aug 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] fixed memory leak in BaseModel by detach some tensor #1924

[BUG] fixed memory leak in BaseModel by detach some tensor #1924

Uh oh!

zju-ys commented Jul 22, 2025 •

edited

Loading

Uh oh!

fkiraly left a comment

Uh oh!

codecov bot commented Jul 22, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[BUG] fixed memory leak in BaseModel by detach some tensor #1924

[BUG] fixed memory leak in BaseModel by detach some tensor #1924

Uh oh!

Conversation

zju-ys commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Did you add any tests for the change?

Uh oh!

fkiraly left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zju-ys commented Jul 22, 2025 •

edited

Loading

codecov bot commented Jul 22, 2025 •

edited

Loading