-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
Bug description
I've recently upgraded my environment from torch 2.7.1+cu128, lightning 2.5.2, python 3.13 to torch 2.10+cu130, lightning 2.6.1 and python 3.14.
With old environment and "auto" strategy for a single GPU, my training was starting right away with basically no boot time. Now I have to wait a minute before the epoch starts. Once it starts, there is major speedup for first epoch, reporting it/s +6% for fp32 and +15% for fp16. In following epochs, the freeze time is included into the epoch time and it accounts for net slowdown, erasing any possible gains with environment upgrade.
I'm not doing anything custom in my training, simply using Trainer fit on single GPU with h5 LightningDataModule.
This is also accompanied by warning:
.../.pixi/envs/default/lib/python3.14/site-packages/pytorch_lightning/utilities/_pytree.py:21: FutureWarning:
`isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
Which from the other issue I understand is not critical here.
What version are you seeing the problem on?
master
Reproduced in studio
No response
How to reproduce the bug
I can't easily share my related training code, but I could share the pixi environments for reproduction.
Old:
[workspace]
channels = ["https://prefix.dev/conda-forge"]
name = "x"
platforms = ["linux-64"]
version = "0.1.0"
[system-requirements]
cuda = "12.0"
[tasks]
[dependencies]
python = "~=3.13.0"
ipykernel = "*"
numpy = "*"
scipy = "*"
pandas = "*"
matplotlib = "*"
ruff = ">=0.12.4,<0.13"
ipympl = ">=0.9.7,<0.10"
[pypi-dependencies]
clearml = "*"
torch = { version = "*", index = "https://download.pytorch.org/whl/cu128" }
torchvision = { version = "*", index = "https://download.pytorch.org/whl/cu128" }
torchaudio = { version = "*", index = "https://download.pytorch.org/whl/cu128" }
neuraloperator = { git = "https://github.com/neuraloperator/neuraloperator.git" }
torch-harmonics = "==0.7.3"
pyroomacoustics = ">=0.8.4, <0.9"
soundfile = ">=0.13.1, <0.14"
lightning = ">=2.5.2, <3"
torchmetrics = ">=1.7.4, <2"
tensorboard = ">=2.20.0, <3"
New:
[workspace]
channels = ["https://prefix.dev/conda-forge"]
name = "x"
platforms = ["linux-64"]
version = "0.1.0"
[system-requirements]
cuda = "13.0"
[tasks]
[dependencies]
python = ">=3.14.3,<3.15"
ipykernel = ">=7.2.0,<8"
numpy = ">=2.4.2,<3"
scipy = ">=1.17.0,<2"
pandas = ">=3.0.1,<4"
matplotlib = ">=3.10.8,<4"
ruff = ">=0.15.2,<0.16"
ipympl = ">=0.10.0,<0.11"
[pypi-dependencies]
clearml = ">=2.1.3, <3"
torch = { version = "*", index = "https://download.pytorch.org/whl/cu130" }
torchvision = { version = "*", index = "https://download.pytorch.org/whl/cu130" }
torchaudio = { version = "*", index = "https://download.pytorch.org/whl/cu130" }
pyroomacoustics = ">=0.9.0, <0.10"
soundfile = ">=0.13.1, <0.14"
lightning = ">=2.6.1, <3"
torchmetrics = ">=1.8.2, <2"
tensorboard = ">=2.20.0, <3"
h5py = ">=3.15.1, <4"Error messages and logs
# New first epoch start
Epoch 0: 3%|██▌ | 44/1688 [00:10<06:16, 4.36it/s, v_num=bf10, train/loss_step=1.020]
# New second epoch start
Epoch 1: 3%|█▋ | 49/1688 [01:07<37:29, 0.73it/s, v_num=bf10, train/loss_step=0.395, val/loss=0.388, train/loss_epoch=0.448]
# Old second epoch start
Epoch 1: 2%|▉ | 40/1688 [00:11<08:10, 3.36it/s, v_num=fd5c, train/loss_step=0.414, val/loss=0.384, train/loss_epoch=0.447]
Environment
Old environment
- CUDA:
- GPU:
- NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition
- NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition
- NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition
- NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition
- available: True
- version: 12.8 - Lightning:
- lightning: 2.5.2
- lightning-utilities: 0.14.3
- pytorch-lightning: 2.5.2
- tensorly-torch: 0.5.0
- torch: 2.7.1+cu128
- torch_harmonics: 0.7.3
- torchaudio: 2.7.1+cu128
- torchmetrics: 1.7.4
- torchvision: 0.22.1+cu128 - Packages:
- Cython: 3.1.2
- GitPython: 3.1.44
- Jinja2: 3.1.6
- Markdown: 3.8.2
- MarkupSafe: 3.0.2
- PyJWT: 2.10.1
- PySide6: 6.9.1
- PyYAML: 6.0.2
- Pygments: 2.19.2
- Werkzeug: 3.1.3
- absl-py: 2.3.1
- aiohappyeyeballs: 2.6.1
- aiohttp: 3.12.14
- aiosignal: 1.4.0
- annotated-types: 0.7.0
- asttokens: 3.0.0
- attrs: 25.3.0
- certifi: 2025.7.14
- cffi: 1.17.1
- charset-normalizer: 3.4.2
- clearml: 2.0.2
- click: 8.2.1
- comm: 0.2.2
- configmypy: 0.2.0
- contourpy: 1.3.2
- cycler: 0.12.1
- debugpy: 1.8.15
- decorator: 5.2.1
- exceptiongroup: 1.3.0
- executing: 2.2.0
- filelock: 3.18.0
- fonttools: 4.59.0
- frozenlist: 1.7.0
- fsspec: 2025.7.0
- furl: 2.1.4
- gitdb: 4.0.12
- grpcio: 1.73.1
- h5py: 3.14.0
- idna: 3.10
- importlib_metadata: 8.7.0
- iniconfig: 2.1.0
- ipykernel: 6.29.5
- ipympl: 0.9.7
- ipython: 9.4.0
- ipython_pygments_lexers: 1.1.1
- ipywidgets: 8.1.7
- jedi: 0.19.2
- jsonschema: 4.24.1
- jsonschema-specifications: 2025.4.1
- jupyter_client: 8.6.3
- jupyter_core: 5.8.1
- jupyterlab_widgets: 3.0.15
- kiwisolver: 1.4.8
- lightning: 2.5.2
- lightning-utilities: 0.14.3
- matplotlib: 3.10.3
- matplotlib-inline: 0.1.7
- mpmath: 1.3.0
- multidict: 6.6.3
- munkres: 1.1.4
- nest_asyncio: 1.6.0
- networkx: 3.5
- neuraloperator: 1.0.2
- numpy: 2.3.1
- nvidia-cublas-cu12: 12.8.3.14
- nvidia-cuda-cupti-cu12: 12.8.57
- nvidia-cuda-nvrtc-cu12: 12.8.61
- nvidia-cuda-runtime-cu12: 12.8.57
- nvidia-cudnn-cu12: 9.7.1.26
- nvidia-cufft-cu12: 11.3.3.41
- nvidia-cufile-cu12: 1.13.0.11
- nvidia-curand-cu12: 10.3.9.55
- nvidia-cusolver-cu12: 11.7.2.55
- nvidia-cusparse-cu12: 12.5.7.53
- nvidia-cusparselt-cu12: 0.6.3
- nvidia-nccl-cu12: 2.26.2
- nvidia-nvjitlink-cu12: 12.8.61
- nvidia-nvtx-cu12: 12.8.55
- opt_einsum: 3.4.0
- orderedmultidict: 1.0.1
- packaging: 25.0
- pandas: 2.3.1
- parso: 0.8.4
- pathlib2: 2.3.7.post1
- pexpect: 4.9.0
- pickleshare: 0.7.5
- pillow: 11.3.0
- platformdirs: 4.3.8
- pluggy: 1.6.0
- prompt_toolkit: 3.0.51
- propcache: 0.3.2
- protobuf: 6.31.1
- psutil: 7.0.0
- ptyprocess: 0.7.0
- pure_eval: 0.2.3
- pybind11: 3.0.0
- pycparser: 2.22
- pydantic: 2.11.7
- pydantic_core: 2.33.2
- pyparsing: 3.2.3
- pyroomacoustics: 0.8.4
- pytest: 8.4.1
- pytest-mock: 3.14.1
- python-dateutil: 2.9.0.post0
- pytorch-lightning: 2.5.2
- pytz: 2025.2
- pyzmq: 27.0.0
- referencing: 0.36.2
- requests: 2.32.4
- rpds-py: 0.26.0
- ruamel.yaml: 0.18.14
- ruamel.yaml.clib: 0.2.12
- ruff: 0.12.4
- scipy: 1.16.0
- sentry-sdk: 2.33.0
- setuptools: 80.9.0
- shiboken6: 6.9.1
- six: 1.17.0
- smmap: 5.0.2
- soundfile: 0.13.1
- stack_data: 0.6.3
- sympy: 1.14.0
- tensorboard: 2.20.0
- tensorboard-data-server: 0.7.2
- tensorly: 0.9.0
- tensorly-torch: 0.5.0
- torch: 2.7.1+cu128
- torch_harmonics: 0.7.3
- torchaudio: 2.7.1+cu128
- torchmetrics: 1.7.4
- torchvision: 0.22.1+cu128
- tornado: 6.5.1
- tqdm: 4.67.1
- traitlets: 5.14.3
- triton: 3.3.1
- typing-inspection: 0.4.1
- typing_extensions: 4.14.1
- tzdata: 2025.2
- urllib3: 2.5.0
- wandb: 0.21.0
- wcwidth: 0.2.13
- widgetsnbextension: 4.0.14
- yarl: 1.20.1
- zencfg: 0.3.0
- zipp: 3.23.0 - System:
- OS: Linux
- architecture:
- 64bit
- ELF
- processor: x86_64
- python: 3.13.5
- release: 6.14.0-37-generic
- version: Consider: ability to set seed #37~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Nov 20 10:25:38 UTC 2
Current environment
- CUDA:
- GPU:
- NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition
- NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition
- NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition
- NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition
- available: True
- version: 13.0 - Lightning:
- lightning: 2.6.1
- lightning-utilities: 0.15.2
- pytorch-lightning: 2.6.1
- torch: 2.10.0+cu130
- torchaudio: 2.10.0+cu130
- torchmetrics: 1.8.2
- torchvision: 0.25.0+cu130 - Packages:
- Cython: 3.2.4
- Jinja2: 3.1.6
- Markdown: 3.10.2
- MarkupSafe: 3.0.3
- PyJWT: 2.10.1
- PySide6: 6.10.2
- PyYAML: 6.0.3
- Pygments: 2.19.2
- Werkzeug: 3.1.6
- absl-py: 2.4.0
- aiohappyeyeballs: 2.6.1
- aiohttp: 3.13.3
- aiosignal: 1.4.0
- asttokens: 3.0.1
- attrs: 25.4.0
- certifi: 2026.1.4
- cffi: 2.0.0
- charset-normalizer: 3.4.4
- clearml: 2.1.3
- comm: 0.2.3
- contourpy: 1.3.3
- cuda-bindings: 13.0.3
- cuda-pathfinder: 1.3.4
- cycler: 0.12.1
- debugpy: 1.8.20
- decorator: 5.2.1
- executing: 2.2.1
- filelock: 3.24.3
- fonttools: 4.61.1
- frozenlist: 1.8.0
- fsspec: 2026.2.0
- furl: 2.1.4
- grpcio: 1.78.1
- h5py: 3.15.1
- idna: 3.11
- ipykernel: 7.2.0
- ipympl: 0.10.0
- ipython: 9.10.0
- ipython_pygments_lexers: 1.1.1
- ipywidgets: 8.1.8
- jedi: 0.19.2
- jsonschema: 4.26.0
- jsonschema-specifications: 2025.9.1
- jupyter_client: 8.8.0
- jupyter_core: 5.9.1
- jupyterlab_widgets: 3.0.16
- kiwisolver: 1.4.9
- lightning: 2.6.1
- lightning-utilities: 0.15.2
- matplotlib: 3.10.8
- matplotlib-inline: 0.2.1
- mpmath: 1.3.0
- multidict: 6.7.1
- munkres: 1.1.4
- nest_asyncio: 1.6.0
- networkx: 3.6.1
- numpy: 2.4.2
- nvidia-cublas: 13.1.0.3
- nvidia-cuda-cupti: 13.0.85
- nvidia-cuda-nvrtc: 13.0.88
- nvidia-cuda-runtime: 13.0.96
- nvidia-cudnn-cu13: 9.15.1.9
- nvidia-cufft: 12.0.0.61
- nvidia-cufile: 1.15.1.6
- nvidia-curand: 10.4.0.35
- nvidia-cusolver: 12.0.4.66
- nvidia-cusparse: 12.6.3.3
- nvidia-cusparselt-cu13: 0.8.0
- nvidia-nccl-cu13: 2.28.9
- nvidia-nvjitlink: 13.0.88
- nvidia-nvshmem-cu13: 3.4.5
- nvidia-nvtx: 13.0.85
- orderedmultidict: 1.0.2
- packaging: 26.0
- pandas: 3.0.1
- parso: 0.8.6
- pathlib2: 2.3.7.post1
- pexpect: 4.9.0
- pillow: 12.1.1
- platformdirs: 4.9.2
- prompt_toolkit: 3.0.52
- propcache: 0.4.1
- protobuf: 6.33.5
- psutil: 7.2.2
- ptyprocess: 0.7.0
- pure_eval: 0.2.3
- pybind11: 3.0.2
- pycparser: 3.0
- pyparsing: 3.3.2
- pyroomacoustics: 0.9.0
- python-dateutil: 2.9.0.post0
- pytorch-lightning: 2.6.1
- pyzmq: 27.1.0
- referencing: 0.37.0
- requests: 2.32.5
- rpds-py: 0.30.0
- ruff: 0.15.2
- scipy: 1.17.0
- setuptools: 82.0.0
- shiboken6: 6.10.2
- six: 1.17.0
- soundfile: 0.13.1
- stack_data: 0.6.3
- sympy: 1.14.0
- tensorboard: 2.20.0
- tensorboard-data-server: 0.7.2
- torch: 2.10.0+cu130
- torchaudio: 2.10.0+cu130
- torchmetrics: 1.8.2
- torchvision: 0.25.0+cu130
- tornado: 6.5.3
- tqdm: 4.67.3
- traitlets: 5.14.3
- triton: 3.6.0
- typing_extensions: 4.15.0
- unicodedata2: 17.0.1
- urllib3: 2.6.3
- wcwidth: 0.6.0
- widgetsnbextension: 4.0.15
- yarl: 1.22.0 - System:
- OS: Linux
- architecture:
- 64bit
- ELF
- processor: x86_64
- python: 3.14.3
- release: 6.14.0-37-generic
- version: Consider: ability to set seed #37~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Nov 20 10:25:38 UTC 2
More info
Sorry that I don't include any code for reproduction, but I hope it's enough for nailing this regression.