Bug description
Noticed the other day that PytorchLightning was upgraded when building a new environment..
I trained a model a small LJ model up to 1000 epoch using that latest version and the previous version. I did not noticed much difference in the training curves and output.
./make-everyvoice-env --conda -n EveryVoice_2026-02-17_ptl_latest
...
(EveryVoice_2026-02-17_ptl_latest) $ pip list | grep pytorch-lightning
pytorch-lightning 2.6.1
(EveryVoice_2026-02-17_ptl_latest) $
VSpytorch-lightning 2.4.0
One thing I did noticed was how the training logs progress is being displayed in the output logs.
I noticed that when I ran the training as a "job" on the GPSC, the output logs does not display a live progress with the "tail -f " command.
We are also being presented new information.
cat PTL_latest.o971987
Done Loading... (0:00:15.19)
┏━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ ┃ Name ┃ Type ┃ Params ┃ Mode ┃ FLOPs ┃
┡━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ 0 │ loss │ FastSpeech2Loss │ 0 │ train │ 0 │
│ 1 │ text_input_layer │ Embedding │ 20.0 K │ train │ 0 │
│ 2 │ position_embedding │ PositionalEmbedding │ 0 │ train │ 0 │
│ 3 │ encoder │ Conformer │ 6.1 M │ train │ 0 │
│ 4 │ variance_adaptor │ VarianceAdaptor │ 1.6 M │ train │ 0 │
│ 5 │ decoder │ Conformer │ 6.1 M │ train │ 0 │
│ 6 │ mel_linear │ Linear │ 20.6 K │ train │ 0 │
│ 7 │ postnet │ PostNet │ 4.3 M │ train │ 0 │
└───┴────────────────────┴─────────────────────┴────────┴───────┴───────┘
Trainable params: 18.2 M
Non-trainable params: 510
Total params: 18.2 M
Total estimated model params size (MB): 72
Modules in train mode: 471
Modules in eval mode: 0
Total FLOPs: 0
BUT , I did noticed that if I started the training directly on the command line in a "sleeper" job , I was able to see "LIVE" progress. The format did change
ex:
Epoch 3/499 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━ 46/56 0:00:05 • 0:00:02 8.30it/s v_num: base training/pitch_loss: 0.080 training/energy_loss: 0.071 training/duration_loss: 0.063 training/spec_loss: 2.155
Epoch 3/499 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━ 46/56 0:00:05 • 0:00:02 8.30it/s v_num: base training/pitch_loss: 0.068 training/energy_loss: 0.062 training/duration_loss: 0.063 training/spec_loss: 2.311
Epoch 3/499 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━ 47/56 0:00:06 • 0:00:02 8.30it/s v_num: base training/pitch_loss: 0.068 training/energy_loss: 0.062
training/duration_loss: 0.063 training/spec_loss: 2.311 training/postnet_lossEpoch 3/499 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━ 47/56 0:00:06 • 0:00:02 8.30it/s v_num: base training/pitch_loss: 0.063 training/energy_loss: 0.061
training/duration_loss: 0.059 training/spec_loss: 2.101 training/postnet_loss:
INFO - `Trainer.fit` stopped: `max_epochs=500` reached.
Epoch 499/499 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56/56 0:00:07 • 0:00:00 8.27it/s v_num: base training/pitch_loss: 0.006 training/energy_loss: 0.005
training/duration_loss: 0.005 training/spec_loss: 0.130
training/postnet_loss: 0.129 training/attn_ctc_loss: 0.275
training/attn_bin_loss: 0.033 training/total_loss: 0.582
Loading EveryVoice modules: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [1:13:01<00:00, 1095.46s/it]
Question, are we OK with this new behaviour?
How to reproduce the bug
Train from the command line and from the Job. The output logs are not the same
Error messages and logs
# Error messages and logs here please
Environment
Current environment
# Please paste the output of `everyvoice --diagnostic` here
# EveryVoice Diagnostic information
More info
No response
Bug description
Noticed the other day that PytorchLightning was upgraded when building a new environment..
I trained a model a small LJ model up to 1000 epoch using that latest version and the previous version. I did not noticed much difference in the training curves and output.
./make-everyvoice-env --conda -n EveryVoice_2026-02-17_ptl_latest...
VS
pytorch-lightning 2.4.0One thing I did noticed was how the training logs progress is being displayed in the output logs.
I noticed that when I ran the training as a "job" on the GPSC, the output logs does not display a live progress with the "tail -f " command.
We are also being presented new information.
BUT , I did noticed that if I started the training directly on the command line in a "sleeper" job , I was able to see "LIVE" progress. The format did change
ex:
Question, are we OK with this new behaviour?
How to reproduce the bug
Train from the command line and from the Job. The output logs are not the same
Error messages and logs
Environment
Current environment
More info
No response