Had a question on on slack, copying it here to keep track / assign to @SamuelLarkin from the thread ( Thank you)
I have a question about how we are doing checkpointing ckpt_epochs: 1 in EV.
For example, I have a test where I set max_epochs: 100 and training is done.
Logs look like this below .
(EveryVoice) [U20-GPSC7]:$ ll logs_and_checkpoints/FeaturePredictionExperiment/base/checkpoints/
total 1.5G
-rw-r----- 1 tes001 nrc_ict 209M Apr 29 16:18 'epoch=87-step=1144.ckpt'
-rw-r----- 1 tes001 nrc_ict 209M Apr 29 16:18 'epoch=93-step=1222.ckpt'
-rw-r----- 1 tes001 nrc_ict 209M Apr 29 16:18 'epoch=96-step=1261.ckpt'
-rw-r----- 1 tes001 nrc_ict 209M Apr 29 16:18 'epoch=97-step=1274.ckpt'
-rw-r----- 1 tes001 nrc_ict 209M Apr 29 16:18 'epoch=99-step=1300.ckpt'
-rw-r----- 1 tes001 nrc_ict 209M Apr 29 16:18 'epoch=99-step=1300-v1.ckpt'
-rw-r----- 1 tes001 nrc_ict 209M Apr 29 16:18 last.ckpt
I generated an audio file using that last.ckpt and this was the filename created below after synth..:
synthesis_output/wav/made-certain-recomme-9229d5cf--default--eng--ckpt\=1300--v_ckpt\=5000000--pred.wav
Question, is it a bug in everyvoice synthesize where that file name should have been something like this instead:
made-certain-recomme-9229d5cf--default--eng--ckpt\=100--v_ckpt\=2500000--pred.wav
Where ckpt should be --> 100 ( or 99 depending on how you count...)
and v_ckpt should be --> 2.5M since I used our Universal Vocoder ( where I thought we took it a the 2.5M checkpoint...)
OR should "ckpt=" be changed for "step=" instead when creating the file name on synth to be more precise?
Sam also had this comment : so I feel it is more that we don't have the proper definition of epoch, ckpt and step
Where I agree, and I am also wondering about those numbers when using multi-gpu / nodes while training.
Sam was also wondering how many GPUs were used while training our Vocoder / are we doing gradient accumulation?. ( cause we are seeing 2.5M and 5M in the model / not sure what it the right one to use?...)
Had a question on on slack, copying it here to keep track / assign to @SamuelLarkin from the thread ( Thank you)
I have a question about how we are doing checkpointing
ckpt_epochs: 1in EV.For example, I have a test where I set
max_epochs: 100and training is done.Logs look like this below .
I generated an audio file using that last.ckpt and this was the filename created below after synth..:
synthesis_output/wav/made-certain-recomme-9229d5cf--default--eng--ckpt\=1300--v_ckpt\=5000000--pred.wavQuestion, is it a bug in everyvoice synthesize where that file name should have been something like this instead:
made-certain-recomme-9229d5cf--default--eng--ckpt\=100--v_ckpt\=2500000--pred.wavWhere
ckptshould be --> 100 ( or 99 depending on how you count...)and
v_ckptshould be --> 2.5M since I used our Universal Vocoder ( where I thought we took it a the 2.5M checkpoint...)OR should "
ckpt=" be changed for "step=" instead when creating the file name on synth to be more precise?Sam also had this comment : so I feel it is more that we don't have the proper definition of epoch, ckpt and step
Where I agree, and I am also wondering about those numbers when using multi-gpu / nodes while training.
Sam was also wondering how many GPUs were used while training our Vocoder / are we doing gradient accumulation?. ( cause we are seeing 2.5M and 5M in the model / not sure what it the right one to use?...)