Skip to content

everyvoice synthesize default file name ( ckpt or step?) #411

@marctessier

Description

@marctessier

Had a question on on slack, copying it here to keep track / assign to @SamuelLarkin from the thread ( Thank you)


I have a question about how we are doing checkpointing ckpt_epochs: 1 in EV.
For example, I have a test where I set max_epochs: 100 and training is done.
Logs look like this below .

(EveryVoice) [U20-GPSC7]:$ ll logs_and_checkpoints/FeaturePredictionExperiment/base/checkpoints/
total 1.5G
-rw-r----- 1 tes001 nrc_ict 209M Apr 29 16:18 'epoch=87-step=1144.ckpt'
-rw-r----- 1 tes001 nrc_ict 209M Apr 29 16:18 'epoch=93-step=1222.ckpt'
-rw-r----- 1 tes001 nrc_ict 209M Apr 29 16:18 'epoch=96-step=1261.ckpt'
-rw-r----- 1 tes001 nrc_ict 209M Apr 29 16:18 'epoch=97-step=1274.ckpt'
-rw-r----- 1 tes001 nrc_ict 209M Apr 29 16:18 'epoch=99-step=1300.ckpt'
-rw-r----- 1 tes001 nrc_ict 209M Apr 29 16:18 'epoch=99-step=1300-v1.ckpt'
-rw-r----- 1 tes001 nrc_ict 209M Apr 29 16:18  last.ckpt

I generated an audio file using that last.ckpt and this was the filename created below after synth..:

synthesis_output/wav/made-certain-recomme-9229d5cf--default--eng--ckpt\=1300--v_ckpt\=5000000--pred.wav

Question, is it a bug in everyvoice synthesize where that file name should have been something like this instead:
made-certain-recomme-9229d5cf--default--eng--ckpt\=100--v_ckpt\=2500000--pred.wav

Where ckpt should be --> 100 ( or 99 depending on how you count...)
and v_ckpt should be --> 2.5M since I used our Universal Vocoder ( where I thought we took it a the 2.5M checkpoint...)
OR should "ckpt=" be changed for "step=" instead when creating the file name on synth to be more precise?

Sam also had this comment : so I feel it is more that we don't have the proper definition of epoch, ckpt and step

Where I agree, and I am also wondering about those numbers when using multi-gpu / nodes while training.

Sam was also wondering how many GPUs were used while training our Vocoder / are we doing gradient accumulation?. ( cause we are seeing 2.5M and 5M in the model / not sure what it the right one to use?...)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingdocumentationImprovements or additions to documentation

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions