Commit bd5176c
authored
add linear lr warmup and lr decay scheduler (#23)
this PR adds a linear lr scheduler and includes some automation based on
current best practices:
a - takes user lr provided in args as lr_max, and computes final min_lr
for the decay schedule based on lr / 10, per chinchilla paper. (i.e.
total decay will be one order of magnitude).
b - computes an automated linear warmup schedule of 10% total iters as
warmup, with min warmup of 2 steps.
c - computes a linear decay schedule after warmup, declining from lr_max
to lr_min over the end of warmup to end of training. (per Aarons latest
paper, linear is preferred schedule).
d - I updated learning rate to 8e-4, in order to provide more visible
per iter results to the user assuming debugModel.
LR scheduling produces much improved loss curve:
<img width="1052" alt="Screenshot 2024-01-28 at 6 39 34 PM"
src="https://github.com/pytorch-labs/torchtrain/assets/46302957/667e8520-809f-419e-bfdd-c3bb8f82ff95">
I added two log prints - the warmup schedule as one line, and then a
step and current lr at each iter.
Both could be disabled if too much info.1 parent 83ee9f7 commit bd5176c
2 files changed
+57
-5
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
| 24 | + | |
24 | 25 | | |
25 | 26 | | |
26 | 27 | | |
| |||
46 | 47 | | |
47 | 48 | | |
48 | 49 | | |
49 | | - | |
| 50 | + | |
50 | 51 | | |
51 | 52 | | |
52 | 53 | | |
| |||
85 | 86 | | |
86 | 87 | | |
87 | 88 | | |
88 | | - | |
89 | 89 | | |
| 90 | + | |
90 | 91 | | |
91 | 92 | | |
92 | 93 | | |
| |||
144 | 145 | | |
145 | 146 | | |
146 | 147 | | |
147 | | - | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
148 | 152 | | |
149 | 153 | | |
150 | 154 | | |
| |||
171 | 175 | | |
172 | 176 | | |
173 | 177 | | |
174 | | - | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
175 | 185 | | |
176 | | - | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
177 | 190 | | |
178 | 191 | | |
179 | 192 | | |
| |||
0 commit comments