Seeking Guidance on Momentum Term Initialization in Momentum Optimizer

I'm currently working on the implementation of the Momentum Optimizer and have a small question regarding the initialization of the momentum term. I would greatly appreciate your insights to help clarify my understanding.

---

### Background
In the standard momentum update formula:
v(t) = β * v(t-1) + (1 - β) * grad
where `β` is the momentum coefficient, the lecture mentioned that the momentum term should be initialized as **0** at the first iteration since there is no historical gradient information initially.

---

### My Question
While experimenting with the code, I noticed that manual initialization requires explicitly checking if a key exists. For example:
```python
if id(w) not in self.u:
    self.u[id(w)] = 0.0  
```
However, this does not pass the test case. When I initialize it as
```python
if id(w) not in self.u:
    self.u[id(w)] = (1 - self.momentum) * grad
```
it passed all the tests. My question is that the momentum term is initialized to `(1 - self.momentum) * grad` , which will cause the momentum term to directly contain the current gradient during the first update, rather than 0 as required by the standard momentum algorithm. This contradicts the core idea of ​​the momentum algorithm, because the momentum term should reflect the accumulation of historical gradients rather than directly introducing the scaling of the current gradient.

If anyone can see this and is willing to answer my questions, I would be very grateful, thank you very much!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seeking Guidance on Momentum Term Initialization in Momentum Optimizer #19

Background

My Question

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Seeking Guidance on Momentum Term Initialization in Momentum Optimizer #19

Description

Background

My Question

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions