nn.Linear.bias shape inconsistency

As noted in `hw2.ipynb`:
> Be careful to explicitly broadcast the bias term to the correct shape -- Needle does not support implicit broadcasting.

And as noted in [this Forum discusion](https://forum.dlsyscourse.org/t/why-explicitly-broadcast-tensors/2376), needle does not support implicit broadcasting because such broadcasts are not tracked in computational graph - this leads to wrong gradient computations in backward pass.

As a result we need to explicitly broadcast bias tensor in `nn.Linear.forward()`.

I guess, there should be no restriction on what shape we use to store bias term after initialization - we will broadcast it anyway during forward pass. We can store bias term either in a 2D-tensor of shape `(1, out_features)` or in a 1D-tensor of shape `(out_features, )`.

However there is ambiguity in `test_nn_and_optim.py`.
* `test_nn_linear_bias_init_1()` asserts that bias is initialized as a 2D-tensor of shape `(1, out_features)`
* but `linear_forward()` and `linear_backward()` functions (that are used in `test_nn_linear_forward_*` and `test_nn_linear_backward_*` tests accordingly) assign 1D-tensor of shape `(out_features, )` to a bias: `f.bias.data = get_tensor(lhs_shape[-1])`<br>
  **Question**: By the way, why do we call `get_tensor` to assign a new value to a bias term? Can't we use `f.bias` value that was created during initialization?

I think we should allow bias to be any of valid shapes (`(1, out_features)` or  `(out_features, )`) in `test_nn_linear_bias_init_1()` test

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nn.Linear.bias shape inconsistency #8

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

nn.Linear.bias shape inconsistency #8

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions