Skip to content

nn.Linear.bias shape inconsistency #8

@navalnica

Description

@navalnica

As noted in hw2.ipynb:

Be careful to explicitly broadcast the bias term to the correct shape -- Needle does not support implicit broadcasting.

And as noted in this Forum discusion, needle does not support implicit broadcasting because such broadcasts are not tracked in computational graph - this leads to wrong gradient computations in backward pass.

As a result we need to explicitly broadcast bias tensor in nn.Linear.forward().

I guess, there should be no restriction on what shape we use to store bias term after initialization - we will broadcast it anyway during forward pass. We can store bias term either in a 2D-tensor of shape (1, out_features) or in a 1D-tensor of shape (out_features, ).

However there is ambiguity in test_nn_and_optim.py.

  • test_nn_linear_bias_init_1() asserts that bias is initialized as a 2D-tensor of shape (1, out_features)
  • but linear_forward() and linear_backward() functions (that are used in test_nn_linear_forward_* and test_nn_linear_backward_* tests accordingly) assign 1D-tensor of shape (out_features, ) to a bias: f.bias.data = get_tensor(lhs_shape[-1])

    Question: By the way, why do we call get_tensor to assign a new value to a bias term? Can't we use f.bias value that was created during initialization?

I think we should allow bias to be any of valid shapes ((1, out_features) or (out_features, )) in test_nn_linear_bias_init_1() test

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions