Skip to content

Handling k-Dimension Divisibility in Backward Matrix Multiplication for kv_a_proj_with_mqa (din=7168, dout=576) #93

@HIT-cwh

Description

@HIT-cwh

In the attn part of dsv3, there is a linear module with a shape of din = 7168 and dout = 576. When calculating dx in the backward pass, the matrix multiplication involves shapes like (seq_len, 576) @ (576, 7168). However, DeepGEMM imposes a restriction on the k dimension, requiring it to be divisible by 128 (reference: DeepGEMM gemm.py#L192).

I’d like to ask whether padding is necessary for the matrix multiplication to meet this requirement.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions