Implement bi-directionality by yair-schiff · Pull Request #52 · state-spaces/mamba

yair-schiff · 2023-12-13T04:01:42Z

Edit:

Implement bi-directionality by applying Mamba module twice: (1) to the forward sequence and (2) to the backward sequence.
Implement 3 2 strategies for combining forward / backward Mamba hidden states:
1. add: Add the states.
2. ~~concat: Concatenate the states. This doubles the hidden dimension,d_model, which also prevents weight tying between embedding and lm_head weights.~~
3. ew_multiply: perform element-wise multiplication between the states.

mamba_ssm/models/mixer_seq_simple.py

Skylion007

Left some nits

mamba_ssm/models/mixer_seq_simple.py

sentialx · 2023-12-24T11:57:20Z

What if the sequences have paddings? E.g.
Input is
[1 2 3 0 0 0]
So flipped input would be
[0 0 0 3 2 1].
Shouldn't it be
[3 2 1 0 0 0]?

yair-schiff · 2023-12-24T16:01:43Z

@sentialx , agreed. That's a good catch.

jimmieliu · 2024-01-02T08:29:28Z

how the speed compares to uni-directional?

yair-schiff · 2024-01-03T15:53:25Z

how the speed compares to uni-directional?

@jimmieliu, it's about 2x

pengzhangzhi · 2024-01-24T02:32:02Z

@yair-schiff I am just curious, did you solve the

What if the sequences have paddings? E.g. Input is [1 2 3 0 0 0] So flipped input would be [0 0 0 3 2 1]. Shouldn't it be [3 2 1 0 0 0]?

Just curious, is this problem solved?

pengzhangzhi · 2024-01-25T19:30:59Z

I came up with a solution to the padding issue. Say a tensor [1,2,3,0,0], where 0 is the padding token. We flip it to get [0,0,1,2,3], pass it to the network and flip it back. Therefore, the flipped tensor information matches the original tensor order as we apply double flips.

given: x
out = x + f(x.flip()).flip()

xuanwuji · 2024-07-13T03:15:01Z

I came up with a solution to the padding issue. Say a tensor [1,2,3,0,0], where 0 is the padding token. We flip it to get [0,0,1,2,3], pass it to the network and flip it back. Therefore, the flipped tensor information matches the original tensor order as we apply double flips.
given: x
out = x + f(x.flip()).flip()

Hi, Your approach is clever! But I have a question: if you flip the input to [0,0,1,2,3], does the padding in front of it affect sequence hidden features learning? i.e., does it produce a different result(bad repersentation of sequence) than the input of [3,2,1,0,0]?
I don't know enough about it, could you possibly give me some guidance? This will help me a lot. Thank you very much!

Museum7432 · 2024-07-14T10:29:24Z

@xuanwuji well, you can remove the leading paddings by shifting each row of x before flipping x. As for its effect, since the hidden state is initialized with 0, it should still be filled with 0 after scanning through the paddings. So, those padding shouldn't have any effect on the result. However, you can use the following function just to be sure.

def flip_padded_hidden_states(hidden_states, seq_lens):
    batch_size, seq_len, hidden_dim = hidden_states.shape

    indices = torch.arange(batch_size * seq_len, device=hidden_states.device).reshape(
        batch_size, seq_len
    )

    indices_offset = seq_len - seq_lens

    indices = (indices - indices_offset.unsqueeze(1)) % (seq_len * batch_size)

    indices = indices.flip(1)

    return hidden_states.reshape(batch_size * seq_len, hidden_dim)[indices]

To check the effect of paddings:

import torch
from mamba_ssm import Mamba2, Mamba
from torch.nn import functional as F

batch, length, dim = 2, 64, 16

model = Mamba(
    d_model=dim, # Model dimension d_model
    d_state=16,  # SSM state expansion factor
    d_conv=4,    # Local convolution width
    expand=2,    # Block expansion factor
).to("cuda")

x = torch.randn(batch, length, dim).to("cuda")
padded_x = F.pad(x, (0,0, 4,0))

y = model(x)
padded_y = model(padded_x)

unpadded_y = padded_y[:,4:]

print(f'Output max diff: {(unpadded_y - y).abs().max().item()}')
print(f'Output mean diff: {(unpadded_y - y).abs().mean().item()}')

However, these errors do stack after multiple layers, so you should use the flip_padded_hidden_states function just to be certain.

yair-schiff added 5 commits December 12, 2023 22:44

Implement bi-directionality

8d84478

Flip strategy checks

107eb60

Refactor bi-directionality to use Mamba thin wrapper

e6ca69c

Move wrapper to models and remove concat strategy

1ce8522

Remove unused import

52f57d6

yair-schiff force-pushed the bidirectional branch from 7692f59 to 52f57d6 Compare December 20, 2023 21:23

Skylion007 reviewed Dec 20, 2023

View reviewed changes

mamba_ssm/models/mixer_seq_simple.py Show resolved Hide resolved

Skylion007 reviewed Dec 20, 2023

View reviewed changes

mamba_ssm/models/mixer_seq_simple.py Show resolved Hide resolved

Use kwargs to avoid boilerplate

f45df27

albertfgu mentioned this pull request Jan 11, 2024

Bidirectional model? #99

Open

zigzagcai mentioned this pull request Mar 19, 2024

[Feature] Support variable-length sequences for mamba block #244

Open

albertfgu force-pushed the main branch 2 times, most recently from 6d45666 to 41d30ce Compare June 3, 2024 12:56

stefan-it mentioned this pull request Mar 19, 2025

Bidirectionality NX-AI/xlstm#75

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement bi-directionality#52

Implement bi-directionality#52
yair-schiff wants to merge 6 commits intostate-spaces:mainfrom
yair-schiff:bidirectional

yair-schiff commented Dec 13, 2023 •

edited

Loading

Uh oh!

Uh oh!

Skylion007 left a comment

Uh oh!

Uh oh!

sentialx commented Dec 24, 2023

Uh oh!

yair-schiff commented Dec 24, 2023

Uh oh!

jimmieliu commented Jan 2, 2024

Uh oh!

yair-schiff commented Jan 3, 2024 •

edited

Loading

Uh oh!

pengzhangzhi commented Jan 24, 2024

Uh oh!

pengzhangzhi commented Jan 25, 2024

Uh oh!

xuanwuji commented Jul 13, 2024

Uh oh!

Museum7432 commented Jul 14, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

yair-schiff commented Dec 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Skylion007 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sentialx commented Dec 24, 2023

Uh oh!

yair-schiff commented Dec 24, 2023

Uh oh!

jimmieliu commented Jan 2, 2024

Uh oh!

yair-schiff commented Jan 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pengzhangzhi commented Jan 24, 2024

Uh oh!

pengzhangzhi commented Jan 25, 2024

Uh oh!

xuanwuji commented Jul 13, 2024

Uh oh!

Museum7432 commented Jul 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

yair-schiff commented Dec 13, 2023 •

edited

Loading

yair-schiff commented Jan 3, 2024 •

edited

Loading

Museum7432 commented Jul 14, 2024 •

edited

Loading