Question about throughput

Hi. Thank you for publishing this repository. Congratulations on the excellent work and well-written paper.

The paper says Mamba has a higher throughput than a Transformers model. To check this, I made a simple test to measure the number of tokens per second generated for Mamba model and the original model of the TinyStories paper, which is GPTNeo-based. The results may vary slightly with each run, but I observed that GPTNeo consistently has much faster inference speed compared to Mamba.

Both models have the same 33M parameters, yet **GPTNeo can generate ~6x times more tokens per second than Mamba**. Could you provide some insights into why Mamba is slower in this case? Perhaps there's something I may have missed?

The results are reproducible, and you can find more details in this [gist](https://gist.github.com/metalwhale/7a4745752e2f9f175f0dd71a87fbdc2c/dee3705e50b9b9545d4029556eeb101498de37e2) (I tested it on Google Colab):

Thank you in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question about throughput #90

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about throughput #90

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions