-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
Hi. Thank you for publishing this repository. Congratulations on the excellent work and well-written paper.
The paper says Mamba has a higher throughput than a Transformers model. To check this, I made a simple test to measure the number of tokens per second generated for Mamba model and the original model of the TinyStories paper, which is GPTNeo-based. The results may vary slightly with each run, but I observed that GPTNeo consistently has much faster inference speed compared to Mamba.
Both models have the same 33M parameters, yet GPTNeo can generate ~6x times more tokens per second than Mamba. Could you provide some insights into why Mamba is slower in this case? Perhaps there's something I may have missed?
The results are reproducible, and you can find more details in this gist (I tested it on Google Colab):
Thank you in advance!