Version: 0.6.6
Describe the Bug
When running inferance at very high token per second rates (like 1500 tokens per second) through an api like Cerebras, the Jan is not able to keep up. Jan cannot render more than 200 to 300 tokens per second on a good computer.
Steps to Reproduce
- Get a Cerebras api key at https://cloud.cerebras.ai/ (you can get some usage for free).
- Add Cerebras as a model provider in Jan with the base url of
https://api.cerebras.ai/v1.
- Select any model like
llama-3.3-70b from Cerebras.
- Observe that generation is limited to 200 or 300 tokens per second. This is not a limitation of the Cerebras API, rather it is the Jan UI failing to render tokens that fast.
- Try out Cerebras inferance to see how fast it normally generates tokens at https://inference.cerebras.ai/.
Screenshots / Logs

You can see that with the same model, the Cerebras API ui can show generated tokens in real time around 1600 tokens per second, but Jan is stuck at 238.
Operating System