Skip to content

bug: Jan UI Bottlenecks Token Rendering Speed to ~300 TPS Despite Faster Cerebras API Output #6199

@Jasper-256

Description

@Jasper-256

Version: 0.6.6

Describe the Bug

When running inferance at very high token per second rates (like 1500 tokens per second) through an api like Cerebras, the Jan is not able to keep up. Jan cannot render more than 200 to 300 tokens per second on a good computer.

Steps to Reproduce

  1. Get a Cerebras api key at https://cloud.cerebras.ai/ (you can get some usage for free).
  2. Add Cerebras as a model provider in Jan with the base url of https://api.cerebras.ai/v1.
  3. Select any model like llama-3.3-70b from Cerebras.
  4. Observe that generation is limited to 200 or 300 tokens per second. This is not a limitation of the Cerebras API, rather it is the Jan UI failing to render tokens that fast.
  5. Try out Cerebras inferance to see how fast it normally generates tokens at https://inference.cerebras.ai/.

Screenshots / Logs

Image Image You can see that with the same model, the Cerebras API ui can show generated tokens in real time around 1600 tokens per second, but Jan is stuck at 238.

Operating System

  • MacOS
  • Windows
  • Linux

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions