Skip to content

Bug: T/S on server different than sweep bench? #1158

@Ph0rk0z

Description

@Ph0rk0z

What happened?

I pulled today's commits and use llama-server. PP is fine but TG is only 23t/s. Then I rerun sweep bench and see the same 30t/s as I did before.

Number of tokens and CTX is similar on both, as are the settings. Yesterday they were matching. Did something break?

Name and Version

main -head

What operating system are you seeing the problem on?

No response

Relevant log output

prompt eval time =    2466.85 ms /  1595 tokens (    1.55 ms per token,   646.57 tokens per second)
       eval time =   10721.79 ms /   249 tokens (   43.06 ms per token,    23.22 tokens per second)
      total time =   13188.64 ms /  1844 tokens


|    PP |     TG |   N_KV |   T_PP s | S_PP t/s |   T_TG s | S_TG t/s |
|-------|--------|--------|----------|----------|----------|----------|
|  1024 |    256 |      0 |    1.569 |   652.55 |    8.321 |    30.77 |
|  1024 |    256 |   1024 |    1.534 |   667.61 |    8.437 |    30.34 |
|  1024 |    256 |   2048 |    1.550 |   660.72 |    8.515 |    30.06 |
|  1024 |    256 |   3072 |    1.565 |   654.20 |    8.598 |    29.77 |
|  1024 |    256 |   4096 |    1.580 |   648.03 |    8.788 |    29.13 |

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions