-
Notifications
You must be signed in to change notification settings - Fork 254
Bug: T/S on server different than sweep bench? #1158
Copy link
Copy link
Closed
Description
What happened?
I pulled today's commits and use llama-server. PP is fine but TG is only 23t/s. Then I rerun sweep bench and see the same 30t/s as I did before.
Number of tokens and CTX is similar on both, as are the settings. Yesterday they were matching. Did something break?
Name and Version
main -head
What operating system are you seeing the problem on?
No response
Relevant log output
prompt eval time = 2466.85 ms / 1595 tokens ( 1.55 ms per token, 646.57 tokens per second)
eval time = 10721.79 ms / 249 tokens ( 43.06 ms per token, 23.22 tokens per second)
total time = 13188.64 ms / 1844 tokens
| PP | TG | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s |
|-------|--------|--------|----------|----------|----------|----------|
| 1024 | 256 | 0 | 1.569 | 652.55 | 8.321 | 30.77 |
| 1024 | 256 | 1024 | 1.534 | 667.61 | 8.437 | 30.34 |
| 1024 | 256 | 2048 | 1.550 | 660.72 | 8.515 | 30.06 |
| 1024 | 256 | 3072 | 1.565 | 654.20 | 8.598 | 29.77 |
| 1024 | 256 | 4096 | 1.580 | 648.03 | 8.788 | 29.13 |Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels