UPSTREAM PR #17263: server : fix "can batch with" bug#207
UPSTREAM PR #17263: server : fix "can batch with" bug#207
Conversation
5c86b47 to
ef7ca13
Compare
|
Access the complete analysis in the LOCI Dashboard Performance Analysis SummaryOverviewAnalysis of version Key FindingsPerformance Metrics:
Core Function Impact: Power Consumption Analysis:
Assembly and Control Flow Analysis: GitHub Code Review Insights: Conclusion: |
d9d7e55 to
f333350
Compare
92ef8cd to
7dd50b8
Compare
Mirrored from ggml-org/llama.cpp#17263
While looking into #17260, found this error in the logic:
The
slot_batchedcould end up being released (for example if the prompt does not fit into the context). The fix is to setslot_batchedptr only after we have actually queued any tokens for that slot.