[WIP] fix check_batch_tkv_limit_cp conservatively#961
[WIP] fix check_batch_tkv_limit_cp conservatively#961sducouedic merged 1 commit intotorch-spyre:mainfrom
Conversation
…nt for all edge cases Signed-off-by: Yannick Schnider <Yannick.Schnider1@ibm.com>
|
👋 Hi! Thank you for contributing. We also recommend installing prek and configuring it to check your code before every local commit. |
|
bot:bench |
|
bot:bench |
|
bot:test |
|
bot:bench |
| new_req_max_tkv = round_up_to_block_size(new_req_tkv + request.max_tokens - 1) | ||
| # Extra block of slack: left-padding can push a sequence's runtime tkv up to | ||
| # one block past the scheduler's estimate when the batch re-aligns on admission. | ||
| new_req_max_tkv += self.block_size |
There was a problem hiding this comment.
dang, yeah my reasoning on only needing to round up was that the padding only happens when a request actually gets to the next block, so if our math is correct here then you'd only get to the next block if some request was going to end inside it, in which case that request would round up and cover all the cases of other requests getting padded to somewhere within that block.
But clearly that reasoning isn't sound 🙃
added an extra block of slack (left-padding can push a sequence's runtime tkv up to one block past the scheduler's estimate when the batch re-aligns on admission) for both new and decode requests.