Commit 892c090
authored
Change profile Run batch based on max_seq_len (vllm-project#415)
Changes the profile_run batches based on the max sequence length. This
avoids padding during prepare_prompt; thus avoiding breaking constraints
based on batch_size * seq_len <= max_num_batch_tokens.
Current logic for profile_run max_batch_size takes precedence.
e.g. - > max_batch_size = 256, max_num_batch_tokens = 2048, block_size =
128, max_seq_len = 1024
with current logic max_seq_len is updated as 8; however in
**prepare_prompt** seq_len is padded to 128, thus getting batch_size *
seq_len as 256 * 128 > max_num_batch_tokens; thus violating the above
mentioned constraint
with the updated logic, we calculate max_batch_size as 2, this avoids
the padding at **prepare_prompt**, thus keeping the constraints in
place.
Fixes: HabanaAI#4051 parent 3af4b6c commit 892c090
1 file changed
Lines changed: 2 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1306 | 1306 | | |
1307 | 1307 | | |
1308 | 1308 | | |
1309 | | - | |
1310 | | - | |
1311 | | - | |
1312 | | - | |
| 1309 | + | |
| 1310 | + | |
1313 | 1311 | | |
1314 | 1312 | | |
1315 | 1313 | | |
| |||
0 commit comments