-
Notifications
You must be signed in to change notification settings - Fork 634
Fix duplicated session_id when pipeline is used by multithreads #2134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
lmdeploy/serve/async_engine.py
Outdated
| def batch_infer( | ||
| self, | ||
| prompts: Union[List[str], str, List[Dict], List[List[Dict]]], | ||
| session_ids: Union[List[int], int] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Introducing session_ids will make the API hard to understand.
Is there any way we can handle it internally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The stream_infer supports batch inputs, we need to match the inputs and outputs and we distinguish it by session_id currently. If we don't introduce the session_ids, we need to change the output like below which i present the ith input in the batch.
i, out = outputs.get(timeout=0.001)
if out is None:
break
yield i, out
|
@AllentDan Do you remember our motivation to support batch prompts inference in streaming mode? |
Required by XTuner and community. #636 |
|
Users tend to input only one prompt in each thread in multithread situations. Shall we provide an |
It didn't make sense to me. |
|
Maybe the user is not capable of using coroutine programming. I recommend they all use the |
|
That's not my point |
|
The root is we don't want users being bothered by session_ids in multithread scenarios. |
|
todo item after inner discussion:
|
lmdeploy/serve/async_engine.py
Outdated
| self.gens_set = set() | ||
| for i in range(self.instance_num): | ||
| self.gens_set.add(self.engine.create_instance()) | ||
| self._session_ids = count(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"self._session_id" is better. After all, it is only one int
AllentDan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
May merge main so as to do the test |
Motivation
When using pipeline.batch_infer / pipeline.stream_infer with multiple threads, the default session_ids are all start of zero which will makes batch inference impossible.