Fix duplicated session_id when pipeline is used by multithreads #2134

irexyc · 2024-07-25T03:50:36Z

Motivation

When using pipeline.batch_infer / pipeline.stream_infer with multiple threads, the default session_ids are all start of zero which will makes batch inference impossible.

from threading import Thread
from lmdeploy import pipeline, GenerationConfig
pipe = pipeline('/mnt/140/InternLM/internlm2-chat-1_8b', log_level='INFO')


def work(ss):
  gen_config = GenerationConfig(ignore_eos=True, max_new_tokens=512)
  for i in range(10):
    for x in pipe.stream_infer('hello', gen_config=gen_config):
        pass


threads = []
for i in range(5):
  t = Thread(target=work, args=(i * 10, ))
  t.start()
  threads.append(t)

for t in threads:
  t.join()

lvhan028 · 2024-07-26T10:52:13Z

lmdeploy/serve/async_engine.py

    def batch_infer(
            self,
            prompts: Union[List[str], str, List[Dict], List[List[Dict]]],
+            session_ids: Union[List[int], int] = None,


Introducing session_ids will make the API hard to understand.
Is there any way we can handle it internally?

The stream_infer supports batch inputs, we need to match the inputs and outputs and we distinguish it by session_id currently. If we don't introduce the session_ids, we need to change the output like below which i present the ith input in the batch.

i, out = outputs.get(timeout=0.001) if out is None: break yield i, out

lvhan028 · 2024-07-28T04:04:47Z

@AllentDan Do you remember our motivation to support batch prompts inference in streaming mode?

AllentDan · 2024-07-29T03:07:49Z

@AllentDan Do you remember our motivation to support batch prompts inference in streaming mode?

Required by XTuner and community. #636

AllentDan · 2024-07-29T03:14:25Z

Users tend to input only one prompt in each thread in multithread situations. Shall we provide an infer function for it? The infer function is a synchronized function of async generate. It does not offer batch inference ability. Users can control the session id themselves.

lvhan028 · 2024-07-29T03:29:00Z

@AllentDan Do you remember our motivation to support batch prompts inference in streaming mode?

Required by XTuner and community. #636

It didn't make sense to me.
If users request streaming output for multiple prompts, can't we recommend they use async_stream_infer and pass the prompts one by one?

AllentDan · 2024-07-29T03:34:23Z

Maybe the user is not capable of using coroutine programming. I recommend they all use the generate function directly if possible.

lvhan028 · 2024-07-29T03:45:03Z

That's not my point
My point is that outputting batch prompt response in streaming mode is probably inappropriate.
Is it supported by vLLM?

lvhan028 · 2024-07-29T03:49:38Z

The root is we don't want users being bothered by session_ids in multithread scenarios.

This reverts commit 263e8cf.

This reverts commit 2b74d46.

lvhan028 · 2024-08-01T12:22:16Z

todo item after inner discussion:

not export session_ids in pipeline API but maintain an internally incremented global session_id
add index in response structure, indicating the id of the corresponding prompt in the prompt list

lmdeploy/messages.py

lvhan028 · 2024-08-01T15:23:52Z

lmdeploy/serve/async_engine.py

        self.gens_set = set()
        for i in range(self.instance_num):
            self.gens_set.add(self.engine.create_instance())
+        self._session_ids = count(0)


"self._session_id" is better. After all, it is only one int

AllentDan

LGTM

lvhan028 · 2024-08-08T04:00:58Z

May merge main so as to do the test

add session_ids arg for multithread use of pipeline.stream_infer

06d1392

lvhan028 requested review from AllentDan and lvhan028 July 25, 2024 15:13

lvhan028 added the improvement label Jul 25, 2024

lvhan028 reviewed Jul 26, 2024

View reviewed changes

irexyc added 2 commits July 30, 2024 21:54

Revert "disable peer access code (InternLM#2082)"

2b74d46

This reverts commit 263e8cf.

Revert "Revert "disable peer access code (InternLM#2082)""

ba2fe36

This reverts commit 2b74d46.

irexyc and others added 4 commits August 1, 2024 12:25

update

76c9fb9

add peer allocator

bba878f

fix lint

d65f198

check cuda error

7a3cf70

lvhan028 reviewed Aug 1, 2024

View reviewed changes

lmdeploy/messages.py Outdated Show resolved Hide resolved

lvhan028 reviewed Aug 1, 2024

View reviewed changes

irexyc and others added 4 commits August 2, 2024 02:43

fix comments

0aa0d49

fix wrong allocator

36ca39f

Merge remote-tracking branch 'origin/main' into mt-batch

a9e2a62

Merge remote-tracking branch 'zl/peer-allocator' into mt-batch

8e8a622

AllentDan approved these changes Aug 2, 2024

View reviewed changes

Merge remote-tracking branch 'origin/main' into mt-batch

793d38c

lvhan028 approved these changes Aug 8, 2024

View reviewed changes

lvhan028 added the BC-breaking label Aug 8, 2024

lvhan028 changed the title ~~add session_ids arg for multithread use of pipeline.stream_infer~~ Fix duplicated session_id when pipeline is used by multithreads Aug 8, 2024

lvhan028 merged commit c685f77 into InternLM:main Aug 8, 2024

Fix duplicated session_id when pipeline is used by multithreads #2134

Fix duplicated session_id when pipeline is used by multithreads #2134

Uh oh!

Conversation

irexyc commented Jul 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Uh oh!

lvhan028 Jul 26, 2024

Choose a reason for hiding this comment

Uh oh!

irexyc Jul 26, 2024

Choose a reason for hiding this comment

Uh oh!

lvhan028 commented Jul 28, 2024

Uh oh!

AllentDan commented Jul 29, 2024

Uh oh!

AllentDan commented Jul 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lvhan028 commented Jul 29, 2024

Uh oh!

AllentDan commented Jul 29, 2024

Uh oh!

lvhan028 commented Jul 29, 2024

Uh oh!

lvhan028 commented Jul 29, 2024

Uh oh!

lvhan028 commented Aug 1, 2024

Uh oh!

Uh oh!

lvhan028 Aug 1, 2024

Choose a reason for hiding this comment

Uh oh!

AllentDan left a comment

Choose a reason for hiding this comment

Uh oh!

lvhan028 commented Aug 8, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

irexyc commented Jul 25, 2024 •

edited

Loading

AllentDan commented Jul 29, 2024 •

edited

Loading