Skip to content

Conversation

@tjohnson31415
Copy link
Contributor

Adds pipeline parallel support for DeepSeek v2.

Tested with https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct using --tensor-parallel-size 1 --pipeline-parallel-size 2

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only trigger fastcheck CI to run, which consists only a small and essential subset of tests to quickly catch errors with the flexibility to run extra individual tests on top (you can do this by unblocking test steps in the Buildkite run).

Full CI run is still required to merge this PR so once the PR is ready to go, please make sure to run it. If you need all test signals in between PR commits, you can trigger full CI as well.

To run full CI, you can do one of these:

  • Comment /ready on the PR
  • Add ready label to the PR
  • Enable auto-merge.

🚀

@youkaichao
Copy link
Member

@tjohnson31415
Copy link
Contributor Author

can you test the correctness locally, using https://github.com/vllm-project/vllm/blob/main/tests/distributed/test_pipeline_parallel.py ?

Sure. I edited the file to set the model to "deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct"and added --trust-remote-code to the launch args and was able to see the test_pipeline_parallel tests passing:

$ pytest -s tests/distributed/test_pipeline_parallel.py
...
.INFO:     Shutting down


=============================== warnings summary ===============================
my-vllm/lib64/python3.11/site-packages/transformers/utils/hub.py:127
  /workspace/my-vllm/lib64/python3.11/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=================== 5 passed, 1 warning in 393.58s (0:06:33) ===================
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [75971]
INFO 07-18 15:40:48 multiproc_worker_utils.py:136] Terminating local vLLM worker processes
(VllmWorkerProcess pid=76043) INFO 07-18 15:40:48 multiproc_worker_utils.py:237] Worker exiting
INFO 07-18 15:40:48 async_llm_engine.py:51] Engine is gracefully shutting down.
[rank0]:[W CudaIPCTypes.cpp:16] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
/usr/lib64/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

@tjohnson31415 tjohnson31415 force-pushed the deepseek-v2-pp branch 2 times, most recently from 568c1d9 to 2522798 Compare July 18, 2024 16:21
@tjohnson31415
Copy link
Contributor Author

Rebased to resolve conflict from main. Reran the tests and they pass still.

Comment on lines 409 to 436
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this lambda function will have prefix= shortly after #6515 .

Signed-off-by: Travis Johnson <[email protected]>
Copy link
Member

@youkaichao youkaichao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for addressing my comments! please test the correctness locally.

@tjohnson31415
Copy link
Contributor Author

please test the correctness locally.

I ran the updated test_pipeline_parallel.py pytest tests locally with the deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct model. It took a few tries to run through the whole suite. A couple times 1 test failed with RuntimeError: Server exited unexpectedly., but a different test each time. Rerunning the failed test by itself it would pass. The third time I got all 10 to pass:

====================================================== 10 passed, 1 warning in 764.67s (0:12:44) ======================================================

@youkaichao
Copy link
Member

Thanks, that might be caused by the flakiness of pp tests. I'll merge as this PR looks good to me now.

Thanks for your contribution!

@youkaichao youkaichao merged commit 507ef78 into vllm-project:main Jul 23, 2024
@tjohnson31415 tjohnson31415 deleted the deepseek-v2-pp branch July 23, 2024 20:15
xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 24, 2024
xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 24, 2024
Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024
LeiWang1999 pushed a commit to LeiWang1999/vllm-bitblas that referenced this pull request Mar 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants