Skip to content

Commit affeb88

Browse files
authored
Merge branch 'master' into autotp_training
2 parents 1e05996 + fd40516 commit affeb88

File tree

138 files changed

+496
-454
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

138 files changed

+496
-454
lines changed

.github/ISSUE_TEMPLATE/deepspeed_chat_bug_report.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ If applicable, add screenshots to help explain your problem.
3232
**System info (please complete the following information):**
3333
- OS: [e.g. Ubuntu 18.04]
3434
- GPU count and types [e.g. two machines with x8 A100s each]
35-
- (if applicable) what [DeepSpeed-MII](https://github.com/microsoft/deepspeed-mii) version are you using
35+
- (if applicable) what [DeepSpeed-MII](https://github.com/deepspeedai/deepspeed-mii) version are you using
3636
- (if applicable) Hugging Face Transformers/Accelerate/etc. versions
3737
- Python version
3838
- Any other relevant info about your setup

.github/ISSUE_TEMPLATE/inference_bug_report.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ If applicable, add screenshots to help explain your problem.
2929
**System info (please complete the following information):**
3030
- OS: [e.g. Ubuntu 18.04]
3131
- GPU count and types [e.g. two machines with x8 A100s each]
32-
- (if applicable) what [DeepSpeed-MII](https://github.com/microsoft/deepspeed-mii) version are you using
32+
- (if applicable) what [DeepSpeed-MII](https://github.com/deepspeedai/deepspeed-mii) version are you using
3333
- (if applicable) Hugging Face Transformers/Accelerate/etc. versions
3434
- Python version
3535
- Any other relevant info about your setup

.github/workflows/nv-a6000.yml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ jobs:
2323
unit-tests:
2424
runs-on: [self-hosted, nvidia, a6000]
2525
container:
26-
image: nvcr.io/nvidia/pytorch:24.03-py3
26+
image: nvcr.io/nvidia/pytorch:24.09-py3
2727
ports:
2828
- 80
2929
options: --gpus all --shm-size "8G"
@@ -57,16 +57,16 @@ jobs:
5757
run: |
5858
unset TORCH_CUDA_ARCH_LIST # only jit compile for current arch
5959
cd tests
60-
python -m pytest --color=yes --durations=0 --verbose -rF -m 'inference_v2' unit/ --torch_ver="2.3" --cuda_ver="12"
61-
python -m pytest --color=yes --durations=0 --verbose -rF -m 'inference_v2_ops' unit/ --torch_ver="2.3" --cuda_ver="12"
60+
python -m pytest --color=yes --durations=0 --verbose -rF -m 'inference_v2' unit/ --torch_ver="2.5" --cuda_ver="12"
61+
python -m pytest --color=yes --durations=0 --verbose -rF -m 'inference_v2_ops' unit/ --torch_ver="2.5" --cuda_ver="12"
6262
- name: MII unit tests
6363
run: |
6464
BRANCH="main"
6565
if [[ ! -z "${{ github.event.inputs.mii_branch }}" ]]; then
6666
BRANCH="${{ github.event.inputs.mii_branch }}"
6767
fi
6868
echo "Cloning DeepSpeed-MII branch: $BRANCH"
69-
git clone -b $BRANCH --depth=1 https://github.com/microsoft/DeepSpeed-MII.git
69+
git clone -b $BRANCH --depth=1 https://github.com/deepspeedai/DeepSpeed-MII.git
7070
cd DeepSpeed-MII
7171
pip install .[dev]
7272
cd tests

.github/workflows/nv-ds-chat.yml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ jobs:
3737

3838
- name: Install pytorch
3939
run: |
40-
pip3 install -U --cache-dir $TORCH_CACHE torch --index-url https://download.pytorch.org/whl/cu121
40+
pip install -U --cache-dir $TORCH_CACHE torch torchvision --index-url https://download.pytorch.org/whl/cu121
4141
python -c "import torch; print('torch:', torch.__version__, torch)"
4242
python -c "import torch; print('CUDA available:', torch.cuda.is_available())"
4343
@@ -54,7 +54,7 @@ jobs:
5454
BRANCH="${{ github.event.inputs.dse_branch }}"
5555
fi
5656
echo "DeepSpeedExamples Branch: $BRANCH"
57-
git clone -b $BRANCH https://github.com/microsoft/DeepSpeedExamples.git
57+
git clone -b $BRANCH https://github.com/deepspeedai/DeepSpeedExamples.git
5858
cd DeepSpeedExamples/applications/DeepSpeed-Chat
5959
pip install -r requirements.txt
6060
pip install -e .
@@ -67,6 +67,7 @@ jobs:
6767
run: |
6868
cd DeepSpeedExamples/applications/DeepSpeed-Chat
6969
unset TORCH_CUDA_ARCH_LIST # only jit compile for current arch
70+
unset NCCL_DEBUG
7071
cd tests
7172
pytest $PYTEST_OPTS ./
7273

.github/workflows/nv-flash-attn.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ jobs:
1818
unit-tests:
1919
runs-on: [self-hosted, nvidia, a6000]
2020
container:
21-
image: nvcr.io/nvidia/pytorch:24.03-py3
21+
image: nvcr.io/nvidia/pytorch:24.09-py3
2222
ports:
2323
- 80
2424
options: --gpus all --shm-size "8G"
@@ -53,7 +53,7 @@ jobs:
5353
run: |
5454
unset TORCH_CUDA_ARCH_LIST # only jit compile for current arch
5555
cd tests
56-
python -m pytest --color=yes --durations=0 --verbose -rF unit/sequence_parallelism/test_ulysses.py --torch_ver="2.3" --cuda_ver="12"
56+
python -m pytest --color=yes --durations=0 --verbose -rF unit/sequence_parallelism/test_ulysses.py --torch_ver="2.5" --cuda_ver="12"
5757
- name: Open GitHub issue if nightly CI fails
5858
if: ${{ failure() && (github.event_name == 'schedule') }}
5959
uses: JasonEtco/create-an-issue@v2

.github/workflows/nv-human-eval.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ jobs:
1111
unit-tests:
1212
runs-on: [self-hosted, nvidia, a6000]
1313
container:
14-
image: nvcr.io/nvidia/pytorch:24.03-py3
14+
image: nvcr.io/nvidia/pytorch:24.09-py3
1515
ports:
1616
- 80
1717
options: --gpus all --shm-size "8G"
@@ -50,4 +50,4 @@ jobs:
5050
run: |
5151
unset TORCH_CUDA_ARCH_LIST # only jit compile for current arch
5252
cd tests
53-
python -m pytest --color=yes --durations=0 --verbose -rF -m 'evaluation' -k "test_human_eval" unit/ --torch_ver="2.3" --cuda_ver="12"
53+
python -m pytest --color=yes --durations=0 --verbose -rF -m 'evaluation' -k "test_human_eval" unit/ --torch_ver="2.5" --cuda_ver="12"

.github/workflows/nv-mii.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ jobs:
6666
BRANCH="${{ github.event.inputs.mii_branch }}"
6767
fi
6868
echo "Cloning DeepSpeed-MII branch: $BRANCH"
69-
git clone -b $BRANCH --depth=1 https://github.com/microsoft/DeepSpeed-MII.git
69+
git clone -b $BRANCH --depth=1 https://github.com/deepspeedai/DeepSpeed-MII.git
7070
cd DeepSpeed-MII
7171
pip install .[dev]
7272
unset TORCH_CUDA_ARCH_LIST # only jit compile for current arch

CONTRIBUTING.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ and then repeat the previous `git commit` command.
2323
## Testing
2424
DeepSpeed tracks two types of tests: unit tests and more costly model convergence tests.
2525
The model convergence tests train
26-
[DeepSpeedExamples](https://github.com/microsoft/DeepSpeedExamples/) and measure
26+
[DeepSpeedExamples](https://github.com/deepspeedai/DeepSpeedExamples/) and measure
2727
end-to-end convergence and related metrics. Unit tests are found in `tests/unit/` and
2828
the model convergence tests are found in `tests/model/`.
2929

@@ -40,7 +40,7 @@ tests. Note that [pytest-forked](https://github.com/pytest-dev/pytest-forked) an
4040

4141
### Model Tests
4242
To execute model tests, first [install DeepSpeed](#installation). The
43-
[DeepSpeedExamples](https://github.com/microsoft/DeepSpeedExamples/) repository is cloned
43+
[DeepSpeedExamples](https://github.com/deepspeedai/DeepSpeedExamples/) repository is cloned
4444
as part of this process. Next, execute the model test driver:
4545
```bash
4646
cd tests/model/
@@ -85,8 +85,8 @@ Based on the issue we shall discuss the merit of the new feature and decide whet
8585
### Step 2: implementation and verification
8686
Contributor will go ahead and implement the feature, and the DeepSpeed team will provide guidance/helps as needed. The required deliverables include:
8787

88-
* A PR to [microsoft/DeepSpeed](https://github.com/microsoft/DeepSpeed) including (1) the feature implementation (2) unit tests (3) documentation (4) tutorial
89-
* A PR to [microsoft/DeepSpeedExamples](https://github.com/microsoft/DeepSpeedExamples) or [microsoft/Megatron-DeepSpeed](https://github.com/microsoft/Megatron-DeepSpeed) including the examples of how to use the feature (this is related to the planned testing experiments in proposal)
88+
* A PR to [deepspeedai/DeepSpeed](https://github.com/deepspeedai/DeepSpeed) including (1) the feature implementation (2) unit tests (3) documentation (4) tutorial
89+
* A PR to [deepspeedai/DeepSpeedExamples](https://github.com/deepspeedai/DeepSpeedExamples) or [deepspeedai/Megatron-DeepSpeed](https://github.com/deepspeedai/Megatron-DeepSpeed) including the examples of how to use the feature (this is related to the planned testing experiments in proposal)
9090
* In the implementation (code, documentation, tutorial), we require the feature author to record their GitHub username as a contact method for future questions/maintenance.
9191

9292
After receiving the PRs, we will review them and merge them after necessary tests/fixes.

0 commit comments

Comments
 (0)