add deepseek ep moe tune config #1431

junhaha666 · 2025-11-18T13:06:18Z

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Copilot

Pull Request Overview

This PR adds 7 new tuned configuration entries for DeepSeek EP MOE (Mixture of Experts) to the performance tuning database.

Adds configurations for varying token sizes (16, 32, 64, 128, 256, 512, 1024) for a specific DeepSeek EP MOE architecture
New configurations use inter_dim=2048, expert=33, and topk=10, distinguishing them from previous entries

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

valarLip · 2025-11-19T02:08:26Z

aiter/configs/tuned_fmoe.csv

+80,512,7168,2048,33,10,ActivationType.Silu,torch.bfloat16,torch.float8_e4m3fnuz,torch.float8_e4m3fnuz,QuantType.per_Token,1,0,64,0,865.1256,_ZN5aiter44fmoe_stage1_bf16_pertokenFp8_g1u1_64x256_pf3E,0.0%,755.7812,moe_ck2stages_gemm2_256x64x128x256_1x4_MulABScaleExpertWeight_v3_Nswizzle0_Quant2_MulRoutedWeight1_F8_F8_B16,0.0%,1620.9068,0,278.22,903.41
+80,1024,7168,2048,33,10,ActivationType.Silu,torch.bfloat16,torch.float8_e4m3fnuz,torch.float8_e4m3fnuz,QuantType.per_Token,1,0,64,0,1623.3044,_ZN5aiter44fmoe_stage1_bf16_pertokenFp8_g1u1_64x256_pf3E,0.0%,1428.8629,moe_ck2stages_gemm2_256x64x128x128_1x4_MulABScaleExpertWeight_v3_Nswizzle0_Quant2_MulRoutedWeight1_F8_F8_B16,0.1%,3052.1673,0,295.51,483.38
+80,16,7168,2048,33,10,ActivationType.Silu,torch.bfloat16,torch.float8_e4m3fnuz,torch.float8_e4m3fnuz,QuantType.per_1x128,1,0,16,0,274.0603,_ZN5aiter59fmoe_stage1_bf16_pertokenFp8_blockscale_g1u1_16x256_2tg_pf3E,4.9%,150.3324,moe_ck2stages_gemm2_256x16x128x256_1x4_MulABScaleExpertWeightA8W8blkscale_v1_Nswizzle0_Quant4_MulRoutedWeight1_F8_F8_B16,0.3%,424.3927,0,33.21,3425.3
+80,32,7168,2048,33,10,ActivationType.Silu,torch.bfloat16,torch.float8_e4m3fnuz,torch.float8_e4m3fnuz,QuantType.per_1x128,1,0,16,0,359.0112,moe_ck2stages_gemm1_256x16x128x256_1x4_MulABScaleExpertWeightA8W8blkscale_v1_Nswizzle0_Quant4_MulRoutedWeight0_silu_F8_F8_B16,0.0%,190.8827,moe_ck2stages_gemm2_256x16x128x256_1x4_MulABScaleExpertWeightA8W8blkscale_v1_Nswizzle0_Quant4_MulRoutedWeight1_F8_F8_B16,0.2%,549.8939,0,51.26,2644.17


no 2 stage solution should be here

Copilot AI review requested due to automatic review settings November 18, 2025 13:06

Copilot started reviewing on behalf of junhaha666 November 18, 2025 13:07 View session

add ptpc deepseek ep moe tuned config

b0dec7d

junhaha666 force-pushed the ds_ep_moe_tune_config branch from 406539e to b0dec7d Compare November 18, 2025 13:08

Copilot finished reviewing on behalf of junhaha666 November 18, 2025 13:08

Copilot AI reviewed Nov 18, 2025

View reviewed changes

add block deepseek ep moe tune config

e29b316

valarLip reviewed Nov 19, 2025

View reviewed changes

using 1stage moe for ptpc deepseek

94838df

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add deepseek ep moe tune config #1431

add deepseek ep moe tune config #1431

junhaha666 commented Nov 18, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

valarLip Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

add deepseek ep moe tune config #1431

Are you sure you want to change the base?

add deepseek ep moe tune config #1431

Conversation

junhaha666 commented Nov 18, 2025

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

valarLip Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants