Skip to content

Conversation

@junhaha666
Copy link
Contributor

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Copilot AI review requested due to automatic review settings November 18, 2025 13:06
@junhaha666 junhaha666 force-pushed the ds_ep_moe_tune_config branch from 406539e to b0dec7d Compare November 18, 2025 13:08
Copilot finished reviewing on behalf of junhaha666 November 18, 2025 13:08
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds 7 new tuned configuration entries for DeepSeek EP MOE (Mixture of Experts) to the performance tuning database.

  • Adds configurations for varying token sizes (16, 32, 64, 128, 256, 512, 1024) for a specific DeepSeek EP MOE architecture
  • New configurations use inter_dim=2048, expert=33, and topk=10, distinguishing them from previous entries

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

80,512,7168,2048,33,10,ActivationType.Silu,torch.bfloat16,torch.float8_e4m3fnuz,torch.float8_e4m3fnuz,QuantType.per_Token,1,0,64,0,865.1256,_ZN5aiter44fmoe_stage1_bf16_pertokenFp8_g1u1_64x256_pf3E,0.0%,755.7812,moe_ck2stages_gemm2_256x64x128x256_1x4_MulABScaleExpertWeight_v3_Nswizzle0_Quant2_MulRoutedWeight1_F8_F8_B16,0.0%,1620.9068,0,278.22,903.41
80,1024,7168,2048,33,10,ActivationType.Silu,torch.bfloat16,torch.float8_e4m3fnuz,torch.float8_e4m3fnuz,QuantType.per_Token,1,0,64,0,1623.3044,_ZN5aiter44fmoe_stage1_bf16_pertokenFp8_g1u1_64x256_pf3E,0.0%,1428.8629,moe_ck2stages_gemm2_256x64x128x128_1x4_MulABScaleExpertWeight_v3_Nswizzle0_Quant2_MulRoutedWeight1_F8_F8_B16,0.1%,3052.1673,0,295.51,483.38
80,16,7168,2048,33,10,ActivationType.Silu,torch.bfloat16,torch.float8_e4m3fnuz,torch.float8_e4m3fnuz,QuantType.per_1x128,1,0,16,0,274.0603,_ZN5aiter59fmoe_stage1_bf16_pertokenFp8_blockscale_g1u1_16x256_2tg_pf3E,4.9%,150.3324,moe_ck2stages_gemm2_256x16x128x256_1x4_MulABScaleExpertWeightA8W8blkscale_v1_Nswizzle0_Quant4_MulRoutedWeight1_F8_F8_B16,0.3%,424.3927,0,33.21,3425.3
80,32,7168,2048,33,10,ActivationType.Silu,torch.bfloat16,torch.float8_e4m3fnuz,torch.float8_e4m3fnuz,QuantType.per_1x128,1,0,16,0,359.0112,moe_ck2stages_gemm1_256x16x128x256_1x4_MulABScaleExpertWeightA8W8blkscale_v1_Nswizzle0_Quant4_MulRoutedWeight0_silu_F8_F8_B16,0.0%,190.8827,moe_ck2stages_gemm2_256x16x128x256_1x4_MulABScaleExpertWeightA8W8blkscale_v1_Nswizzle0_Quant4_MulRoutedWeight1_F8_F8_B16,0.2%,549.8939,0,51.26,2644.17
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no 2 stage solution should be here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants