Skip to content

Conversation

@xiaobochen123
Copy link
Contributor

Motivation

fix moe ep bug, when load fp8 model. Links to related issues link

Test model : neuralmagic/DeepSeek-Coder-V2-Instruct-FP8

Accuracy: 0.932
Invalid: 0.000
Latency: 243.824 s
Output throughput: 1027.530 token/s

cc: @ispobock

Modifications

Checklist

  • Format your code according to the Contributor Guide.
  • Add unit tests as outlined in the Contributor Guide.
  • Update documentation as needed, including docstrings or example tutorials.

@zhyncs zhyncs changed the title fix moe-ep bug fix moe-ep accuracy issue for fp8 Dec 16, 2024
@zhyncs zhyncs merged commit b532a5f into sgl-project:main Dec 16, 2024
15 checks passed
timethink pushed a commit to timethink/sglang that referenced this pull request Mar 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants