Skip to content

[Issue]: MORI-EP bug on MI300X+CX7 #168

@TianDi101

Description

@TianDi101

Problem Description

Stress test failed on MI300X+CX7 due to hang issue.
For benchmark, we met this error
/home/ditian12/mori/src/ops/dispatch_combine/internode_v1.cpp:375: void mori::moe::v1::DispatchInterNodeRecv(EpDispatchCombineArgs<T> &) [T = hip_bfloat16]: Device-side assertion (lanePe < config.worldSize) && (lanePe >= 0)' failed.`

Operating System

Ubuntu

CPU

AMD

GPU

MI300X

ROCm Version

ROCm-7.0.0

ROCm Component

No response

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions