Skip to content

Conversation

@Rohan138
Copy link
Contributor

@Rohan138 Rohan138 commented Nov 18, 2025

Motivation

While working on https://github.com/ROCm/vllm/pull/808/files, we've found that building the aiter wheel then linking it into the final docker build stage leads to a broken build, which does not occur when building the library in-place. cc @dllehr-amd @zhiding512

It looks like the CKtile 2stage MOE was not registered in pybind when it was ported over from 355_wip: https://github.com/ROCm/aiter/blob/355_wip/csrc/rocm_ops.cpp#L80; linking it fixes the issue.

Also updated the CK 2stage MOE to use FLATMM_HIP_CLANG_PATH.

Technical Details

Test Plan

Test Result

Submission Checklist

zhiding512
zhiding512 previously approved these changes Nov 18, 2025
Copy link
Contributor

@zhiding512 zhiding512 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Rohan138 Rohan138 force-pushed the ropotdar/fix_2stage_moe_jit branch from 1f88988 to c2ca6ac Compare November 18, 2025 07:44
@Rohan138 Rohan138 marked this pull request as ready for review November 18, 2025 07:47
Copilot AI review requested due to automatic review settings November 18, 2025 07:47
Copilot finished reviewing on behalf of Rohan138 November 18, 2025 07:52
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a missing pybind registration for the CKtile 2-stage MOE kernel that was not properly registered when ported from the 355_wip branch. The fix enables the CKtile MOE functions to be accessible from Python.

  • Adds the missing header include for moe_cktile2stages.h
  • Registers the MOE_CKTILE_2STAGES_PYBIND macro to expose cktile_moe_gemm1 and cktile_moe_gemm2 functions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Rohan138
Copy link
Contributor Author

Rohan138 commented Nov 19, 2025

@zhiding512 / @valarLip PTAL, can we merge to unblock vllm/355_wip? Unrelated CI failures due to git ssh outage earlier today

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants