-
Notifications
You must be signed in to change notification settings - Fork 5
feat(core): add pooling model initial support for V1 engine #152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
| num_scheduled_tokens: int, | ||
| num_scheduled_tokens_np: np.ndarray, | ||
| kv_connector_output: Optional[KVConnectorOutput], | ||
| ) -> ModelRunnerOutput: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m curious whether any modifications were made from the GPU's _pool().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are any changes, I’d appreciate it if you could leave a comment on the corresponding code snippets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current initial implementation has been designed to run pooler on the CPU in the same manner as the GPU code.
However, certain poolers require significant computation(like lm head). Moving these to the rbln will likely be necessary for optimization.
|
Can this implementation also cover pooling models other than Qwen3? (such as BERT, ...) |
|
@huijjj It would be great if you could review the warmup-related code. |
I tested with other models such as BERT ( Error log |
|
#167 might have fixed this issue. |
6884ee8 to
7e4ed8a
Compare
Applying #167 resolved the previous error, but I encountered a new one. It seems to be an issue related to the usage of Error Log |
|
added potential fix. does it fix the issue? |
I am still encountering the same error. As shown in the debug output above, the |
|
hmm then I guess we need to change the signature of |
🚀 Summary of Changes
_pool()method inRBLNModelRunnerfor pooling model inference based onGPUModelRunnerimplementation📌 Related Issues / Tickets
✅ Type of Change
feature)model)core)bug-fix)perf)refactor)docs)other): please describe🧪 How to Test
For Qwen3 Embedding
RBLN_PROFILER=0 RBLN_KERNEL_MODE=triton VLLM_RBLN_USE_VLLM_MODEL=1 VLLM_USE_V1=1 python examples/experimental/qwen3_embedding.pyFor Qwen3 Reranker
RBLN_PROFILER=0 RBLN_KERNEL_MODE=triton VLLM_RBLN_USE_VLLM_MODEL=1 VLLM_USE_V1=1 python examples/experimental/qwen3_reranker.py📋 Checklist
💬 Notes