-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[megatron] feat: qwen2.5vl #1286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…en25vl_tmp_update0527_v2
…en25vl_tmp_update0527_v3
…en25vl_tmp_update0527_v4
…en25vl_tmp_update0527_v5
We at nvidia have developed the |
eric-haibin-lin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could u add a new dataset and a reference training record in https://verl.readthedocs.io/en/latest/algo/baseline.html ? thanks! (doing it in next PR is fine).
i do not have further comments for now. i think it's crucial to refactor the worker classes according to so that we can have standalone tests for model fwd/bwd. #1560 #1913
Sure, I will add a record of qwen2.5vl+megatron on next PR. For the refactors, I will keep tracking and contribute from aspect of megatron |
|
@dataproblems do you want to take a final look? |
|
@eric-haibin-lin - my bad! I missed this! I'll try to make sure I get the notifications from now on!! |
|
@eric-haibin-lin added a record and a recipe of training qwen2.5vl 7b, see #1969 |
|
@ISEEKYAN hi thanks for your awesome work! it seems we should guarantee there must be one image in mcore_batch_sz ? |
multiple images are stacked in https://github.com/volcengine/verl/blob/main/verl/workers/actor/megatron_actor.py#L412 |
thanks for your replay! but when we use plain-text data (no images or no videos), there would be KeyError in https://github.com/volcengine/verl/blob/main/verl/models/mcore/model_forward.py#L91-L92. maybe we should pad empty images in the forward function? |
@MaoChouHJM It is a very good question. The qwen2.5vl it self support pure language input, so this is a bug in existing code. would you please contribute a PR to fix this? |
of course, i will try to contribute my first PR. you mentioned "The qwen2.5vl it self support pure language input", means the huggingface implemetion? aybe I can refer to this code and fix it. |
Both HF implementation and this megatron implementation support pure language as input. While it is a bug that existing code asserts that multimodal input exists. |
@ISEEKYAN #1999 here is my pr, would you please review it :> |
works with qwen2.5vl 3b + geo3k <img width="1148" alt="image" src="https://github.com/user-attachments/assets/87c8746c-7f40-4189-9e82-eb1b459669f8" /> <img width="1143" alt="image" src="https://github.com/user-attachments/assets/58bce88d-c53e-45a2-b89c-bfacf4ae9e85" /> <img width="1503" alt="image" src="https://github.com/user-attachments/assets/284ef5c6-2057-4a73-ad56-bed2ef0ece43" />
…-text and image-text (#1999) ### Checklist Before Starting - [ ] Searched for similar PR(s). - [ ] Checked PR Title format - [ ] In format of: [modules] type: Title - [ ] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, tests, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc` - [ ] type is in `feat, fix, refactor, chore` - [ ] can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? fix qwen2_vl on plain-text data and mix data of plain-text and image-text, refer to #1286 ### Test test on gsm8k dataset and mix data of gsm8k and geo3k. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.
…-text and image-text (volcengine#1999) ### Checklist Before Starting - [ ] Searched for similar PR(s). - [ ] Checked PR Title format - [ ] In format of: [modules] type: Title - [ ] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, tests, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc` - [ ] type is in `feat, fix, refactor, chore` - [ ] can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? fix qwen2_vl on plain-text data and mix data of plain-text and image-text, refer to volcengine#1286 ### Test test on gsm8k dataset and mix data of gsm8k and geo3k. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.
…-text and image-text (volcengine#1999) ### Checklist Before Starting - [ ] Searched for similar PR(s). - [ ] Checked PR Title format - [ ] In format of: [modules] type: Title - [ ] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, tests, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc` - [ ] type is in `feat, fix, refactor, chore` - [ ] can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? fix qwen2_vl on plain-text data and mix data of plain-text and image-text, refer to volcengine#1286 ### Test test on gsm8k dataset and mix data of gsm8k and geo3k. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.
works with qwen2.5vl 3b + geo3k <img width="1148" alt="image" src="https://github.com/user-attachments/assets/87c8746c-7f40-4189-9e82-eb1b459669f8" /> <img width="1143" alt="image" src="https://github.com/user-attachments/assets/58bce88d-c53e-45a2-b89c-bfacf4ae9e85" /> <img width="1503" alt="image" src="https://github.com/user-attachments/assets/284ef5c6-2057-4a73-ad56-bed2ef0ece43" />
works with qwen2.5vl 3b + geo3k