From 75a0582b72b072863f27fe4666d7852c6821ac5e Mon Sep 17 00:00:00 2001 From: sergiopaniego Date: Tue, 7 Oct 2025 14:35:02 +0200 Subject: [PATCH 1/7] Update max_length explanation for VLMs --- docs/source/grpo_trainer.md | 10 ++++++++-- docs/source/rloo_trainer.md | 10 ++++++++-- 2 files changed, 16 insertions(+), 4 deletions(-) diff --git a/docs/source/grpo_trainer.md b/docs/source/grpo_trainer.md index a8d058d4194..3dce352a363 100644 --- a/docs/source/grpo_trainer.md +++ b/docs/source/grpo_trainer.md @@ -567,8 +567,14 @@ accelerate launch \ ### Configuration Tips -> [!WARNING] -> VLM training may fail if image tokens are truncated. We highly recommend disabling truncation by setting `max_prompt_length` to `None`. +> [!TIP] +> For VLMs, truncating may remove image tokens, leading to errors during training. To avoid this, set `max_length=None` in the [`GRPOConfig`]. This allows the model to process the full sequence length without truncating image tokens. +> +> ```python +> GRPOConfig(max_length=None, ...) +> ``` +> +> Only use `max_length` when you've verified that truncation won't remove image tokens for the entire dataset. - Use LoRA on vision-language projection layers - Enable 4-bit quantization to reduce memory usage diff --git a/docs/source/rloo_trainer.md b/docs/source/rloo_trainer.md index 891a0bcb0f0..814c77620f5 100644 --- a/docs/source/rloo_trainer.md +++ b/docs/source/rloo_trainer.md @@ -549,8 +549,14 @@ accelerate launch \ ### Configuration Tips -> [!WARNING] -> VLM training may fail if image tokens are truncated. We highly recommend disabling truncation by setting `max_prompt_length` to `None`. +> [!TIP] +> For VLMs, truncating may remove image tokens, leading to errors during training. To avoid this, set `max_length=None` in the [`RLOOConfig`]. This allows the model to process the full sequence length without truncating image tokens. +> +> ```python +> RLOOConfig(max_length=None, ...) +> ``` +> +> Only use `max_length` when you've verified that truncation won't remove image tokens for the entire dataset. - Use LoRA on vision-language projection layers - Enable 4-bit quantization to reduce memory usage From 5634d26933bcca3efe9e5ca3851e32630cc90c32 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Quentin=20Gallou=C3=A9dec?= <45557362+qgallouedec@users.noreply.github.com> Date: Tue, 4 Nov 2025 18:01:39 -0700 Subject: [PATCH 2/7] Apply suggestion from @qgallouedec --- docs/source/grpo_trainer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/grpo_trainer.md b/docs/source/grpo_trainer.md index 3dce352a363..2b7c81e00ee 100644 --- a/docs/source/grpo_trainer.md +++ b/docs/source/grpo_trainer.md @@ -568,7 +568,7 @@ accelerate launch \ ### Configuration Tips > [!TIP] -> For VLMs, truncating may remove image tokens, leading to errors during training. To avoid this, set `max_length=None` in the [`GRPOConfig`]. This allows the model to process the full sequence length without truncating image tokens. +> For VLMs, truncating may remove image tokens, leading to errors during training. To avoid this, set `max_prompt_length=None` in the [`GRPOConfig`]. This allows the model to process the full sequence length without truncating image tokens. > > ```python > GRPOConfig(max_length=None, ...) From ba5752809bb811b9e7329b41ec0c3c51b92000ae Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Quentin=20Gallou=C3=A9dec?= <45557362+qgallouedec@users.noreply.github.com> Date: Tue, 4 Nov 2025 18:01:47 -0700 Subject: [PATCH 3/7] Apply suggestion from @qgallouedec --- docs/source/grpo_trainer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/grpo_trainer.md b/docs/source/grpo_trainer.md index 2b7c81e00ee..e016958c966 100644 --- a/docs/source/grpo_trainer.md +++ b/docs/source/grpo_trainer.md @@ -574,7 +574,7 @@ accelerate launch \ > GRPOConfig(max_length=None, ...) > ``` > -> Only use `max_length` when you've verified that truncation won't remove image tokens for the entire dataset. +> Only use `max_prompt_length` when you've verified that truncation won't remove image tokens for the entire dataset. - Use LoRA on vision-language projection layers - Enable 4-bit quantization to reduce memory usage From 5a776a8a4a14774e922b1816ba18e07a557e2cd6 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Quentin=20Gallou=C3=A9dec?= <45557362+qgallouedec@users.noreply.github.com> Date: Tue, 4 Nov 2025 18:01:53 -0700 Subject: [PATCH 4/7] Apply suggestion from @qgallouedec --- docs/source/grpo_trainer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/grpo_trainer.md b/docs/source/grpo_trainer.md index e016958c966..8d458e66e27 100644 --- a/docs/source/grpo_trainer.md +++ b/docs/source/grpo_trainer.md @@ -571,7 +571,7 @@ accelerate launch \ > For VLMs, truncating may remove image tokens, leading to errors during training. To avoid this, set `max_prompt_length=None` in the [`GRPOConfig`]. This allows the model to process the full sequence length without truncating image tokens. > > ```python -> GRPOConfig(max_length=None, ...) +> GRPOConfig(max_prompt_length=None, ...) > ``` > > Only use `max_prompt_length` when you've verified that truncation won't remove image tokens for the entire dataset. From 3beb1bf723cfdaa829be661335484e9f08709db6 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Quentin=20Gallou=C3=A9dec?= <45557362+qgallouedec@users.noreply.github.com> Date: Tue, 4 Nov 2025 18:02:06 -0700 Subject: [PATCH 5/7] Apply suggestion from @qgallouedec --- docs/source/rloo_trainer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/rloo_trainer.md b/docs/source/rloo_trainer.md index 814c77620f5..0e11b01dae3 100644 --- a/docs/source/rloo_trainer.md +++ b/docs/source/rloo_trainer.md @@ -550,7 +550,7 @@ accelerate launch \ ### Configuration Tips > [!TIP] -> For VLMs, truncating may remove image tokens, leading to errors during training. To avoid this, set `max_length=None` in the [`RLOOConfig`]. This allows the model to process the full sequence length without truncating image tokens. +> For VLMs, truncating may remove image tokens, leading to errors during training. To avoid this, set `max_prompt_length=None` in the [`RLOOConfig`]. This allows the model to process the full sequence length without truncating image tokens. > > ```python > RLOOConfig(max_length=None, ...) From 556009885a0a8ac83e7b488259b85f12a4cc33e1 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Quentin=20Gallou=C3=A9dec?= <45557362+qgallouedec@users.noreply.github.com> Date: Tue, 4 Nov 2025 18:02:12 -0700 Subject: [PATCH 6/7] Apply suggestion from @qgallouedec --- docs/source/rloo_trainer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/rloo_trainer.md b/docs/source/rloo_trainer.md index 0e11b01dae3..29f8331f7b1 100644 --- a/docs/source/rloo_trainer.md +++ b/docs/source/rloo_trainer.md @@ -553,7 +553,7 @@ accelerate launch \ > For VLMs, truncating may remove image tokens, leading to errors during training. To avoid this, set `max_prompt_length=None` in the [`RLOOConfig`]. This allows the model to process the full sequence length without truncating image tokens. > > ```python -> RLOOConfig(max_length=None, ...) +> RLOOConfig(max_prompt_length=None, ...) > ``` > > Only use `max_length` when you've verified that truncation won't remove image tokens for the entire dataset. From 24272073a61c39e5d3c7c0f2b59d83b1712716bb Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Quentin=20Gallou=C3=A9dec?= <45557362+qgallouedec@users.noreply.github.com> Date: Tue, 4 Nov 2025 18:02:17 -0700 Subject: [PATCH 7/7] Apply suggestion from @qgallouedec --- docs/source/rloo_trainer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/rloo_trainer.md b/docs/source/rloo_trainer.md index 29f8331f7b1..9ab999b59cb 100644 --- a/docs/source/rloo_trainer.md +++ b/docs/source/rloo_trainer.md @@ -556,7 +556,7 @@ accelerate launch \ > RLOOConfig(max_prompt_length=None, ...) > ``` > -> Only use `max_length` when you've verified that truncation won't remove image tokens for the entire dataset. +> Only use `max_prompt_length` when you've verified that truncation won't remove image tokens for the entire dataset. - Use LoRA on vision-language projection layers - Enable 4-bit quantization to reduce memory usage