Upgrade trl fix #2544

Datta0 · 2025-05-15T14:42:25Z

Depends on : unslothai/unsloth-zoo#140

Tested: GRPO with qwen-3-4B, ORPO and DPO (to make sure nothing breaks there)

TODO:

[ ] Compare performance

…se autoSequenceClassification

…ions correctly

… stuff

Trl upgrade fix

For TRL 0.18.0 (Main branch of TRL at the time because its on 0.17.0) , the SFT trainer for some reason deletes the labels column and unsloth internal loss funcitons need that column for hte claculations so I add it back in like this.

Trl update, particualry small fixes on SFT trainer

Signed-off-by: datta0 <[email protected]>

danielhanchen · 2025-05-25T10:21:08Z

unsloth/models/rl.py

+        if trl_version >= "0.18":
+            # Replace LLM init with already existing vLLM engine for colocate mode
+            vllm_llm_init_pattern = r"self\.llm\s*=\s*LLM\([^)]*\)*\)"
+            vllm_llm_repalcement = "self.llm = model.vllm_engine\n"


Spelling error :)

danielhanchen · 2025-05-25T10:21:14Z

unsloth/models/rl.py

+            vllm_llm_repalcement = "self.llm = model.vllm_engine\n"
+            new_vllm_part = re.sub(
+                vllm_llm_init_pattern,
+                vllm_llm_repalcement,


danielhanchen · 2025-05-25T10:22:28Z

unsloth/models/rl_replacements.py

        if not hasattr(self, '_autocast_dtype'):
            self._autocast_dtype = torch.float16 if os.environ.get('ACCELERATE_MIXED_PRECISION', 'fp16') == 'fp16' else torch.bfloat16
            if os.environ.get('UNSLOTH_FORCE_FLOAT32', '0') == '1': self._autocast_dtype = torch.float16
+        os.environ["UNSLOTH_RETURN_HIDDEN_STATES"] = "0"


Wait this forcefully returns hidden states right - doesn't this make the GRPO loss use more memory?

danielhanchen · 2025-05-25T10:23:03Z

unsloth/models/rl_replacements.py

            logits = model(input_ids=input_ids, attention_mask=attention_mask, logits_to_keep=logits_to_keep + 1).logits
            logits = logits[:, :-1, :]  # (B, L-1, V), exclude the last logit: it corresponds to the next token pred
-
+            logits = logits.to(torch.float32)


I don't think this is correct since we auto upcast inside the torch.compile function, so this uses 2x more memory

yeah I think so.
will remove this

danielhanchen · 2025-05-25T10:23:42Z

unsloth/models/rl_replacements.py

            logits = logits[:, -logits_to_keep:]
-            return logits
-            # return selective_log_softmax(logits, input_ids)  #  compute logprobs for the input tokens
+            #return logits


Yes this is correct for non GRPO, but reminder in Unsloth GRPO, we specifically calculate the logits on the fly to save VRAM

Signed-off-by: datta0 <[email protected]>

pluesclues added 30 commits December 12, 2024 11:34

Update llama.py making set and reset functions in order to properly u…

7445628

…se autoSequenceClassification

Update fast_lora.py, added mixed precising pytorch autocasting

389b98f

Update llama.py did not included rotary embeddings in the reset funct…

4705906

…ions correctly

Merge branch 'unslothai:main' into main

9b5b8a1

Update rl.py: correct get reward model added as well as the eval step…

9d43808

… stuff

Update rl.py removed function that did not need to be patched

7503c71

Update llama.py: kept reset functions and made their names generic

d219f25

Update fast_lora.py

52f4301

Merge branch 'unslothai:main' into main

454b30d

Update rl.py, try except

9ad2ea3

Merge branch 'main' into main

f5dbcb7

Update fast_lora.py, removing downcasting stuff

06e694f

Merge branch 'unslothai:main' into main

c424ba8

Update llama.py removed depircate LLamaLinearScalingRotaryEmbedding

24ab05e

Merge branch 'unslothai:main' into main

8799a5c

Update rl.py for VLLM RLOO and PPO

fa0fdc3

Merge branch 'unslothai:main' into main

48e9dfb

Merge branch 'unslothai:main' into vllm_rloo_ppo

6483cd4

Update rl.py reverted

5f17f4b

Update rl.py with peft cahnges

854b589

Update rl.py, disabling adapters screws inference up

5b73657

Merge branch 'unslothai:main' into main

4a8d2db

Merge branch 'unslothai:main' into main

ad69e08

Merge branch 'unslothai:main' into vllm_rloo_ppo

7c27649

Update rl.py getting PPO support

1385af2

Merge branch 'unslothai:main' into main

292510f

Merge branch 'unslothai:main' into vllm_rloo_ppo

0cf7b10

Merge branch 'unslothai:main' into main

4b6a089

Merge branch 'unslothai:main' into vllm_rloo_ppo

35700a7

Update rl.py cleanup

f3d10f3

pluesclues and others added 9 commits May 18, 2025 11:30

Merge pull request #3 from Datta0/trl_upgrade_fix

66628a3

Trl upgrade fix

Update rl.py

e546d9c

For TRL 0.18.0 (Main branch of TRL at the time because its on 0.17.0) , the SFT trainer for some reason deletes the labels column and unsloth internal loss funcitons need that column for hte claculations so I add it back in like this.

Update llama.py, merge it to be dattas llama version

cabb7eb

Update rl.py, sft changes to get 0.18.0 to be working

e8b1ae1

Merge pull request #1 from pluesclues/trl_update

65bb18c

Trl update, particualry small fixes on SFT trainer

Update rl_replacements.py, added hidden state stuff

e91f3d3

Update rl_replacements.py

d25bfbb

Update rl_replacements.py

15f43d1

Update rl_replacements.py, rechanged the accumlated loss

51e4a89

Datta0 force-pushed the trl_upgrade_fix branch from 8b89b09 to 5096de8 Compare May 24, 2025 07:54

Fixup num_iterations>1 for grpo

dd5128d

Signed-off-by: datta0 <[email protected]>

Datta0 force-pushed the trl_upgrade_fix branch from 5096de8 to dd5128d Compare May 24, 2025 07:54

pluesclues and others added 2 commits May 24, 2025 07:45

Update rl_replacements.py

d9aa746

Merge remote-tracking branch 'plues/trl_update' into trl_upgrade_fix

478ec60

Signed-off-by: datta0 <[email protected]>

danielhanchen reviewed May 25, 2025

View reviewed changes

no unnecessary logits upcast. fix naming

1aa7aa1

Signed-off-by: datta0 <[email protected]>

Datta0 force-pushed the trl_upgrade_fix branch from a7eec30 to 1aa7aa1 Compare May 25, 2025 13:11

pluesclues and others added 4 commits May 25, 2025 13:44

Update rl_replacements.py returned hidden states from logprobs

a949ad6

Update rl_replacements.py removed debug logic

ba13f40

Update rl_replacements.py, should be fine now

4becc0c

Merge remote-tracking branch 'plues/trl_update' into trl_upgrade_fix

1aa2e37

Datta0 force-pushed the trl_upgrade_fix branch from 2961e2d to 1aa2e37 Compare May 25, 2025 18:31

Datta0 marked this pull request as ready for review May 26, 2025 04:22

pluesclues and others added 4 commits May 26, 2025 10:06

Update rl_replacements.py, should take new args for GRPO trainer

9b6fa47

Update rl_replacements.py, made it compatible with trl 0.15.2

e1a7efa

Update rl_replacements.py, fixed typo in per tokne-Logps

c54d77b

Merge remote-tracking branch 'plues/trl_update' into trl_upgrade_fix

0cff346

danielhanchen merged commit 567b2f7 into unslothai:main May 27, 2025

sabilmakbar mentioned this pull request Jun 5, 2025

[Bug] AttributeError: 'LlamaForCausalLM' object has no attribute 'disable_adapter'. Did you mean: 'disable_adapters'? #2688

Closed

Datta0 deleted the trl_upgrade_fix branch July 26, 2025 04:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Upgrade trl fix #2544

Upgrade trl fix #2544

Uh oh!

Datta0 commented May 15, 2025 •

edited

Loading

Uh oh!

danielhanchen May 25, 2025

Uh oh!

danielhanchen May 25, 2025

Uh oh!

danielhanchen May 25, 2025

Uh oh!

danielhanchen May 25, 2025

Uh oh!

Datta0 May 25, 2025

Uh oh!

danielhanchen May 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Upgrade trl fix #2544

Upgrade trl fix #2544

Uh oh!

Conversation

Datta0 commented May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TODO:

Uh oh!

danielhanchen May 25, 2025

Choose a reason for hiding this comment

Uh oh!

danielhanchen May 25, 2025

Choose a reason for hiding this comment

Uh oh!

danielhanchen May 25, 2025

Choose a reason for hiding this comment

Uh oh!

danielhanchen May 25, 2025

Choose a reason for hiding this comment

Uh oh!

Datta0 May 25, 2025

Choose a reason for hiding this comment

Uh oh!

danielhanchen May 25, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Datta0 commented May 15, 2025 •

edited

Loading