FEAT add GraLoRA #2851

yeonjoon-jung01 · 2025-10-20T01:24:36Z

Summary

We opened the initial PR for the GraLoRA method (a granular low-rank adaptation that improves expression power and outlier handling, selected as a NeurIPS 2025 Spotlight), based on #2636

BenjaminBossan

Thanks for contributing GraLoRA to PEFT. The method looks interesting and the implementation generally looks good.

I have added a bunch of comments, but many of these are just due to your fork being a bit older. We have simplified PEFT now so that you can remove a bunch of code, I have marked the code that can be deleted.

Apart from the comments that I added, to complete this PR, let's work on:

Extend tests: Add tests to test_custom_models.py, test_encoder_decoder_models.py, test_feature_extraction_models.py, and test_seq_classifier.py
Also, let's add documentation and ideally also at least one example.
Optional, but highly recommended: Add an experiment to our PEFT method comparison suite.

src/peft/tuners/gralora/config.py

src/peft/tuners/gralora/layer.py

src/peft/tuners/gralora/model.py

tests/test_gralora.py

BenjaminBossan · 2025-10-23T09:58:01Z

@yeonjoon-jung01 Please ping me when you're finished so that I know that I can give this another review. Also, if possible, please avoid force pushes or rebases, as those make reviews harder.

…hts parameter for flexible initialization

…est coverage

…ight calculation.

…ce, and more intuitive hybrid_r handling.

yeonjoon-jung01 · 2025-10-24T04:44:19Z

@yeonjoon-jung01 Please ping me when you're finished so that I know that I can give this another review. Also, if possible, please avoid force pushes or rebases, as those make reviews harder.

@BenjaminBossan I’ve finished updating the code 🙂. I saw your message a bit late — I had already rebased the branch to sync with the main stream, just in case there might be any conflicts. I’ll make sure to avoid force pushes or rebases from now on. Sorry about that!

yeonjoon-jung01 · 2025-10-24T10:15:13Z

@BenjaminBossan I have also resolved the previously missed features.

I’ve extended the test coverage to include test_custom_models.py, test_encoder_decoder_models.py, test_feature_extraction_models.py, and test_seq_classifier.py.
Additionally, I’ve added corresponding documentation and example code.

BenjaminBossan

Thanks for the updates to the PR. I did another review round, please check.

Also, before committing your changes, please call make style. Ensure that you have the correct version of ruff installed (0.12.12).

docs/source/package_reference/gralora.md

src/peft/tuners/gralora/config.py

src/peft/tuners/gralora/layer.py

tests/test_gralora.py

yeonjoon-jung01 · 2025-10-24T18:23:25Z

@BenjaminBossan I’ve resolved all of your comments and applied the suggested changes. The main update is that I removed tests/test_gralora.py and integrated the related test cases into the existing test_initialization and test_custom_models files, including additional scenarios for Hybrid GraLoRA.

HuggingFaceDocBuilderDev · 2025-10-27T11:42:29Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

BenjaminBossan · 2025-10-27T11:44:55Z

@yeonjoon-jung01 Could you please run make style?

yeonjoon-jung01 · 2025-10-27T12:40:55Z

@BenjaminBossan I have run make style updated the documentations

BenjaminBossan

Thanks for the updates. A test is failing. Check my comment on possible solutions.

src/peft/tuners/gralora/layer.py

yeonjoon-jung01 · 2025-10-28T12:28:11Z

@BenjaminBossan Do you think there’s any additional code I should test or update?

BenjaminBossan

Thanks for the updates, the change looks good.

I focused on the examples this time and found a few issues. Some are possibly due to some recent changes in transformers, not sure, but we should update them so that the examples run out of the box.

Moreover, I ran an experiment with GraLoRA on the PEFT MetaMath benchmark. I used the default settings for the config and in general, the results compare to LoRA rank 32, with similar memory usage and training time. However, the final test accuracy fell slightly short, attaining 46.2% compared to 48.2% with LoRA. If you have any suggestion for better GraLoRA hyper-parameters for this experiment, feel free to check them in as an experiment. Otherwise, we can also work the defaults.

examples/gralora_finetuning/README.md

examples/gralora_finetuning/gralora_finetuning.py

yeonjoon-jung01 · 2025-10-29T03:40:07Z

Thanks for the updates, the change looks good.

I focused on the examples this time and found a few issues. Some are possibly due to some recent changes in transformers, not sure, but we should update them so that the examples run out of the box.

Moreover, I ran an experiment with GraLoRA on the PEFT MetaMath benchmark. I used the default settings for the config and in general, the results compare to LoRA rank 32, with similar memory usage and training time. However, the final test accuracy fell slightly short, attaining 46.2% compared to 48.2% with LoRA. If you have any suggestion for better GraLoRA hyper-parameters for this experiment, feel free to check them in as an experiment. Otherwise, we can also work the defaults.

@BenjaminBossan Could you please try with learning rate 2e-4 instead of the default 1e-4 for GraLoRA?

…tegration

yeonjoon-jung01 · 2025-10-29T10:21:45Z

Thanks for the updates, the change looks good.
I focused on the examples this time and found a few issues. Some are possibly due to some recent changes in transformers, not sure, but we should update them so that the examples run out of the box.
Moreover, I ran an experiment with GraLoRA on the PEFT MetaMath benchmark. I used the default settings for the config and in general, the results compare to LoRA rank 32, with similar memory usage and training time. However, the final test accuracy fell slightly short, attaining 46.2% compared to 48.2% with LoRA. If you have any suggestion for better GraLoRA hyper-parameters for this experiment, feel free to check them in as an experiment. Otherwise, we can also work the defaults.

@BenjaminBossan Could you please try with learning rate 2e-4 instead of the default 1e-4 for GraLoRA?

@BenjaminBossan I’ve tested the method on my side with both rank 32 and 64, and in both cases, it achieved higher performance than LoRA. If you’d like, I can also commit the configuration and result JSON files, or you’re welcome to try reproducing it on your end.
I used a learning rate of 2e-4 for both cases, as GraLoRA is more robust to outliers, making a higher learning rate manageable. For the other settings, I used rank 32 with alpha 64 and rank 64 with alpha 128. All other parameters were kept at their default values.

BenjaminBossan

Could you please try with learning rate 2e-4 instead of the default 1e-4 for GraLoRA?

This does indeed help. With rank 32, I get higher accuracy now, 48.6% compared to LoRA rank 32 getting 48.2%. For rank 64 (alpha 128), I get 52.7% with GraLoRA and 53.0% with LoRA. In both cases, the memory usage is very close between the two methods.

We can check in those experiments, but if you have other suggestions that may work better, e.g. different hybrid_r or gralora_k, LMK.

examples/gralora_finetuning/gralora_finetuning.py

…e unsupported features

yeonjoon-jung01 · 2025-10-29T14:26:55Z

Could you please try with learning rate 2e-4 instead of the default 1e-4 for GraLoRA?

This does indeed help. With rank 32, I get higher accuracy now, 48.6% compared to LoRA rank 32 getting 48.2%. For rank 64 (alpha 128), I get 52.7% with GraLoRA and 53.0% with LoRA. In both cases, the memory usage is very close between the two methods.

We can check in those experiments, but if you have other suggestions that may work better, e.g. different hybrid_r or gralora_k, LMK.

@BenjaminBossan I also tried using a learning rate of 2e-4 for rank 32 and alpha 64, which achieved a test accuracy of 50.7%. The experiment was run on a single NVIDIA RTX A6000. I believe the difference may have resulted from variations in hardware or library versions.

I’m just curious if there might be any other differing settings. I’ve attached the following as adapter_config.json.

{
  "r": 32,
  "hybrid_r": 0,
  "target_modules": null,
  "gralora_alpha": 64,
  "gralora_dropout": 0.0,
  "gralora_k": 2,
  "fan_in_fan_out": false,
  "bias": "none",
  "init_weights": true,
  "layers_to_transform": null,
  "layers_pattern": null,
  "modules_to_save": null,
  "peft_type": "GRALORA",
  "megatron_config": null,
  "megatron_core": "megatron.core",
  "revision": null,
  "task_type": "CAUSAL_LM"
}

with following for the training_params.json

{
  "optimizer_kwargs": {
    "lr": 2e-4
  }
}

the result and package info is as follow

"step": 5000,
"test accuracy": 0.5072024260803639,
"train loss": 0.5921622431278228,
"train samples": 20000,
"train total tokens": 4198051

"transformers-version": "4.57.1",
"transformers-commit-hash": null,
"peft-version": "0.17.2.dev0",
"peft-commit-hash": "a1c944a4c3aceffb5d9dc3ee4e37aaad76d55f5c",
"datasets-version": "4.3.0",
"datasets-commit-hash": null,
"bitsandbytes-version": "0.48.1",
"bitsandbytes-commit-hash": null,
"torch-version": "2.7.1+cu126",
"torch-commit-hash": null

BenjaminBossan · 2025-10-29T15:24:11Z

My adapter_config.json for rank 32 is:

{
  "auto_mapping": null,
  "base_model_name_or_path": null,
  "bias": "none",
  "fan_in_fan_out": false,
  "gralora_alpha": 64,
  "gralora_dropout": 0.0,
  "gralora_k": 2,
  "hybrid_r": 0,
  "inference_mode": false,
  "init_weights": true,
  "layers_pattern": null,
  "layers_to_transform": null,
  "modules_to_save": null,
  "peft_type": "GRALORA",
  "peft_version": "0.17.2.dev0@UNKNOWN",
  "r": 32,
  "revision": null,
  "target_modules": null,
  "task_type": null
}

Training params are the same.

I also tried using a learning rate of 2e-4 for rank 32 and alpha 64, which achieved a test accuracy of 50.7%

The same accuracy for both ranks?

I believe the difference may have resulted from variations in hardware or library versions.

Possibly, just as an example, I use torch 2.8.0. The final run will be on an AWS instance we use for all models, so the score may differ yet from what I reported, as I ran the experiment locally.

yeonjoon-jung01 · 2025-10-29T15:44:47Z

Could you please try with learning rate 2e-4 instead of the default 1e-4 for GraLoRA?

This does indeed help. With rank 32, I get higher accuracy now, 48.6% compared to LoRA rank 32 getting 48.2%. For rank 64 (alpha 128), I get 52.7% with GraLoRA and 53.0% with LoRA. In both cases, the memory usage is very close between the two methods.

We can check in those experiments, but if you have other suggestions that may work better, e.g. different hybrid_r or gralora_k, LMK.

My adapter_config.json for rank 32 is:

Training params are the same.

I also tried using a learning rate of 2e-4 for rank 32 and alpha 64, which achieved a test accuracy of 50.7%

The same accuracy for both ranks?

I believe the difference may have resulted from variations in hardware or library versions.

Possibly, just as an example, I use torch 2.8.0. The final run will be on an AWS instance we use for all models, so the score may differ yet from what I reported, as I ran the experiment locally.

@BenjaminBossan For rank 64, the accuracy ranged between 52.5 and 53%, depending on the hardware (A6000 and H100), which aligns with your reported results. I haven't tested yet, but I think we could consider setting gralora_k=4 for rank 64.

Additionally, I believe the accuracy for rank 64 LoRA is documented as 48.9% in the lora--llama-3.2-3B-rank64.json file. I wonder if any configuration has changed.

Finally, I think a batch size of 4 might be too small for stable training. How about adding an accumulation step option and increasing the effective batch size to a more common value, such as 128 or 192?

yeonjoon-jung01 · 2025-10-30T14:44:20Z

@BenjaminBossan I guess you could add the GraLoRA rank-32 example with a learning rate of 2e-4 for now.

I believe the accuracy results vary significantly across different settings (hardware and library versions) due to the instability caused by the small batch size. If you’re planning to scale up the batch size, I might try other configurations then.

Otherwise, please let me know if there’s anything else I should take care of.

BenjaminBossan · 2025-10-31T13:53:21Z

I guess you could add the GraLoRA rank-32 example with a learning rate of 2e-4 for now.

Could you please push the experiments to this PR (only the configs, not the results)? Since the learning rate also needs a different value, please include the training_params.json (like here).

Additionally, I believe the accuracy for rank 64 LoRA is documented as 48.9% in the lora--llama-3.2-3B-rank64.json file. I wonder if any configuration has changed.

I compared to LoRA with rank 64 and rslora enabled for better alpha values:

peft/method_comparison/MetaMathQA/results/lora--llama-3.2-3B-rank64-rslora.json

Line 321 in d43b315

"test accuracy": 0.5299469294920395,

I think a batch size of 4 might be too small for stable training. How about adding an accumulation step option and increasing the effective batch size to a more common value, such as 128 or 192?

We could think about adding gradient accumulation to the script, but we wanted to keep it simple on purpose, and gradient accumulation can be tricky to get right. For this PR, let's keep the setting as they are. Since the other methods use the same batch size, I think the comparison is still fair.

yeonjoon-jung01 · 2025-10-31T16:10:04Z

@BenjaminBossan I was just wondering if increasing the batch size could help stabilize the training process and make the final results more consistent across different settings (hardware and library versions). However, I also agree that it’s fine to keep the current settings as they are in this PR.

I’ve added the experiment configs for GraLoRA :)

BenjaminBossan · 2025-11-03T14:29:38Z

@yeonjoon-jung01 Could you please run make style :)

yeonjoon-jung01 · 2025-11-03T15:04:36Z

@yeonjoon-jung01 Could you please run make style :)

@BenjaminBossan I have updated the code 👍

BenjaminBossan

Thanks a lot @yeonjoon-jung01 for the last update and for your great work on this PR overall. It's now in a finished state as everything LGTM.

We will not merge it right now as PEFT is currently in feature freeze. As soon as the next release is out, which shouldn't be too long in the future, this PR will be merged.

yeonjoon-jung01 · 2025-11-04T14:28:46Z

Thanks a lot @yeonjoon-jung01 for the last update and for your great work on this PR overall. It's now in a finished state as everything LGTM.

We will not merge it right now as PEFT is currently in feature freeze. As soon as the next release is out, which shouldn't be too long in the future, this PR will be merged.

@BenjaminBossan Thanks a lot for your guidance during this PR! Really appreciate the helpful feedbacks. Looking forward to it being merged after the next release.

yeonjoon-jung01 mentioned this pull request Oct 20, 2025

Support for GraLoRA #2636

Closed

BenjaminBossan requested changes Oct 22, 2025

View reviewed changes

yeonjoon-jung01 force-pushed the gralora_support branch from 134e6f0 to a24d156 Compare October 23, 2025 06:34

yeonjoon-jung01 and others added 5 commits October 24, 2025 00:00

feat: Add Gralora configuration and basic implementation

6dfa24e

ENH Support merge/unmerge in GraLoRA functionality; support init_weig…

bfa1ef7

…hts parameter for flexible initialization

TST Add test suite for GraLoRA.

9813b17

FIX & TEST: Fix GraLoRA bugs in get_peft_config_as_dict and improve t…

c1fe6c4

…est coverage

Refactor GraLoRA weight computation to improve efficiency in delta-we…

4f1444f

…ight calculation.

yeonjoon-jung01 force-pushed the gralora_support branch from c53ffce to 2618a8a Compare October 23, 2025 15:21

yeonjoon-jung01 added 2 commits October 24, 2025 00:29

Refactor GraLoRA code for clearer documentation, simplified inheritan…

9431502

…ce, and more intuitive hybrid_r handling.

Update test code for the GraLoRA method

dec25f5

yeonjoon-jung01 force-pushed the gralora_support branch from 2618a8a to dec25f5 Compare October 23, 2025 15:31

yeonjoon-jung01 requested a review from BenjaminBossan October 24, 2025 01:36

ADD: documentations, examples, and test code for GraLoRA method

925ad72

BenjaminBossan requested changes Oct 24, 2025

View reviewed changes

REFACTOR: integrate GraLoRA tests into existing test files

3f69d8f

yeonjoon-jung01 requested a review from BenjaminBossan October 24, 2025 18:24

UPDATE document format in GraLoRA

430e896

BenjaminBossan requested changes Oct 27, 2025

View reviewed changes

src/peft/tuners/gralora/layer.py Outdated Show resolved Hide resolved

FIX CPU casting in GraLoRA get_delta_weight function

351877f

yeonjoon-jung01 requested a review from BenjaminBossan October 27, 2025 14:18

BenjaminBossan requested changes Oct 28, 2025

View reviewed changes

HaohanTsao mentioned this pull request Oct 29, 2025

DOCS: update README for GraLoRA finetuning with correct SFTTrainer integration yeonjoon-jung01/peft#2

Merged

DOCS: update README for GraLoRA finetuning with correct SFTTrainer in…

a1c944a

…tegration

BenjaminBossan requested changes Oct 29, 2025

View reviewed changes

examples/gralora_finetuning/gralora_finetuning.py Outdated Show resolved Hide resolved

examples/gralora_finetuning/gralora_finetuning.py Outdated Show resolved Hide resolved

EXAMPLE: update example code to support latest transformers and remov…

c0ead87

…e unsupported features

yeonjoon-jung01 requested a review from BenjaminBossan October 30, 2025 15:38

ADD GraLoRA experiment configs

005745b

Update style

0b130cc

BenjaminBossan approved these changes Nov 4, 2025

View reviewed changes

FEAT add GraLoRA #2851

Are you sure you want to change the base?

FEAT add GraLoRA #2851

Uh oh!

Conversation

yeonjoon-jung01 commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BenjaminBossan commented Oct 23, 2025

Uh oh!

yeonjoon-jung01 commented Oct 24, 2025

Uh oh!

yeonjoon-jung01 commented Oct 24, 2025

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yeonjoon-jung01 commented Oct 24, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Oct 27, 2025

Uh oh!

BenjaminBossan commented Oct 27, 2025

Uh oh!

yeonjoon-jung01 commented Oct 27, 2025

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yeonjoon-jung01 commented Oct 28, 2025

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yeonjoon-jung01 commented Oct 29, 2025

Uh oh!

yeonjoon-jung01 commented Oct 29, 2025

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

yeonjoon-jung01 commented Oct 29, 2025

Uh oh!

BenjaminBossan commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yeonjoon-jung01 commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yeonjoon-jung01 commented Oct 20, 2025 •

edited

Loading

BenjaminBossan commented Oct 29, 2025 •

edited

Loading

yeonjoon-jung01 commented Oct 29, 2025 •

edited

Loading