Skip to content

Conversation

@younesbelkada
Copy link
Contributor

@younesbelkada younesbelkada commented Jul 25, 2023

What does this PR do ?

Refactors a bit the internals of some PEFT models (making some methods static, etc.) and introduces a new method create_and_replace for users that want to pass a bare model, a peft config to inject adapters in-place into the model.
These changes are totally BC with the previous PEFT versions.

This PR makes things easier for the PEFT integration in transformers huggingface/transformers#25077

cc @pacman100 @BenjaminBossan

The main goal of the PR is to expose a new API for advanced users that want to integrate PEFT method without the need to use PeftModel wrappers. A simple usecase could be someone that wants to inject adapters on a model and does want to keep the original class of the model without having to offload that to peft that will create a PeftModel. I have faced this issue in huggingface/transformers#25077
Among other things, this PR refactors some internals of PEFT library, while keeping it fully backward compatible.

To tackle the main motivation I propose to differentiate things between two type of adapters
1- adapters that are injectable (LoRA, AdaLoRA, IA3)
2- adapters that are not injectable (the rest)

As a first iteration this API would be supported only for the scenario 1- / therefore I decided to create 2 abstract classes to make things easy to be able to determine if the adapter layer (e.g. LoraLayer) / adapter module (e.g. LoraModel) does follow the minimal requirement (i.e. needed attributes, etc.)

Notable changes:

1- make a static method _get_peft_state_dict to be able to simply retrieve the state dict of the adapters for anyone that wants to use that method. I plan also to move that method in utils
2- PeftConfig now returns the correct class
3- Creates a new property method is_prompt_learning to avoid importing PromptLearningConfig all the way down
4- Introduces a new object TUNERS_MAPPING , which is a mapping of supported pluggable adapters
5- Creates two abstract classes
5.1- BaseTunerLayerMixin a mixin to check for minimal required attributes that a tuner layer should have active_adapter / _is_plugable
5.2- BaseTunerMixin a higher level module mixin that should be used for any injectable adapters in the future.

TODO:

  • test 8bit/4bit compatibility
  • docs

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jul 25, 2023

The documentation is not available anymore as the PR was closed or merged.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for much for working on this. IMO this is moving PEFT in the right direction (not only saying this because it brings it closer to my refactoring proposal ;-), even regardless of the transformers integration.

general comments

I am happy to break down into multiple smaller PRs

Yeah, there is a bit of diff noise because somewhat unrelated things were changed, but splitting into separate PRs is extra work, so I'm fine with keeping it as is.

To tackle the main motivation I propose to differentiate things between two type of adapters

Yes, big +1 from my side. The two types are quite different and lots of code is basically a big if-else that does completely different things based on the type.

create_and_replace

I think the biggest change of the PR is the new function peft.mapping.create_and_replace. IIUC, the purpose of that function is to create an adapted module with corresponding technique (lora, ia³, etc.) without any wrapper model around it. I agree that this is a good feature.

However, I think this somewhat duplicates existing functionality. At least with lora, we can create a LoraModel and then unload it, which achieves the same goal. I believe it would be better if we create an instance of the LoraModel and then return the unloaded model. That way, there is no duplication with the different _find_and_replace methods. I'm still in favor of factoring out that code, because it's mostly duplicated between the different adapter methods, but maybe just as a helper function.

So in sum, I think we can keep the create_and_replace function, but in its body, it would instantiate a LoraModel / IA3Model etc. based on the passed config, and then return model.unload(), so it would be a very short function that leverages existing code. WDYT?

Of course, we would have to support unloading on the non-LoRA methods too, but they should have it anyway (maybe we can add this method, and all the other missing ones, to the abstract base class?).

Backwards compatibility

If I'm not missing something, this is indeed fully backwards compatible. Just to be sure though, I would test that a model created with the new method can indeed load the weights from an existing model.

In the long run, we could even add this kind of test to the test suite (regression testing) but it's probably out of scope for this PR.

Taking this further

Also probably out of scope, but I'm just trying to imagine how we can take this direction even further. If we agree on this direction, that may inspire this PR as well.

IMHO, we should consider if we want to offer 2 APIs. One is a "high level" API and the other a "low level" API. Both are public, so we expect that advanced users may use the low level API.

The distinction would be that the high level API implements creation of PEFT models based on a PEFT config. To do that, the high level API would call the low level API. This would include an API to set individual adapter layers with specific parameters derived from the config.

That way, users can use the high level API the same way they do right now and create a PEFT model from config. However, more advanced users can create more fine-grained adapters, such as:

  • lora layers with different values for r (see #752)
  • mixing different types, like lora + ia³
  • providing custom adapter layers written by the user

Obviously, the low level API cannot be represented by a config, since it is too flexible. That means, a config, via high level API, can map to low level API, but not vice versa. I think this is okay, though, because configs are always very restrictive in their expressiveness, and we shouldn't prevent advanced users to do more powerful things.

Another advantage of this better separation of concerns would be that if someone wants to implement a new tuning method, they only have to implement the low level API. That means less code duplication, like we have right now with the finding and replacement layers that is duplicated for each method.

But yeah, as I mentioned, that's out of scope for this PR, just my flights of fancy :)

def peft_is_plugable(self):
return self._is_plugable

def __post_init__(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__post_init__ is only called by dataclasses, so it's never called here. You could move it to __init__ instead, but that would require the child classes to call super correctly. I wonder if the mixin is really needed. Where do we call peft_is_plugable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used _ peft_is_plugable in the transformers integration but I will remove it, I will propose something else for the post init method

self.trainable_adapter_name = adapter_name
self.rankallocator = RankAllocator(self.model, self.peft_config[adapter_name], self.trainable_adapter_name)

@staticmethod
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to make this a classmethod. Then, in the method body, you shouldn't use AdaLoraModel._create_new_module and AdaLoraModel._replace_module, but cls._create_new_module and cls._replace_module instead. The reason why this is better is that if a user creates a subclass of AdaLoraModel with their own _create_new_module or _replace_module, as is, those methods would not be used.

The same argument applies to the other create_and_replace methods on lora and ia³.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great ! Thanks for this suggestion

Copy link
Contributor

@pacman100 pacman100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @younesbelkada, thank you for all the great efforts are putting into this! 🤗

My major concern is that create_and_replace is leading to lot of duplication of code. One way to overcome it is to use create_and_replace in _find_and_replace.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to do an in-depth review but I struggled a bit with understanding how the different pieces fit together. Therefore, I created a schema, which is shown in the figure below. Please let me know if the schema is incorrect.

While looking at it, I couldn't stop but feel like the way things are connected is more complex than necessary. Therefore, I drafted an alternative suggestion at the bottom of the figure (for the case of LoRA but the same should work with others). Please let me know if it makes sense.

Basically, I think the alternative is simpler while achieving the same goal. A main difference is that with the current suggestion, when calling from transformers, there is never a PeftModel or LoraModel instance being created (because only class and static methods are called). However, I'm not sure why that would be an issue. In my suggestion, we create an instance but in the end only return the base_model. Is there some important reason I'm missing why that cannot work?

schema-refactor

EDIT 1: In my suggestion, the BaseTunerMixin is renamed to BaseTuner and it is a normal base class. The only real difference would be that BaseTuner inherits from nn.Module and LoraModel et al. only inherit from BaseTuner, not nn.Module + BaseTuner. FWIW, in the current suggestion, BaseTunerMixin is not a real mixin class because it contains abstract classes, which I think is a contradiction. It is already more or less a base class.

EDIT 2: I would argue it's actually an advantage if transformers calls get_peft_model, the same way that PEFT users would instantiate a PEFT model. This way, if we add some fixes to PEFT that are upstream of peft.mapping.create_and_replace (just an example, maybe PeftModel has to do some modifications to peft_config), the fix is also provided for transformers, whereas with the current suggestion, it would not be.


# TODO: test that
for module in model.modules():
if not getattr(module, "_is_peft_tuner_layer", False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering: Could we check instead if isinstance(module, BaseTunerLayerMixin) or do we expect BaseTunerLayerMixin with Is_peft_tuner_layer=False or do we want to allow layers with Is_peft_tuner_layer=True that are not BaseTunerLayerMixin? I'm not suggesting to change, rather I want to understand the intent.

Also, I wonder if this is redundant, because add_adapter calls _mark_only_adapters_as_trainable, which seems to do the same job, but it's more specialized, as it is implemented on each tuner class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with all the points you have shared here

Copy link
Contributor Author

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @BenjaminBossan for the valuable feedback and clear picture. I should have adapted the design accordingly with a minor tweak, I propose to define an instance method _create_and_replace for each subclass since I prefer to keep the _replace_module method as it is so that it can be used correctly by AdaLoraModel without code de-duplication. Also that method was used by the _unload_and_optionally_merge method.
I have also fixed AdaLora failing tests. Would like to hear from you about this design!

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the failing tests, I haven't checked yet what the issue is, will likely do it later, but I wanted to give feedback as early as possible, so here it is :)

I like these recent changes. I think this is a good clean up of the code with some refactoring and better definition of interfaces. Overall, I also have more confidence now that this code is doing the correct thing.

Since quite a few changes were about moving code from A to B, I have a small request: Let the reviewers know which code was moved unchanged, and which code was moved + changed, so that we know what we need to check and what we can skip.

I don't have too many comments left. A few I made, I believe, should help to simplify the code even further. Please take a look if they makes sense to you.

@younesbelkada younesbelkada marked this pull request as ready for review August 1, 2023 08:58
@younesbelkada
Copy link
Contributor Author

Thanks very much @BenjaminBossan for your extensive review and all your patience !
@pacman100 I can also confirm all slow tests pass on this branch, validating the 4bit / 8bit compatibility of this PR

Copy link
Contributor

@pacman100 pacman100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @younesbelkada, Thank you for iterating and adding BaseTuner and BaseTunerLayer classes to standardize the injectable peft methods 😄.

Currently, I am very confused with 3 functions/methods doing different things with name create_and_replace. There is an external function in utils called create_and_replace, BaseTuner has public method called create_and_replace as well as a private method _create_and_replace while all doing very different things. This makes it difficult to understand and something that is complicated when it shouldn't be the case.

How about the following:

  1. I feel the util function of mapping.py named create_and_replace is not required at all. See the comment on transformers PR: huggingface/transformers#25077 (comment)
  2. Rename the public method of BaseTuner named create_and_replace to add_adapter (preferred) or inject_adapter or find_create_and_replace, WDYT?

PeftType,
_freeze_adapter,
_get_submodules,
_is_valid_match,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why move is_valid_match which is just ia3 specific to others?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good point, I thought other tuner methods could benefit from it (since the method looks quite generic) in the future so I moved it into the utils

@younesbelkada younesbelkada requested a review from pacman100 August 2, 2023 09:48
@younesbelkada younesbelkada changed the title [core] PEFT refactor + introducing create_and_replace public method [core] PEFT refactor + introducing inject_adapter_in_model public method Aug 2, 2023
Copy link
Contributor

@pacman100 pacman100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all the changes and bearing with me 🤗. Left a comment

@younesbelkada younesbelkada requested a review from pacman100 August 2, 2023 12:25

tuner_cls = PEFT_TYPE_TO_TUNER_MAPPING[peft_config.peft_type]

peft_model = tuner_cls(model, peft_config, adapter_name=adapter_name)
Copy link
Contributor

@patrickvonplaten patrickvonplaten Aug 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I went down the path correctly here it looks like this line will set required_grad=False for all weights of the model. For huggingface/transformers#25077 this would mean that if a load lora weights of a model all weights of the model (except the lora weights) are frozen - I would not expect such a behavior here.

Also it seems like the peft config decides further down the road here:

if self.peft_config[adapter_name].inference_mode:
if we're in inference mode or not - for the transformers implementation I would make try to not make any assumptions here depending on the config but leave it fully up to the user - so just not do anything (don't freeze any grads).

Can we add a flag here that would allow for this ?

Also I'd maybe add a comment to make it a bit easier for the user to understand what this line does.

Suggested change
peft_model = tuner_cls(model, peft_config, adapter_name=adapter_name)
# By instantiating a peft model we are injecting randomly initialized LoRA layers into the model's modules.
peft_model = tuner_cls(model, peft_config, adapter_name=adapter_name)

Copy link
Contributor Author

@younesbelkada younesbelkada Aug 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Freezing the base model should be optional when you load a pretrained adapter I think you are right

However when you attach fresh new adapters usually (for 99% of the usecases) it is to train them, so maybe we should make a distinction

1- load a pretrained adapter --> not necessearly freeze the base model
2- add a new random adapter --> I would expect users to train the adapters --> freeze the base model

WDYT? @pacman100 @BenjaminBossan

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, I would maybe give the user full flexibility here then and add a new function argument:
freeze_weights=True/False/"non-lora-only" or something similar? I think when calling this from transformers we don't want all the model to be frozen, but for other use cases it could very much be the case I'd say

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1- load a pretrained adapter --> not necessearly freeze the base model
2- add a new random adapter --> I would expect users to train the adapters --> freeze the base model

That makes sense, I would maybe give the user full flexibility here then and add a new function argument

Personally, I would prefer to give the user an explicit argument, rather than (or in addition to) trying to guess based on the circumstances, so I would go with that option.

@patrickvonplaten: You bring up some good points. I will still merge the PR as is because Younes will be OoO for some time and we don't want this big refactor to become stale. We should be able to address your concerns in a follow up PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! Agree that giving the user an explicit argument is the right option here

Co-authored-by: Patrick von Platen <[email protected]>
@BenjaminBossan BenjaminBossan merged commit ed396a6 into huggingface:main Aug 7, 2023
BenjaminBossan added a commit to BenjaminBossan/peft that referenced this pull request Aug 7, 2023
When a user tries to add a 2nd adapter, Lora and AdaLora make some checks to
ensure the new adapter is compatible with existing adapters. Currently, that
check is performed halfway through the method. This means that if the check
fails, the new adapter is partially applied, leaving the model in a bad state.
The main purpose of this PR is to ensure that the model state is correct after
such a failure is encountered.

Tests were added to catch this potential bug.

While working on this, I also did some related, but not strictly necessary
changes to the add_adapter methods:

- Previously, the peft_config from the PeftModel was passed to the base
  model. This meant that sometimes, the base model would hold a reference
  to PeftModel.peft_config, but not always, as some base models would
  create new dicts. This is problematic, because some code would rely on
  the objects being the same. Now, they are never the same, leading to
  more consistency.
- I think that the check if multiple adapters have biases (which is not
  supported) was accidentally removed by huggingface#749. It is added back in.
- Add some type annotations
- Extend docstrings to contain adapter_name
BenjaminBossan added a commit that referenced this pull request Aug 8, 2023
When a user tries to add a 2nd adapter, Lora and AdaLora make some checks to
ensure the new adapter is compatible with existing adapters. Currently, that
check is performed halfway through the method. This means that if the check
fails, the new adapter is partially applied, leaving the model in a bad state.
The main purpose of this PR is to ensure that the model state is correct after
such a failure is encountered.

Tests were added to catch this potential bug.

While working on this, I also did some related, but not strictly necessary
changes to the add_adapter methods:

- Previously, the peft_config from the PeftModel was passed to the base
  model. This meant that sometimes, the base model would hold a reference
  to PeftModel.peft_config, but not always, as some base models would
  create new dicts. This is problematic, because some code would rely on
  the objects being the same. Now, they are never the same, leading to
  more consistency.
- I think that the check if multiple adapters have biases (which is not
  supported) was accidentally removed by #749. It is added back in.
- Add some type annotations
- Extend docstrings to contain adapter_name
@younesbelkada younesbelkada deleted the integration-v2 branch August 17, 2023 10:06
@githubnemo githubnemo mentioned this pull request Jan 31, 2025
4 tasks
Guy-Bilitski pushed a commit to Guy-Bilitski/peft that referenced this pull request May 13, 2025
… method (huggingface#749)

Refactors a bit the internals of some PEFT models and introduces a new
method inject_adapter_in_model for users that want to pass a bare model
and a peft config to inject adapters in-place into the model. These
changes are totally BC with the previous PEFT versions.

This PR makes things easier for the PEFT integration in transformers
huggingface/transformers#25077

The main goal of the PR is to expose a new API for advanced users that
want to integrate PEFT method without the need to use the PeftModel
wrapper. A simple use case could be someone that wants to inject adapters
into a model and wants to keep the original class of the model without
having to offload that to peft that will create a PeftModel. I have
faced this issue in huggingface/transformers#25077 Among other things,
this PR refactors some internals of PEFT library, while keeping it fully
backward compatible.

To tackle the main motivation I propose to differentiate things between
two type of adapters

1- adapters that are injectable (LoRA, AdaLoRA, IA3)
2- adapters that are not injectable (the rest)

As a first iteration this API would be supported only for the scenario
1- / therefore I decided to create 2 abstract classes to make things
easy to be able to determine if the adapter layer (e.g. LoraLayer) /
adapter module (e.g. LoraModel) does follow the minimal
requirement (i.e. needed attributes, etc.)

Other related changes:

1- Creates a new property method is_prompt_learning to avoid importing
   PromptLearningConfig all the way down
2- Introduces a new object TUNERS_MAPPING, which is a mapping of
   supported pluggable adapters
3- Creates two abstract classes
3.1- BaseTunerLayer: a mixin to check for minimal required attributes
     that a tuner layer should have active_adapter / _is_plugable
3.2- BaseTuner: a higher level module mixin that should be used for any
     injectable adapters in the future.

---------

Co-authored-by: Benjamin Bossan <[email protected]>
Co-authored-by: Patrick von Platen <[email protected]>
Guy-Bilitski pushed a commit to Guy-Bilitski/peft that referenced this pull request May 13, 2025
When a user tries to add a 2nd adapter, Lora and AdaLora make some checks to
ensure the new adapter is compatible with existing adapters. Currently, that
check is performed halfway through the method. This means that if the check
fails, the new adapter is partially applied, leaving the model in a bad state.
The main purpose of this PR is to ensure that the model state is correct after
such a failure is encountered.

Tests were added to catch this potential bug.

While working on this, I also did some related, but not strictly necessary
changes to the add_adapter methods:

- Previously, the peft_config from the PeftModel was passed to the base
  model. This meant that sometimes, the base model would hold a reference
  to PeftModel.peft_config, but not always, as some base models would
  create new dicts. This is problematic, because some code would rely on
  the objects being the same. Now, they are never the same, leading to
  more consistency.
- I think that the check if multiple adapters have biases (which is not
  supported) was accidentally removed by huggingface#749. It is added back in.
- Add some type annotations
- Extend docstrings to contain adapter_name
cyyever pushed a commit to cyyever/peft that referenced this pull request Sep 4, 2025
…rCompletionOnlyLM (huggingface#749)

* Update utils.py

* correctly assign instruction_template in DataCollatorForCompletionOnlyLM

* correctly use instruction_token_ids in DataCollatorForCompletionOnlyLM

* DataCollatorForCompletionOnlyLM: fix instruction_template / response_template type check: handle cases where instruction_template is None

* make precommit

* Test DataCollatorForCompletionOnlyLM with pre-tokenized instruction_template
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants