-
Notifications
You must be signed in to change notification settings - Fork 2.1k
[core] PEFT refactor + introducing inject_adapter_in_model public method
#749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] PEFT refactor + introducing inject_adapter_in_model public method
#749
Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
- use propery instead of checking `isinstance` - move config to a new place
BenjaminBossan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for much for working on this. IMO this is moving PEFT in the right direction (not only saying this because it brings it closer to my refactoring proposal ;-), even regardless of the transformers integration.
general comments
I am happy to break down into multiple smaller PRs
Yeah, there is a bit of diff noise because somewhat unrelated things were changed, but splitting into separate PRs is extra work, so I'm fine with keeping it as is.
To tackle the main motivation I propose to differentiate things between two type of adapters
Yes, big +1 from my side. The two types are quite different and lots of code is basically a big if-else that does completely different things based on the type.
create_and_replace
I think the biggest change of the PR is the new function peft.mapping.create_and_replace. IIUC, the purpose of that function is to create an adapted module with corresponding technique (lora, ia³, etc.) without any wrapper model around it. I agree that this is a good feature.
However, I think this somewhat duplicates existing functionality. At least with lora, we can create a LoraModel and then unload it, which achieves the same goal. I believe it would be better if we create an instance of the LoraModel and then return the unloaded model. That way, there is no duplication with the different _find_and_replace methods. I'm still in favor of factoring out that code, because it's mostly duplicated between the different adapter methods, but maybe just as a helper function.
So in sum, I think we can keep the create_and_replace function, but in its body, it would instantiate a LoraModel / IA3Model etc. based on the passed config, and then return model.unload(), so it would be a very short function that leverages existing code. WDYT?
Of course, we would have to support unloading on the non-LoRA methods too, but they should have it anyway (maybe we can add this method, and all the other missing ones, to the abstract base class?).
Backwards compatibility
If I'm not missing something, this is indeed fully backwards compatible. Just to be sure though, I would test that a model created with the new method can indeed load the weights from an existing model.
In the long run, we could even add this kind of test to the test suite (regression testing) but it's probably out of scope for this PR.
Taking this further
Also probably out of scope, but I'm just trying to imagine how we can take this direction even further. If we agree on this direction, that may inspire this PR as well.
IMHO, we should consider if we want to offer 2 APIs. One is a "high level" API and the other a "low level" API. Both are public, so we expect that advanced users may use the low level API.
The distinction would be that the high level API implements creation of PEFT models based on a PEFT config. To do that, the high level API would call the low level API. This would include an API to set individual adapter layers with specific parameters derived from the config.
That way, users can use the high level API the same way they do right now and create a PEFT model from config. However, more advanced users can create more fine-grained adapters, such as:
- lora layers with different values for
r(see #752) - mixing different types, like lora + ia³
- providing custom adapter layers written by the user
Obviously, the low level API cannot be represented by a config, since it is too flexible. That means, a config, via high level API, can map to low level API, but not vice versa. I think this is okay, though, because configs are always very restrictive in their expressiveness, and we shouldn't prevent advanced users to do more powerful things.
Another advantage of this better separation of concerns would be that if someone wants to implement a new tuning method, they only have to implement the low level API. That means less code duplication, like we have right now with the finding and replacement layers that is duplicated for each method.
But yeah, as I mentioned, that's out of scope for this PR, just my flights of fancy :)
src/peft/tuners/tuners_utils.py
Outdated
| def peft_is_plugable(self): | ||
| return self._is_plugable | ||
|
|
||
| def __post_init__(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__post_init__ is only called by dataclasses, so it's never called here. You could move it to __init__ instead, but that would require the child classes to call super correctly. I wonder if the mixin is really needed. Where do we call peft_is_plugable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used _ peft_is_plugable in the transformers integration but I will remove it, I will propose something else for the post init method
src/peft/tuners/adalora.py
Outdated
| self.trainable_adapter_name = adapter_name | ||
| self.rankallocator = RankAllocator(self.model, self.peft_config[adapter_name], self.trainable_adapter_name) | ||
|
|
||
| @staticmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to make this a classmethod. Then, in the method body, you shouldn't use AdaLoraModel._create_new_module and AdaLoraModel._replace_module, but cls._create_new_module and cls._replace_module instead. The reason why this is better is that if a user creates a subclass of AdaLoraModel with their own _create_new_module or _replace_module, as is, those methods would not be used.
The same argument applies to the other create_and_replace methods on lora and ia³.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds great ! Thanks for this suggestion
pacman100
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @younesbelkada, thank you for all the great efforts are putting into this! 🤗
My major concern is that create_and_replace is leading to lot of duplication of code. One way to overcome it is to use create_and_replace in _find_and_replace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to do an in-depth review but I struggled a bit with understanding how the different pieces fit together. Therefore, I created a schema, which is shown in the figure below. Please let me know if the schema is incorrect.
While looking at it, I couldn't stop but feel like the way things are connected is more complex than necessary. Therefore, I drafted an alternative suggestion at the bottom of the figure (for the case of LoRA but the same should work with others). Please let me know if it makes sense.
Basically, I think the alternative is simpler while achieving the same goal. A main difference is that with the current suggestion, when calling from transformers, there is never a PeftModel or LoraModel instance being created (because only class and static methods are called). However, I'm not sure why that would be an issue. In my suggestion, we create an instance but in the end only return the base_model. Is there some important reason I'm missing why that cannot work?
EDIT 1: In my suggestion, the BaseTunerMixin is renamed to BaseTuner and it is a normal base class. The only real difference would be that BaseTuner inherits from nn.Module and LoraModel et al. only inherit from BaseTuner, not nn.Module + BaseTuner. FWIW, in the current suggestion, BaseTunerMixin is not a real mixin class because it contains abstract classes, which I think is a contradiction. It is already more or less a base class.
EDIT 2: I would argue it's actually an advantage if transformers calls get_peft_model, the same way that PEFT users would instantiate a PEFT model. This way, if we add some fixes to PEFT that are upstream of peft.mapping.create_and_replace (just an example, maybe PeftModel has to do some modifications to peft_config), the fix is also provided for transformers, whereas with the current suggestion, it would not be.
src/peft/mapping.py
Outdated
|
|
||
| # TODO: test that | ||
| for module in model.modules(): | ||
| if not getattr(module, "_is_peft_tuner_layer", False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering: Could we check instead if isinstance(module, BaseTunerLayerMixin) or do we expect BaseTunerLayerMixin with Is_peft_tuner_layer=False or do we want to allow layers with Is_peft_tuner_layer=True that are not BaseTunerLayerMixin? I'm not suggesting to change, rather I want to understand the intent.
Also, I wonder if this is redundant, because add_adapter calls _mark_only_adapters_as_trainable, which seems to do the same job, but it's more specialized, as it is implemented on each tuner class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed with all the points you have shared here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @BenjaminBossan for the valuable feedback and clear picture. I should have adapted the design accordingly with a minor tweak, I propose to define an instance method _create_and_replace for each subclass since I prefer to keep the _replace_module method as it is so that it can be used correctly by AdaLoraModel without code de-duplication. Also that method was used by the _unload_and_optionally_merge method.
I have also fixed AdaLora failing tests. Would like to hear from you about this design!
BenjaminBossan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the failing tests, I haven't checked yet what the issue is, will likely do it later, but I wanted to give feedback as early as possible, so here it is :)
I like these recent changes. I think this is a good clean up of the code with some refactoring and better definition of interfaces. Overall, I also have more confidence now that this code is doing the correct thing.
Since quite a few changes were about moving code from A to B, I have a small request: Let the reviewers know which code was moved unchanged, and which code was moved + changed, so that we know what we need to check and what we can skip.
I don't have too many comments left. A few I made, I believe, should help to simplify the code even further. Please take a look if they makes sense to you.
Co-authored-by: Benjamin Bossan <[email protected]>
|
Thanks very much @BenjaminBossan for your extensive review and all your patience ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @younesbelkada, Thank you for iterating and adding BaseTuner and BaseTunerLayer classes to standardize the injectable peft methods 😄.
Currently, I am very confused with 3 functions/methods doing different things with name create_and_replace. There is an external function in utils called create_and_replace, BaseTuner has public method called create_and_replace as well as a private method _create_and_replace while all doing very different things. This makes it difficult to understand and something that is complicated when it shouldn't be the case.
How about the following:
- I feel the util function of
mapping.pynamedcreate_and_replaceis not required at all. See the comment on transformers PR: huggingface/transformers#25077 (comment) - Rename the public method of
BaseTunernamedcreate_and_replacetoadd_adapter(preferred) orinject_adapterorfind_create_and_replace, WDYT?
| PeftType, | ||
| _freeze_adapter, | ||
| _get_submodules, | ||
| _is_valid_match, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why move is_valid_match which is just ia3 specific to others?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good point, I thought other tuner methods could benefit from it (since the method looks quite generic) in the future so I moved it into the utils
core] PEFT refactor + introducing create_and_replace public methodcore] PEFT refactor + introducing inject_adapter_in_model public method
pacman100
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for all the changes and bearing with me 🤗. Left a comment
|
|
||
| tuner_cls = PEFT_TYPE_TO_TUNER_MAPPING[peft_config.peft_type] | ||
|
|
||
| peft_model = tuner_cls(model, peft_config, adapter_name=adapter_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I went down the path correctly here it looks like this line will set required_grad=False for all weights of the model. For huggingface/transformers#25077 this would mean that if a load lora weights of a model all weights of the model (except the lora weights) are frozen - I would not expect such a behavior here.
Also it seems like the peft config decides further down the road here:
Line 203 in ec267c6
| if self.peft_config[adapter_name].inference_mode: |
Can we add a flag here that would allow for this ?
Also I'd maybe add a comment to make it a bit easier for the user to understand what this line does.
| peft_model = tuner_cls(model, peft_config, adapter_name=adapter_name) | |
| # By instantiating a peft model we are injecting randomly initialized LoRA layers into the model's modules. | |
| peft_model = tuner_cls(model, peft_config, adapter_name=adapter_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Freezing the base model should be optional when you load a pretrained adapter I think you are right
However when you attach fresh new adapters usually (for 99% of the usecases) it is to train them, so maybe we should make a distinction
1- load a pretrained adapter --> not necessearly freeze the base model
2- add a new random adapter --> I would expect users to train the adapters --> freeze the base model
WDYT? @pacman100 @BenjaminBossan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, I would maybe give the user full flexibility here then and add a new function argument:
freeze_weights=True/False/"non-lora-only" or something similar? I think when calling this from transformers we don't want all the model to be frozen, but for other use cases it could very much be the case I'd say
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1- load a pretrained adapter --> not necessearly freeze the base model
2- add a new random adapter --> I would expect users to train the adapters --> freeze the base model
That makes sense, I would maybe give the user full flexibility here then and add a new function argument
Personally, I would prefer to give the user an explicit argument, rather than (or in addition to) trying to guess based on the circumstances, so I would go with that option.
@patrickvonplaten: You bring up some good points. I will still merge the PR as is because Younes will be OoO for some time and we don't want this big refactor to become stale. We should be able to address your concerns in a follow up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! Agree that giving the user an explicit argument is the right option here
Co-authored-by: Patrick von Platen <[email protected]>
When a user tries to add a 2nd adapter, Lora and AdaLora make some checks to ensure the new adapter is compatible with existing adapters. Currently, that check is performed halfway through the method. This means that if the check fails, the new adapter is partially applied, leaving the model in a bad state. The main purpose of this PR is to ensure that the model state is correct after such a failure is encountered. Tests were added to catch this potential bug. While working on this, I also did some related, but not strictly necessary changes to the add_adapter methods: - Previously, the peft_config from the PeftModel was passed to the base model. This meant that sometimes, the base model would hold a reference to PeftModel.peft_config, but not always, as some base models would create new dicts. This is problematic, because some code would rely on the objects being the same. Now, they are never the same, leading to more consistency. - I think that the check if multiple adapters have biases (which is not supported) was accidentally removed by huggingface#749. It is added back in. - Add some type annotations - Extend docstrings to contain adapter_name
When a user tries to add a 2nd adapter, Lora and AdaLora make some checks to ensure the new adapter is compatible with existing adapters. Currently, that check is performed halfway through the method. This means that if the check fails, the new adapter is partially applied, leaving the model in a bad state. The main purpose of this PR is to ensure that the model state is correct after such a failure is encountered. Tests were added to catch this potential bug. While working on this, I also did some related, but not strictly necessary changes to the add_adapter methods: - Previously, the peft_config from the PeftModel was passed to the base model. This meant that sometimes, the base model would hold a reference to PeftModel.peft_config, but not always, as some base models would create new dicts. This is problematic, because some code would rely on the objects being the same. Now, they are never the same, leading to more consistency. - I think that the check if multiple adapters have biases (which is not supported) was accidentally removed by #749. It is added back in. - Add some type annotations - Extend docstrings to contain adapter_name
… method (huggingface#749) Refactors a bit the internals of some PEFT models and introduces a new method inject_adapter_in_model for users that want to pass a bare model and a peft config to inject adapters in-place into the model. These changes are totally BC with the previous PEFT versions. This PR makes things easier for the PEFT integration in transformers huggingface/transformers#25077 The main goal of the PR is to expose a new API for advanced users that want to integrate PEFT method without the need to use the PeftModel wrapper. A simple use case could be someone that wants to inject adapters into a model and wants to keep the original class of the model without having to offload that to peft that will create a PeftModel. I have faced this issue in huggingface/transformers#25077 Among other things, this PR refactors some internals of PEFT library, while keeping it fully backward compatible. To tackle the main motivation I propose to differentiate things between two type of adapters 1- adapters that are injectable (LoRA, AdaLoRA, IA3) 2- adapters that are not injectable (the rest) As a first iteration this API would be supported only for the scenario 1- / therefore I decided to create 2 abstract classes to make things easy to be able to determine if the adapter layer (e.g. LoraLayer) / adapter module (e.g. LoraModel) does follow the minimal requirement (i.e. needed attributes, etc.) Other related changes: 1- Creates a new property method is_prompt_learning to avoid importing PromptLearningConfig all the way down 2- Introduces a new object TUNERS_MAPPING, which is a mapping of supported pluggable adapters 3- Creates two abstract classes 3.1- BaseTunerLayer: a mixin to check for minimal required attributes that a tuner layer should have active_adapter / _is_plugable 3.2- BaseTuner: a higher level module mixin that should be used for any injectable adapters in the future. --------- Co-authored-by: Benjamin Bossan <[email protected]> Co-authored-by: Patrick von Platen <[email protected]>
When a user tries to add a 2nd adapter, Lora and AdaLora make some checks to ensure the new adapter is compatible with existing adapters. Currently, that check is performed halfway through the method. This means that if the check fails, the new adapter is partially applied, leaving the model in a bad state. The main purpose of this PR is to ensure that the model state is correct after such a failure is encountered. Tests were added to catch this potential bug. While working on this, I also did some related, but not strictly necessary changes to the add_adapter methods: - Previously, the peft_config from the PeftModel was passed to the base model. This meant that sometimes, the base model would hold a reference to PeftModel.peft_config, but not always, as some base models would create new dicts. This is problematic, because some code would rely on the objects being the same. Now, they are never the same, leading to more consistency. - I think that the check if multiple adapters have biases (which is not supported) was accidentally removed by huggingface#749. It is added back in. - Add some type annotations - Extend docstrings to contain adapter_name
…rCompletionOnlyLM (huggingface#749) * Update utils.py * correctly assign instruction_template in DataCollatorForCompletionOnlyLM * correctly use instruction_token_ids in DataCollatorForCompletionOnlyLM * DataCollatorForCompletionOnlyLM: fix instruction_template / response_template type check: handle cases where instruction_template is None * make precommit * Test DataCollatorForCompletionOnlyLM with pre-tokenized instruction_template
What does this PR do ?
Refactors a bit the internals of some PEFT models (making some methods static, etc.) and introduces a new method
create_and_replacefor users that want to pass a bare model, a peft config to inject adapters in-place into the model.These changes are totally BC with the previous PEFT versions.
This PR makes things easier for the PEFT integration in transformers huggingface/transformers#25077
cc @pacman100 @BenjaminBossan
The main goal of the PR is to expose a new API for advanced users that want to integrate PEFT method without the need to use
PeftModelwrappers. A simple usecase could be someone that wants to inject adapters on a model and does want to keep the original class of the model without having to offload that topeftthat will create aPeftModel. I have faced this issue in huggingface/transformers#25077Among other things, this PR refactors some internals of PEFT library, while keeping it fully backward compatible.
To tackle the main motivation I propose to differentiate things between two type of adapters
1- adapters that are injectable (LoRA, AdaLoRA, IA3)
2- adapters that are not injectable (the rest)
As a first iteration this API would be supported only for the scenario 1- / therefore I decided to create 2 abstract classes to make things easy to be able to determine if the adapter layer (e.g.
LoraLayer) / adapter module (e.g.LoraModel) does follow the minimal requirement (i.e. needed attributes, etc.)Notable changes:
1- make a static method
_get_peft_state_dictto be able to simply retrieve the state dict of the adapters for anyone that wants to use that method. I plan also to move that method in utils2- PeftConfig now returns the correct class
3- Creates a new property method
is_prompt_learningto avoid importingPromptLearningConfigall the way down4- Introduces a new object
TUNERS_MAPPING, which is a mapping of supported pluggable adapters5- Creates two abstract classes
5.1-
BaseTunerLayerMixina mixin to check for minimal required attributes that a tuner layer should haveactive_adapter/_is_plugable5.2-
BaseTunerMixina higher level module mixin that should be used for any injectable adapters in the future.TODO: