huggingface
diff --git a/‎docs/source/accelerate/deepspeed.md‎
Lines changed: 18 additions & 27 deletions b/‎docs/source/accelerate/deepspeed.md‎
Lines changed: 18 additions & 27 deletions
diff --git a/‎docs/source/conceptual_guides/adapter.md‎
Lines changed: 2 additions & 5 deletions b/‎docs/source/conceptual_guides/adapter.md‎
Lines changed: 2 additions & 5 deletions
diff --git a/‎docs/source/developer_guides/checkpoint.md‎
Lines changed: 4 additions & 10 deletions b/‎docs/source/developer_guides/checkpoint.md‎
Lines changed: 4 additions & 10 deletions
diff --git a/‎docs/source/developer_guides/custom_models.md‎
Lines changed: 5 additions & 11 deletions b/‎docs/source/developer_guides/custom_models.md‎
Lines changed: 5 additions & 11 deletions
diff --git a/‎docs/source/developer_guides/lora.md‎
Lines changed: 43 additions & 55 deletions b/‎docs/source/developer_guides/lora.md‎
Lines changed: 43 additions & 55 deletions
diff --git a/‎docs/source/developer_guides/troubleshooting.md‎
Lines changed: 6 additions & 15 deletions b/‎docs/source/developer_guides/troubleshooting.md‎
Lines changed: 6 additions & 15 deletions
diff --git a/‎docs/source/quicktour.md‎
Lines changed: 4 additions & 10 deletions b/‎docs/source/quicktour.md‎
Lines changed: 4 additions & 10 deletions
@@ -276,11 +276,8 @@ In the above example, the memory consumed per GPU is **36.6 GB**. Therefore, wha
 # Use PEFT and DeepSpeed with ZeRO3 and CPU Offloading for finetuning large models on a single GPU
 This section of guide will help you learn how to use our DeepSpeed [training script](https://github.com/huggingface/peft/blob/main/examples/conditional_generation/peft_lora_seq2seq_accelerate_ds_zero3_offload.py). You'll configure the script to train a large model for conditional generation with ZeRO-3 and CPU Offload.
 
-<Tip>
-
-💡 To help you get started, check out our example training scripts for [causal language modeling](https://github.com/huggingface/peft/blob/main/examples/causal_language_modeling/peft_lora_clm_accelerate_ds_zero3_offload.py) and [conditional generation](https://github.com/huggingface/peft/blob/main/examples/conditional_generation/peft_lora_seq2seq_accelerate_ds_zero3_offload.py). You can adapt these scripts for your own applications or even use them out of the box if your task is similar to the one in the scripts.
-
-</Tip>
+> [!TIP]
+> 💡 To help you get started, check out our example training scripts for [causal language modeling](https://github.com/huggingface/peft/blob/main/examples/causal_language_modeling/peft_lora_clm_accelerate_ds_zero3_offload.py) and [conditional generation](https://github.com/huggingface/peft/blob/main/examples/conditional_generation/peft_lora_seq2seq_accelerate_ds_zero3_offload.py). You can adapt these scripts for your own applications or even use them out of the box if your task is similar to the one in the scripts.
 
 ## Configuration
 
@@ -338,11 +335,8 @@ Let's dive a little deeper into the script so you can see what's going on, and u
 
 Within the [`main`](https://github.com/huggingface/peft/blob/2822398fbe896f25d4dac5e468624dc5fd65a51b/examples/conditional_generation/peft_lora_seq2seq_accelerate_ds_zero3_offload.py#L103) function, the script creates an [`~accelerate.Accelerator`] class to initialize all the necessary requirements for distributed training.
 
-<Tip>
-
-💡 Feel free to change the model and dataset inside the `main` function. If your dataset format is different from the one in the script, you may also need to write your own preprocessing function. 
-
-</Tip>
+> [!TIP]
+> 💡 Feel free to change the model and dataset inside the `main` function. If your dataset format is different from the one in the script, you may also need to write your own preprocessing function.
 
 The script also creates a configuration for the 🤗 PEFT method you're using, which in this case, is LoRA. The [`LoraConfig`] specifies the task type and important parameters such as the dimension of the low-rank matrices, the matrices scaling factor, and the dropout probability of the LoRA layers. If you want to use a different 🤗 PEFT method, make sure you replace `LoraConfig` with the appropriate [class](../package_reference/tuners).
 
@@ -439,20 +433,17 @@ dataset['train'][label_column][:10]=['no complaint', 'no complaint', 'complaint'
 2. When using CPU offloading, the major gains from using PEFT to shrink the optimizer states and gradients to that of the adapter weights would be realized on CPU RAM and there won't be savings with respect to GPU memory.
 3. DeepSpeed Stage 3 and qlora when used with CPU offloading leads to more GPU memory usage when compared to disabling CPU offloading. 
 
-<Tip>
-
-💡 When you have code that requires merging (and unmerging) of weights, try to manually collect the parameters with DeepSpeed Zero-3 beforehand:
-
-```python
-import deepspeed
-
-is_ds_zero_3 = ... # check if Zero-3
-
-with deepspeed.zero.GatheredParameters(list(model.parameters()), enabled= is_ds_zero_3):
-    model.merge_adapter()
-    # do whatever is needed, then unmerge in the same context if unmerging is required
-    ...
-    model.unmerge_adapter()
-```
-
-</Tip>
+> [!TIP]
+> 💡 When you have code that requires merging (and unmerging) of weights, try to manually collect the parameters with DeepSpeed Zero-3 beforehand:
+>
+> ```python
+> import deepspeed
+>
+> is_ds_zero_3 = ... # check if Zero-3
+>
+> with deepspeed.zero.GatheredParameters(list(model.parameters()), enabled= is_ds_zero_3):
+>     model.merge_adapter()
+>     # do whatever is needed, then unmerge in the same context if unmerging is required
+>     ...
+>     model.unmerge_adapter()
+> ```
@@ -22,11 +22,8 @@ This guide will give you a brief overview of the adapter methods supported by PE
 
 ## Low-Rank Adaptation (LoRA)
 
-<Tip>
-
-LoRA is one of the most popular PEFT methods and a good starting point if you're just getting started with PEFT. It was originally developed for large language models but it is a tremendously popular training method for diffusion models because of its efficiency and effectiveness.
-
-</Tip>
+> [!TIP]
+> LoRA is one of the most popular PEFT methods and a good starting point if you're just getting started with PEFT. It was originally developed for large language models but it is a tremendously popular training method for diffusion models because of its efficiency and effectiveness.
 
 As mentioned briefly earlier, [LoRA](https://hf.co/papers/2106.09685) is a technique that accelerates finetuning large models while consuming less memory.
 
 
@@ -129,21 +129,15 @@ Let's break this down:
 - By default, LoRA isn't applied to BERT's embedding layer, so there are _no entries_ for `lora_A_embedding` and `lora_B_embedding`.
 - The keys of the `state_dict` always start with `"base_model.model."`. The reason is that, in PEFT, we wrap the base model inside a tuner-specific model (`LoraModel` in this case), which itself is wrapped in a general PEFT model (`PeftModel`). For this reason, these two prefixes are added to the keys. When converting to the PEFT format, it is required to add these prefixes.
 
-<Tip>
-
-This last point is not true for prefix tuning techniques like prompt tuning. There, the extra embeddings are directly stored in the `state_dict` without any prefixes added to the keys.
-
-</Tip>
+> [!TIP]
+> This last point is not true for prefix tuning techniques like prompt tuning. There, the extra embeddings are directly stored in the `state_dict` without any prefixes added to the keys.
 
 When inspecting the parameter names in the loaded model, you might be surprised to find that they look a bit different, e.g. `base_model.model.encoder.layer.0.attention.self.query.lora_A.default.weight`. The difference is the *`.default`* part in the second to last segment. This part exists because PEFT generally allows the addition of multiple adapters at once (using an `nn.ModuleDict` or `nn.ParameterDict` to store them). For example, if you add another adapter called "other", the key for that adapter would be `base_model.model.encoder.layer.0.attention.self.query.lora_A.other.weight`.
 
 When you call [`~PeftModel.save_pretrained`], the adapter name is stripped from the keys. The reason is that the adapter name is not an important part of the model architecture; it is just an arbitrary name. When loading the adapter, you could choose a totally different name, and the model would still work the same way. This is why the adapter name is not stored in the checkpoint file.
 
-<Tip>
-
-If you call `save_pretrained("some/path")` and the adapter name is not `"default"`, the adapter is stored in a sub-directory with the same name as the adapter. So if the name is "other", it would be stored inside of `some/path/other`.
-
-</Tip>
+> [!TIP]
+> If you call `save_pretrained("some/path")` and the adapter name is not `"default"`, the adapter is stored in a sub-directory with the same name as the adapter. So if the name is "other", it would be stored inside of `some/path/other`.
 
 In some circumstances, deciding which values to add to the checkpoint file can become a bit more complicated. For example, in PEFT, DoRA is implemented as a special case of LoRA. If you want to convert a DoRA model to PEFT, you should create a LoRA checkpoint with extra entries for DoRA. You can see this in the `__init__` of the previous `LoraLayer` code:
 
 
@@ -48,12 +48,9 @@ class MLP(nn.Module):
 
 This is a straightforward multilayer perceptron with an input layer, a hidden layer, and an output layer.
 
-<Tip>
-
-For this toy example, we choose an exceedingly large number of hidden units to highlight the efficiency gains
-from PEFT, but those gains are in line with more realistic examples.
-
-</Tip>
+> [!TIP]
+> For this toy example, we choose an exceedingly large number of hidden units to highlight the efficiency gains
+> from PEFT, but those gains are in line with more realistic examples.
 
 There are a few linear layers in this model that could be tuned with LoRA. When working with common 🤗 Transformers
 models, PEFT will know which layers to apply LoRA to, but in this case, it is up to us as a user to choose the layers.
@@ -272,11 +269,8 @@ peft_model = get_peft_model(base_model, config)
 # do training
 ```
 
-<Tip>
-
-When you call [`get_peft_model`], you will see a warning because PEFT does not recognize the targeted module type. In this case, you can ignore this warning.
-
-</Tip>
+> [!TIP]
+> When you call [`get_peft_model`], you will see a warning because PEFT does not recognize the targeted module type. In this case, you can ignore this warning.
 
 By supplying a custom mapping, PEFT first checks the base model's layers against the custom mapping and dispatches to the custom LoRA layer type if there is a match. If there is no match, PEFT checks the built-in LoRA layer types for a match.
 
 
@@ -119,11 +119,8 @@ initialize_lora_eva_weights(peft_model, dataloader)
 ```
 EVA works out of the box with bitsandbytes. Simply initialize the model with `quantization_config` and call [`initialize_lora_eva_weights`] as usual.
 
-<Tip>
-
-For further instructions on using EVA, please refer to our [documentation](https://github.com/huggingface/peft/tree/main/examples/eva_finetuning).
-
-</Tip>
+> [!TIP]
+> For further instructions on using EVA, please refer to our [documentation](https://github.com/huggingface/peft/tree/main/examples/eva_finetuning).
 
 ### LoftQ
 
@@ -158,11 +155,8 @@ At the moment, `replace_lora_weights_loftq` has these additional limitations:
 - Model files must be stored as a `safetensors` file.
 - Only bitsandbytes 4bit quantization is supported.
 
-<Tip>
-
-Learn more about how PEFT works with quantization in the [Quantization](quantization) guide.
-
-</Tip>
+> [!TIP]
+> Learn more about how PEFT works with quantization in the [Quantization](quantization) guide.
 
 ### Rank-stabilized LoRA
 
@@ -570,11 +564,8 @@ model.add_weighted_adapter(
 model.set_adapter(weighted_adapter_name)
 ```
 
-<Tip>
-
-There are several supported methods for `combination_type`. Refer to the [documentation](../package_reference/lora#peft.LoraModel.add_weighted_adapter) for more details. Note that "svd" as the `combination_type` is not supported when using `torch.float16` or `torch.bfloat16` as the datatype.
-
-</Tip>
+> [!TIP]
+> There are several supported methods for `combination_type`. Refer to the [documentation](../package_reference/lora#peft.LoraModel.add_weighted_adapter) for more details. Note that "svd" as the `combination_type` is not supported when using `torch.float16` or `torch.bfloat16` as the datatype.
 
 Now, perform inference:
 
@@ -792,43 +783,40 @@ model = create_arrow_model(
 ```
 To encode general knowledge, GenKnowSub subtracts the average of the provided general adapters from each task-specific adapter once, before routing begins. Furthermore, the ability to add or remove adapters after calling ```create_arrow_model``` (as described in the Arrow section) is still supported in this case.
 
-<Tip>
-
-**Things to keep in mind when using Arrow + GenKnowSub:**
-
-- All LoRA adapters (task-specific and general) must share the same ```rank``` and ```target_modules```.
-
-- Any inconsistency in these settings will raise an error in ```create_arrow_model```.
-
-- Having different scaling factors (```lora_alpha```) across task adapters is supported — Arrow handles them automatically.
-
-- Merging the ```"arrow_router"``` is not supported, due to its dynamic routing behavior.
-
-- In create_arrow_model, task adapters are loaded as ```task_i``` and general adapters as ```gks_j``` (where ```i``` and ```j``` are indices). The function ensures consistency of ```target_modules```, ```rank```, and whether adapters are applied to ```Linear``` or ```Linear4bit``` layers. It then adds the ```"arrow_router"``` module and activates it. Any customization of this process requires overriding ```create_arrow_model```.
-
-- This implementation is compatible with 4-bit quantization (via bitsandbytes):
-
-    ```py
-    from transformers import AutoModelForCausalLM, BitsAndBytesConfig
-    import torch
-
-    # Quantisation config
-    bnb_config = BitsAndBytesConfig(
-            load_in_4bit=True,
-            bnb_4bit_quant_type="nf4",
-            bnb_4bit_compute_dtype=torch.bfloat16,
-            bnb_4bit_use_double_quant=False,
-        )
-
-    # Loading the model
-    base_model = AutoModelForCausalLM.from_pretrained(
-        "microsoft/Phi-3-mini-4k-instruct",
-        torch_dtype=torch.bfloat16,
-        device_map="auto",
-        quantization_config=bnb_config,
-    )
-
-    # Now call create_arrow_model() as we explained before.
-    ```
-
-</Tip>
+> [!TIP]
+> **Things to keep in mind when using Arrow + GenKnowSub:**
+>
+> - All LoRA adapters (task-specific and general) must share the same ```rank``` and ```target_modules```.
+>
+> - Any inconsistency in these settings will raise an error in ```create_arrow_model```.
+>
+> - Having different scaling factors (```lora_alpha```) across task adapters is supported — Arrow handles them automatically.
+>
+> - Merging the ```"arrow_router"``` is not supported, due to its dynamic routing behavior.
+>
+> - In create_arrow_model, task adapters are loaded as ```task_i``` and general adapters as ```gks_j``` (where ```i``` and ```j``` are indices). The function ensures consistency of ```target_modules```, ```rank```, and whether adapters are applied to ```Linear``` or ```Linear4bit``` layers. It then adds the ```"arrow_router"``` module and activates it. Any customization of this process requires overriding ```create_arrow_model```.
+>
+> - This implementation is compatible with 4-bit quantization (via bitsandbytes):
+>
+>     ```py
+>     from transformers import AutoModelForCausalLM, BitsAndBytesConfig
+>     import torch
+>
+>     # Quantisation config
+>     bnb_config = BitsAndBytesConfig(
+>             load_in_4bit=True,
+>             bnb_4bit_quant_type="nf4",
+>             bnb_4bit_compute_dtype=torch.bfloat16,
+>             bnb_4bit_use_double_quant=False,
+>         )
+>
+>     # Loading the model
+>     base_model = AutoModelForCausalLM.from_pretrained(
+>         "microsoft/Phi-3-mini-4k-instruct",
+>         torch_dtype=torch.bfloat16,
+>         device_map="auto",
+>         quantization_config=bnb_config,
+>     )
+>
+>     # Now call create_arrow_model() as we explained before.
+>     ```
@@ -71,11 +71,8 @@ trainer = Trainer(model=peft_model, fp16=True, ...)
 trainer.train()
 ```
 
-<Tip>
-
-Starting from PEFT version v0.12.0, PEFT automatically promotes the dtype of adapter weights from `torch.float16` and `torch.bfloat16` to `torch.float32` where appropriate. To _prevent_ this behavior, you can pass `autocast_adapter_dtype=False` to [`~get_peft_model`], to [`~PeftModel.from_pretrained`], and to [`~PeftModel.load_adapter`].
-
-</Tip>
+> [!TIP]
+> Starting from PEFT version v0.12.0, PEFT automatically promotes the dtype of adapter weights from `torch.float16` and `torch.bfloat16` to `torch.float32` where appropriate. To _prevent_ this behavior, you can pass `autocast_adapter_dtype=False` to [`~get_peft_model`], to [`~PeftModel.from_pretrained`], and to [`~PeftModel.load_adapter`].
 
 ### Selecting the dtype of the adapter
 
@@ -137,11 +134,8 @@ You should probably TRAIN this model on a down-stream task to be able to use it
 
 The mentioned layers should be added to `modules_to_save` in the config to avoid the described problem.
 
-<Tip>
-
-As an example, when loading a model that is using the DeBERTa architecture for sequence classification, you'll see a warning that the following weights are newly initialized: `['classifier.bias', 'classifier.weight', 'pooler.dense.bias', 'pooler.dense.weight']`. From this, it follows that the `classifier` and `pooler` layers should be added to: `modules_to_save=["classifier", "pooler"]`.
-
-</Tip>
+> [!TIP]
+> As an example, when loading a model that is using the DeBERTa architecture for sequence classification, you'll see a warning that the following weights are newly initialized: `['classifier.bias', 'classifier.weight', 'pooler.dense.bias', 'pooler.dense.weight']`. From this, it follows that the `classifier` and `pooler` layers should be added to: `modules_to_save=["classifier", "pooler"]`.
 
 ### Extending the vocabulary
 
@@ -345,11 +339,8 @@ TunerModelStatus(
 
 Loading adapters like LoRA weights should generally be fast compared to loading the base model. However, there can be use cases where the adapter weights are quite large or where users need to load a large number of adapters -- the loading time can add up in this case. The reason for this is that the adapter weights are first initialized and then overridden by the loaded weights, which is wasteful. To speed up the loading time, you can pass the `low_cpu_mem_usage=True` argument to [`~PeftModel.from_pretrained`] and [`~PeftModel.load_adapter`].
 
-<Tip>
-
-If this option works well across different use cases, it may become the default for adapter loading in the future.
-
-</Tip>
+> [!TIP]
+> If this option works well across different use cases, it may become the default for adapter loading in the future.
 
 
 ## Reproducibility
 
@@ -36,11 +36,8 @@ from peft import LoraConfig, TaskType
 peft_config = LoraConfig(task_type=TaskType.SEQ_2_SEQ_LM, inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1)
 ```
 
-<Tip>
-
-See the [`LoraConfig`] reference for more details about other parameters you can adjust, such as the modules to target or the bias type.
-
-</Tip>
+> [!TIP]
+> See the [`LoraConfig`] reference for more details about other parameters you can adjust, such as the modules to target or the bias type.
 
 Once the [`LoraConfig`] is setup, create a [`PeftModel`] with the [`get_peft_model`] function. It takes a base model - which you can load from the Transformers library - and the [`LoraConfig`] containing the parameters for how to configure a model for training with LoRA.
 
@@ -124,11 +121,8 @@ Both methods only save the extra PEFT weights that were trained, meaning it is s
 
 ## Inference
 
-<Tip>
-
-Take a look at the [AutoPeftModel](package_reference/auto_class) API reference for a complete list of available `AutoPeftModel` classes.
-
-</Tip>
+> [!TIP]
+> Take a look at the [AutoPeftModel](package_reference/auto_class) API reference for a complete list of available `AutoPeftModel` classes.
 
 Easily load any PEFT-trained model for inference with the [`AutoPeftModel`] class and the [`~transformers.PreTrainedModel.from_pretrained`] method: