Skip to content
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
b04903b
move deepspeed to `lib_integrations.deepspeed`
younesbelkada Aug 18, 2023
9453fa2
more refactor
younesbelkada Aug 18, 2023
7c0b4bb
oops
younesbelkada Aug 18, 2023
582fbde
fix slow tests
younesbelkada Aug 18, 2023
190b83f
Fix docs
younesbelkada Aug 18, 2023
026f53c
fix docs
younesbelkada Aug 18, 2023
b30adec
addess feedback
younesbelkada Aug 21, 2023
f8afb0a
address feedback
younesbelkada Aug 21, 2023
de497d4
final modifs for PEFT
younesbelkada Aug 21, 2023
b2e1672
Merge remote-tracking branch 'upstream/main' into move-integrations
younesbelkada Aug 21, 2023
e4e245b
fixup
younesbelkada Aug 21, 2023
5668474
Merge branch 'main' into move-integrations
younesbelkada Aug 21, 2023
f4d8c83
ok now
younesbelkada Aug 21, 2023
bc7a6ae
Merge branch 'move-integrations' of https://github.com/younesbelkada/…
younesbelkada Aug 21, 2023
c80cfd1
trigger CI
younesbelkada Aug 21, 2023
656c411
trigger CI again
younesbelkada Aug 21, 2023
80d2775
Update docs/source/en/main_classes/deepspeed.md
younesbelkada Aug 22, 2023
7cc7cbb
import from `integrations`
younesbelkada Aug 22, 2023
b4c4cf7
address feedback
younesbelkada Aug 22, 2023
bd95ee2
revert removal of `deepspeed` module
younesbelkada Aug 22, 2023
615ac14
revert removal of `deepspeed` module
younesbelkada Aug 22, 2023
b8fcf61
fix conflicts
younesbelkada Aug 22, 2023
be38218
ooops
younesbelkada Aug 22, 2023
310ceb1
oops
younesbelkada Aug 22, 2023
bb0a025
Merge remote-tracking branch 'upstream/main' into move-integrations
younesbelkada Aug 22, 2023
b756ace
add deprecation warning
younesbelkada Aug 22, 2023
080fc2f
place it on the top
younesbelkada Aug 22, 2023
d50051a
put `FutureWarning`
younesbelkada Aug 23, 2023
72fd103
fix conflicts with not_doctested.txt
younesbelkada Aug 23, 2023
5773b33
add back `bitsandbytes` module with a depr warning
younesbelkada Aug 23, 2023
8ace6bd
fix
younesbelkada Aug 23, 2023
10d3b77
Merge remote-tracking branch 'upstream/main' into move-integrations
younesbelkada Aug 23, 2023
7b6098c
fix
younesbelkada Aug 23, 2023
89f4ebd
fixup
younesbelkada Aug 23, 2023
3107a96
oops
younesbelkada Aug 23, 2023
fa451d4
fix doctests
younesbelkada Aug 23, 2023
33412d3
Merge branch 'main' into move-integrations
younesbelkada Aug 23, 2023
4b4c681
Merge remote-tracking branch 'upstream/main' into move-integrations
younesbelkada Aug 24, 2023
10d6e18
Merge remote-tracking branch 'upstream/main' into move-integrations
younesbelkada Aug 25, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions docs/source/en/main_classes/deepspeed.md
Original file line number Diff line number Diff line change
Expand Up @@ -2065,20 +2065,20 @@ In this case you usually need to raise the value of `initial_scale_power`. Setti

## Non-Trainer Deepspeed Integration

The [`~deepspeed.HfDeepSpeedConfig`] is used to integrate Deepspeed into the 🤗 Transformers core
The [`~integrations.deepspeed.HfDeepSpeedConfig`] is used to integrate Deepspeed into the 🤗 Transformers core
functionality, when [`Trainer`] is not used. The only thing that it does is handling Deepspeed ZeRO-3 param gathering and automatically splitting the model onto multiple gpus during `from_pretrained` call. Everything else you have to do by yourself.

When using [`Trainer`] everything is automatically taken care of.

When not using [`Trainer`], to efficiently deploy DeepSpeed ZeRO-3, you must instantiate the
[`~deepspeed.HfDeepSpeedConfig`] object before instantiating the model and keep that object alive.
[`~integrations.deepspeed.HfDeepSpeedConfig`] object before instantiating the model and keep that object alive.

If you're using Deepspeed ZeRO-1 or ZeRO-2 you don't need to use `HfDeepSpeedConfig` at all.

For example for a pretrained model:

```python
from transformers.deepspeed import HfDeepSpeedConfig
from transformers.integrations.deepspeed import HfDeepSpeedConfig
from transformers import AutoModel
import deepspeed

Expand All @@ -2092,7 +2092,7 @@ engine = deepspeed.initialize(model=model, config_params=ds_config, ...)
or for non-pretrained model:

```python
from transformers.deepspeed import HfDeepSpeedConfig
from transformers.integrations.deepspeed import HfDeepSpeedConfig
from transformers import AutoModel, AutoConfig
import deepspeed

Expand All @@ -2108,7 +2108,7 @@ Please note that if you're not using the [`Trainer`] integration, you're complet

## HfDeepSpeedConfig

[[autodoc]] deepspeed.HfDeepSpeedConfig
[[autodoc]] integrations.deepspeed.HfDeepSpeedConfig
- all

### Custom DeepSpeed ZeRO Inference
Expand Down Expand Up @@ -2161,7 +2161,7 @@ Make sure to:


from transformers import AutoTokenizer, AutoConfig, AutoModelForSeq2SeqLM
from transformers.deepspeed import HfDeepSpeedConfig
from transformers.integrations.deepspeed import HfDeepSpeedConfig
import deepspeed
import os
import torch
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@

from parameterized import parameterized # noqa
from transformers import TrainingArguments, is_torch_available # noqa
from transformers.deepspeed import is_deepspeed_available # noqa
from transformers.integrations.deepspeed import is_deepspeed_available # noqa
from transformers.file_utils import WEIGHTS_NAME # noqa
from transformers.testing_utils import ( # noqa
CaptureLogger,
Expand Down
8 changes: 4 additions & 4 deletions src/transformers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -111,8 +111,10 @@
"is_tensorboard_available",
"is_wandb_available",
],
"lib_integrations": [],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing an integrations key here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"lib_integrations.peft": [],
"integrations.bitsandbytes": [],
"integrations.deepspeed": [],
"integrations.integration_utils": [],
"integrations.peft": [],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can all be removed, you just need to put integrations.

"modelcard": ["ModelCard"],
"modeling_tf_pytorch_utils": [
"convert_tf_weight_name_to_pt_weight_name",
Expand Down Expand Up @@ -733,7 +735,6 @@
"is_vision_available",
"logging",
],
"utils.bitsandbytes": [],
"utils.quantization_config": ["BitsAndBytesConfig", "GPTQConfig"],
}

Expand Down Expand Up @@ -989,7 +990,6 @@
"TextDataset",
"TextDatasetForNextSentencePrediction",
]
_import_structure["deepspeed"] = []
_import_structure["generation"].extend(
[
"BeamScorer",
Expand Down
2 changes: 1 addition & 1 deletion src/transformers/generation/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
import torch.distributed as dist
from torch import nn

from ..deepspeed import is_deepspeed_zero3_enabled
from ..integrations.deepspeed import is_deepspeed_zero3_enabled
from ..modeling_outputs import CausalLMOutputWithPast, Seq2SeqLMOutput
from ..models.auto import (
MODEL_FOR_CAUSAL_IMAGE_MODELING_MAPPING,
Expand Down
54 changes: 54 additions & 0 deletions src/transformers/integrations/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Copyright 2023 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from .bitsandbytes import bitsandbytes
from .deepspeed import deepspeed
from .integration_utils import (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For an additional followup, would be great to split all of those in their respective modules (have one for wandb, one for cometml etc)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed!

INTEGRATION_TO_CALLBACK,
AzureMLCallback,
ClearMLCallback,
CodeCarbonCallback,
CometCallback,
DagsHubCallback,
FlyteCallback,
MLflowCallback,
NeptuneCallback,
NeptuneMissingConfiguration,
TensorBoardCallback,
WandbCallback,
get_available_reporting_integrations,
get_reporting_integration_callbacks,
hp_params,
is_azureml_available,
is_clearml_available,
is_codecarbon_available,
is_comet_available,
is_dagshub_available,
is_fairscale_available,
is_flyte_deck_standard_available,
is_flytekit_available,
is_mlflow_available,
is_neptune_available,
is_optuna_available,
is_ray_available,
is_ray_tune_available,
is_sigopt_available,
is_tensorboard_available,
is_wandb_available,
rewrite_logs,
run_hp_search_optuna,
run_hp_search_ray,
run_hp_search_sigopt,
run_hp_search_wandb,
)
from .peft import PeftAdapterMixin
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,10 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from .peft import PeftAdapterMixin
from .bitsandbytes import (
get_keys_to_not_convert,
replace_8bit_linear,
replace_with_bnb_linear,
set_module_8bit_tensor_to_device,
set_module_quantized_tensor_to_device,
)
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,16 @@

from packaging import version

from ..utils import logging
from .import_utils import is_accelerate_available, is_bitsandbytes_available
from ...utils import logging
from ...utils.import_utils import is_accelerate_available, is_bitsandbytes_available


if is_bitsandbytes_available():
import bitsandbytes as bnb
import torch
import torch.nn as nn

from ..pytorch_utils import Conv1D
from ...pytorch_utils import Conv1D

if is_accelerate_available():
from accelerate import init_empty_weights
Expand Down
25 changes: 25 additions & 0 deletions src/transformers/integrations/deepspeed/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Copyright 2023 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from .deepspeed import (
HfDeepSpeedConfig,
HfTrainerDeepSpeedConfig,
deepspeed_config,
deepspeed_init,
deepspeed_load_checkpoint,
deepspeed_optim_sched,
is_deepspeed_available,
is_deepspeed_zero3_enabled,
set_hf_deepspeed_config,
unset_hf_deepspeed_config,
)
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,8 @@
import weakref
from functools import partialmethod

from .dependency_versions_check import dep_version_check
from .utils import is_accelerate_available, is_torch_available, logging
from ...dependency_versions_check import dep_version_check
from ...utils import is_accelerate_available, is_torch_available, logging


if is_torch_available():
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,8 @@

import numpy as np

from . import __version__ as version
from .utils import flatten_dict, is_datasets_available, is_pandas_available, is_torch_available, logging
from .. import __version__ as version
from ..utils import flatten_dict, is_datasets_available, is_pandas_available, is_torch_available, logging


logger = logging.get_logger(__name__)
Expand Down Expand Up @@ -68,10 +68,10 @@
except importlib.metadata.PackageNotFoundError:
_has_neptune = False

from .trainer_callback import ProgressCallback, TrainerCallback # noqa: E402
from .trainer_utils import PREFIX_CHECKPOINT_DIR, BestRun, IntervalStrategy # noqa: E402
from .training_args import ParallelMode # noqa: E402
from .utils import ENV_VARS_TRUE_VALUES, is_torch_tpu_available # noqa: E402
from ..trainer_callback import ProgressCallback, TrainerCallback # noqa: E402
from ..trainer_utils import PREFIX_CHECKPOINT_DIR, BestRun, IntervalStrategy # noqa: E402
from ..training_args import ParallelMode # noqa: E402
from ..utils import ENV_VARS_TRUE_VALUES, is_torch_tpu_available # noqa: E402


# Integration functions:
Expand Down
10 changes: 5 additions & 5 deletions src/transformers/modeling_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,10 +35,10 @@

from .activations import get_activation
from .configuration_utils import PretrainedConfig
from .deepspeed import deepspeed_config, is_deepspeed_zero3_enabled
from .dynamic_module_utils import custom_object_save
from .generation import GenerationConfig, GenerationMixin
from .lib_integrations import PeftAdapterMixin
from .integrations import PeftAdapterMixin
from .integrations.deepspeed import deepspeed_config, is_deepspeed_zero3_enabled
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import from .integrations

from .pytorch_utils import ( # noqa: F401
Conv1D,
apply_chunking_to_forward,
Expand Down Expand Up @@ -660,7 +660,7 @@ def _load_state_dict_into_meta_model(
# they won't get loaded.

if is_quantized:
from .utils.bitsandbytes import set_module_quantized_tensor_to_device
from .integrations.bitsandbytes import set_module_quantized_tensor_to_device
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here and below.


error_msgs = []

Expand Down Expand Up @@ -2937,7 +2937,7 @@ def from_pretrained(
keep_in_fp32_modules = []

if load_in_8bit or load_in_4bit:
from .utils.bitsandbytes import get_keys_to_not_convert, replace_with_bnb_linear
from .integrations.bitsandbytes import get_keys_to_not_convert, replace_with_bnb_linear

llm_int8_skip_modules = quantization_config.llm_int8_skip_modules
load_in_8bit_fp32_cpu_offload = quantization_config.llm_int8_enable_fp32_cpu_offload
Expand Down Expand Up @@ -3255,7 +3255,7 @@ def _load_pretrained_model(
):
is_safetensors = False
if is_quantized:
from .utils.bitsandbytes import set_module_quantized_tensor_to_device
from .integrations.bitsandbytes import set_module_quantized_tensor_to_device

if device_map is not None and "disk" in device_map.values():
archive_file = (
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
from torch.nn import CrossEntropyLoss

from ...activations import ACT2FN
from ...deepspeed import is_deepspeed_zero3_enabled
from ...integrations.deepspeed import is_deepspeed_zero3_enabled
from ...modeling_outputs import (
BaseModelOutput,
CausalLMOutput,
Expand Down
2 changes: 1 addition & 1 deletion src/transformers/models/deprecated/mctct/modeling_mctct.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,8 @@
from torch import nn

from ....activations import ACT2FN
from ....deepspeed import is_deepspeed_zero3_enabled
from ....file_utils import add_code_sample_docstrings, add_start_docstrings, add_start_docstrings_to_model_forward
from ....integrations.deepspeed import is_deepspeed_zero3_enabled
from ....modeling_outputs import BaseModelOutput, CausalLMOutput
from ....modeling_utils import (
PreTrainedModel,
Expand Down
2 changes: 1 addition & 1 deletion src/transformers/models/distilbert/modeling_distilbert.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@

from ...activations import get_activation
from ...configuration_utils import PretrainedConfig
from ...deepspeed import is_deepspeed_zero3_enabled
from ...integrations.deepspeed import is_deepspeed_zero3_enabled
from ...modeling_outputs import (
BaseModelOutput,
MaskedLMOutput,
Expand Down
2 changes: 1 addition & 1 deletion src/transformers/models/esm/modeling_esmfold.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
import torch.nn as nn
from torch.nn import LayerNorm

from ...deepspeed import is_deepspeed_available
from ...integrations.deepspeed import is_deepspeed_available
from ...modeling_outputs import ModelOutput
from ...utils import (
ContextManagers,
Expand Down
2 changes: 1 addition & 1 deletion src/transformers/models/fsmt/modeling_fsmt.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@
from torch.nn import CrossEntropyLoss, LayerNorm

from ...activations import ACT2FN
from ...deepspeed import is_deepspeed_zero3_enabled
from ...integrations.deepspeed import is_deepspeed_zero3_enabled
from ...modeling_outputs import (
BaseModelOutput,
BaseModelOutputWithPastAndCrossAttentions,
Expand Down
2 changes: 1 addition & 1 deletion src/transformers/models/hubert/modeling_hubert.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
from torch.nn import CrossEntropyLoss

from ...activations import ACT2FN
from ...deepspeed import is_deepspeed_zero3_enabled
from ...integrations.deepspeed import is_deepspeed_zero3_enabled
from ...modeling_outputs import BaseModelOutput, CausalLMOutput, SequenceClassifierOutput
from ...modeling_utils import PreTrainedModel
from ...utils import (
Expand Down
2 changes: 1 addition & 1 deletion src/transformers/models/m2m_100/modeling_m2m_100.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
from torch.nn import CrossEntropyLoss

from ...activations import ACT2FN
from ...deepspeed import is_deepspeed_zero3_enabled
from ...integrations.deepspeed import is_deepspeed_zero3_enabled
from ...modeling_outputs import (
BaseModelOutput,
BaseModelOutputWithPastAndCrossAttentions,
Expand Down
2 changes: 1 addition & 1 deletion src/transformers/models/nllb_moe/modeling_nllb_moe.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
from torch.utils.checkpoint import checkpoint

from ...activations import ACT2FN
from ...deepspeed import is_deepspeed_zero3_enabled
from ...integrations.deepspeed import is_deepspeed_zero3_enabled
from ...modeling_outputs import (
MoEModelOutput,
MoEModelOutputWithPastAndCrossAttentions,
Expand Down
2 changes: 1 addition & 1 deletion src/transformers/models/sew/modeling_sew.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
from torch.nn import CrossEntropyLoss

from ...activations import ACT2FN
from ...deepspeed import is_deepspeed_zero3_enabled
from ...integrations.deepspeed import is_deepspeed_zero3_enabled
from ...modeling_outputs import BaseModelOutput, CausalLMOutput, SequenceClassifierOutput
from ...modeling_utils import PreTrainedModel
from ...utils import add_code_sample_docstrings, add_start_docstrings, add_start_docstrings_to_model_forward, logging
Expand Down
2 changes: 1 addition & 1 deletion src/transformers/models/sew_d/modeling_sew_d.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
from torch.nn import CrossEntropyLoss, LayerNorm

from ...activations import ACT2FN
from ...deepspeed import is_deepspeed_zero3_enabled
from ...integrations.deepspeed import is_deepspeed_zero3_enabled
from ...modeling_outputs import BaseModelOutput, CausalLMOutput, SequenceClassifierOutput
from ...modeling_utils import PreTrainedModel
from ...pytorch_utils import softmax_backward_data
Expand Down
2 changes: 1 addition & 1 deletion src/transformers/models/speecht5/modeling_speecht5.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, L1Loss

from ...activations import ACT2FN
from ...deepspeed import is_deepspeed_zero3_enabled
from ...integrations.deepspeed import is_deepspeed_zero3_enabled
from ...modeling_outputs import (
BaseModelOutput,
BaseModelOutputWithPastAndCrossAttentions,
Expand Down
2 changes: 1 addition & 1 deletion src/transformers/models/unispeech/modeling_unispeech.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
from torch.nn import CrossEntropyLoss

from ...activations import ACT2FN
from ...deepspeed import is_deepspeed_zero3_enabled
from ...integrations.deepspeed import is_deepspeed_zero3_enabled
from ...modeling_outputs import BaseModelOutput, CausalLMOutput, SequenceClassifierOutput, Wav2Vec2BaseModelOutput
from ...modeling_utils import PreTrainedModel
from ...utils import (
Expand Down
Loading