[Unified Checkpoint] Checkpoint compression by wtmlon · Pull Request #9183 · PaddlePaddle/PaddleNLP

wtmlon · 2024-09-23T09:41:44Z

PR types

PR changes

Description

checkpoint 压缩功能实现
新增参数

--ckpt_quant_stage "O0"/"O1"/"O2"
```
O0：不压缩
```
```
O1：channel-wise int8 压缩
```
```
O2：group-wise int4 压缩 
```
--unified_checkpoint_config "remove_master_weight"

amp O2开启此 flag 不额外保存master weight权重

如果开启此 flag 去载入有 master weight 的 checkpoint，依旧会正常读取 master weight 进行载入

CLAassistant · 2024-09-23T09:41:50Z

All committers have signed the CLA.

codecov · 2024-09-23T10:16:02Z

Codecov Report

Attention: Patch coverage is 11.94969% with 280 lines in your changes missing coverage. Please review.

Project coverage is 52.98%. Comparing base (2ecf7ef) to head (b2bcf16).
Report is 249 commits behind head on develop.

Files with missing lines	Patch %	Lines
...enlp/quantization/checkpoint_quantization_utils.py	9.17%	99 Missing ⚠️
...lp/quantization/unified_checkpoint_quantization.py	7.29%	89 Missing ⚠️
...p/trainer/unified_checkpoint/unified_checkpoint.py	2.85%	34 Missing ⚠️
paddlenlp/trainer/unified_checkpoint/utils.py	7.69%	24 Missing ⚠️
paddlenlp/trainer/unified_checkpoint/load_local.py	6.66%	14 Missing ⚠️
paddlenlp/transformers/model_utils.py	40.00%	12 Missing ⚠️
paddlenlp/peft/lora/lora_model.py	0.00%	3 Missing ⚠️
paddlenlp/trainer/trainer_utils.py	33.33%	2 Missing ⚠️
...dlenlp/trainer/unified_checkpoint/async_handler.py	33.33%	2 Missing ⚠️
paddlenlp/trainer/training_args.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           develop    #9183     +/-   ##
==========================================
  Coverage    52.98%   52.98%             
==========================================
  Files          676      687     +11     
  Lines       108003   109184   +1181     
==========================================
+ Hits         57220    57851    +631     
- Misses       50783    51333    +550

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ZHUI

加个单测

DesmonDay · 2024-10-14T08:31:58Z

            self._shared_save_optimizer_flag = multiprocessing.Array("i", 1)

-    def _file_save_async_or_sync(self, state_dict, path, is_sync=True, state_dict_type="model_weight"):
+    def quant_unified_optimizer(self, state_dict, state_dict_type, ckpt_quant_stage):


这块建议单独拎出来放到一个文件里，目前我正在重构unified_checkpoint.py，会把比较多逻辑分离出来。

DesmonDay · 2024-10-15T07:26:18Z

                    self.state.epoch = epoch + (step + 1) / steps_in_epoch
                    self.control = self.callback_handler.on_step_end(args, self.state, self.control)
                    self._maybe_log_save_evaluate(tr_loss, model, epoch, ignore_keys_for_eval, inputs=inputs)
+                    if self.state.global_step != 0 and (self.state.global_step) % self.args.save_steps == 0:


这个地方具体是啥？

DesmonDay · 2024-10-15T07:33:24Z

+                    shard_file,
+                    tp_actions if pre_tensor_parallel_split else None,
+                    expected_keys,
+                    ckpt_quant_stage=model.config.ckpt_quant_stage,


这块为啥需要传这个ckpt_quant_stage进来，默认O0的话就不用传吧

DesmonDay · 2024-10-15T07:33:34Z

+                    shard_file,
+                    tp_actions if pre_tensor_parallel_split else None,
+                    expected_keys,
+                    ckpt_quant_stage=model.config.ckpt_quant_stage,


DesmonDay · 2024-10-15T07:34:40Z

                            self._lock,
                            state_dict_type,
                            self.global_rank,
+                            ckpt_quant_stage,


如果只需要对optimizer_weight做压缩，其他例如model_weight、master_weight不用的话，这个变量可以不传入。

DesmonDay · 2024-10-15T07:41:56Z

                            if "skip_save_model_weight" in self.args.unified_checkpoint_config
                            else state_dict_type,
                            self.global_rank,
+                            ckpt_quant_stage,


DesmonDay · 2024-10-15T07:43:03Z

        lock,
        state_dict_type,
        global_rank,
+        ckpt_quant_stage,


搞成一个可选参数就行，例如ckpt_quant_stage="O0"

DesmonDay · 2024-10-15T07:43:39Z

                path=os.path.join(save_directory, shard_file),
                is_sync=is_sync_save,
                state_dict_type="model_weight",
+                ckpt_quant_stage=model_to_save.config.ckpt_quant_stage,


DesmonDay · 2024-10-15T07:43:50Z

            path=os.path.join(output_dir, master_weights_name),
            is_sync=is_sync_save,
            state_dict_type="master_weight",
+            ckpt_quant_stage=model.config.ckpt_quant_stage,


DesmonDay · 2024-10-15T07:43:58Z

                path=os.path.join(save_directory, shard_master_weight_file),
                is_sync=is_sync_save,
                state_dict_type="master_weight",
+                ckpt_quant_stage=model.config.ckpt_quant_stage,


…nto ckpt-compress Conflicts: paddlenlp/trainer/unified_checkpoint/unified_checkpoint.py

DesmonDay · 2024-10-30T08:36:07Z

        return returned_state_dict

-    state_dict_optim = load_resolved_archive_file(resolved_archive_file, sharded_metadata, expected_keys)
+    index = {}


这一行可以去掉，比较多余

DesmonDay · 2024-10-30T08:43:21Z

            new_name = static2struct_name_mappings[static_name] + "/" + type_name
            optim_state_dict[new_name] = optim_state_dict.pop(key)
+
+        if UnifiedCheckpointOption.REMOVE_MASTER_WEIGHT.value in self.args.unified_checkpoint_config:


REMOVE_MASTER_WEIGHT 这个判断不应该写在这个函数里，应该控制传进来save_non_merge_optimizer的master_weights就是none。

DesmonDay · 2024-10-30T08:49:37Z

    return last_dtype


+def dequant_unified_optimizer(self, state_dict, ckpt_quant_stage, scale_dict):


self输入多余，测试过吗

…nto ckpt-compress Conflicts: paddlenlp/trainer/unified_checkpoint/unified_checkpoint.py

ZHUI · 2024-11-07T07:19:41Z

    """
+    quant = False
+    if ckpt_quant_stage != "O0":
+        quant = "optimizer" in checkpoint_file


这个有点 hack了。

ZHUI · 2024-11-07T07:20:50Z

    return last_dtype


+def dequant_unified_optimizer(state_dict, ckpt_quant_stage, scale_dict):


挪到 ..quantization 文件夹下面？

ZHUI · 2024-11-07T07:23:44Z

        for key in keys:
-            if fliter_dict_keys is not None and key not in fliter_dict_keys:
+            # non merge ckpt loading dont have filter key.
+            if key.endswith(SYMMETRY_QUANT_SCALE) or (fliter_dict_keys is not None and key not in fliter_dict_keys):


Suggested change

if key.endswith(SYMMETRY_QUANT_SCALE) or (fliter_dict_keys is not None and key not in fliter_dict_keys):

if key.endswith(SYMMETRY_QUANT_SCALE):

continue

if (fliter_dict_keys is not None and key not in fliter_dict_keys):

continue

ZHUI · 2024-11-07T07:25:02Z

+MOMENT2_KEYNAME = "moment2_0"
+BETA1_KEYNAME = "beta1_pow_acc_0"
+BETA2_KEYNAME = "beta2_pow_acc_0"
+SYMMETRY_QUANT_SCALE = "_codebook"


+1 这个你最好加一些特殊符号，不然容易出现重名。

ZHUI · 2024-11-07T07:27:51Z

            )
        },
    )
+    ckpt_quant_stage: str = field(


看看要不要放到 unifie_checkpoint_config 中配置，因为是搭配UC使用。

DesmonDay

LGTM

wawltor · 2024-11-18T11:07:38Z

@@ -0,0 +1,303 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.


2020 -> 2024

wawltor · 2024-11-18T11:21:35Z

+    abs_max_values = np.where(
+        abs_max_values == np.array(0, dtype=inputs.dtype), np.array(1e-8, dtype=inputs.dtype), abs_max_values
+    )
+    return abs_max_values


这里直接用1e-8来表示是不是没有考虑训练的dtype，bf16、float16、float32 表示空间不太一样

group-wise 中一个 group 有可能全是 0，会导致量化时除 0，这里的 1e-8 是防除 0 的一个小偏置

wawltor · 2024-11-18T11:22:42Z

+import numpy as np
+import paddle
+
+


重要的函数都要加上注释，同时参数的args也需要加上
对于引用的量化算法加上arvix链接

wawltor · 2024-11-18T11:24:33Z

+
+
+# channel-wise abs max calculation
+def cal_abs_max_channel(inputs, quant_axis=1):


这里的quant axis 为什么默认是1

magic number加上注释

wawltor · 2024-11-18T11:30:08Z

+                qdq_x = (
+                    quant_x
+                    / bnt
+                    * scales[rank * scales.shape[0] // world_size : (rank + 1) * scales.shape[0] // world_size]


这个变量名比较奇怪，world_size一般情况下都是指带训练总卡数，但是在这里的表示tensor parallel 通信组的size；注意变量名

这里同时有个疑问，我看是对所有的参数都是做了quant，但是Norm参数没有做参数切分，这个时候还能这么quant吗

尽量加上注释，不然代码的阅读性差

wawltor · 2024-11-18T11:38:40Z

+            if len(scales.shape) == 0 or quant_x.shape[-1] == scales.shape[-1]:
+                qdq_x = (quant_x / bnt * scales) + mins
+            else:
+                qdq_x = (


有些问题同qdq_weight

wawltor · 2024-11-18T11:53:15Z

+    int4_high = np.where(int4_high > 8, int4_high - 16, int4_high)
+
+    high_tensor = paddle.Tensor(int4_high, zero_copy=True)
+    low_tensor = paddle.Tensor(int4_low, zero_copy=True)


这里的Tensor是放在GPU还是CPU

cpu->gpu，已去除 zero_copy

wawltor · 2024-11-18T11:54:34Z

+                    m1_quant, codebook = qdq_weight(state_dict[m1_key], quant_bit=8)
+                    quant_weight, mins, maxs = asymmetry_qdq_weight(ratio, quant_bit=8)
+                    state_dict[m1_key] = m1_quant
+                    codebook_dict[m1_key + SYMMETRY_QUANT_SCALE] = codebook


这里的codebook命名来源是什么？

已统一修改成 scales

wawltor · 2024-11-18T12:47:19Z

+                dist.all_reduce(quant_bits)
+
+            model_numel = all_bits / 4
+            all_bits = model_numel * 7.0


magic number 写上注释

这一块计算有点多余，已去掉

…nto ckpt-compress

wawltor

LGTM

wawltor

LGTM

* checkpoint compression init * add ckpt quant argument * add ckpt quant ci * fix ci * fix lint * remove stage O2, change O3 --> O2 * support async save * file adjustment * magic string remove * ci fix * ci fix, code refinement * function extraction * fix ci * code refinement * fix ci * fix ci * support non merge tp ckpt quantization * fix ci * update * fix bug * code refactor * fix lint * fix ci * del old uc.py * fix lint * add mgpu ci * fix ci * multi thread loading * fix lint * fix bug * refactor code * add comment * fix lint * add comment * add comment * fix bug * fix bugs when ckpt no quant and no master weight * remove uni-test Conflicts: paddlenlp/transformers/model_utils.py

checkpoint compression init

cd4e5e0

add ckpt quant argument

7684576

wtmlon requested a review from DesmonDay September 24, 2024 09:48

ZHUI changed the title ~~checkpoint compression init~~ [Unified Checkpoint] Checkpoint compression init Sep 30, 2024

ZHUI reviewed Sep 30, 2024

View reviewed changes

wtmlon added 6 commits October 11, 2024 14:30

add ckpt quant ci

afcecad

fix ci

d8f3351

fix lint

434bd4c

remove stage O2, change O3 --> O2

a98fb8b

support async save

2e5c73b

file adjustment

6b1f3bf

ZHUI reviewed Oct 14, 2024

View reviewed changes

Comment thread paddlenlp/transformers/model_utils.py

wtmlon added 3 commits October 14, 2024 14:19

magic string remove

c4a80e7

ci fix

ae305a9

ci fix, code refinement

fd6ad57

wtmlon changed the title ~~[Unified Checkpoint] Checkpoint compression init~~ [Unified Checkpoint] Checkpoint compression Oct 14, 2024

DesmonDay reviewed Oct 14, 2024

View reviewed changes

wtmlon added 2 commits October 15, 2024 14:55

function extraction

f766d15

fix ci

e74b68b

DesmonDay reviewed Oct 15, 2024

View reviewed changes

wtmlon added 4 commits October 28, 2024 19:10

fix ci

2330839

multi thread loading

3fcd471

fix lint

f57aab5

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…

50ee148

…nto ckpt-compress Conflicts: paddlenlp/trainer/unified_checkpoint/unified_checkpoint.py

DesmonDay reviewed Oct 30, 2024

View reviewed changes

wtmlon added 2 commits November 5, 2024 19:55

fix bug

75a1011

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…

ffd0823

…nto ckpt-compress Conflicts: paddlenlp/trainer/unified_checkpoint/unified_checkpoint.py

ZHUI reviewed Nov 7, 2024

View reviewed changes

refactor code

4947a8c

DesmonDay previously approved these changes Nov 15, 2024

View reviewed changes

ZHUI previously approved these changes Nov 15, 2024

View reviewed changes

wawltor reviewed Nov 18, 2024

View reviewed changes

wtmlon added 2 commits November 19, 2024 11:18

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…

3eaebbb

…nto ckpt-compress

add comment

a6b2236

wtmlon dismissed stale reviews from ZHUI and DesmonDay via a6b2236 November 19, 2024 09:50

wtmlon added 5 commits November 19, 2024 18:32

fix lint

a5d0afa

add comment

fdd92a8

add comment

b2b20be

fix bug

432e97c

fix bugs when ckpt no quant and no master weight

5eb201c

wawltor previously approved these changes Nov 22, 2024

View reviewed changes

remove uni-test

b2bcf16

wtmlon dismissed wawltor’s stale review via b2bcf16 November 22, 2024 07:25

wawltor approved these changes Nov 25, 2024

View reviewed changes

wawltor merged commit 195fde3 into PaddlePaddle:develop Nov 25, 2024

		return last_dtype


		def dequant_unified_optimizer(self, state_dict, ckpt_quant_stage, scale_dict):

		return last_dtype


		def dequant_unified_optimizer(state_dict, ckpt_quant_stage, scale_dict):

-            if key.endswith(SYMMETRY_QUANT_SCALE) or (fliter_dict_keys is not None and key not in fliter_dict_keys):
+            if key.endswith(SYMMETRY_QUANT_SCALE):
+                   continue
+            if (fliter_dict_keys is not None and key not in fliter_dict_keys):
+                   continue

		@@ -0,0 +1,303 @@
		# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.



		# channel-wise abs max calculation
		def cal_abs_max_channel(inputs, quant_axis=1):

		import numpy as np
		import paddle

Conversation

wtmlon commented Sep 23, 2024 • edited by DesmonDay Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR types

PR changes

Description

Uh oh!

CLAassistant commented Sep 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Sep 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ZHUI left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

DesmonDay Oct 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DesmonDay Oct 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DesmonDay left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

wtmlon commented Sep 23, 2024 •

edited by DesmonDay

Loading

CLAassistant commented Sep 23, 2024 •

edited

Loading

codecov Bot commented Sep 23, 2024 •

edited

Loading

DesmonDay Oct 14, 2024 •

edited

Loading

DesmonDay Oct 30, 2024 •

edited

Loading