-
Notifications
You must be signed in to change notification settings - Fork 117
kunlunxin update glm config #236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 29 commits
c158781
9eb7eb9
3f6afe7
c48bd1b
acbf4b6
53b0999
dd3c478
7659bea
f953f4e
04d5bd9
c038ab7
046fdf7
6b6fd85
4f998e3
629b37a
be5eb37
22eeefc
0dee798
c2f993f
ca34bb6
a9d02b8
972ed9f
f1714c5
ace9fea
c474b63
9e30809
015a751
60dee8a
32eb6f1
9d81ff1
7952993
27b48b5
c0e3ab4
7394352
186ae5d
2f204f5
26acd75
13cc7be
d207087
27033ba
a44db3f
bce5f61
42f761d
cda8284
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -14,45 +14,35 @@ | |
| - OS版本:Ubuntu 20.04 | ||
| - OS kernel版本: 5.4.0-26-generic | ||
| - 加速卡驱动版本:4.0.25 | ||
| - Docker镜像和版本:pytorch1.12.1-cpu-ubuntu18.04:v0.04 | ||
| - 训练框架版本:xmlir+e70db8f6 | ||
| - Docker镜像和版本:pytorch1.12.1-cpu-ubuntu20.04:v0.01 | ||
| - 训练框架版本:xmlir+111e7d45 【[xmlir下载](https://bd.bcebos.com/klx-pytorch-ipipe-bd/flagperf/archives/111e7d45/xmlir-0.0.1-cp38-cp38-linux_x86_64.whl)】 | ||
| - 训练编译器版本:xacc+111e7d45 【[xacc下载](https://bd.bcebos.com/klx-pytorch-ipipe-bd/flagperf/archives/111e7d45/xacc-0.1.0-cp38-cp38-linux_x86_64.whl)】 | ||
| - 依赖软件版本:pytorch-1.12.1+cpu | ||
|
|
||
| ### 测试运行方法 | ||
|
|
||
| 修改`FlagPerf/training/run_benchmarks/config/test_conf.py`文件里的配置项: | ||
|
|
||
| ```python | ||
| VENDOR = "kunlunxin" | ||
|
|
||
| ACCE_CONTAINER_OPT = " --device=/dev/xpu0 --device=/dev/xpu1 --device=/dev/xpu2" + \ | ||
| " --device=/dev/xpu3 --device=/dev/xpu4 --device=/dev/xpu5" + \ | ||
| " --device=/dev/xpu6 --device=/dev/xpu7 --device=/dev/xpuctrl" | ||
|
|
||
| ACCE_VISIBLE_DEVICE_ENV_NAME = "XPU_VISIBLE_DEVICES" | ||
|
|
||
| CASES = [ | ||
| "GLM_TORCH_DEMO_R300_1X1", | ||
| "GLM_TORCH_DEMO_R300_1X2", | ||
| "GLM_TORCH_DEMO_R300_1X4", | ||
| "GLM_TORCH_DEMO_R300_1X8", | ||
| "GLM_TORCH_DEMO_R300_2X8" | ||
| ] | ||
| ``` | ||
|
|
||
| 剩余步骤按照项目根目录文档下的[“快速启动”](../../../README.md#快速启动)章节进行。 | ||
|
|
||
|
|
||
| ### 运行情况参考 | ||
|
|
||
| | 训练资源 | 配置文件 | 运行时长(s) | 目标精度 | 收敛精度 | Steps数 | 性能(samples/s) | | ||
| |---------| --------------- | ----------- | -------- | -------- | ------- | ---------------- | | ||
| | 单机1卡 | config_R300x1x1 | 121371.25| 0.8 | 0.8021 | 14400(fp32)| 0.50 | | ||
| | 单机2卡 | config_R300x1x2 | 106709.60| 0.8 | 0.8085 | 12000(fp32)| 0.92 | | ||
| | 单机4卡 | config_R300x1x4 | 44162.12 | 0.8 | 0.8027 | 4800(fp32) | 1.79 | | ||
| | 单机8卡 | config_R300x1x8 | 22902.82 | 0.8 | 0.8003 | 2400(fp32) | 3.47 | | ||
| | 两机8卡 | config_R300x2x8 | 16217.80 | 0.8 | 0.8012 | 1500(fp32) | 6.08 | | ||
|
|
||
| ### 许可证 | ||
|
|
||
| Apache 2.0 license。 | ||
| #### 运行情况 | ||
|
|
||
| * 通用指标 | ||
|
|
||
| | 指标名称 | 指标值 | 特殊说明 | | ||
| | -------------- | ------------------------------ | ------------------------------------------- | | ||
| | 任务类别 | 通用语言模型 | | | ||
| | 模型 | glm | | | ||
| | 数据集 | ReCoRD | | | ||
| | 数据精度 | precision,见“性能指标” | 可选fp32/amp/fp16 | | ||
| | 超参修改 | fix_hp,见“性能指标” | 跑满硬件设备评测吞吐量所需特殊超参 | | ||
| | 硬件设备简称 | R300 | | | ||
| | 硬件存储使用 | mem(actual/total),见“性能指标” | 通常称为“显存”,单位为GiB | | ||
| | 端到端时间 | e2e_time,见“性能指标” | 总时间+Perf初始化等时间 | | ||
| | 总吞吐量 | p_whole,见“性能指标” | 实际训练样本数除以总时间(performance_whole) | | ||
| | 训练吞吐量 | p_train,见“性能指标” | 不包含每个epoch末尾的评估部分耗时 | | ||
| | **计算吞吐量** | **p_core,见“性能指标”** | 不包含数据IO部分的耗时(p3>p2>p1) | | ||
| | 训练结果 | acc,见“性能指标” | 分类准确率(mlm_accuracy) | | ||
| | 额外修改项 | 无 | | | ||
|
|
||
| * 性能指标 | ||
|
|
||
| | 配置 | precision | fix_hp | e2e_time | p_whole | p_train | p_core | acc | mem | | ||
| | ------------------- | --------- | ---------------- | -------- | ------- | ------- | ------ | ----- | --------- | | ||
| | R300单机单卡(1x1) | fp32 | bs=4,lr=1e-05 | | | | | 80.52% | 31.8/32.0 | | ||
| | R300单机8卡(1x8) | fp32 | bs=5,lr=1e-05 | | | | | 80.52%| 31.8/32.0 | | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 请确认下,这个是5么?其他两个是否必须为4
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 单机8卡是5
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 双机8卡是2,之前上传的时候出错了 |
||
| | R300两机8卡(2x8) | fp32 | bs=4,lr=1e-05 | | | | | 80.31% | 31.7/32.0 | | ||
yuzhou03 marked this conversation as resolved.
Show resolved
Hide resolved
|
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,21 +1,2 @@ | ||
| vendor = 'kunlunxin' | ||
| fp16 = False | ||
|
|
||
| train_batch_size = 4 | ||
GGuanl marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| eval_batch_size = 6 | ||
|
|
||
| dist_backend = "xccl" | ||
|
|
||
| lr = 1e-5 | ||
| weight_decay = 0.1 | ||
| adam_beta1 = 0.9 | ||
| adam_beta2 = 0.999 | ||
| adam_eps = 1e-08 | ||
| gradient_accumulation_steps = 1 | ||
| warmup = 0.1 | ||
| lr_decay_ratio = 0.1 | ||
| lr_decay_iters = 4338 | ||
| log_freq = 1 | ||
| seed = 4096 | ||
| max_samples_termination = 5553080 | ||
| training_event = None | ||
| eval_batch_size = 4 | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,21 +1,2 @@ | ||
| vendor = 'kunlunxin' | ||
| fp16 = False | ||
|
|
||
| train_batch_size = 4 | ||
| eval_batch_size = 6 | ||
|
|
||
| dist_backend = "xccl" | ||
|
|
||
| lr = 1e-5 | ||
| weight_decay = 0.1 | ||
| adam_beta1 = 0.9 | ||
| adam_beta2 = 0.999 | ||
| adam_eps = 1e-08 | ||
| gradient_accumulation_steps = 1 | ||
| warmup = 0.1 | ||
| lr_decay_ratio = 0.1 | ||
| lr_decay_iters = 4338 | ||
| log_freq = 1 | ||
| seed = 4096 | ||
| max_samples_termination = 5553080 | ||
| training_event = None | ||
| train_batch_size = 5 | ||
GGuanl marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| eval_batch_size = 5 | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,21 +1,2 @@ | ||
| vendor = 'kunlunxin' | ||
| fp16 = False | ||
|
|
||
| train_batch_size = 4 | ||
GGuanl marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| eval_batch_size = 6 | ||
|
|
||
| dist_backend = "xccl" | ||
|
|
||
| lr = 1e-5 | ||
| weight_decay = 0.1 | ||
| adam_beta1 = 0.9 | ||
| adam_beta2 = 0.999 | ||
| adam_eps = 1e-08 | ||
| gradient_accumulation_steps = 1 | ||
| warmup = 0.1 | ||
| lr_decay_ratio = 0.1 | ||
| lr_decay_iters = 4338 | ||
| log_freq = 1 | ||
| seed = 4096 | ||
| max_samples_termination = 5553080 | ||
| training_event = None | ||
| eval_batch_size = 4 | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,18 @@ | ||
| vendor = 'kunlunxin' | ||
| fp16 = False | ||
|
|
||
| dist_backend = "xccl" | ||
|
|
||
| lr = 1e-5 | ||
| weight_decay = 0.1 | ||
| adam_beta1 = 0.9 | ||
| adam_beta2 = 0.999 | ||
| adam_eps = 1e-08 | ||
| gradient_accumulation_steps = 1 | ||
| warmup = 0.1 | ||
| lr_decay_ratio = 0.1 | ||
| lr_decay_iters = 4338 | ||
| log_freq = 1 | ||
| seed = 4096 | ||
| max_samples_termination = 5553080 | ||
|
||
| training_event = None | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,3 +1,4 @@ | ||
| h5sparse | ||
| boto3 | ||
| h5py | ||
| numpy>=1.15.4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2个下载地址是可用的吗?
