Skip to content

Conversation

@fxyfxy777
Copy link
Contributor

@fxyfxy777 fxyfxy777 commented Aug 3, 2025

PR Category

Operator Mechanism

PR Types

Bug fixes

Description

interpolate的error_log.log 精度与配置报错汇总(共计 2079 条)

1. cuda error 700(共 1356 条)

原因int 类型溢出,超过最大表示范围
解决方案:将相关变量替换为 size_t 类型,避免溢出。


2. CUDA error: invalid configuration argument(共 675 条)

原因:Torch 报错
解决方案:将对应配置加入 tester/api_config/torch_error_skip.txt,参考:PR #484


3. The values for attribute 'shape' do not match(共 40 条)

原因:输入 scale_h / scale_w 使用 float 精度,而 Torch 使用 double,导致乘法后误差放大并影响 int 强转后的结果,注意这里指的是函数初始化的过程paddle使用的是float,计算过程中强制转换成double只会放大误差!

示例验证代码

int64_t in_h = 10;
float scale_h_f = 0.7999999999999999f;
double scale_h_d = 0.7999999999999999;

float result_h_f = in_h * scale_h_f;
double result_h_d = in_h * scale_h_d;

int out_h_f = static_cast<int>(result_h_f);
int out_h_d = static_cast<int>(result_h_d);

std::cout << std::fixed << std::setprecision(10);
std::cout << "scale_h (float) = " << scale_h_f << ", scale_h (double) = " << scale_h_d << std::endl;
std::cout << "result_h_f (float)  = " << result_h_f << std::endl;
std::cout << "result_h_d (double) = " << result_h_d << std::endl;
std::cout << "out_h_f = " << out_h_f << ", out_h_d = " << out_h_d << std::endl;

输出结果

scale_h (float) = 0.8000000119, scale_h (double) = 0.8000000000

result_h_f (float)  = 8.0000000000
result_h_d (double) = 8.0000000000

out_h_f = 8, out_h_d = 7

结论:使用 float 会造成数值精度误差,通过 double 初始化可规避该问题。若改动范围过大,暂时处理方案为:


4. [accuracy error] paddle.nn.functional.interpolate(共 6 条)

5. [accuracy error] backward paddle.nn.functional.interpolate(共 2 条)

问题一致,均为精度不足导致误差过大。

插值计算公式如下:

$$ \begin{aligned} V &= d_2 \cdot \left[ h_2 \cdot (w_2 \cdot V_{000} + w_1 \cdot V_{001}) + h_1 \cdot (w_2 \cdot V_{010} + w_1 \cdot V_{011}) \right] \\ &\quad + d_1 \cdot \left[ h_2 \cdot (w_2 \cdot V_{100} + w_1 \cdot V_{101}) + h_1 \cdot (w_2 \cdot V_{110} + w_1 \cdot V_{111}) \right] \end{aligned} $$

原因

  • 大 Tensor 情况下,d1/d2, w1/w2, h1/h2 精度不足;
    -解决方案
  • Forward 阶段使用 float 精度可控制误差;
  • Backward 阶段仍出现最大绝对误差约为 5,提升至 double 后误差降至约 1.5,调整容忍度上限以放宽精度误差范围。
  • 参见fix_interpolate PFCCLab/PaddleAPITest#490

Pcard-92269

@paddle-bot
Copy link

paddle-bot bot commented Aug 3, 2025

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@fxyfxy777
Copy link
Contributor Author

/re-run all-failed

@fxyfxy777
Copy link
Contributor Author

/re-run all-failed

1 similar comment
@fxyfxy777
Copy link
Contributor Author

/re-run all-failed

@fxyfxy777
Copy link
Contributor Author

/re-run all-failed

Copy link
Contributor

@wanghuancoder wanghuancoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lshpku lshpku merged commit a145db3 into PaddlePaddle:develop Aug 5, 2025
71 of 72 checks passed
@fxyfxy777 fxyfxy777 deleted the fix_interpolate branch September 9, 2025 06:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants