Commit ea15020
committed
[CPU][IBM Z] Fix BF16 support and vectorize math operations for s390x (VXE)
- Fix BF16 byte ordering for big-endian architecture
- Vectorize exp(), tanh(), erf() functions with polynomial approximations
- Add FMA intrinsics (fma, fms, nfma, nfms) using vec_madd/vec_msub
- Improve BF16 rounding with round-to-nearest-even
- Fix prefetch implementation
- Add sigmoid, gelu_tanh, gelu_erf, rcp, rsqrt operations
- Implement softmax_fp32vec8 and rmsnorm_fp32vec8 kernels
- Fix FP16 support by aliasing to FP32Vec16
- Exclude s390x from FP16 vector trait in cpu_attn_impl.hpp
Signed-off-by: Rehan Khan <[email protected]>1 parent 4393684 commit ea15020
2 files changed
+543
-62
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
841 | 841 | | |
842 | 842 | | |
843 | 843 | | |
844 | | - | |
| 844 | + | |
845 | 845 | | |
846 | 846 | | |
847 | 847 | | |
| |||
0 commit comments