Skip to content

Commit ea15020

Browse files
committed
[CPU][IBM Z] Fix BF16 support and vectorize math operations for s390x (VXE)
- Fix BF16 byte ordering for big-endian architecture - Vectorize exp(), tanh(), erf() functions with polynomial approximations - Add FMA intrinsics (fma, fms, nfma, nfms) using vec_madd/vec_msub - Improve BF16 rounding with round-to-nearest-even - Fix prefetch implementation - Add sigmoid, gelu_tanh, gelu_erf, rcp, rsqrt operations - Implement softmax_fp32vec8 and rmsnorm_fp32vec8 kernels - Fix FP16 support by aliasing to FP32Vec16 - Exclude s390x from FP16 vector trait in cpu_attn_impl.hpp Signed-off-by: Rehan Khan <[email protected]>
1 parent 4393684 commit ea15020

File tree

2 files changed

+543
-62
lines changed

2 files changed

+543
-62
lines changed

csrc/cpu/cpu_attn_impl.hpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -841,7 +841,7 @@ struct VecTypeTrait<c10::BFloat16> {
841841
};
842842
#endif
843843

844-
#if !defined(__powerpc__)
844+
#if !defined(__powerpc__) && !defined(__s390x__)
845845
template <>
846846
struct VecTypeTrait<c10::Half> {
847847
using vec_t = vec_op::FP16Vec16;

0 commit comments

Comments
 (0)