Releases: ARM-software/optimized-routines
v26.01
This release introduces AdvSIMD/SVE log, rsqrt, and pow variants; expands fp/ with complete double-precision arithmetic and conversions; and broadens CI/platform coverage while tightening pow/log/exp special-case handling, simplifying math configuration, and improving tests and benchmarks.
Main AOR directory
Add
- Add cygwin/newlib support for the string routines and tests.
- Update GitHub runners to refresh apt packages before installing dependencies.
Change
- aarch64 assembly now emits .cfi_negate_ra_state for correct unwind info.
- bench: Fix the memset benchmark typo that overflowed the size array.
fp/
Add
- Implement double-precision addition/subtraction, division, and comparisons.
- Add conversions between integers and double precision as well as between single and double precision.
- Replace the stub ui2f test with a real one and add GNU-style comparison helpers.
Change
- Rename the aux subdirectory and register aliases to avoid Windows toolchain issues.
- Apply multiple GNU assembler build fixes across the new fp sources.
math/
Add
- aarch64: Implement AdvSIMD log2p1 and SVE log2p1(f).
- aarch64: Implement AdvSIMD and SVE log10p1(f).
- aarch64: Implement AdvSIMD and SVE rsqrt(f).
- aarch64: Implement AdvSIMD and SVE powr(f) plus shared helpers.
- aarch64: Provide return-by-value AdvSIMD and SVE modf(f) and sincospi(f).
Change
- Remove float_t/double_t usage and drop WANT_SIMD_EXCEPT to simplify configuration.
- aarch64: Rework pow(f) and powr helpers, vectorise special cases, and fix negative or special argument handling.
- aarch64/advsimd: Optimise log/log1p/log2/log10 (float and double).
- aarch64/advsimd: Optimise atanhf, asinhf, acoshf, tan, and sinh.
- aarch64/advsimd: Optimise vectorised special cases for expf, exp2f, exp10f, exp2m1f, exp10m1f via new helpers.
- aarch64/advsimd: Vectorise expm1/expm1f special cases.
- aarch64/experimental: Remove ldexp from expm1(f).
- aarch64/sve: Improve scalar callbacks and special-case buffers.
- aarch64/sve: Vectorise exp/exp2/exp10/expm1 fallbacks.
Full Changelog: v25.07...v26.01
v25.07
This new release introduces a new subproject called fp for basic
software-emulated floating-point arithmetic, and public-facing CI through
GitHub Actions. It entirely reworks the READMEs and provides numerous new
routines in math/ as well as performance and codegen improvement.
Windows is now supported across AoR and the default config.mk.dist now works
for all platforms out of the box.
This new release also provides several bug fixes and documentation updates.
AOR
Additions
- Add github CODEOWNERS
- Add github labellers for pull requests
- Add github actions for build and run
- Add new contributor greetings
Changes
- config: Enable -Werror
- Make READMEs more engaging and user-friendly.
- Make CC and HOST_CC overridable
fp
New subdirectory for basic software emulated floating-point
Additions
- double-precision multiplication.
- conversions from integers to single precision.
- conversions from single precision to integers.
- single-precision comparisons.
- single-precision division.
- single-precision addition and subtraction.
- add a second version of single-precision multiply.
- add a MAINTAINERS entry for the new subdir.
- Initial commit of some optimized basic FP arithmetic.
Changes
- f2lz, l2f: improved replacement for RSC in Thumb.
- improve checking in test-fcmp.
math
Additions
- aarch64: Implements AdvSIMD log2p1f.
- aarch64: Implement AdvSIMD and SVE exp10m1(f)
- aarch64: Implement AdvSIMD and SVE exp2m1(f)
- aarch64: Implement AdvSIMD and SVE acospi(f)
- aarch64: Implement AdvSIMD and SVE asinpi(f)
- aarch64: Implement AdvSIMD and SVE atanpi(f)
- aarch64: Implement AdvSIMD and SVE atan2pi(f)
- aarch64: Optimise AdvSIMD and SVE atan(f)
- aarch64/experimental: Fast inaccurate vector expf, powf, sinf, and cosf.
Changes
- Add explicit alignment for math data structures
- aarch64: Rename WANT_TRIGPI to WANT_C23
- aarch64/sve: Optimise fp64 hyperbolics
- aarch64/sve: Optimise coshf, expm1, expf, exp2(f), exp10f, log1p
- aarch64/sve: Improve codegen in log1p helper.
- aarch64/sve: Fixed svld1rq using incorrect predicates
- test: Improve libm and MPFR wrappers.
- test: Fix MPFR wrapper for trigpis.
- test: Fix checks with MPFR as reference
networking
Changes
- test/chksum.c: fix undefined-function error.
string
Additions
- Add support for MacOS assembler
Changes
- bench: Avoid overflow in size array
Full Changelog: v25.01...v25.07
v25.01
This new release provides numerous performance and codegen improvement in math/ and string/ routines. The directory structure is re-organised and simplified. The math test framework is reworked to improve testing capacity over math routines. This new release also provides several bug fixes and documentation updates.
-
Update MAINTAINERS
-
Improve subdirectory structure
- Merge pl/ into math/. All math routines can now be built into a
single library, and the new structure reduces code duplication. - Upgrade the test infrastructure. Intervals, thresholds and other
test parameters are now embedded in source files, processed into
files generated at compile-time, and finally passed to the tester
script. - Move "lower-quality" routines into dedicated experimental/
directories.
- Merge pl/ into math/. All math routines can now be built into a
-
Changes in config and build system
- Update minimum required versions of GCC (>= 10) and CLANG (>=5), as
a consequence of always building SVE on AArch64. - Updates in config options, eg. removing
WANT_SIMD_TESTSand
WANT_SVE_MATH.
- Update minimum required versions of GCC (>= 10) and CLANG (>=5), as
-
Changes in math/ subdirectory
- Provide vector annotations to allow auto-vectorisation to our
mathlib provided that-ffast-mathis enabled andmathlib.his
included. - Many codegen and performance improvement in vector routines. Fixing
most regressions that occurred between GCC 13 and 14.
Improvement in memory access, reduction of spills, and better usage
of instruction set. - Add vector variants for standard and non-standard routines:
- C99: modf.
- C23: tanpi.
- other: sincospi.
- Fix signature of vector sincos.
- Fix tests with MPFR.
- Allow building, testing and benchmarking scalar math routines on
macOS and Windows.
- Provide vector annotations to allow auto-vectorisation to our
-
Changes in string/ subdirectory
- Fix 32-bit Arm build.
- Improve string benchmarks.
- Add support for MOPS memcpy/memmove/memset.
- Improved memset performance.
- Add new SVE memset implementation.
- Remove ILP32 support.
-
Changes in networking/ subdirectory
- Fix make install. Library and header might need renaming in the
future to be consistent with other components.
- Fix make install. Library and header might need renaming in the
Full Changelog: v24.05...v25.01
v24.05 release
- Math routine changes
- Fixed AdvSIMD vector powf and log for the big-endian target.
- Fixed an undefined signed shift in the exp10 code, unlikely
to cause problems in practice. - AdvSIMD pow got minor optimizations.
- Now there is a build option to disable SIMD and exp10 tests
to allow testing libcs without those symbols.
- pl/ directory
- Several big-endian fixes and code cleanups.
- This continues to host many math routines with mixed quality.
v24.01 release
- String routine changes
- Added memcpy, memmove, memset for MOPS extension.
- Optimized memcpy by improving code alignment.
- Fixed GNU property note on ILP32.
- Math routine changes
- Vector math code now uses ACLE intrinsics and aarch64 only.
- Vector math code no longer builds scalar and base PCS variants.
- Optimized vector sin and cos.
- Added tgamma128, a binary128 tgammal implementation.
- pl/ directory
- This continues to host many math routines with mixed quality.
v23.01 release
- Project changes
- All files are under a new dual license now (MIT OR Apache-2.0 WITH LLVM-exception at the election of the user).
- Added MAINTAINERS file describing who maintains the subdirectories.
- Added README.contributors files documenting contribution requirements.
- Added new pl/ subdirectory for Arm's Performance Library related routines.
- String routine changes
- Added memset benchmark.
- Improved strlen and memcpy benchmarks.
- Added SVE memcpy.
- Updated arm string functions to support M-profile PACBTI.
- Merged the MTE and generic versions of strcmp, strncmp, strcpy and stpcpy into one implementation.
- Optimized memcmp, memchr-mte, memrchr, strchr-mte, strchrnul-mte, strrchr-mte, strlen, strlen-mte, strnlen, strcpy.
- Math routine changes
- Fixed constants in sinf, cosf and sincosf to be compile time computed even with gcc-12 -frounding-math.
- Fixed an invalid shift in logf.
- Support floating-point exceptions in vector math routines when WANT_SIMD_EXCEPT is set.
v21.02 release
- String routine changes
- Added AArch64 ILP32 ABI support.
- Fixed SVE strnlen return value.
- Added MTE related __mtag_tag_region.
- Added MTE related __mtag_tag_zero_region.
- Minor code cleanups.
v20.11 release
- New math routines
- Scalar erff and erf using fma.
v20.08 release
- Bug fixes
- strcmp-mte nul check
- strncmp-mte with large size
- arm memcpy with large size (CVE-2020-6096)
- String routines performance improvements
- strlen
- memmove with backward copy
- Benchmarking code for strings and memory routines
- strlen
v20.05 release
- New functionality (64-bit Arm)
- string: Optimized MTE variants of strlen, strnlen, strchr, strchrnul, strrchr, memchr, memrchr, strcpy, stpcpy, strcmp, strncmp
- string: Changes to support BTI
- string: New optimized memrchr, strnlen
- Performance improvements (Neoverse N1)
- strchr/strchrnul: 21% improvement on long strings
- strrchr: 11% improvement
- strnlen: 130% improvement on long strings, 50% on short strings
- Benchmark and tests
- string: New memcpy benchmark
- string: Cleanup testsuite and improve test coverage