12 Jan 15:28

blapie

649ccc8

v26.01 Latest

Latest

This release introduces AdvSIMD/SVE log, rsqrt, and pow variants; expands fp/ with complete double-precision arithmetic and conversions; and broadens CI/platform coverage while tightening pow/log/exp special-case handling, simplifying math configuration, and improving tests and benchmarks.

Main AOR directory

Add

Add cygwin/newlib support for the string routines and tests.
Update GitHub runners to refresh apt packages before installing dependencies.

Change

aarch64 assembly now emits .cfi_negate_ra_state for correct unwind info.
bench: Fix the memset benchmark typo that overflowed the size array.

fp/

Add

Implement double-precision addition/subtraction, division, and comparisons.
Add conversions between integers and double precision as well as between single and double precision.
Replace the stub ui2f test with a real one and add GNU-style comparison helpers.

Change

Rename the aux subdirectory and register aliases to avoid Windows toolchain issues.
Apply multiple GNU assembler build fixes across the new fp sources.

math/

Add

aarch64: Implement AdvSIMD log2p1 and SVE log2p1(f).
aarch64: Implement AdvSIMD and SVE log10p1(f).
aarch64: Implement AdvSIMD and SVE rsqrt(f).
aarch64: Implement AdvSIMD and SVE powr(f) plus shared helpers.
aarch64: Provide return-by-value AdvSIMD and SVE modf(f) and sincospi(f).

Change

Remove float_t/double_t usage and drop WANT_SIMD_EXCEPT to simplify configuration.
aarch64: Rework pow(f) and powr helpers, vectorise special cases, and fix negative or special argument handling.
aarch64/advsimd: Optimise log/log1p/log2/log10 (float and double).
aarch64/advsimd: Optimise atanhf, asinhf, acoshf, tan, and sinh.
aarch64/advsimd: Optimise vectorised special cases for expf, exp2f, exp10f, exp2m1f, exp10m1f via new helpers.
aarch64/advsimd: Vectorise expm1/expm1f special cases.
aarch64/experimental: Remove ldexp from expm1(f).
aarch64/sve: Improve scalar callbacks and special-case buffers.
aarch64/sve: Vectorise exp/exp2/exp10/expm1 fallbacks.

Full Changelog: v25.07...v26.01

Assets 2

14 Jul 15:10

blapie

v25.07

3bc8091

v25.07

This new release introduces a new subproject called fp for basic
software-emulated floating-point arithmetic, and public-facing CI through
GitHub Actions. It entirely reworks the READMEs and provides numerous new
routines in math/ as well as performance and codegen improvement.
Windows is now supported across AoR and the default config.mk.dist now works
for all platforms out of the box.
This new release also provides several bug fixes and documentation updates.

AOR

Additions

Add github CODEOWNERS
Add github labellers for pull requests
Add github actions for build and run
Add new contributor greetings

Changes

config: Enable -Werror
Make READMEs more engaging and user-friendly.
Make CC and HOST_CC overridable

fp

New subdirectory for basic software emulated floating-point

Additions

double-precision multiplication.
conversions from integers to single precision.
conversions from single precision to integers.
single-precision comparisons.
single-precision division.
single-precision addition and subtraction.
add a second version of single-precision multiply.
add a MAINTAINERS entry for the new subdir.
Initial commit of some optimized basic FP arithmetic.

Changes

f2lz, l2f: improved replacement for RSC in Thumb.
improve checking in test-fcmp.

math

Additions

aarch64: Implements AdvSIMD log2p1f.
aarch64: Implement AdvSIMD and SVE exp10m1(f)
aarch64: Implement AdvSIMD and SVE exp2m1(f)
aarch64: Implement AdvSIMD and SVE acospi(f)
aarch64: Implement AdvSIMD and SVE asinpi(f)
aarch64: Implement AdvSIMD and SVE atanpi(f)
aarch64: Implement AdvSIMD and SVE atan2pi(f)
aarch64: Optimise AdvSIMD and SVE atan(f)
aarch64/experimental: Fast inaccurate vector expf, powf, sinf, and cosf.

Changes

Add explicit alignment for math data structures
aarch64: Rename WANT_TRIGPI to WANT_C23
aarch64/sve: Optimise fp64 hyperbolics
aarch64/sve: Optimise coshf, expm1, expf, exp2(f), exp10f, log1p
aarch64/sve: Improve codegen in log1p helper.
aarch64/sve: Fixed svld1rq using incorrect predicates
test: Improve libm and MPFR wrappers.
test: Fix MPFR wrapper for trigpis.
test: Fix checks with MPFR as reference

networking

Changes

test/chksum.c: fix undefined-function error.

string

Additions

Add support for MacOS assembler

Changes

bench: Avoid overflow in size array

Full Changelog: v25.01...v25.07

Assets 2

09 Jan 15:57

blapie

v25.01

3752b98

v25.01

This new release provides numerous performance and codegen improvement in math/ and string/ routines. The directory structure is re-organised and simplified. The math test framework is reworked to improve testing capacity over math routines. This new release also provides several bug fixes and documentation updates.

Update MAINTAINERS
Improve subdirectory structure
- Merge pl/ into math/. All math routines can now be built into a
  single library, and the new structure reduces code duplication.
- Upgrade the test infrastructure. Intervals, thresholds and other
  test parameters are now embedded in source files, processed into
  files generated at compile-time, and finally passed to the tester
  script.
- Move "lower-quality" routines into dedicated experimental/
  directories.
Changes in config and build system
- Update minimum required versions of GCC (>= 10) and CLANG (>=5), as
  a consequence of always building SVE on AArch64.
- Updates in config options, eg. removing WANT_SIMD_TESTS and
  WANT_SVE_MATH.
Changes in math/ subdirectory
- Provide vector annotations to allow auto-vectorisation to our
  mathlib provided that -ffast-math is enabled and mathlib.h is
  included.
- Many codegen and performance improvement in vector routines. Fixing
  most regressions that occurred between GCC 13 and 14.
  Improvement in memory access, reduction of spills, and better usage
  of instruction set.
- Add vector variants for standard and non-standard routines:
  - C99: modf.
  - C23: tanpi.
  - other: sincospi.
- Fix signature of vector sincos.
- Fix tests with MPFR.
- Allow building, testing and benchmarking scalar math routines on
  macOS and Windows.
Changes in string/ subdirectory
- Fix 32-bit Arm build.
- Improve string benchmarks.
- Add support for MOPS memcpy/memmove/memset.
- Improved memset performance.
- Add new SVE memset implementation.
- Remove ILP32 support.
Changes in networking/ subdirectory
- Fix make install. Library and header might need renaming in the
  future to be consistent with other components.

Full Changelog: v24.05...v25.01

Assets 2

23 May 11:25

nsz-arm

v24.05

90f7e62

v24.05 release

Math routine changes
- Fixed AdvSIMD vector powf and log for the big-endian target.
- Fixed an undefined signed shift in the exp10 code, unlikely
  to cause problems in practice.
- AdvSIMD pow got minor optimizations.
- Now there is a build option to disable SIMD and exp10 tests
  to allow testing libcs without those symbols.
pl/ directory
- Several big-endian fixes and code cleanups.
- This continues to host many math routines with mixed quality.

Assets 2

12 Jan 13:18

nsz-arm

v24.01

864fb5e

v24.01 release

String routine changes
- Added memcpy, memmove, memset for MOPS extension.
- Optimized memcpy by improving code alignment.
- Fixed GNU property note on ILP32.
Math routine changes
- Vector math code now uses ACLE intrinsics and aarch64 only.
- Vector math code no longer builds scalar and base PCS variants.
- Optimized vector sin and cos.
- Added tgamma128, a binary128 tgammal implementation.
pl/ directory
- This continues to host many math routines with mixed quality.

Assets 2

25 Jan 12:34

nsz-arm

v23.01

56e3bf0

v23.01 release

Project changes
- All files are under a new dual license now (MIT OR Apache-2.0 WITH LLVM-exception at the election of the user).
- Added MAINTAINERS file describing who maintains the subdirectories.
- Added README.contributors files documenting contribution requirements.
- Added new pl/ subdirectory for Arm's Performance Library related routines.
String routine changes
- Added memset benchmark.
- Improved strlen and memcpy benchmarks.
- Added SVE memcpy.
- Updated arm string functions to support M-profile PACBTI.
- Merged the MTE and generic versions of strcmp, strncmp, strcpy and stpcpy into one implementation.
- Optimized memcmp, memchr-mte, memrchr, strchr-mte, strchrnul-mte, strrchr-mte, strlen, strlen-mte, strnlen, strcpy.
Math routine changes
- Fixed constants in sinf, cosf and sincosf to be compile time computed even with gcc-12 -frounding-math.
- Fixed an invalid shift in logf.
- Support floating-point exceptions in vector math routines when WANT_SIMD_EXCEPT is set.

Assets 2

18 Feb 14:31

nsz-arm

v21.02

6798b50

v21.02 release

String routine changes
- Added AArch64 ILP32 ABI support.
- Fixed SVE strnlen return value.
- Added MTE related __mtag_tag_region.
- Added MTE related __mtag_tag_zero_region.
- Minor code cleanups.

Assets 2

16 Nov 13:20

nsz-arm

v20.11

58af293

v20.11 release

New math routines
- Scalar erff and erf using fma.

Assets 2

14 Aug 12:49

nsz-arm

v20.08

0f4ae0c

v20.08 release

Bug fixes
- strcmp-mte nul check
- strncmp-mte with large size
- arm memcpy with large size (CVE-2020-6096)
String routines performance improvements
- strlen
- memmove with backward copy
Benchmarking code for strings and memory routines
- strlen

Assets 2

29 May 13:28

nsz-arm

v20.05

ef907c7

v20.05 release

New functionality (64-bit Arm)
- string: Optimized MTE variants of strlen, strnlen, strchr, strchrnul, strrchr, memchr, memrchr, strcpy, stpcpy, strcmp, strncmp
- string: Changes to support BTI
- string: New optimized memrchr, strnlen
Performance improvements (Neoverse N1)
- strchr/strchrnul: 21% improvement on long strings
- strrchr: 11% improvement
- strnlen: 130% improvement on long strings, 50% on short strings
Benchmark and tests
- string: New memcpy benchmark
- string: Cleanup testsuite and improve test coverage

Assets 2

Releases: ARM-software/optimized-routines

v26.01

Uh oh!

v25.07

AOR

Additions

Changes

fp

Additions

Changes

math

Additions

Changes

networking

Changes

string

Additions

Changes

Uh oh!

v25.01

Uh oh!

v24.05 release

Uh oh!

v24.01 release

Uh oh!

v23.01 release

Uh oh!

v21.02 release

Uh oh!

v20.11 release

Uh oh!

v20.08 release

Uh oh!

v20.05 release

Uh oh!