Skip to content

Optimize Fw::StringUtils::string_length with SWAR algorithm#4789

Open
vsoulgard wants to merge 3 commits intonasa:develfrom
vsoulgard:fw-types-string-length-swar
Open

Optimize Fw::StringUtils::string_length with SWAR algorithm#4789
vsoulgard wants to merge 3 commits intonasa:develfrom
vsoulgard:fw-types-string-length-swar

Conversation

@vsoulgard
Copy link
Contributor

Related Issue(s) #4788
Has Unit Tests (y/n) y
Documentation Included (y/n) n
Generative AI was used in this contribution (y/n) y

Change Description

This PR implements the performance optimization proposed in #4788. It replaces the naive byte-by-byte string length calculation with a SWAR (SIMD Within A Register) approach using word-sized and bitwise NUL char detection.

Details:

  • Processes the initial unaligned head byte-by-byte to ensure word-alignment before the main loop.
  • Uses standard bitwise operations to detect NUL char in parallel within a register.
  • For short strings (word-sized or smaller) a simple byte-by-byte loop is used to avoid overhead.
  • Added macro NO_ASAN to suppress false positives from AddressSanitizer for trusted functions.
  • Added new edge test cases and alignment assert.

Rationale

SWAR reduces the number of loop iterations and memory load instructions, thus improving performance. This technique is widely used in high-performance standard libraries.

Benchmarks:

(Tested on Linux x86_64 with GCC 14.2.0)

Benchmark Results (aligned) benchmark32 benchmark1024

New version is slightly slower for very short strings, but a lot faster for long ones:
N=2 - 1.28 vs. 2.01 (ns)
N=4 - 1.76 vs. 2.27 (ns)
N=8 - 3.06 vs. 3.01 (ns)
N=16 - 6.05 vs. 3.43 (ns)
N=32 - 19.30 vs. 4.31 (ns)
N=64 - 27.97 vs. 6.37 (ns)
N=128 - 53.45 vs. 10.36 (ns) (x5 faster)
N=256 - 100.27 vs. 22.29 (ns)
N=512 - 191.22 vs. 42.47 (ns)

Benchmark Results (unaligned) benchmark32_unaligned benchmark1024_unaligned

Unaligned calculations are slower than aligned ones, because of head byte-by-byte cycle, but still faster than old version:
N=2 - 1.33 vs. 1.91 (ns)
N=4 - 2.03 vs. 2.26 (ns)
N=8 - 3.55 vs. 3.02 (ns)
N=16 - 6.65 vs. 5.18 (ns)
N=32 - 17.12 vs. 6.10 (ns)
N=64 - 30.34 vs. 8.44 (ns)
N=128 - 56.18 vs. 12.60 (ns)
N=256 - 103.61 vs. 24.40 (ns)
N=512 - 199.45 vs. 44.11 (ns)

Testing/Review Recommendations

Benchmark Source
static void BM_string_length(benchmark::State& state)
{
    const size_t LEN = state.range(0);
    std::string src_str = std::string(LEN, 'A');
    const char* src = src_str.c_str();

    for (auto _ : state)
    {
        auto result = Fw::StringUtils::string_length(src, LEN);
        benchmark::DoNotOptimize(result);
    }
}

static void BM_string_lengthV2(benchmark::State& state)
{
    const size_t LEN = state.range(0);
    std::string src_str = std::string(LEN, 'A');
    const char* src = src_str.c_str();

    for (auto _ : state)
    {
        auto result = Fw::StringUtils::string_lengthV2(src, LEN);
        benchmark::DoNotOptimize(result);
    }
}

BENCHMARK(BM_string_length)->RangeMultiplier(2)->Range(0, 1024);
BENCHMARK(BM_string_lengthV2)->RangeMultiplier(2)->Range(0, 1024);

BENCHMARK_MAIN();

AI Usage

AI was used for research and testing of ideas. All code implementation, algorithm design, benchmarking, and technical decisions were done by me.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates Fw::StringUtils::string_length to use a SWAR (word-at-a-time) NUL detection approach to reduce iterations and improve performance for longer strings, and adds unit tests intended to cover additional edge cases.

Changes:

  • Replaced byte-by-byte string_length with an aligned, word-at-a-time SWAR implementation (with a short-string fast path).
  • Introduced a NO_ASAN macro and applied it to string_length.
  • Added new unit tests for empty strings, various lengths, large strings, and unaligned pointers.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
Fw/Types/StringUtils.cpp Implements the SWAR-based string_length and applies NO_ASAN.
Fw/Types/Assert.hpp Adds the NO_ASAN macro definition (compiler-attribute based).
Fw/Types/test/ut/TypesTest.cpp Adds new unit tests for string_length edge cases.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +26 to +33
NO_ASAN
FwSizeType Fw::StringUtils::string_length(const CHAR* source, FwSizeType buffer_size) {
FwSizeType length = 0;
FW_ASSERT(source != nullptr);
for (length = 0; length < buffer_size; length++) {

// Fast path for short strings - avoids alignment overhead
if (buffer_size <= sizeof(std::size_t))
{
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string_length is annotated with NO_ASAN, which will suppress AddressSanitizer reports for this function. Since this implementation bounds all reads to buffer_size, ASan findings here are likely to indicate a real caller bug (e.g., passing an incorrect buffer_size) rather than a false positive; disabling ASan could hide those issues. Consider removing NO_ASAN or limiting it to a narrowly-scoped build/config where intentional over-reads actually occur (and document why).

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct statement. But as far as i know even if the passed buffer_size is incorrect, the over-read will be safe, since aligned reading do not cross memory page boundaries.

The reason I added NO_ASAN macro is because an incorrect buffer_size passed in Fw::StringUtils::string_copy, which causes ASan finding when running unit tests:

char* Fw::StringUtils::string_copy(char* destination, const char* source, FwSizeType num) {
    // Handle self-copy and 0 bytes copy
    if (destination == source || num == 0) {
        return destination;
    }
    FW_ASSERT(source != nullptr);
    FW_ASSERT(destination != nullptr);

    // Copying an overlapping range is undefined
    FwSizeType source_len = string_length(source, num) + 1; // <-- HERE
    FW_ASSERT(source + source_len <= destination || destination + num <= source);

    char* returned = strncpy(destination, source, static_cast<size_t>(num));
    destination[num - 1] = '\0';
    return returned;
}

destination length (num) passed as source length into Fw::StringUtils::string_length, which causes ASan finding in cases where destination length (num) is greater than the length of source buffer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me the sanitizer is finding a bug in the unit test code here, and we should fix the unit test, rather than disabling the sanitizer. If I understand what is going on, num should never be greater than the minimum of the sizes of the buffers pointed to by destination and source. The unit test code seems to violate this assumption. In general, if we allow this assumption to be violated by disabling the sanitizer, then we are opening the door to a buffer overrun.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, disabling the sanitizer for an entire function is something we should avoid -- especially for a function like this that does low-level memory manipulation.

Copy link
Collaborator

@bocchino bocchino Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, what if someone calls this function with an actual buffer size of 100 for destination and 1000 for source, and provides 1000 as num? We definitely don't want to disable sanitation in that case. I agree with Copilot here:

Since this implementation bounds all reads to buffer_size, ASan findings here are likely to indicate a real caller bug (e.g., passing an incorrect buffer_size) rather than a false positive; disabling ASan could hide those issues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review! You are right, ASan only triggers if we pass buffer_length greater than the actual buffer size. But that happens not only in Fw::StringUtils::string_copy, but also in StringBase::operator+=:

const CHAR* StringBase::operator+=(const CHAR* src) {
this->appendBuff(src, static_cast<SizeType>(StringUtils::string_length(src, this->getCapacity())));
return this->toChar();
}

It worked well with the original byte-by-byte comparison, but SWAR implementation causes an over-read here (optimized strnlen implementations behave similarly). To avoid breaking things, I suggest sticking with the original version, but adding SWAR implementation as second high-performance (for long strings) option for cases where buffer length is known in advance, if you'd be interested in that approach.
Like:

FwSizeType Fw::StringUtils::string_length_swar(const CHAR* source, FwSizeType max_buffer_size) 

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks. It seems that the SWAR approach is pushing on an issue that has always been in the framework. In the case of the operator+= code cited above, the operation is not safe against a buffer overrun in all cases (even without SWAR), because (1) no buffer size is provided for src, and (2) there is no guarantee that the string pointed to by src is null terminated.

The F Prime team will need to decide how to proceed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants