Skip to content

cut: fix -s flag for newline delimiter and optimize memory allocation#11143

Merged
cakebaker merged 1 commit intouutils:mainfrom
akervald:fix-cut-newline-s-flag
Mar 4, 2026
Merged

cut: fix -s flag for newline delimiter and optimize memory allocation#11143
cakebaker merged 1 commit intouutils:mainfrom
akervald:fix-cut-newline-s-flag

Conversation

@akervald
Copy link
Contributor

@akervald akervald commented Feb 27, 2026

Fixes & Improvements

  • -slogic fix: Add the missing only_delimited check to properly suppress non-delimited lines.
  • Field-Level Streaming: Replace whole-file split().collect() with a memchr-powered loop. This shifts memory complexity from O(Total File Size) to O(Max Field Size) - as "OOM-safe" as the specification allows.
  • Zero-Allocation Skipping: Bypass unselected fields using BufReader::consume() to avoid heap copies.
  • Sequential Pointer Tracking: Replace nested loops and segments.get() lookups with a single-pass range_idx pointer that synchronizes "Skip" and "Keep" paths in one linear sweep.
  • Early Exit: Terminate I/O immediately once the highest requested field is processed.
  • Edge Case Support: Correctly handle single lines lacking a trailing newline.

Benchmarks

10,000,000 records (seq 1 10000000 > bench_input.txt), base M1 Pro chip.

Case 1: Filtered Selection with Early Exit (-s -d $'\n' -f 2,1024,4096)

Command Mean [ms] Min [ms] Max [ms] Relative
gcut 330.8 ± 5.6 326.5 341.5 169.60 ± 30.28
./cut_old 437.1 ± 9.1 430.7 456.6 224.09 ± 40.10
./cut_new 2.0 ± 0.3 1.3 3.3 1.00

Result: ~224x faster than cut_old, ~170x faster than GNU cut.

Case 2: Full File Read / Base Throughput (-s -d $'\n' -f 1-10000000)

Command Mean [ms] Min [ms] Max [ms] Relative
gcut 676.7 ± 5.1 673.7 689.3 3.96 ± 0.13
./cut_old 527.0 ± 16.8 517.9 573.0 3.08 ± 0.14
./cut_new 171.1 ± 5.5 168.9 192.1 1.00

Result: ~3x faster than cut_old, ~4x faster than GNU cut.

References

@github-actions
Copy link

GNU testsuite comparison:

GNU test failed: tests/cut/cut. tests/cut/cut is passing on 'main'. Maybe you have to rebase?
Congrats! The gnu test tests/misc/io-errors is no longer failing!
Congrats! The gnu test tests/tail/tail-n0f is now passing!

@akervald akervald force-pushed the fix-cut-newline-s-flag branch 2 times, most recently from f82da8d to cb8cddd Compare February 27, 2026 15:56
@codspeed-hq
Copy link

codspeed-hq bot commented Feb 27, 2026

Merging this PR will improve performance by 45.19%

⚡ 1 improved benchmark
✅ 301 untouched benchmarks
🆕 2 new benchmarks
⏩ 42 skipped benchmarks1

Performance Changes

Mode Benchmark BASE HEAD Efficiency
🆕 Simulation cut_fields_newline_delim N/A 189.8 µs N/A
🆕 Memory cut_fields_newline_delim N/A 67.8 KB N/A
Memory cut_fields_custom_delim 67.8 KB 46.7 KB +45.19%

Comparing akervald:fix-cut-newline-s-flag (fe4e36b) with main (f335d14)

Open in CodSpeed

Footnotes

  1. 42 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@akervald akervald closed this Feb 27, 2026
@akervald akervald reopened this Feb 27, 2026
@github-actions
Copy link

GNU testsuite comparison:

GNU test failed: tests/cut/cut. tests/cut/cut is passing on 'main'. Maybe you have to rebase?
Skipping an intermittent issue tests/date/date-locale-hour (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/pr/bounded-memory (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)
Note: The gnu test tests/unexpand/bounded-memory is now being skipped but was previously passing.
Congrats! The gnu test tests/tail/tail-n0f is now passing!

@akervald akervald force-pushed the fix-cut-newline-s-flag branch from 10358f4 to 34eee51 Compare February 27, 2026 17:14
@github-actions
Copy link

GNU testsuite comparison:

GNU test failed: tests/cut/cut. tests/cut/cut is passing on 'main'. Maybe you have to rebase?
Skipping an intermittent issue tests/date/date-locale-hour (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/tty/tty-eof (passes in this run but fails in the 'main' branch)
Note: The gnu test tests/cut/bounded-memory is now being skipped but was previously passing.
Congrats! The gnu test tests/tail/tail-n0f is now passing!

@akervald akervald force-pushed the fix-cut-newline-s-flag branch from 34eee51 to 1a2b3d4 Compare February 28, 2026 09:35
@github-actions
Copy link

GNU testsuite comparison:

GNU test failed: tests/cut/cut. tests/cut/cut is passing on 'main'. Maybe you have to rebase?
Skip an intermittent issue tests/tail/follow-name (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/date/resolution (passes in this run but fails in the 'main' branch)
Note: The gnu test tests/cp/link-heap is now being skipped but was previously passing.
Note: The gnu test tests/dd/no-allocate is now being skipped but was previously passing.
Note: The gnu test tests/pr/bounded-memory is now being skipped but was previously passing.
Note: The gnu test tests/tail/tail-n0f is now being skipped but was previously passing.
Congrats! The gnu test tests/expand/bounded-memory is now passing!
Note: The gnu test tests/env/env-signal-handler was skipped on 'main' but is now failing.

@akervald akervald marked this pull request as draft February 28, 2026 10:36
@akervald akervald force-pushed the fix-cut-newline-s-flag branch 2 times, most recently from 95bad10 to 76b271e Compare February 28, 2026 11:57
@akervald akervald marked this pull request as ready for review February 28, 2026 11:58
@akervald akervald requested a review from cakebaker February 28, 2026 12:03
@github-actions
Copy link

GNU testsuite comparison:

GNU test failed: tests/cut/cut. tests/cut/cut is passing on 'main'. Maybe you have to rebase?
Skip an intermittent issue tests/pr/bounded-memory (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/tail/follow-name (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/date/resolution (passes in this run but fails in the 'main' branch)
Note: The gnu test tests/cut/cut-huge-range is now being skipped but was previously passing.
Congrats! The gnu test tests/expand/bounded-memory is now passing!
Note: The gnu test tests/env/env-signal-handler was skipped on 'main' but is now failing.

@akervald akervald force-pushed the fix-cut-newline-s-flag branch from 76b271e to a5a3666 Compare February 28, 2026 14:11
@github-actions
Copy link

GNU testsuite comparison:

GNU test failed: tests/cut/cut. tests/cut/cut is passing on 'main'. Maybe you have to rebase?
Skip an intermittent issue tests/date/resolution (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/tail/symlink (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/tty/tty-eof (passes in this run but fails in the 'main' branch)
Congrats! The gnu test tests/misc/io-errors is no longer failing!
Congrats! The gnu test tests/timeout/timeout-group is no longer failing!
Note: The gnu test tests/seq/seq-epipe is now being skipped but was previously passing.
Congrats! The gnu test tests/tail/tail-n0f is now passing!

@akervald akervald force-pushed the fix-cut-newline-s-flag branch 2 times, most recently from 4821938 to 69c5cfe Compare February 28, 2026 16:45
@github-actions
Copy link

GNU testsuite comparison:

GNU test failed: tests/cut/cut. tests/cut/cut is passing on 'main'. Maybe you have to rebase?
Note: The gnu test tests/tail/pipe-f is now being skipped but was previously passing.
Congrats! The gnu test tests/cp/link-heap is now passing!
Congrats! The gnu test tests/seq/seq-epipe is now passing!

@akervald akervald force-pushed the fix-cut-newline-s-flag branch 3 times, most recently from 0f4f795 to fc991a1 Compare February 28, 2026 17:42
@github-actions
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/date/resolution (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/pr/bounded-memory (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/date/date-locale-hour (passes in this run but fails in the 'main' branch)
Congrats! The gnu test tests/misc/io-errors is no longer failing!
Congrats! The gnu test tests/tail/retry is no longer failing!

@akervald
Copy link
Contributor Author

Hi @cakebaker, the tests passed, but the benchmark failed due to an infrastructure issue. Could you please re-run that job? Thanks!

@akervald
Copy link
Contributor Author

akervald commented Feb 28, 2026

@sylvestre I noticed Attempt №3 was cancelled. Since I don't have permissions to trigger the CI/CD jobs myself, could you let me know if there’s a specific fix I need to make, or if you could re-run the checks when the environment is ready? Thanks!

@akervald akervald force-pushed the fix-cut-newline-s-flag branch 2 times, most recently from 2c55e4a to 261a1fd Compare March 3, 2026 10:36
@github-actions
Copy link

github-actions bot commented Mar 3, 2026

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/symlink (passes in this run but fails in the 'main' branch)
Note: The gnu test tests/tail/tail-n0f is now being skipped but was previously passing.

@akervald akervald force-pushed the fix-cut-newline-s-flag branch from d9727d9 to f70f671 Compare March 3, 2026 12:24
@github-actions
Copy link

github-actions bot commented Mar 3, 2026

GNU testsuite comparison:

Skip an intermittent issue tests/cut/bounded-memory (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/date/date-locale-hour (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/tail/symlink (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/tty/tty-eof (passes in this run but fails in the 'main' branch)

@akervald
Copy link
Contributor Author

akervald commented Mar 3, 2026

@cakebaker should be ready for a review/merge

@akervald akervald force-pushed the fix-cut-newline-s-flag branch 2 times, most recently from 1719399 to f7a0636 Compare March 3, 2026 15:18
@github-actions
Copy link

github-actions bot commented Mar 3, 2026

GNU testsuite comparison:

Skipping an intermittent issue tests/pr/bounded-memory (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/tail/symlink (passes in this run but fails in the 'main' branch)
Note: The gnu test tests/cp/link-heap is now being skipped but was previously passing.
Note: The gnu test tests/rm/many-dir-entries-vs-OOM is now being skipped but was previously passing.

@akervald akervald force-pushed the fix-cut-newline-s-flag branch 2 times, most recently from a21f3c1 to 9213a87 Compare March 3, 2026 15:31
@github-actions
Copy link

github-actions bot commented Mar 3, 2026

GNU testsuite comparison:

Skipping an intermittent issue tests/pr/bounded-memory (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/tail/symlink (passes in this run but fails in the 'main' branch)
Note: The gnu test tests/cp/link-heap is now being skipped but was previously passing.
Note: The gnu test tests/tail/tail-n0f is now being skipped but was previously passing.

@akervald akervald force-pushed the fix-cut-newline-s-flag branch 4 times, most recently from deea6e9 to 2cabf8e Compare March 3, 2026 17:19
@github-actions
Copy link

github-actions bot commented Mar 4, 2026

GNU testsuite comparison:

Skip an intermittent issue tests/date/date-locale-hour (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/tail/inotify-dir-recreate (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/tty/tty-eof (passes in this run but fails in the 'main' branch)
Congrats! The gnu test tests/tail/retry is no longer failing!
Note: The gnu test tests/tail/tail-n0f is now being skipped but was previously passing.

- Fixed the -s flag incorrectly suppressing output when the delimiter is a newline.
- Improved performance in cut_fields_newline_char_delim.
- Updated tests to match GNU cut behavior for newline delimiters.
@akervald akervald force-pushed the fix-cut-newline-s-flag branch from 2cabf8e to fe4e36b Compare March 4, 2026 07:22
@github-actions
Copy link

github-actions bot commented Mar 4, 2026

GNU testsuite comparison:

Skip an intermittent issue tests/date/date-locale-hour (fails in this run but passes in the 'main' branch)
Note: The gnu test tests/cut/bounded-memory is now being skipped but was previously passing.
Note: The gnu test tests/seq/seq-epipe is now being skipped but was previously passing.

@akervald akervald requested a review from cakebaker March 4, 2026 08:30
@cakebaker cakebaker merged commit 9bbb58b into uutils:main Mar 4, 2026
163 checks passed
@cakebaker
Copy link
Contributor

Thanks for your PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cut: incorrect delimiter handling when delimiter is newline

2 participants