Skip to content

split: preserve non-UTF-8 bytes in output filename generation#11397

Open
can1357 wants to merge 1 commit intouutils:mainfrom
can1357:split-preserve-non-utf8-bytes-in-output-filename-generation
Open

split: preserve non-UTF-8 bytes in output filename generation#11397
can1357 wants to merge 1 commit intouutils:mainfrom
can1357:split-preserve-non-utf8-bytes-in-output-filename-generation

Conversation

@can1357
Copy link
Contributor

@can1357 can1357 commented Mar 18, 2026

uutils split accepts non-UTF-8 prefix and suffix inputs but converts them with to_string_lossy() when building chunk filenames. GNU keeps pathname bytes intact, while uutils rewrites invalid bytes to UTF-8 replacement characters.

Reproduction Steps

d=$(mktemp -d); p=$(printf "p\377"); printf "AB" | split -b1 - "$d/$p"; ls "$d" | od -An -tx1
# Expected (GNU): 70 ff 61 61 0a 70 ff 61 62 0a
# Actual (uutils): 70 ef bf bd 61 61 0a 70 ef bf bd 61 62 0a

Impact

Chunk files are created under rewritten names instead of the requested byte paths. This breaks GNU compatibility in non-UTF-8 environments and can cause filename collisions or misdirected output files.

@codspeed-hq
Copy link

codspeed-hq bot commented Mar 19, 2026

Merging this PR will improve performance by 6.79%

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 1 improved benchmark
✅ 297 untouched benchmarks
⏩ 48 skipped benchmarks1

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation split_bytes 429.8 µs 402.5 µs +6.79%

Comparing can1357:split-preserve-non-utf8-bytes-in-output-filename-generation (d5421ab) with main (ef8e45c)

Open in CodSpeed

Footnotes

  1. 48 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant