[UNDERTOW-2655] Fix text corruption in FileUtils.readFile when reading multi-byte characters#1834
Conversation
fl4via
left a comment
There was a problem hiding this comment.
Hi @finalchild ! thanks for your PR, can you please create a test for the fix?
3e2adaf to
28f3eeb
Compare
|
@fl4via |
28f3eeb to
b12583b
Compare
4120ee7 to
8c5b88b
Compare
|
hello @finalchild ! We are doing a big bulky task that is backporting of PRs amongst all the branches (and between undertow <-> undertow-ee) the fixes that were added to a few of them (basically, 2.2.x and 2.3.x). There are also PRs in 2.4.x that were not present in main, so I'm creating PRs for those. Once we are done with that task, we will release 2.4.0.Beta1 with the same fixes that went in 2.3.23.Final and 2.2.39.Final. Doing it this way helps us keep the releases consistent in terms of the fixes that are contained each one of them. As soon as this is done, we will review and merge all the withheld PRs, including yours. I know we have a big number of PRs in the line right now, I kindly ask you to bear with me just a little longer and then the fixes will be fully processed. After all that, 2.3.24.Final, 2.2.40.Final and 2.4.0.Beta2 will follow, containing all the great work from community contributors and project maintainers. That way, your fix will be available to all Undertow users. In terms of time frame, it all depends on how fast I can move with the PRs, but I am hoping it won't take long. Thank you for your great contribution! |
|
Hi @finalchild ! Again, thank you for your PR. We are finally at the moment where we will be merging and backporting it. I appreciate you waiting this long. I hope it is worthwhile when we have Undertow 2.4.0.Final released with all the pending fixes in the PR line. |
…g multi-byte characters The readFile method was reading the InputStream into a fixed-size byte buffer and decoding each chunk independently. This caused multi-byte UTF-8 character sequences to be split across buffer boundaries, resulting in text corruption with replacement characters. Replaced BufferedInputStream with InputStreamReader to handle buffering and character decoding together in a streaming fashion, ensuring multi-byte character sequences are never split. This issue became more significant after UNDERTOW-2337, as large form-data field values are now processed by this function. Originally reported in Spring Framework issue #35292. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Summary
Fixes text corruption in
FileUtils.readFilewhen reading multi-byte UTF-8 characters.Problem: The original implementation read the
InputStreaminto a fixed-size byte buffer (1024 bytes) and decoded each chunk independently. When a multi-byte character sequence was split across a buffer boundary, the decoder received incomplete character data, resulting in replacement characters (�) in the final string.Solution: Replaced
BufferedInputStreamwithInputStreamReaderto handle buffering and character decoding together in a streaming fashion, ensuring multi-byte character sequences are never split.Note: The implementation is copied from Java 25's
InputStreamReader#readAllAsString.This issue became more significant after fixing UNDERTOW-2337, as large form-data field values are now processed by this vulnerable function. Originally reported in Spring Framework issue #35292.
Issue: UNDERTOW-2655