Fix snippet generation failure in some cases with Chinese characters #447

edgeinfinity1 · 2025-09-10T17:59:38Z

Fixes #0000

Changes proposed in this pull request:
In some uncertain cases, uploaded text files containing Chinese(and possibly other special characters) will fail to generate snippets. I tried to track the procedure and found that the preg_split \R will result in some corrupted characters in $lines.
The reason is not clear to me, but by changing it to /\r\n|\n|\r/ it works well as intended.

Reviewers should focus on:
I'm not sure why this fix could work, since the new expression should be equal with \R.

Screenshot
Before fix:

After fix:

Confirmed

Frontend changes: tested on a local Flarum installation.
Backend changes: tests are green (run composer test).

Required changes:

Related Flarum core extension PR's: (Omit this section if irrelevant)

Copilot

Pull Request Overview

This PR fixes snippet generation failures for text files containing Chinese characters and other special characters. The issue was caused by the \R regex pattern in preg_split corrupting characters during line splitting, which prevented proper snippet generation for affected files.

Replaces the \R regex pattern with explicit line break patterns (/\r\n|\n|\r/) in the text preview formatter
Resolves character corruption issues that prevented snippet generation for files with Chinese characters

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

clarkwinkelmann · 2025-09-12T22:10:48Z

Could this be a case where mb_split should be used?

I don't see why a regular split by newline characters could break a multibyte UTF string, unless some Unicode characters use one of the newline codes as their second byte?

I'm not sure where the full pattern definition can be found, but if my duck.ai answer is correct it seems like \R also matches \x0B, \f, \u{2028} and \u{2029}. Could one of those codes specifically be causing issue?

clarkwinkelmann · 2025-09-12T22:12:48Z

Also, while I think about it, can we get some example text that causes issues in case someone ever writes tests for this?

edgeinfinity1 · 2025-09-13T05:29:37Z

Could this be a case where mb_split should be used?

I tested it just now, and mb_split('\R', ...) works fine. Should I apply this to the PR?

Also, while I think about it, can we get some example text that causes issues in case someone ever writes tests for this?

Here's an example to reproduce the issue:
ftbe.txt

imorland · 2025-09-16T16:19:54Z

Could this be a case where mb_split should be used?

I tested it just now, and mb_split('\R', ...) works fine. Should I apply this to the PR?

Also, while I think about it, can we get some example text that causes issues in case someone ever writes tests for this?

Here's an example to reproduce the issue: ftbe.txt

Yes, please use mb_split - once that's done this is good to merge. Thanks for your contribution!

edgeinfinity1 · 2025-09-17T04:13:56Z

Changed as requested.

Update FormatTextPreview.php

acd2449

edgeinfinity1 requested a review from a team as a code owner September 10, 2025 17:59

DavideIadeluca requested a review from Copilot September 10, 2025 18:23

Copilot AI reviewed Sep 10, 2025

View reviewed changes

Update FormatTextPreview.php

9aba1aa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix snippet generation failure in some cases with Chinese characters #447

Fix snippet generation failure in some cases with Chinese characters #447

Uh oh!

edgeinfinity1 commented Sep 10, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

clarkwinkelmann commented Sep 12, 2025

Uh oh!

clarkwinkelmann commented Sep 12, 2025

Uh oh!

edgeinfinity1 commented Sep 13, 2025 •

edited

Loading

Uh oh!

imorland commented Sep 16, 2025

Uh oh!

edgeinfinity1 commented Sep 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Fix snippet generation failure in some cases with Chinese characters #447

Are you sure you want to change the base?

Fix snippet generation failure in some cases with Chinese characters #447

Uh oh!

Conversation

edgeinfinity1 commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

clarkwinkelmann commented Sep 12, 2025

Uh oh!

clarkwinkelmann commented Sep 12, 2025

Uh oh!

edgeinfinity1 commented Sep 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

imorland commented Sep 16, 2025

Uh oh!

edgeinfinity1 commented Sep 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

edgeinfinity1 commented Sep 10, 2025 •

edited

Loading

edgeinfinity1 commented Sep 13, 2025 •

edited

Loading