-
Notifications
You must be signed in to change notification settings - Fork 88
chore(core): Add utfcpp library as a submodule. #701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe pull request introduces the Changes
Sequence DiagramsequenceDiagram
participant Build System
participant Dependency Manager
participant Checksum Validator
participant Downloader
Build System->>Dependency Manager: Request utfcpp dependency
Dependency Manager->>Checksum Validator: Validate existing checksum
Checksum Validator-->>Dependency Manager: Checksum validation result
Dependency Manager->>Downloader: Download utfcpp library
Downloader-->>Dependency Manager: Download complete
Dependency Manager->>Checksum Validator: Compute new checksum
Checksum Validator-->>Dependency Manager: Checksum generated
Dependency Manager-->>Build System: Dependency ready
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
.gitmodules (1)
32-34: Consider adding shallow clone flag for the utfcpp submodule.The submodule entry looks good, but consider adding
shallow = trueto optimize clone size and performance, similar to how it's configured for the json submodule.[submodule "components/core/submodules/utfcpp"] path = components/core/submodules/utfcpp url = https://github.com/nemtrif/utfcpp.git + shallow = true
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
.gitmodules(1 hunks)components/core/submodules/utfcpp(1 hunks)deps-tasks.yml(5 hunks)
✅ Files skipped from review due to trivial changes (1)
- components/core/submodules/utfcpp
⏰ Context from checks skipped due to timeout of 90000ms (11)
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: ubuntu-focal-static-linked-bins
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: ubuntu-focal-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: build-macos (macos-14, false)
- GitHub Check: build-macos (macos-14, true)
- GitHub Check: build-macos (macos-13, false)
- GitHub Check: build-macos (macos-13, true)
- GitHub Check: lint-check (macos-latest)
🔇 Additional comments (4)
deps-tasks.yml (4)
20-20: LGTM!The checksum file variable follows the established naming convention.
42-42: LGTM!The utfcpp checksum file is correctly integrated into the core task's sources and commands.
Also applies to: 58-58
99-99: LGTM!The utfcpp task is correctly added to the sequential execution list in all-internal-deps.
376-404: LGTM!The utfcpp task is well-structured and follows the established pattern:
- Uses the latest stable version (4.0.6)
- Downloads from the official GitHub releases
- Implements all required task components (checksum validation, dependency download, etc.)
LinZhihao-723
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look good to me. A few things to comment:
- I understand that we're having discussions offline about whether we should enforce the UTF8 encoding, but I think it is still good to have utils that can efficiently validate UTF8 byte sequence with error handling options (such as replacing invalid chars by the dedicated placeholder). Therefore, I think this PR should be merged and we should add the planned UTF8 validation features into CLP core.
- That said, can we create a feature request with a description of our plan? That would make it easier to track the implementation status.
I will approve the PR once the feature request is ready
LinZhihao-723
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR title lgtm.
Description
This PR adds the utfcpp library to this project as a submodule. This library provides the capability to replace invalid sequences in utf-8 strings with a replacement character; this functionality will be used in a follow-up PR to create a validating UTF-8 parser class that aims to efficiently validate UTF-8 and allow users to control how illegal UTF-8 is handled (e.g. by returning an error, or replacing invalid code sequences with a replacement character).
This library was chosen because it seems mature, and the main alternative (the official icu library) seems a bit heavy-weight to pull in for this use-case. simdjson is not viable here because its invalid sequence replacment logic is tied to JSON unescaping, and simdutf can not be used because (surprisingly) it only provides UTF validation and conversion not invalid UTF-8 replacement.
Validation performed
deps:coretask passesSummary by CodeRabbit
New Features
Chores