feat(dom-rules): Add 300+ dom rules#931
Conversation
🦋 Changeset detectedLatest commit: 635f45a The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
There was a problem hiding this comment.
1 issue found across 5 files
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="src/utils/constants/dom-rules.ts">
<violation number="1" location="src/utils/constants/dom-rules.ts:152">
P2: findMatchingSelectors returns empty results while domRules is still loading asynchronously, and call sites use it synchronously. Early DOM scans can skip domain-specific blocking rules until the JSON import finishes, changing behavior from the previous synchronous constants.</violation>
</file>
Since this is your first cubic review, here's how it works:
- cubic automatically reviews your code and comments on bugs and improvements
- Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
- Ask questions if you need clarification on any suggestion
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
PR Review: feat(dom-rules): Add 300+ dom rulesOverall Assessment🟢 Good addition - Bringing in 360+ website-specific DOM rules from a fork is valuable for improving translation accuracy across popular sites. The refactor to JSON-based configuration makes rules easier to maintain. Key InsightsData structure: JSON config with two rule types ( Complexity: The glob-to-regex pattern matching is the main logic addition. Implementation is straightforward but has edge cases. Risk points:
Issues to Address1. Potential Breaking Change: YouTube rules removedThe original `.${SUBTITLES_VIEW_CLASS}`,
`.${STATE_MESSAGE_CLASS}`,
`.${TRANSLATE_BUTTON_CLASS}`,These dynamic selectors based on JS constants are NOT present in the JSON file. The YouTube entry in "www.youtube.com": [".ytp-caption-segment"]This is a regression - the extension's own subtitle-related elements won't be excluded from translation. Suggestion: Either add the actual class values to the JSON, or create a mechanism to merge runtime constants with JSON rules for YouTube. 2. Performance: Pattern matching on every element
Suggestion: Cache the matched selectors per URL at the start of page translation rather than re-computing per element. 3. Massive duplication in JSONTwitter/X rules are duplicated 10+ times for different subdomains. Same for Stack Exchange sites. Suggestion: Consider using wildcard patterns more aggressively: "*.twitter.com": [...],
"*.x.com": [...],Or introduce rule inheritance/aliases to reduce the 2200-line JSON file. 4. Missing tests for new pattern matching logic
The existing tests in 5. Edge case bug in matchUrlPatternif (!pattern.startsWith('http://') && !pattern.startsWith('https://')) {
return regex.test(url.replace(/^https?:\/\//, ''))
}If pattern doesn't start with protocol but URL does, it strips protocol from URL. But if pattern contains path like Minor Issues
if (!domRules)
// Not initialized yet
return []Comment on wrong line - should be above or inline.
"https://www.reddit.com/r/*/comments/*/*": [...]While others use just hostnames. This inconsistency works due to the pattern matching but reduces clarity. What's Good
RecommendationWorth merging with fixes:
The core value (300+ rules) is solid. The implementation needs minor hardening before merge. |
|
can I ask where are these rules originally from? |
Sorry for the late reply @TianmuTNT reversed some rules from the "Immersive Translate" plugin and wrote a script to convert to the read-frog format. |
Code Review: PR #931 - feat(dom-rules): Add 300+ dom rulesSummaryThis PR adds 360+ website-specific DOM exclusion rules from @TianmuTNT's fork and refactors the rule loading system to use a JSON configuration file with wildcard URL pattern matching. 🟢 Good taste overallThe approach of externalizing rules to JSON is sound - it makes rules more maintainable and easier to contribute to without modifying TypeScript code. ✅ What Works Well
🔴 Issues to Address1. Critical: Massive selector array duplicationThe Twitter/X rules are duplicated across 10+ domains with identical selectors:
Recommendation: Use the wildcard support to consolidate: "*.twitter.com": [...],
"*.x.com": [...],
"twitter.com": [...],
"x.com": [...]Same issue with Stack Exchange sites (stackoverflow, superuser, askubuntu, serverfault, *.stackexchange.com), Quora domains, and Telegram domains. DRY violation - ~500+ lines could be reduced to ~100. 2. Performance concern: Linear pattern matching
for (const [pattern, selectors] of Object.entries(ruleset || {})) {
if (matchUrlPattern(url, pattern) || matchUrlPattern(hostname, pattern)) {
return selectors
}
}With 300+ patterns, this could be slow. The function correctly checks exact hostname match first (fast path), but consider:
3. Fragile CSS class selectorsMany selectors use generated class names that may change: "[class='css-901oao r-14j79pv r-37j5jr r-n6v787 r-16dba41 r-1cwl3u0 r-bcqeeo r-qvutc0']"
"[class='css-175oi2r r-1habvwh r-vqp9x9 r-1q9bdsx r-1loqt21 r-9njtsq r-1wtj0ep r-nsbfu8 r-xbdcod r-13c7hvr']"These are brittle and will break when sites update their CSS bundler config. Consider using 4. Missing tests for new functionalityThe Important: Add tests for:
🟡 Suggestions for Improvement
`${YOUTUBE_NATIVE_SUBTITLES_CLASS}`,
`.${SUBTITLES_VIEW_CLASS}`,
`.${STATE_MESSAGE_CLASS}`,
`.${TRANSLATE_BUTTON_CLASS}`,These dynamic class references seem lost in the migration. Verify these are handled elsewhere. Test Coverage
SecurityNo security concerns identified. The rules are static JSON data with CSS selectors. Final VerdictApprove with changes requested. The core refactoring is good, but:
The bulk of the work (collecting 300+ rules) is valuable for the community. Just needs some polish before merging. |
| - News and media sites (Reuters, CNBC, New York Times) | ||
| - And many more | ||
|
|
||
| Also implemented wildcard pattern matching (`*`, `**`) and JSON-based configuration to support flexible URL patterns. |
|
|
||
| export function isCustomDontWalkIntoElement(element: HTMLElement): boolean { | ||
| const dontWalkIntoElementSelectorList = CUSTOM_DONT_WALK_INTO_ELEMENT_SELECTOR_MAP[window.location.hostname] ?? [] | ||
| const dontWalkIntoElementSelectorList = findMatchingSelectors('dontWalkIntoSelectors', window.location.href) |
There was a problem hiding this comment.
findMatchingSelectors 并非 O(1) 会有严重性能问题
|
|
||
| export function isCustomForceBlockTranslation(element: HTMLElement): boolean { | ||
| const forceBlockSelectorList = CUSTOM_FORCE_BLOCK_TRANSLATION_SELECTOR_MAP[window.location.hostname] ?? [] | ||
| const forceBlockSelectorList = findMatchingSelectors('forceBlockTranslationSelectors', window.location.href) |
There was a problem hiding this comment.
findMatchingSelectors 并非 O(1) 会有严重性能问题,应该作为一个参数传递
| - News and media sites (Reuters, CNBC, New York Times) | ||
| - And many more | ||
|
|
||
| Also implemented wildcard pattern matching (`*`, `**`) and JSON-based configuration to support flexible URL patterns. |
In fact this is not generated by a script from @TianmuTNT
|
|
能不能给原仓库开发者开放一下权限:
关于你的 |
🤔 |
或许你可以直接复制粘贴一下上面 @ananaBMaster 的代码hhh |
|
奇怪,看到 github 的确实是 不过我也看到他们有 是不是这两者结合有什么机制 目前看效果,应该 Github 的这个例子不太对的感觉。 |
我认为 规则是这样运行的 选择 selectors 里的所有元素, 然后排除能被 additionalExcludeSelectors 选中的元素 现在这个 pr 只排除了 additionalExcludeSelectors, 导致有的东西被错误翻译了 我没看过沉浸式翻译的代码, 仅猜测 |
我也是这样想的,就是他们有某种集合相互影响的机制,可能某个会覆盖另外一个规则。 所以这个 PR 合并的话,还需要实现一下这个机制才行 |
Co-authored-by: MengXi <contact@mengxi.work>
|
BTW, 看上去 eslint 不太喜欢用 .gitignorre import { includeIgnoreFile } from '@eslint/compat'
const __filename = fileURLToPath(import.meta.url)
const __dirname = path.dirname(__filename)
const gitignorePath = path.resolve(__dirname, '.gitignore')
export default antfu({
/*...*/
}).append(includeIgnoreFile(gitignorePath, 'Imported .gitignore patterns')).append(includeIgnoreFile(gitExcludePath, 'Imported .git/info/exclude patterns')) |
现在这个仓库里的代码是 @TianmuTNT Vibe 的 |
让 eslint 忽略对应文件应该就可以 |
一个是可以用 trie,另一个是可以只跑一次,把结果作为参数往下传递。 |
Code Review: feat(dom-rules): Add 300+ dom rulesOverall AssessmentThe feature concept is valuable -- moving from a small hardcoded map to a comprehensive JSON-based rule system with glob pattern matching is the right direction. However, there are several significant issues that should be addressed before merging. Critical Issues
Performance Concerns
Code Quality
Test Coverage
Security
Suggestions
|






Type of Changes
Description
dom-rules.tsfor loading rules from json (I have merged the exist rules into the json file)Related Issue
None
How Has This Been Tested?
Verified via a manually smoke test
Screenshots
None
Checklist
I think I cannot access the source code of page https://www.readfrog.app/zh/tutorial/code-contribution/custom-dom-rules
So I cannot modify the docs.
Additional Information
Summary by cubic
Added 360+ website-specific DOM exclusion and block-translation rules, loaded from a JSON config with wildcard URL matching. This improves translation accuracy across popular sites and makes rules easier to maintain.
New Features
Refactors
Written for commit 635f45a. Summary will update on new commits.