Skip to content

fix(extension): fix hidden elements being translated#958

Open
Yukiniro wants to merge 4 commits intomainfrom
gqfghxwp
Open

fix(extension): fix hidden elements being translated#958
Yukiniro wants to merge 4 commits intomainfrom
gqfghxwp

Conversation

@Yukiniro
Copy link
Collaborator

@Yukiniro Yukiniro commented Feb 10, 2026

Type of Changes

  • ✨ New feature (feat)
  • 🐛 Bug fix (fix)
  • 📝 Documentation change (docs)
  • 💄 UI/style change (style)
  • ♻️ Code refactoring (refactor)
  • ⚡ Performance improvement (perf)
  • ✅ Test related (test)
  • 🔧 Build or dependencies update (build)
  • 🔄 CI/CD related (ci)
  • 🌐 Internationalization (i18n)
  • 🧠 AI model related (ai)
  • 🔄 Revert a previous commit (revert)
  • 📦 Other changes that do not modify src or test files (chore)

Description

Related Issue

Closes #927

How Has This Been Tested?

  • Added unit tests
  • Verified through manual testing

Screenshots

Checklist

  • I have tested these changes locally
  • I have updated the documentation accordingly if necessary
  • My code follows the code style of this project
  • My changes do not break existing functionality
  • If my code was generated by AI, I have proofread and improved it as necessary.

Additional Information


Summary by cubic

Prevents the extension from translating hidden content. Screen-reader-only and visually hidden text is now excluded from traversal, translation, and text extraction.

  • Bug Fixes
    • Exclude elements with sr-only, visually-hidden, aria-hidden, display:none, visibility:hidden, and non-content tags (e.g., SCRIPT).
    • Keep notranslate and CODE in the “don’t walk into but translate as child” path; remove sr-only from that path.
    • Added unit tests for filter and traversal, covering hidden element exclusion.

Written for commit 126b1ff. Summary will update on new commits.

@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Feb 10, 2026
@changeset-bot
Copy link

changeset-bot bot commented Feb 10, 2026

🦋 Changeset detected

Latest commit: 126b1ff

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@read-frog/extension Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@github-actions github-actions bot added the fix label Feb 10, 2026
@dosubot
Copy link

dosubot bot commented Feb 10, 2026

Related Documentation

2 document(s) may need updating based on files changed in this PR:

Read Frog - Open Source Immersive Translate

Auto Translation Features
View Suggested Changes
@@ -202,7 +202,6 @@
 This approach ensures that translation works for all iframe content, including complex sites like edX, embedded widgets, and dynamically inserted frames.
 
 ### How Shadow DOM translation and style injection works
-
 Read Frog now fully supports translating and styling content inside Shadow DOMs, such as those used by custom web components. Previously, translation styles were only injected at the document level, which meant translated content inside Shadow Roots was not styled correctly due to style encapsulation.
 
 **Key features of Shadow DOM support:**
@@ -229,7 +228,7 @@
 - Exclusion of code listings and custom elements from translation traversal.
 - Exclusion of Reddit-specific UI/accessibility elements.
 - **Exclusion of decorative and non-content elements (SVG, SCRIPT, STYLE, etc.):** Elements that are not meant to be translated, such as SVG graphics, script tags, and style tags, are now filtered out during translation traversal. This prevents visual artifacts (such as stray dots) and ensures only meaningful content is translated.
-- **Exclusion of elements with `aria-hidden="true"`:** Elements marked as aria-hidden are now skipped during translation traversal. This prevents hidden or accessibility-only elements from affecting block/inline determination and ensures only visible content is translated. This fix resolves layout issues on sites like Twitter where aria-hidden elements previously caused incorrect translation grouping.
+- **Exclusion of elements with `aria-hidden="true"`, `sr-only`, or `visually-hidden` classes:** Elements marked as aria-hidden or with classes commonly used to visually hide content (such as `sr-only` and `visually-hidden`) are now skipped during translation traversal. This prevents hidden or accessibility-only elements from affecting block/inline determination and ensures only visible content is translated. This fix resolves layout issues on sites like Twitter where aria-hidden elements previously caused incorrect translation grouping, and also prevents translation of screen-reader-only or visually hidden text.
 - **Exclusion of YouTube-specific UI elements and metadata:** Navigation bars, masthead, guide, metadata panels, channel name, comments header, reply/more buttons, badges, and other non-content selectors on www.youtube.com are now excluded from translation. This ensures only real content is translated and prevents broken labels. The following selectors are used for exclusion:
   - `#masthead-container *`, `#guide-inner-content *`, `#metadata *`, `#channel-name`, `.translate-button`, `.yt-lockup-metadata-view-model__metadata`, `.yt-spec-avatar-shape__badge-text`, `.shortsLockupViewModelHostOutsideMetadataSubhead`, `ytd-comments-header-renderer`, `#top-row`, `#header-author`, `#reply-button-end`, `#more-replies`, `#info`, `#badges *`
 - Only relevant content is translated, not entire ancestor nodes.

[Accept] [Decline]

Translation Toggle Logic and Content Detection
View Suggested Changes
@@ -12,6 +12,9 @@
 To find the wrapper for a translated node, `findPreviousTranslatedWrapper(node: Element | Text, walkId: string)` checks if the node itself is a wrapper (with a different walkId) or looks for a wrapper as a child that doesn't match the current walkId. The wrapper element always includes a `data-read-frog-translation-mode` attribute indicating the mode (`bilingual` or `translationOnly`) and a `data-read-frog-walked` attribute for walk tracking.
 
 The system uses additional DOM attributes and classes to label nodes during traversal, such as `WALKED_ATTRIBUTE`, `BLOCK_ATTRIBUTE`, `INLINE_ATTRIBUTE`, and `PARAGRAPH_ATTRIBUTE`, which help distinguish between block and inline nodes and manage translation state.
+
+**Exclusion of Hidden Elements:**
+Elements that are visually hidden or marked as not intended for user visibility are now explicitly excluded from translation and text extraction. This includes elements with the `sr-only` or `visually-hidden` classes, elements with `aria-hidden="true"`, elements with `display: none` or `visibility: hidden` styles, and certain tags like `<script>`. These elements are detected using the `isDontWalkIntoAndDontTranslateAsChildElement` function and are skipped during both translation and text extraction. This ensures that only visible, user-facing content is translated.
 
 ### Differentiation Between Original and Translated Text
 - **Bilingual mode**: Original text remains in the DOM. Translated text is wrapped in a `<span>` element with the `NOTRANSLATE_CLASS` and `CONTENT_WRAPPER_CLASS` classes, and the `data-read-frog-translation-mode="bilingual"` attribute. Inside this wrapper, the translated content itself is further classified as inline or block using `INLINE_CONTENT_CLASS` or `BLOCK_CONTENT_CLASS`.
@@ -43,6 +46,9 @@
 ### Test Coverage
 Unit and integration tests cover the detection logic for translated content nodes and wrappers, as well as the new translation-only mode and complex DOM structures. The `isTranslatedContentNode` function is tested to ensure it returns true for elements with the correct classes and false otherwise, including for text nodes. The `findPreviousTranslatedWrapper` function is tested to verify it finds the correct wrapper for translated nodes, returns null for non-translated content, and correctly traverses multiple parent levels. The translation-only mode is tested to ensure that original content is restored correctly when toggled off, and that the wrapper and attributes are set as expected. Integration tests now cover mixed inline/block/nested structures and switching between translation modes. Numeric content exclusion and `<pre>` tag exclusion are also covered to ensure numbers and preformatted/code blocks are not translated or wrapped.
 
+**Hidden Element Exclusion Tests:**
+Additional unit tests verify that elements with the `sr-only` or `visually-hidden` classes, elements with `aria-hidden="true"`, and similar hidden elements are excluded from translation and text extraction. This ensures that only visible content is processed, preventing hidden or accessibility-only content from being translated or included in extracted text.
+
 ### User Experience Improvements
 A translation mode selector is now available in the popup UI, allowing users to easily switch between bilingual and translation-only modes. The selector uses localized labels and provides tooltips for additional guidance. If a user selects a mode that is not supported by the current provider, the system automatically switches to a compatible provider, ensuring seamless operation and reducing the likelihood of errors. This smart fallback mechanism improves reliability and user confidence in the translation feature.
 

[Accept] [Decline]

Note: You must be authenticated to accept/decline updates.

How did I do? Any feedback?  Join Discord

@claude
Copy link

claude bot commented Feb 10, 2026

PR Review: fix(extension): fix hidden elements being translated

Summary

This PR correctly moves sr-only detection from isDontWalkIntoButTranslateAsChildElement to isDontWalkIntoAndDontTranslateAsChildElement, and adds visually-hidden detection. This fixes #927 where screen-reader-only text on sites like openai.com was being picked up by the translator.

The logic change is sound: sr-only / visually-hidden elements should be fully excluded (don't walk, don't translate), not just "don't walk but still translate as child".


Code Quality

Good:

  • Clean, minimal diff — only changes what's needed.
  • Tests cover both filter functions and the integration through extractTextContent.
  • Changeset included.

Suggestions:

  1. isDontWalkIntoAndDontTranslateAsChildElement — redundant getComputedStyle calls (filter.ts:127-128):

    const dontWalkCSS
      = window.getComputedStyle(element).display === 'none'
        || window.getComputedStyle(element).visibility === 'hidden'

    This calls getComputedStyle twice per element. Not introduced by this PR, but since you're touching this function, consider caching it:

    const style = window.getComputedStyle(element)
    const dontWalkCSS = style.display === 'none' || style.visibility === 'hidden'

    getComputedStyle can trigger layout/style recalculation — calling it once per element instead of twice is a meaningful win given this runs on every element during DOM traversal.

  2. Class name detection is brittle for sr-only / visually-hidden:

    These are CSS convention class names (Tailwind's sr-only, Bootstrap's visually-hidden). Some sites may use variations like screen-reader-only, screenreader-text, clip-hide, or WordPress's screen-reader-text. The current approach handles the two most common conventions, which is pragmatic, but worth noting as a known limitation. May be worth a code comment documenting this tradeoff.

    Additionally, some frameworks apply the visually-hidden pattern via inline styles (e.g., clip: rect(0,0,0,0); position: absolute; width: 1px; height: 1px) without using class names. The existing getComputedStyle checks for display:none / visibility:hidden won't catch these. This is an edge case for a potential follow-up, not a blocker for this PR.


Potential Bugs / Issues

  1. display:none / visibility:hidden tests may not work in jsdom:

    The test for isDontWalkIntoAndDontTranslateAsChildElement doesn't test the CSS-based checks (display:none, visibility:hidden). This is understandable since jsdom's getComputedStyle doesn't resolve CSS properly — but it means those code paths aren't covered by tests. Worth adding a comment noting this limitation, or exploring happy-dom which has better CSS support.

  2. Eager evaluation negates short-circuit benefit (filter.ts:122-133):

    return dontWalkCustomElement || dontWalkContent || dontWalkInvalidTag || dontWalkCSS || dontWalkAriaHidden || dontWalkVisuallyHidden

    All checks (including the expensive getComputedStyle) are eagerly evaluated before the || chain, so the short-circuit doesn't save any work. For a real perf win, the expensive getComputedStyle call should be moved after the cheap checks using early returns or lazy evaluation. Not a blocker, but worth considering as a follow-up since this function runs on every element during traversal.


Security

No security concerns. The changes are purely about DOM filtering logic.


Test Coverage

  • Filter function tests: Good coverage for the new behavior. Tests confirm sr-only and visually-hidden are excluded from isDontWalkIntoButTranslateAsChildElement and included in isDontWalkIntoAndDontTranslateAsChildElement.
  • Traversal integration tests: Good coverage verifying extractTextContent skips hidden element text.
  • Missing: No test for display:none or visibility:hidden elements (jsdom limitation). No test for aria-hidden="false" to confirm it doesn't trigger exclusion.

Verdict

Clean, well-scoped fix. The core logic change (moving sr-only from "translate as child" to "don't translate") is correct. The performance suggestions (caching getComputedStyle, lazy evaluation of expensive checks) are optional improvements, not blockers.

LGTM with minor suggestions.

@dosubot dosubot bot added the app: browser extension Related to browser extension label Feb 10, 2026
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 4 files

@Yukiniro Yukiniro assigned Yukiniro and unassigned Yukiniro Feb 10, 2026
@Yukiniro Yukiniro requested a review from mengxi-ream February 10, 2026 15:29
@claude
Copy link

claude bot commented Feb 15, 2026

PR Review: fix(extension): fix hidden elements being translated

Overview

Moves sr-only from isDontWalkIntoButTranslateAsChildElement to isDontWalkIntoAndDontTranslateAsChildElement and adds visually-hidden detection. This correctly reclassifies screen-reader-only elements: they should be fully excluded from traversal and translation, not treated as "don't walk but still translate as child."

Fixes #927 (OpenAI blog hidden text being translated).


Code Quality

Strengths:

  • Minimal, focused diff — 4 files, +93/-5 lines. Only touches what's needed.
  • Correct semantic change: sr-only elements belong in the "don't translate" category, not the "translate as child" category.
  • Good test additions covering both the filter functions and the extractTextContent integration path.
  • Changeset included.

Issues & Suggestions

1. Redundant getComputedStyle calls (filter.ts:127-128)

const dontWalkCSS
  = window.getComputedStyle(element).display === 'none'
    || window.getComputedStyle(element).visibility === 'hidden'

getComputedStyle is called twice per element. This pre-dates the PR, but since the function is being modified:

const style = window.getComputedStyle(element)
const dontWalkCSS = style.display === 'none' || style.visibility === 'hidden'

This matters because isDontWalkIntoAndDontTranslateAsChildElement runs on every element during DOM traversal (walkAndLabelElement) and text extraction (extractTextContent).

2. Eager evaluation defeats short-circuit optimization (filter.ts:122-133)

All six checks are evaluated upfront before the || chain. The expensive operations (getComputedStyle, isCustomDontWalkIntoElement with element.matches()) run even when a cheap check like DONT_WALK_AND_TRANSLATE_TAGS.has(element.tagName) would have returned true. Consider reordering to check cheap conditions first with early returns:

if (DONT_WALK_AND_TRANSLATE_TAGS.has(element.tagName)) return true
if (element.getAttribute('aria-hidden') === 'true') return true
if (['sr-only', 'visually-hidden'].some(cls => element.classList.contains(cls))) return true
// ... then the expensive checks

Not a blocker, but a meaningful perf improvement for a hot path.

3. Class-name-based detection is inherently fragile

sr-only (Tailwind) and visually-hidden (Bootstrap) are the two most common conventions, so this is pragmatic. But sites may also use screen-reader-text (WordPress), screenreader-only, clip-hide, etc. Some apply the visually-hidden pattern via inline styles (clip: rect(0,0,0,0); position: absolute; width: 1px; height: 1px) without any class name — the existing getComputedStyle checks won't catch these.

This is a known limitation, not a blocker. A brief code comment noting the tradeoff would be helpful for future contributors.

4. Missing test edge cases

  • No test for aria-hidden="false" confirming it does not trigger exclusion.
  • No test for display:none / visibility:hidden (understandable — jsdom's getComputedStyle doesn't resolve CSS, so those code paths can't be tested here).
  • Tests for isDontWalkIntoAndDontTranslateAsChildElement don't verify the config.translate.page.range logic path.

Security

No concerns. Changes are purely DOM filtering logic with no external input handling.


Verdict

The core logic change is correct and well-scoped. The suggestions above (caching getComputedStyle, reordering for short-circuit, edge case tests) are optional improvements — none are blockers.

LGTM with minor suggestions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

app: browser extension Related to browser extension fix size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] OpenAI 博客里隐藏的字也会被插件识别出来

1 participant

Comments