-
Notifications
You must be signed in to change notification settings - Fork 848
feat(DqElement): Truncate large html strings when the element has a large outerHTML string
#4796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(DqElement): Truncate large html strings when the element has a large outerHTML string
#4796
Conversation
WalkthroughThe truncation logic for serializing DOM elements was refactored. The old simple string truncation was replaced with new functions that intelligently truncate an element's outerHTML, prioritizing attribute shortening and removal. The Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant DqElement
participant Utils
Caller->>DqElement: get source property
DqElement->>Utils: truncateElement(element, maxLength, maxAttributeLength)
Utils->>Utils: getOuterHtml(element)
Utils->>Utils: Truncate attributes if needed
Utils->>Utils: Truncate tag if needed
Utils-->>DqElement: truncated outerHTML string
DqElement-->>Caller: truncated source string
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
npm warn skipping integrity check for git dependency ssh://[email protected]/w3c/aria-practices.git 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
⏰ Context from checks skipped due to timeout of 90000ms (1)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (4)
lib/core/utils/dq-element.js (2)
48-54: Two consecutive substrings can leave the result >maxLength
substring(0, maxLength) + '...>'may exceedmaxLengthby four characters, and the subsequent block only trims after the first>which might still leave the string longer than requested. Consolidate the logic so the final string length (including the ellipsis) never exceedsmaxLength.
64-65: ExposemaxLength/maxAttributeLengthas parameters
getSourcehard-codes the defaulttruncateElementlimits (100 / 20). Call-sites cannot request a looser or stricter limit, which reduces reusability. Consider forwarding run-options or accepting optional arguments so consumers can tune truncation for their context.test/core/utils/dq-element.js (2)
84-96: Test asserts are brittle – rely on “contains” instead of full equalityThese expectations hard-code the exact string produced by the truncation algorithm (including spacing and the order of attributes). A minor implementation change (e.g. different attribute order) will break the test while the functionality is still correct.
Use
assert.match(result.source, /<div id="target".*attribute1="value1".*...>/)orassert.includeto test the behaviour (truncation & ellipsis) without coupling to the precise formatting.
98-104: Same brittleness for long-tag testAs above, asserting the full 103-character string makes the test fragile. Instead verify that:
- The result starts with
<bbbb- Contains an ellipsis before the closing
>- Is shorter than the original tag name.
This keeps the test intention clear while accommodating harmless internal changes.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Knowledge Base: Disabled due to Reviews > Disable Knowledge Base setting
📒 Files selected for processing (2)
lib/core/utils/dq-element.js(1 hunks)test/core/utils/dq-element.js(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Security Review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (1)
lib/core/utils/dq-element.js (1)
49-68: Hot loop serialises on every iterationWithin the
forloop you callcreateTruncatedAttributeOuterHtmleach time (line 64), causing full serialisation + string ops per attribute.
Complex elements with dozens of attributes can degrade to quadratic time.A quick win:
- if ( - createTruncatedAttributeOuterHtml(copyNode, attributesToTruncate).indexOf( - '>' - ) < maxLength - ) + const currentLength = + createTruncatedAttributeOuterHtml(copyNode, attributesToTruncate) + .indexOf('>'); + if (currentLength < maxLength) {break;}…but cache the serialised string once per loop iteration to avoid recomputing it twice.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Knowledge Base: Disabled due to Reviews > Disable Knowledge Base setting
📒 Files selected for processing (2)
lib/core/utils/dq-element.js(1 hunks)test/core/utils/dq-element.js(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- test/core/utils/dq-element.js
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Security Review
🔇 Additional comments (1)
lib/core/utils/dq-element.js (1)
70-74: Safe DOM mutation – good job!👍 Attributes are now removed from the cloned node (
copyNode) rather than the live DOM, eliminating the side-effects highlighted in the earlier review.
lib/core/utils/dq-element.js
Outdated
| if (modifiedCurrentStr.indexOf('>') > maxLength) { | ||
| modifiedCurrentStr = modifiedCurrentStr.substring(0, maxLength) + '...>'; | ||
| } | ||
|
|
||
| if (modifiedCurrentStr.length > maxLength) { | ||
| modifiedCurrentStr = modifiedCurrentStr.substring( | ||
| 0, | ||
| modifiedCurrentStr.indexOf('>') + 1 | ||
| ); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second truncation pass still assumes “>`” is present
modifiedCurrentStr.indexOf('>') may return –1.
The subsequent substring(0, idx + 1) will then slice from position 0 to 0, yielding an empty string.
Add a guard similar to the one suggested above or default to maxLength when the delimiter is absent.
🤖 Prompt for AI Agents
In lib/core/utils/dq-element.js around lines 81 to 90, the second truncation
logic assumes the '>' character is always present, but indexOf('>') can return
-1, causing substring to produce an empty string. Fix this by adding a guard to
check if indexOf('>') is -1; if so, use maxLength as the substring end index
instead of indexOf('>') + 1.
lib/core/utils/dq-element.js
Outdated
| let intermediateAttrString = ''; | ||
| const tagNameEnd = str.indexOf('>'); | ||
|
|
||
| for (const truncatedAttr of truncatedAttributeSet) { | ||
| intermediateAttrString += `${truncatedAttr.substring(0, 20)}... `; | ||
| } | ||
|
|
||
| str = | ||
| str.substring(0, tagNameEnd) + | ||
| ' ' + | ||
| intermediateAttrString + | ||
| ' ' + | ||
| str.substring(tagNameEnd); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tagNameEnd === -1 breaks substring maths
If outerHTML lacks a ">" (e.g. serialisation error, svg self-closing tags when pretty-printed without a >), indexOf returns -1, turning subsequent substring calls into unexpected results or empty strings.
Guard before use:
- const tagNameEnd = str.indexOf('>');
+ const tagNameEnd = str.indexOf('>');
+ if (tagNameEnd === -1) {
+ return str; // nothing to splice into
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let intermediateAttrString = ''; | |
| const tagNameEnd = str.indexOf('>'); | |
| for (const truncatedAttr of truncatedAttributeSet) { | |
| intermediateAttrString += `${truncatedAttr.substring(0, 20)}... `; | |
| } | |
| str = | |
| str.substring(0, tagNameEnd) + | |
| ' ' + | |
| intermediateAttrString + | |
| ' ' + | |
| str.substring(tagNameEnd); | |
| let intermediateAttrString = ''; | |
| const tagNameEnd = str.indexOf('>'); | |
| if (tagNameEnd === -1) { | |
| return str; // nothing to splice into | |
| } | |
| for (const truncatedAttr of truncatedAttributeSet) { | |
| intermediateAttrString += `${truncatedAttr.substring(0, 20)}... `; | |
| } | |
| str = | |
| str.substring(0, tagNameEnd) + | |
| ' ' + | |
| intermediateAttrString + | |
| ' ' + | |
| str.substring(tagNameEnd); |
🤖 Prompt for AI Agents
In lib/core/utils/dq-element.js around lines 33 to 45, the code uses
str.indexOf('>') without checking if the character exists, which can cause
substring operations to behave incorrectly if tagNameEnd is -1. Add a guard
condition to check if tagNameEnd is -1 before using it in substring calls, and
handle that case appropriately to avoid breaking the string manipulation logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (4)
lib/core/utils/dq-element.js (4)
89-94: Redundant outerHTML gate-keeper ingetSourcePrior feedback pointed out that this duplicates logic handled inside
truncateElement; the early exit again prevents fallback for nodes withoutouterHTML.
Same fix: drop the guard and delegate directly.
77-84:⚠️ Potential issueSecond pass can still slice with
endIndex === -1
endIndex = currElementStr.indexOf('>')may be -1.substring(0, endIndex + 1)would then return an empty string, discarding the entire element.Add the same sentinel check used above:
- const endIndex = currElementStr.indexOf('>'); - currElementStr = currElementStr.substring(0, endIndex + 1); + const endIndex = currElementStr.indexOf('>'); + if (endIndex !== -1) { + currElementStr = currElementStr.substring(0, endIndex + 1); + }
38-50:⚠️ Potential issue
openingTagEndIndex === -1corrupts the resulting stringWhen
>is absent (e.g. malformed outerHTML or self-closing pretty-printed SVG),indexOf('>')returns -1. Substringing with -1 silently converts to 0, yielding an opening tag that begins with a space and no closing bracket.Guard before slicing:
- const openingTagEndIndex = str.indexOf('>'); + const openingTagEndIndex = str.indexOf('>'); + if (openingTagEndIndex === -1) { + return str; // nothing to inject into + }
11-22:⚠️ Potential issueEarly return blocks the XMLSerializer fallback – outerHTML-less nodes still stringify to empty
The
if (!element?.outerHTML) { return ''; }check exits before the XMLSerializer branch can execute, re-introducing the limitation called out in an earlier review.
DocumentFragments, SVG nodes in some browsers, or test doubles withoutouterHTMLwill again yield an empty string even though they are perfectly serialisable.-function getOuterHtml(element) { - if (!element?.outerHTML) { - return ''; - } - - let source = element.outerHTML; - - if (!source && typeof window.XMLSerializer === 'function') { - source = new window.XMLSerializer().serializeToString(element); - } - - return source || ''; +function getOuterHtml(element) { + if (!element) { + return ''; + } + + // Prefer native outerHTML when present, otherwise fall back + let source = element.outerHTML ?? ''; + + if (!source && typeof window !== 'undefined' && typeof window.XMLSerializer === 'function') { + source = new window.XMLSerializer().serializeToString(element); + } + + return source; }
🧹 Nitpick comments (1)
lib/core/utils/dq-element.js (1)
30-33: Unnecessary second clone doubles CPU & memory usage
truncateElementalready passes a deep clone; cloning again here (element.cloneNode(true)) adds O(N) overhead inside a hot path. Use the supplied node directly or allow the caller to signal when cloning is safe to skip.- const copyNode = element.cloneNode(true); - - let str = getOuterHtml(copyNode); + // Caller supplies an isolated copy – avoid re-cloning. + const nodeForSerialization = element; + let str = getOuterHtml(nodeForSerialization);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Knowledge Base: Disabled due to Reviews > Disable Knowledge Base setting
📒 Files selected for processing (1)
lib/core/utils/dq-element.js(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Security Review
fb41eb9 to
9629adb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add a test for excessively long attribute names? Something like:
const el = document.createElement('div')
let attributeName = 'data-'
for (let i = 0; i < 10000000; i++) {
attributeName += 'foo'
}
el.setAttribute(attributeName, 'bar')I don't know what the expected behavior will be here, but I'm sure this is a case we'll want to handle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. For excessively long attribute names, it will truncate it to the max attribute length, which here would be 20 characters. So the result for this would be
<div data-foofoofoofoofoo... ></div>
Is this acceptable ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense to me 👍
I'll bring this up internally and see what the rest of the team thinks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. @straker can you check to make sure a change like this doesn't break anyone's deduplication code? Otherwise this may need to go behind an option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__
straker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the pr! I'll go confirm with other teams about their duplication strategy and make sure source isn't one of the properties they use. Overall the code and is good. Below I've left some comments and suggestions, mostly focused around using element.attributes clobbering and the use of > to detect the end of the opening tag.
|
Thanks for the changes. I'm unable to look at this this week but will try to look at it next week. |
straker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes. Seeing the changes had me think about a few edge cases that we should make sure we all agree on so pining my colleagues about it.
lib/core/utils/dq-element.js
Outdated
| } | ||
|
|
||
| if (attributeStrList.length < elementNodeMap.length) { | ||
| str = `<${shallowNode.nodeName.toLowerCase()} ${attributeStrList.join(' ')} ...>`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the document is in XHTML we shouldn't lowercase the node name (see virtual node class).
lib/core/utils/dq-element.js
Outdated
| } | ||
|
|
||
| if (str.length > maxLen) { | ||
| str = str.substring(0, maxLen) + '... >'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This leaves an odd edge case where if there are less attributeStr attributes than elementNodeMap attributes, you end the string with .... However if that final string is then longer than the maxLen you also add 3 more dots. So if the first string with the 3 dots is say 2 characters over the maxLen, you add more dots and end up with more than 3 dots at the end.
E.g.
<button aria-label="Wednesday the 14th" data-test="Lorem ipsum dolor sit amet, consectetur adipiscing elit." data-test1="Lorem ipsum dolor sit amet, consectetur adipiscing elit." data-test2="Lorem ipsum dolor sit amet, consectetur adipiscing elit." data-test3="Lorem ipsum dolor sit amet, consectetur adipiscing elit." data-test4="Lorem ipsum dolor sit amet, consectetur adipiscing elit." data-test5="Lorem ipsum dolor sit amet, consectetur adipiscing elit." data-test678910="Lorem ipsum dolor sit amet, consectetur adipiscing elit." data-test7="Lorem ipsum dolor sit amet, consectetur adipiscing elit.">Wed the 14th</button><button aria-label=\"Wednesday the 14th\" data-test=\"Lorem ipsum dolor si...\" data-test1=\"Lorem ipsum dolor si...\" data-test2=\"Lorem ipsum dolor si...\" data-test3=\"Lorem ipsum dolor si...\" data-test4=\"Lorem ipsum dolor si...\" data-test5=\"Lorem ipsum dolor si...\" data-test678910=\"Lorem ipsum dolor si..... >…DqElement Fixed truncation logic for large outerHTML for the the element itself Closes issue dequelabs#4544
- Added space consistency for the first attribute - Increased maxlength for the entire outerHTML to 3 times that of the current element
Earlier, was having quadratic time for truncating outerHTML, reduced it to linear. Changed attribute truncating logic to truncate the attribute name and value separately instead of completely ignoring the value.
… test more readable
- Uses Vnode to have support for Xhtml and Html - Attributes will not be truncated if the element is within limits buts the children cause longer element - Truncating ellipsis now handles edge case of uneven number of dots
Handles an edge case where element itself is small and a void element but children are longer
1adb7a5 to
2d92af4
Compare
|
Thanks for the changes. Unfortunately something in those changes broke a bunch of the tests. Double unfortunately since you forced pushed I'm unable to see the newest changes compared to prior changes, so I don't know exactly what you changed that caused the problem, nor can I revert to the point where the code was passing. |
|
I tried running the tests from the develop branch locally, and they were failing over there too (the same areas). Can there be an issue in that ? |
|
Develop is passing on my end. The way our tests are set up you have to build first before you test, otherwise the changes won't be tested. |
|
My mistake, I was looking at the wrong place 😅. I have debugged the issue now. |
|
This all looks good, thanks for the updates. I'm trying to finalize feedback from other teams about changing the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry this took so long, took a while to get approval from all the other teams involved. They confirmed this change shouldn't break deduplication so can be merged.
Reviewed for security
html strings when the element has a large outerHTML string
html strings when the element has a large outerHTML stringhtml strings when the element has a large outerHTML string
sierra88-en
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
List
sierra88-en
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__
## [4.11.0](v4.10.3...v4.11.0) (2025-10-07) ### Features - add RGAA tags to rules ([#4862](#4862)) ([53a925a](53a925a)) - **aria-prohibited-attr:** add support for fallback roles ([#4325](#4325)) ([62a19a9](62a19a9)) - **axe.d.ts:** add nodeSerializer typings ([#4551](#4551)) ([a2f3a48](a2f3a48)), closes [#4093](#4093) - **DqElement:** deprecate fromFrame function ([#4881](#4881)) ([374c376](374c376)), closes [#4093](#4093) - **DqElement:** Truncate large `html` strings when the element has a large outerHTML string ([#4796](#4796)) ([404a4fb](404a4fb)), closes [#4544](#4544) - **get-xpath:** return proper relative selector for id ([#4846](#4846)) ([1035f9e](1035f9e)), closes [#4845](#4845) - **i18n:** Add Portugal Portuguese translation ([#4725](#4725)) ([5b6a65a](5b6a65a)) - incomplete with node on which an error occurred ([#4863](#4863)) ([32ed8da](32ed8da)) - **locale:** Added ru locale ([#4565](#4565)) ([067b01d](067b01d)) - **tap:** some best practice rules map to RGAA ([#4895](#4895)) ([bc33f4c](bc33f4c)) - **td-headers-attr:** report headers attribute referencing other <td> elements as unsupported ([#4589](#4589)) ([ec7c6c8](ec7c6c8)), closes [#3987](#3987) ### Bug Fixes - **aria-allowed-role:** add form to allowed roles of form element ([#4588](#4588)) ([8aa47ac](8aa47ac)), closes [/github.com/dequelabs/axe-core/blob/develop/lib/standards/html-elms.js#L264](https://github.com/dequelabs//github.com/dequelabs/axe-core/blob/develop/lib/standards/html-elms.js/issues/L264) - **aria-allowed-role:** Add math to allowed roles for img element ([#4658](#4658)) ([95b6c18](95b6c18)), closes [#4657](#4657) - **autocomplete-valid :** Ignore readonly autocomplete field ([#4721](#4721)) ([491f4ec](491f4ec)), closes [#4708](#4708) - **autocomplete-valid:** treat values "xon" and "xoff" as non-WCAG-violations ([#4878](#4878)) ([52bc611](52bc611)), closes [#4877](#4877) - **axe.d.ts:** add typings for preload options object ([#4543](#4543)) ([cfd2974](cfd2974)) - **button-name,input-button-name,input-img-alt:** allow label to give accessible name ([#4607](#4607)) ([a9710d7](a9710d7)), closes [#4472](#4472) [#3696](#3696) [#3696](#3696) - **captions:** fix grammar in captions check incomplete message ([#4661](#4661)) ([11de515](11de515)) - **color-contrast:** do not run on elements with font-size: 0 ([#4822](#4822)) ([d77c885](d77c885)), closes [#4820](#4820) - consistently parse tabindex, following HTML 5 spec ([#4637](#4637)) ([645a850](645a850)), closes [#4632](#4632) - **core:** measure perf for async checks ([#4609](#4609)) ([7e9bacf](7e9bacf)) - fix grammar when using "alternative text" in a sentence ([#4811](#4811)) ([237a586](237a586)), closes [#4394](#4394) - **get-ancestry:** add nth-child selector for multiple siblings of shadow root ([#4606](#4606)) ([1cdd6c3](1cdd6c3)), closes [#4563](#4563) - **get-ancestry:** don't error when there is no parent ([#4617](#4617)) ([a005703](a005703)) - **locale:** fix typos in japanese (ja) locale ([#4856](#4856)) ([3462cc5](3462cc5)) - **locale:** fixed typos in german (DE) locale ([#4631](#4631)) ([b7736de](b7736de)) - **locale:** proofread and updated de.json ([#4643](#4643)) ([8060ada](8060ada)) - **meta-viewport:** lower impact to moderate ([#4887](#4887)) ([2f32aa5](2f32aa5)), closes [#4714](#4714) - **no-autoplay-audio:** don't timeout for preload=none media elements ([#4684](#4684)) ([cdc871e](cdc871e)) - **performanceTimer:** throwing in axe catch clause ([#4852](#4852)) ([a4ade04](a4ade04)), closes [/github.com/dequelabs/axe-core/blob/e7dae4ec48cbfef74de9f833fdcfb178c1002985/lib/core/base/rule.js#L297-L300](https://github.com/dequelabs//github.com/dequelabs/axe-core/blob/e7dae4ec48cbfef74de9f833fdcfb178c1002985/lib/core/base/rule.js/issues/L297-L300) - **performanceTimer:** work in frames ([#4834](#4834)) ([d7dfebc](d7dfebc)) - **rules:** Change "alternate text" to "alternative text" ([#4582](#4582)) ([b03ada3](b03ada3)) - **target-size:** do not treat focusable tabpanels as targets ([#4702](#4702)) ([60d11f2](60d11f2)), closes [#4421](#4421) [#4701](#4701) - **type:** correct RuleError type ([#4893](#4893)) ([d1aa8e2](d1aa8e2)) - **types:** correct raw types ([#4903](#4903)) ([3eade11](3eade11)) This PR was opened by a robot 🤖 🎉
feat(DqElement): Fixed truncation logic for large outerHTML for the the element itself
Changed truncation logic to also truncate the element of DqElement's source, while preserving the earlier truncation logic.
Closes: Issue #4544