Skip to content

StackOverflowError: Element.data() is executed recursively without checks #1864

@biecho

Description

@biecho

Hi there,

The implementation of the Element.data() function listed below is executed recursively in an unsafe manner.

    public String data() {
        StringBuilder sb = StringUtil.borrowBuilder();

        for (Node childNode : childNodes) {
            if (childNode instanceof DataNode) {
                DataNode data = (DataNode) childNode;
                sb.append(data.getWholeData());
            } else if (childNode instanceof Comment) {
                Comment comment = (Comment) childNode;
                sb.append(comment.getData());
            } else if (childNode instanceof Element) {
                Element element = (Element) childNode;
                String elementData = element.data();
                sb.append(elementData);
            } else if (childNode instanceof CDataNode) {
                // this shouldn't really happen because the html parser won't see the cdata as anything special when parsing script.
                // but incase another type gets through.
                CDataNode cDataNode = (CDataNode) childNode;
                sb.append(cDataNode.getWholeText());
            }
        }
        return StringUtil.releaseBuilder(sb);
    }

The recursion is not checked for depth and can lead to a StackOverflowError if there are enough nested children.

I suggest to handle this case similar to Element.text() where a NodeTraversor is used to compute the result iteratively.

Another approach could be something along the lines:

    public String data() {
		StringBuilder sb = StringUtil.borrowBuilder();

		var descendants = new ArrayDeque<>(childNodes);
		while (!descendants.isEmpty()) {
			var descendantNode = descendants.pollFirst();
			if (descendantNode instanceof DataNode) {
				DataNode data = (DataNode) descendantNode;
				sb.append(data.getWholeData());
			}
			else if (descendantNode instanceof Comment) {
				Comment comment = (Comment) descendantNode;
				sb.append(comment.getData());
			}
			else if (descendantNode instanceof Element) {
				Element element = (Element) descendantNode;
				// We must visit the first child on the list next.
				// If this, in turn, has children, his children are visited next, and so on.
				// Therefore, we have to append the child nodes of this parent backward to the front of the queue.
				var childNodes = element.childNodes();
				for (int i = childNodes.size() - 1; i > 0; i--) {
					descendants.addFirst(childNodes.get(i));
				}
			}
			else if (descendantNode instanceof CDataNode) {
				// this shouldn't really happen because the html parser won't see the cdata as anything special when parsing script.
				// but incase another type gets through.
				CDataNode cDataNode = (CDataNode) descendantNode;
				sb.append(cDataNode.getWholeText());
			}
		}
		return StringUtil.releaseBuilder(sb);

Metadata

Metadata

Assignees

Labels

fixedAn {bug|improvement} that has been {fixed|implemented}

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions