This repository was archived by the owner on Mar 24, 2025. It is now read-only.
Commit b79b1a9
committed
Add missed other default case when parsing/inferring XML documents
This PR adds the support for skipping multiple white spaces around a comment.
This should have been added but missed. As `XMLStreamConstants.COMMENT` is always skipped [here](https://github.com/databricks/spark-xml/blob/master/src/main/scala/com/databricks/spark/xml/parsers/StaxXmlParser.scala#L51-L52) and [here](https://github.com/databricks/spark-xml/blob/master/src/main/scala/com/databricks/spark/xml/util/InferSchema.scala#L85-L86) but it seems it is possible to have the `COMMENT` is between white spaces.
In this case, `factory.setProperty(XMLInputFactory.IS_COALESCING, true)` does not coalesce both white spaces.
In more details,
```xml
<a>
<!-- comment -->
<b>...</b>
</a>
```
in this case, `<!--comment -->` is surrounded with whitespaces.
This produces the events as blow:
```bash
XMLStreamConstants.CHARACTERS # whitespace
XMLStreamConstants.COMMENT # comment
XMLStreamConstants.CHARACTERS # whitespace
XMLStreamConstants.START_ELEMENT # <b>
```
Current codes always filter `XmlEvent.COMMENT` so it ends up with
```bash
XMLStreamConstants.CHARACTERS # whitespace
XMLStreamConstants.CHARACTERS # whitespace
XMLStreamConstants.START_ELEMENT # <b>
```
which does not happen in normal cases because we are coalescing multiple `XMLStreamConstants.CHARACTERS` into single one as below:
```bash
XMLStreamConstants.CHARACTERS # whitespace
XMLStreamConstants.START_ELEMENT # <b>
```
Author: hyukjinkwon <[email protected]>
Closes #166 from HyukjinKwon/missed-other-cases.1 parent 617a31e commit b79b1a9
File tree
3 files changed
+6
-0
lines changed- src
- main/scala/com/databricks/spark/xml
- parsers
- util
- test/resources
3 files changed
+6
-0
lines changedLines changed: 1 addition & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
112 | 112 | | |
113 | 113 | | |
114 | 114 | | |
| 115 | + | |
115 | 116 | | |
116 | 117 | | |
117 | 118 | | |
| |||
Lines changed: 1 addition & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
143 | 143 | | |
144 | 144 | | |
145 | 145 | | |
| 146 | + | |
146 | 147 | | |
147 | 148 | | |
148 | 149 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
| 5 | + | |
5 | 6 | | |
6 | 7 | | |
7 | 8 | | |
| |||
10 | 11 | | |
11 | 12 | | |
12 | 13 | | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
13 | 17 | | |
14 | 18 | | |
15 | 19 | | |
| |||
0 commit comments