gosax is a Go library for XML SAX (Simple API for XML) parsing, supporting read-only functionality. This library is
designed for efficient and memory-conscious XML parsing, drawing inspiration from various sources to provide a
performant parser.
- Read-only SAX parsing: Stream and process XML documents without loading the entire document into memory.
- Efficient parsing: Utilizes techniques inspired by
quick-xmlandpkg/jsonfor high performance. - SWAR (SIMD Within A Register): Optimizations for fast text processing, inspired by
memchr. - Compatibility with encoding/xml: Includes utility functions to bridge
gosaxtypes withencoding/xmltypes, facilitating easy integration with existing code that uses the standard library.
goos: darwin
goarch: arm64
pkg: github.com/orisano/gosax
BenchmarkReader_Event-12 5 211845800 ns/op 1103.30 MB/s 2097606 B/op 6 allocs/op
To install gosax, use go get:
go get github.com/orisano/gosaxHere is a basic example of how to use gosax to parse an XML document:
package main
import (
"fmt"
"log"
"strings"
"github.com/orisano/gosax"
)
func main() {
xmlData := `<root><element>Value</element></root>`
reader := strings.NewReader(xmlData)
r := gosax.NewReader(reader)
for {
e, err := r.Event()
if err != nil {
log.Fatal(err)
}
if e.Type() == gosax.EventEOF {
break
}
fmt.Println(string(e.Bytes))
}
// Output:
// <root>
// <element>
// Value
// </element>
// </root>
}Important Note for encoding/xml Users:
When migrating from
encoding/xmltogosax, note that self-closing tags are handled differently. To mimicencoding/xmlbehavior, setgosax.Reader.EmitSelfClosingTagtotrue. This ensures self-closing tags are recognized and processed correctly.
If you are used to encoding/xml's Token, start with gosax.TokenE.
Note: Using gosax.TokenE and gosax.Token involves memory allocation due to interfaces.
Before:
var dec *xml.Decoder
for {
tok, err := dec.Token()
if err == io.EOF {
break
}
// ...
}After:
var dec *gosax.Reader
for {
tok, err := gosax.TokenE(dec.Event())
if err == io.EOF {
break
}
// ...
}xmlb is an extension for gosax to simplify rewriting code from encoding/xml. It provides a higher-performance bridge for XML parsing and processing.
Before:
var dec *xml.Decoder
for {
tok, err := dec.Token()
if err == io.EOF {
break
}
switch t := tok.(type) {
case xml.StartElement:
// ...
case xml.CharData:
// ...
case xml.EndElement:
// ...
}
} After:
var dec *xmlb.Decoder
for {
tok, err := dec.Token()
if err == io.EOF {
break
}
switch tok.Type() {
case xmlb.StartElement:
t, _ := tok.StartElement()
// ...
case xmlb.CharData:
t, _ := tok.CharData()
// ...
case xmlb.EndElement:
t := tok.EndElement()
// ...
}
} This library is licensed under the terms specified in the LICENSE file.
gosax is inspired by the following projects and resources:
- Dave Cheney's GopherCon SG 2023 Talk
- quick-xml
- memchr (SWAR part)
Contributions are welcome! Please fork the repository and submit pull requests.
For any questions or feedback, feel free to open an issue on the GitHub repository.