Async JSON stream reader for selective parsing of large payloads. This is the
standalone home of Extract's AsyncJsonStreamReader implementation.
- Stream through large JSON without deserializing the full payload.
- Selectively read keys, values, and arrays using a tokenized reader.
- Handles chunk boundaries and escaped strings correctly.
- Built on Tokio
AsyncRead. - No unsafe code.
cargo add asyncjsonstreamuse asyncjsonstream::AsyncJsonStreamReader;
use std::io::Cursor;
#[tokio::main]
async fn main() -> Result<(), asyncjsonstream::AsyncJsonStreamReaderError> {
let data = r#"{"status":"success","results":[{"id":1},{"id":2}]}"#;
let mut reader = AsyncJsonStreamReader::new(Cursor::new(data.as_bytes().to_vec()));
while let Some(key) = reader.next_object_entry().await? {
match key.as_str() {
"status" => {
let status = reader.read_string().await?;
println!("status={status}");
}
"results" => {
while reader.start_array_item().await? {
let obj = reader.deserialize_object().await?;
println!("id={}", obj["id"]);
}
}
_ => {}
}
}
Ok(())
}- Read object entries with
next_object_entry. - Skip values by calling
next_object_entryagain without consuming the value. - Stream arrays with
start_array_item. - Parse string/number/bool with
read_string,read_number,read_boolean. - Deserialize a sub-object with
deserialize_object.
All fallible operations return AsyncJsonStreamReaderError:
Iofor reader failuresJsonErrorfor malformed JSONUnexpectedTokenwhen the stream doesn't match the expected structure
Minimum supported Rust version is 1.74.
The examples folder includes a generator and benchmark for a single large JSON object with a
rows array. This comparison highlights the memory savings when you stream and skip large
fields instead of deserializing full objects.
cargo run --release --example generate_big_object -- \
--path /tmp/big_object.json \
--target-bytes 5368709120 \
--payload-bytes 1024/usr/bin/time -l cargo run --release --example bench_big_object -- \
--path /tmp/big_object.json --mode async
/usr/bin/time -l cargo run --release --example bench_big_object -- \
--path /tmp/big_object.json --mode async-light
/usr/bin/time -l cargo run --release --example bench_big_object -- \
--path /tmp/big_object.json --mode serdeasync deserializes each row into a serde_json::Value (higher memory). async-light only
reads id and skips other fields using tokens (low memory).
| Mode | Rows | Elapsed (ms) | Max RSS (bytes) | Peak footprint (bytes) |
|---|---|---|---|---|
| async | 4,979,433 | 7,432 | 3,320,676,352 | 5,382,197,400 |
| async-light | 4,979,433 | 10,340 | 2,916,352 | 2,146,616 |
| serde | 4,979,433 | 6,662 | 10,902,372,352 | 14,253,713,704 |
Checksums matched across modes, confirming identical id aggregation.
Licensed under either of:
- Apache License, Version 2.0 (
LICENSE-APACHE) - MIT license (
LICENSE-MIT)
at your option.