Skip to content

Conversation

@harshmotw-db
Copy link
Contributor

@harshmotw-db harshmotw-db commented Jun 25, 2025

What changes are proposed in this pull request?

This PR introduces a low-level Variant library with the json_to_variant function which is similar to parse_json in Spark. The function is written in such a way that the caller owns the memory that the output is written to. The caller needs to implement VariantMemoryManager with the methods borrow_value_buffer, borrow_metadata_buffer, ensure_value_buffer_size and ensure_metadata_buffer_size.

How was this change tested?

Several unit tests to manually compare the constructed variants with raw bytes. Implementing variant_to_json should increase coverage and make more tests easier. While the PR currently contains many tests, we will be adding more tests.

TODO:

  1. Test UTF-8 strings with varying character widths.
  2. Test size limit exceeded errors.
  3. More testing on variant objects - nesting, different offset sizes, is_large, keys in different languages etc.
  4. Formalize errors - currently the errors thrown by this library are a little rough.

@github-actions github-actions bot added the breaking-change Change that require a major version bump label Jun 25, 2025
@scovich
Copy link
Collaborator

scovich commented Jun 25, 2025

qq: How does this relate to the ongoing work to support variant in arrow-rs?
https://github.com/apache/arrow-rs/tree/main/parquet-variant

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking-change Change that require a major version bump

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants