Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented May 21, 2025

As @mapleFU pointed out, the binary vale of primtive_int64 actually contains an int32 as it appears Spark truncates variant values to the smallest type that will fit it

Changes

  1. Update the regeneration script
  2. Rerun the script
  3. Check in the results

I also manually verified the output binary is correct:

$ xxd primitive_int64.value
00000000: 1815 81e9 7df4 1022 11                   ....}..".

we see the first byte is 0x18

the first byte 0x18 is 0b00011000

  • low 2 bits are 0b00 => Primitive type
  • high 6 bits are 0b000110 ==> 6

Per the encoding grammar for Variant basic types) table, primitive_type = 6 corresponds to int64:

Exact Numeric int8 3 INT(8, signed) 1 byte
Exact Numeric int16 4 INT(16, signed) 2 byte little-endian
Exact Numeric int32 5 INT(32, signed) 4 byte little-endian
Exact Numeric int64 6 INT(64, signed) 8 byte little-endian

@alamb alamb changed the title Fix Variant int64 example Fix Variant int64 example binary data May 21, 2025
Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch @mapleFU, and thanks for the fix @alamb 🙌

@mapleFU mapleFU merged commit 107b366 into apache:master May 22, 2025
@alamb alamb deleted the alamb/fix_primitive_int64 branch May 22, 2025 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

primitive_int64.value maybe an int32 type

3 participants