Skip to content

Commit 694d585

Browse files
sboshrakundadebdatta
authored andcommitted
[Internal] JSON Binary Encoding: Adds support for encoding uniform arrays (#4866)
## Description Added full end-to-end support for writing and reading binary-encoded uniform number arrays, as well as nested arrays of uniform number arrays. **Uniform Number Arrays** A uniform number array is a JSON array where all items share the same numeric type. The encoding supports the following numeric types: - **Int8**: Signed 1-byte integer (-128 to 127) - **UInt8**: Unsigned 1-byte integer (0 to 255) - **Int16**: Signed 2-byte integer - **Int32**: Signed 4-byte integer - **Int64**: Signed 8-byte integer - **Float16**: 2-byte floating-point value (currently unsupported) - **Float32**: 4-byte floating-point value - **Float64**: 8-byte floating-point value Uniform number arrays are represented by these new type markers: - **ArrNumC1**: Uniform number array with a 1-byte item count - **ArrNumC2**: Uniform number array with a 2-byte item count Both type markers are encoded as follows: `| Type marker | Item type marker | Item count |` To maintain backward compatibility, writing uniform number arrays is controlled via the `EnableNumberArrays `write option. When enabled, at the end of writing an array, the writer checks if all values are numeric. It identifies the smallest numeric type that fits all values and compares the length of the uniform number array to the regular array. If the new length is less than or equal to the old one, the array is converted to a uniform number array. **Arrays of Uniform Number Arrays** This encoding enhancement allows for encoding multiple uniform number arrays with the same underlying numeric type and item count into a single contiguous array of numbers. The items in all arrays are preceded by a prefix indicating the common array encoding and the number of encoded arrays. Arrays of uniform number arrays are supported by these two new type-markers: - **ArrArrNumC1C1**: Array of 1-byte item count of common uniform number arrays with 1-byte item count. - **ArrArrNumC2C2**: Array of 2-byte item count of common uniform number arrays with 2-byte item count. Both new values are encoded as follows: `| Type-marker | Array type-marker | Number type-marker | Number item count | Array item count |` Similar to uniform number arrays, the writing of arrays of uniform number arrays is conditional on the `EnableNumberArrays` write option being specified. This ensures backward compatibility with readers and navigators that do not yet support this encoding. **JSON Serialization Testing** - Introduced a new set of tests for both uniform number arrays and nested arrays of uniform number arrays. - Enhanced the `JsonToken` class to support representation of uniform number array tokens. - Updated `JsonWriterTest` to include additional validation. This now not only checks the expected output but also verifies round-trip consistency across different formats and write options for all three rewrite scenarios: JSON Navigator, JSON Reader - Write All, and JSON Reader - Write Current Token. ## Type of change Please delete options that are not relevant. - [ ] New feature (non-breaking change which adds functionality) ## Closing issues
1 parent d41592f commit 694d585

22 files changed

Lines changed: 9550 additions & 1776 deletions

Microsoft.Azure.Cosmos/src/Json/IJsonBinaryWriterExtensions.cs

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,8 @@ internal interface IJsonBinaryWriterExtensions : IJsonWriter
1010
{
1111
void WriteRawJsonValue(
1212
ReadOnlyMemory<byte> rootBuffer,
13-
ReadOnlyMemory<byte> rawJsonValue,
14-
bool isRootNode,
13+
int valueOffset,
14+
JsonBinaryEncoding.UniformArrayInfo externalArrayInfo,
1515
bool isFieldName);
1616
}
1717
}

Microsoft.Azure.Cosmos/src/Json/IJsonWriter.cs

Lines changed: 52 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,11 @@
44
namespace Microsoft.Azure.Cosmos.Json
55
{
66
using System;
7+
using System.Collections.Generic;
78
using Microsoft.Azure.Cosmos.Core.Utf8;
89

910
/// <summary>
10-
/// Interface for all JsonWriters that know how to write jsons of a specific serialization format.
11+
/// Common interface for all JSON writers that can write JSON in a specific serialization format.
1112
/// </summary>
1213
#if INTERNAL
1314
public
@@ -87,6 +88,54 @@ interface IJsonWriter
8788
/// </summary>
8889
void WriteNullValue();
8990

91+
#region Number Arrays
92+
93+
/// <summary>
94+
/// Writes an array of 8-byte unsigned integer values.
95+
/// </summary>
96+
/// <param name="values">The array of 8-byte unsigned integer values to write.</param>
97+
void WriteNumberArray(IReadOnlyList<byte> values);
98+
99+
/// <summary>
100+
/// Writes an array of 8-byte signed integer values.
101+
/// </summary>
102+
/// <param name="values">The array of 8-byte signed integer values to write.</param>
103+
void WriteNumberArray(IReadOnlyList<sbyte> values);
104+
105+
/// <summary>
106+
/// Writes an array of 16-byte signed integer values.
107+
/// </summary>
108+
/// <param name="values">The array of 16-byte signed integer values to write.</param>
109+
void WriteNumberArray(IReadOnlyList<short> values);
110+
111+
/// <summary>
112+
/// Writes an array of 32-byte signed integer values.
113+
/// </summary>
114+
/// <param name="values">The array of 32-byte signed integer values to write.</param>
115+
void WriteNumberArray(IReadOnlyList<int> values);
116+
117+
/// <summary>
118+
/// Writes an array of 64-byte signed integer values.
119+
/// </summary>
120+
/// <param name="values">The array of 64-byte signed integer values to write.</param>
121+
void WriteNumberArray(IReadOnlyList<long> values);
122+
123+
/// <summary>
124+
/// Writes an array of single-precision floating-point numbers.
125+
/// </summary>
126+
/// <param name="values">The array of single-precision floating-point numbers to write.</param>
127+
void WriteNumberArray(IReadOnlyList<float> values);
128+
129+
/// <summary>
130+
/// Writes an array of double-precision floating-point numbers.
131+
/// </summary>
132+
/// <param name="values">The array of double-precision floating-point numbers to write.</param>
133+
void WriteNumberArray(IReadOnlyList<double> values);
134+
135+
#endregion
136+
137+
#region Extended Types
138+
90139
/// <summary>
91140
/// Writes an single signed byte integer to the internal buffer.
92141
/// </summary>
@@ -141,6 +190,8 @@ interface IJsonWriter
141190
/// <param name="value">The value of the bytes to write.</param>
142191
void WriteBinaryValue(ReadOnlySpan<byte> value);
143192

193+
#endregion
194+
144195
/// <summary>
145196
/// Gets the result of the JsonWriter.
146197
/// </summary>

Microsoft.Azure.Cosmos/src/Json/JsonBinaryEncoding.Enumerator.cs

Lines changed: 66 additions & 70 deletions
Original file line numberDiff line numberDiff line change
@@ -13,57 +13,73 @@ internal static partial class JsonBinaryEncoding
1313
{
1414
public static class Enumerator
1515
{
16-
public static IEnumerable<ReadOnlyMemory<byte>> GetArrayItems(ReadOnlyMemory<byte> buffer)
16+
public static IEnumerable<ArrayItem> GetArrayItems(
17+
ReadOnlyMemory<byte> rootBuffer,
18+
int arrayOffset,
19+
UniformArrayInfo externalArrayInfo)
1720
{
21+
ReadOnlyMemory<byte> buffer = rootBuffer.Slice(arrayOffset);
1822
byte typeMarker = buffer.Span[0];
19-
if (!JsonBinaryEncoding.TypeMarker.IsArray(typeMarker))
23+
24+
UniformArrayInfo uniformArrayInfo;
25+
if (externalArrayInfo != null)
2026
{
21-
throw new JsonInvalidTokenException();
27+
uniformArrayInfo = externalArrayInfo.NestedArrayInfo;
28+
}
29+
else
30+
{
31+
uniformArrayInfo = IsUniformArrayTypeMarker(typeMarker) ? GetUniformArrayInfo(buffer.Span) : null;
2232
}
2333

24-
int firstArrayItemOffset = JsonBinaryEncoding.GetFirstValueOffset(typeMarker);
25-
int arrayLength = JsonBinaryEncoding.GetValueLength(buffer.Span);
26-
27-
// Scope to just the array
28-
buffer = buffer.Slice(0, arrayLength);
29-
30-
// Seek to the first array item
31-
buffer = buffer.Slice(firstArrayItemOffset);
32-
33-
while (buffer.Length != 0)
34+
if (uniformArrayInfo != null)
3435
{
35-
int arrayItemLength = JsonBinaryEncoding.GetValueLength(buffer.Span);
36-
if (arrayItemLength > buffer.Length)
36+
int itemStartOffset = arrayOffset + uniformArrayInfo.PrefixSize;
37+
int itemEndOffset = itemStartOffset + (uniformArrayInfo.ItemSize * uniformArrayInfo.ItemCount);
38+
for (int offset = itemStartOffset; offset < itemEndOffset; offset += uniformArrayInfo.ItemSize)
39+
{
40+
yield return new ArrayItem(offset, uniformArrayInfo);
41+
}
42+
}
43+
else
44+
{
45+
if (!TypeMarker.IsArray(typeMarker))
3746
{
38-
// Array Item got cut off.
3947
throw new JsonInvalidTokenException();
4048
}
4149

42-
// Create a buffer for that array item
43-
ReadOnlyMemory<byte> arrayItem = buffer.Slice(0, arrayItemLength);
44-
yield return arrayItem;
50+
int firstArrayItemOffset = JsonBinaryEncoding.GetFirstValueOffset(typeMarker);
51+
int arrayLength = JsonBinaryEncoding.GetValueLength(buffer.Span);
4552

46-
// Slice off the array item
47-
buffer = buffer.Slice(arrayItemLength);
48-
}
49-
}
53+
// Scope to just the array
54+
buffer = buffer.Slice(0, arrayLength);
5055

51-
public static IEnumerable<Memory<byte>> GetMutableArrayItems(Memory<byte> buffer)
52-
{
53-
foreach (ReadOnlyMemory<byte> readOnlyArrayItem in Enumerator.GetArrayItems(buffer))
54-
{
55-
if (!MemoryMarshal.TryGetArray(readOnlyArrayItem, out ArraySegment<byte> segment))
56+
// Seek to the first array item
57+
buffer = buffer.Slice(firstArrayItemOffset);
58+
59+
while (buffer.Length != 0)
5660
{
57-
throw new InvalidOperationException("failed to get array segment.");
58-
}
61+
int arrayItemLength = JsonBinaryEncoding.GetValueLength(buffer.Span);
62+
if (arrayItemLength > buffer.Length)
63+
{
64+
// Array Item got cut off.
65+
throw new JsonInvalidTokenException();
66+
}
5967

60-
yield return segment;
68+
yield return new ArrayItem(arrayOffset + (arrayLength - buffer.Length), null);
69+
70+
// Slice off the array item
71+
buffer = buffer.Slice(arrayItemLength);
72+
}
6173
}
6274
}
6375

64-
public static IEnumerable<ObjectProperty> GetObjectProperties(ReadOnlyMemory<byte> buffer)
76+
public static IEnumerable<ObjectProperty> GetObjectProperties(
77+
ReadOnlyMemory<byte> rootBuffer,
78+
int objectOffset)
6579
{
80+
ReadOnlyMemory<byte> buffer = rootBuffer.Slice(objectOffset);
6681
byte typeMarker = buffer.Span[0];
82+
6783
if (!JsonBinaryEncoding.TypeMarker.IsObject(typeMarker))
6884
{
6985
throw new JsonInvalidTokenException();
@@ -73,7 +89,7 @@ public static IEnumerable<ObjectProperty> GetObjectProperties(ReadOnlyMemory<byt
7389
int objectLength = JsonBinaryEncoding.GetValueLength(buffer.Span);
7490

7591
// Scope to just the array
76-
buffer = buffer.Slice(0, (int)objectLength);
92+
buffer = buffer.Slice(0, objectLength);
7793

7894
// Seek to the first object property
7995
buffer = buffer.Slice(firstValueOffset);
@@ -85,7 +101,8 @@ public static IEnumerable<ObjectProperty> GetObjectProperties(ReadOnlyMemory<byt
85101
throw new JsonInvalidTokenException();
86102
}
87103

88-
ReadOnlyMemory<byte> name = buffer.Slice(0, nameNodeLength);
104+
int nameOffset = objectOffset + (objectLength - buffer.Length);
105+
89106
buffer = buffer.Slice(nameNodeLength);
90107

91108
int valueNodeLength = JsonBinaryEncoding.GetValueLength(buffer.Span);
@@ -94,57 +111,36 @@ public static IEnumerable<ObjectProperty> GetObjectProperties(ReadOnlyMemory<byt
94111
throw new JsonInvalidTokenException();
95112
}
96113

97-
ReadOnlyMemory<byte> value = buffer.Slice(0, valueNodeLength);
98-
buffer = buffer.Slice(valueNodeLength);
99-
100-
yield return new ObjectProperty(name, value);
101-
}
102-
}
103-
104-
public static IEnumerable<MutableObjectProperty> GetMutableObjectProperties(Memory<byte> buffer)
105-
{
106-
foreach (ObjectProperty objectProperty in GetObjectProperties(buffer))
107-
{
108-
if (!MemoryMarshal.TryGetArray(objectProperty.Name, out ArraySegment<byte> nameSegment))
109-
{
110-
throw new InvalidOperationException("failed to get array segment.");
111-
}
114+
int valueOffset = objectOffset + (objectLength - buffer.Length);
112115

113-
if (!MemoryMarshal.TryGetArray(objectProperty.Value, out ArraySegment<byte> valueSegment))
114-
{
115-
throw new InvalidOperationException("failed to get array segment.");
116-
}
116+
buffer = buffer.Slice(valueNodeLength);
117117

118-
yield return new MutableObjectProperty(nameSegment, valueSegment);
118+
yield return new ObjectProperty(nameOffset, valueOffset);
119119
}
120120
}
121121

122-
public readonly struct ObjectProperty
122+
public readonly struct ArrayItem
123123
{
124-
public ObjectProperty(
125-
ReadOnlyMemory<byte> name,
126-
ReadOnlyMemory<byte> value)
124+
public ArrayItem(int offset, UniformArrayInfo externalArrayInfo)
127125
{
128-
this.Name = name;
129-
this.Value = value;
126+
this.Offset = offset;
127+
this.ExternalArrayInfo = externalArrayInfo;
130128
}
131129

132-
public ReadOnlyMemory<byte> Name { get; }
133-
public ReadOnlyMemory<byte> Value { get; }
130+
public int Offset { get; }
131+
public UniformArrayInfo ExternalArrayInfo { get; }
134132
}
135133

136-
public readonly struct MutableObjectProperty
134+
public readonly struct ObjectProperty
137135
{
138-
public MutableObjectProperty(
139-
Memory<byte> name,
140-
Memory<byte> value)
136+
public ObjectProperty(int nameOffset, int valueOffset)
141137
{
142-
this.Name = name;
143-
this.Value = value;
138+
this.NameOffset = nameOffset;
139+
this.ValueOffset = valueOffset;
144140
}
145141

146-
public Memory<byte> Name { get; }
147-
public Memory<byte> Value { get; }
142+
public int NameOffset { get; }
143+
public int ValueOffset { get; }
148144
}
149145
}
150146
}

Microsoft.Azure.Cosmos/src/Json/JsonBinaryEncoding.NodeTypes.cs

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,7 @@ public static class NodeTypes
9999
String, // StrR2 (Reference string of 2-byte offset)
100100
String, // StrR3 (Reference string of 3-byte offset)
101101
String, // StrR4 (Reference string of 4-byte offset)
102-
Unknown, // <empty> 0xC7
102+
Number, // NumUI64
103103

104104
// Number Values
105105
Number, // NumUI8
@@ -109,7 +109,7 @@ public static class NodeTypes
109109
Number, // NumDbl,
110110
Float32, // Float32
111111
Float64, // Float64
112-
Unknown, // <empty> 0xCF
112+
Unknown, // Float16 (No corresponding JsonNodeType at the moment)
113113

114114
// Other Value Types
115115
Null, // Null
@@ -119,7 +119,7 @@ public static class NodeTypes
119119
Unknown, // <empty> 0xD4
120120
Unknown, // <empty> 0xD5
121121
Unknown, // <empty> 0xD6
122-
Unknown, // <empty> 0xD7
122+
Unknown, // UInt8 (No corresponding JsonNodeType at the moment)
123123

124124
Int8, // Int8
125125
Int16, // Int16
@@ -150,11 +150,11 @@ public static class NodeTypes
150150
Object, // ObjLC2 (2-byte length and count)
151151
Object, // ObjLC4 (4-byte length and count)
152152

153-
// Empty Range
154-
Unknown, // <empty> 0xF0
155-
Unknown, // <empty> 0xF1
156-
Unknown, // <empty> 0xF2
157-
Unknown, // <empty> 0xF3
153+
// Array and Object Special Type Markers
154+
Array, // ArrNumC1 Uniform number array of 1-byte item count
155+
Array, // ArrNumC2 Uniform number array of 2-byte item count
156+
Array, // Array of 1-byte item count of Uniform number array of 1-byte item count
157+
Array, // Array of 2-byte item count of Uniform number array of 2-byte item count
158158
Unknown, // <empty> 0xF4
159159
Unknown, // <empty> 0xF5
160160
Unknown, // <empty> 0xF6

0 commit comments

Comments
 (0)