AVRO-4197 Fix schema bytes defaults by martin-augment · Pull Request #18 · martin-augment/avro

martin-augment · 2025-11-06T20:58:27Z

3543: To review by AI

Note

Ensures bytes default values are encoded as strings when building schemas and adds a unit test validating default byte[] handling.

Schema defaults handling:
- In Schema.Field constructors, route default values through JacksonUtils.toJsonNode(..., true) so bytes defaults are encoded for schema context.
Jackson utils:
- Add toJsonNode(object, boolean inSchemaContext) and propagate inSchemaContext through toJson(...).
- When inSchemaContext is true, serialize byte[] as ISO-8859-1 strings; otherwise keep binary.
Tests:
- Add TestSchema.byteArrayDefaultField to assert default bytes fields materialize correctly as ByteBuffer contents.

^{Written by Cursor Bugbot for commit bd96b04. This will update automatically on new commits. Configure here.}

coderabbitai · 2025-11-06T20:58:41Z

Walkthrough

Field constructors in Schema.java updated to pass a boolean parameter to JacksonUtils.toJsonNode. Two new toJsonNode method overloads added to JacksonUtils with schema context flag. Bytes serialization behavior modified based on schema context flag. Test coverage added for byte array default fields.

Changes

Cohort / File(s)	Summary
Field Default Value Handling `lang/java/avro/src/main/java/org/apache/avro/Schema.java`	Updated Field constructors to pass boolean parameter to JacksonUtils.toJsonNode for default value serialization
JSON Serialization Utilities `lang/java/avro/src/main/java/org/apache/avro/util/internal/JacksonUtils.java`	Added two public toJsonNode method overloads; enhanced toJson with inSchemaContext flag; bytes serialized as ISO-8859-1 string when inSchemaContext is true, otherwise as binary data
Test Coverage `lang/java/avro/src/test/java/org/apache/avro/TestSchema.java`	Added byteArrayDefaultField test method; updated imports to include ByteBuffer and wildcard assertions import

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch pr-3543-2025-11-06-20-58-24

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6398a1f and bd96b04.

📒 Files selected for processing (3)

lang/java/avro/src/main/java/org/apache/avro/Schema.java (2 hunks)
lang/java/avro/src/main/java/org/apache/avro/util/internal/JacksonUtils.java (1 hunks)
lang/java/avro/src/test/java/org/apache/avro/TestSchema.java (3 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

lang/java/avro/src/main/java/org/apache/avro/util/internal/JacksonUtils.java (1)

lang/java/avro/src/main/java/org/apache/avro/JsonProperties.java (1)

JsonProperties (123-345)

lang/java/avro/src/main/java/org/apache/avro/Schema.java (1)

lang/java/avro/src/main/java/org/apache/avro/util/internal/JacksonUtils.java (1)

JacksonUtils (43-220)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: claude-review
GitHub Check: codex

🔇 Additional comments (8)

lang/java/avro/src/test/java/org/apache/avro/TestSchema.java (3)
29-29: LGTM!

The ByteBuffer import is necessary for the new test method.

63-63: LGTM!

The wildcard import simplifies assertions usage in the test file.

236-243: Let me search for how Avro specifically handles byte array defaults in the build process:

Based on my verification, the concern about ByteBuffer.array() usage is theoretically valid but the test will function correctly as written. The pattern shown in Avro code handles ByteBuffer position/limit explicitly, which demonstrates this is a real consideration. However, GenericRecordBuilder fills in default values for fields if they are not specified, and when a byte array default is wrapped, it uses ByteBuffer.wrap() which starts at position 0 with full capacity.

The test will pass as-is because the ByteBuffer returned for a BYTES field with a simple byte[] default will have position=0 and limit=array.length. However, this remains a fragile assumption and best practice suggests explicit verification of buffer state.

Use ByteBuffer.array() with explicit position/limit handling to ensure robustness.

While the test currently works, make the assertion more defensive:
-    assertArrayEquals(((ByteBuffer) rec.get("bytesField")).array(), defaultBytes);
+    ByteBuffer buffer = (ByteBuffer) rec.get("bytesField");
+    byte[] actualBytes = new byte[buffer.remaining()];
+    buffer.duplicate().get(actualBytes);
+    assertArrayEquals(actualBytes, defaultBytes);
This ensures the test correctly extracts only the valid buffer content regardless of internal position/limit state.
lang/java/avro/src/main/java/org/apache/avro/Schema.java (2)

614-618: LGTM! Correct fix for byte array defaults in schemas.

The constructor now passes true for inSchemaContext to ensure byte arrays in default values are properly serialized as ISO-8859-1 encoded strings in the schema JSON representation.

624-628: LGTM! Consistent with the other Field constructor.

This constructor also correctly passes true for inSchemaContext when converting default values to JsonNode.

lang/java/avro/src/main/java/org/apache/avro/util/internal/JacksonUtils.java (3)

60-62: LGTM! Backwards-compatible overload.

The new overload maintains existing behavior by defaulting to inSchemaContext=false, ensuring backwards compatibility for existing callers.

64-75: LGTM! Properly propagates the schema context flag.

The method signature now includes the inSchemaContext parameter and correctly passes it to the internal toJson method.

78-100: LGTM! Correct fix for bytes serialization in schema context.

The changes properly handle bytes based on context:

In schema context (inSchemaContext=true): bytes are encoded as ISO-8859-1 strings for JSON compatibility

Outside schema context: bytes are written as binary data

The flag is correctly propagated through recursive calls (lines 85, 91), ensuring consistent behavior throughout the object graph.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

claude · 2025-11-06T21:00:10Z

Code Review - AVRO-4197 Fix schema bytes defaults

Thank you for this contribution! This PR addresses an important issue with how byte array default values are handled in schema field definitions. Here's my detailed review:

✅ Positive Aspects

Clear Problem Statement: The fix correctly identifies that byte arrays used as default values in schema fields need to be encoded as ISO-8859-1 strings in the schema context, matching the Avro specification.
Minimal API Surface Change: The addition of the inSchemaContext parameter to toJsonNode maintains backward compatibility by defaulting to false, ensuring existing callers are not affected.
Test Coverage: The new test byteArrayDefaultField() validates the core functionality by ensuring that byte array defaults materialize correctly as ByteBuffer contents.

🔍 Code Quality Observations

Schema.java (lines 616, 626)

Good: The changes correctly pass true for inSchemaContext when creating Field objects with default values.
Consistency: Both constructors are updated consistently.

JacksonUtils.java (lines 60-120)

Good: The overloaded method pattern is clean and maintains backward compatibility.
Implementation: The conditional logic at lines 95-100 correctly distinguishes between schema context (string encoding) and non-schema context (binary encoding).
Comment Quality: The inline comment at line 96 is helpful but could be more specific about why this is required per the Avro spec.

TestSchema.java (lines 236-243)

Test Quality: The test is straightforward and validates the expected behavior.
Potential Issue: ⚠️ Line 242 uses ((ByteBuffer) rec.get("bytesField")).array() which could fail if the ByteBuffer is read-only or doesn't have a backing array. Consider using:
```
ByteBuffer buffer = (ByteBuffer) rec.get("bytesField");
byte[] actual = new byte[buffer.remaining()];
buffer.get(actual);
assertArrayEquals(actual, defaultBytes);
```

🤔 Potential Issues & Concerns

SchemaBuilder Inconsistency: The SchemaBuilder class has its own private toJsonNode(Object o) method (line 2718) that already handles byte arrays correctly by converting them to ISO-8859-1 strings. This creates inconsistency:
- SchemaBuilder.toJsonNode() always encodes bytes as strings
- JacksonUtils.toJsonNode() now requires an inSchemaContext flag
Question: Should SchemaBuilder.toJsonNode() be refactored to use JacksonUtils.toJsonNode(object, true) to avoid duplication?
Nested Byte Arrays: The SchemaBuilder comment at line 2723 mentions "this does not handle the case of a default value with nested bytes". Does this PR handle nested byte arrays in maps/collections correctly? The recursive calls in toJson() properly propagate inSchemaContext, which suggests it should work, but this deserves explicit testing.
Missing Test Coverage:
- No test for nested byte arrays (e.g., byte[] inside a map or array default value)
- No test for ByteBuffer as a default value (only tests byte[])
- No test verifying that non-schema-context calls still produce binary nodes
TestJacksonUtils Impact: The existing test at line 71 expects BinaryNode.valueOf(new byte[] { 1, 2 }) from toJsonNode(new byte[] { 1, 2 }). Since the default is now inSchemaContext=false, this should still pass, but it would be good to add a complementary test showing that toJsonNode(new byte[] { 1, 2 }, true) produces a TextNode.

🔒 Security Considerations

No security concerns identified. The use of ISO-8859-1 encoding is appropriate for Avro byte string representation.

⚡ Performance Considerations

The changes add minimal overhead (one boolean parameter and conditional check). Performance impact is negligible.

📝 Recommendations

Add test for schema serialization: Verify that the schema itself serializes correctly with byte defaults:

String schemaJson = rSchema.toString();
Schema reparsed = new Schema.Parser().parse(schemaJson);
// verify the reparsed schema has the same default

Add test for nested bytes: Test default values like Map.of("key", new byte[]{1,2,3})
Consider refactoring SchemaBuilder: Investigate whether SchemaBuilder.toJsonNode() can be consolidated with JacksonUtils.toJsonNode()
Enhance inline documentation: Add a more detailed comment explaining the Avro specification requirement for byte encoding in schemas.
Fix the test robustness issue: Use the ByteBuffer reading pattern suggested above.

Summary

This is a solid fix that addresses a real issue with byte array defaults in schema fields. The implementation is clean and maintains backward compatibility. With the addition of a few more tests and minor improvements, this would be ready to merge.

Recommendation: Approve with minor changes suggested above.

Review generated with assistance from Claude Code

augmentcode

Review completed. 1 suggestions posted.
Comment augment review to trigger a new review.

augmentcode · 2025-11-06T21:01:38Z

lang/java/avro/src/test/java/org/apache/avro/TestSchema.java

+    Schema.Field field = new Schema.Field("bytesField", Schema.create(Schema.Type.BYTES), "my bytes field", defaultBytes);
+    Schema rSchema = Schema.createRecord("myRecord", "myRecord docs", "me", false, List.of(field));
+    GenericData.Record rec = new GenericRecordBuilder(rSchema).build();
+    assertArrayEquals(((ByteBuffer) rec.get("bytesField")).array(), defaultBytes);


Consider swapping the parameters to assertArrayEquals so it’s expected, actual; JUnit uses this order and it improves failure messages.

🤖 Was this useful? React with 👍 or 👎

value:good-to-have; category:bug; feedback: The Augment AI reviewer is correct that the arguments are swapped. In case of non equal values they will be reported wrongly by JUnit library.

github-actions · 2025-11-06T21:04:18Z

Findings

Major – .github/workflows/claude-code-review.yml#L1 (and the other two workflow files) are removed, which would disable our existing PR/issue automation. This isn’t mentioned in the PR and feels unrelated to the bytes-default bug fix, so it should be restored unless there’s an explicit decision to drop those integrations.

Everything else in the diff looks consistent with the goal of emitting ISO-8859-1 strings for schema-time byte defaults, and the new unit test covers the regression path.

Fix schema bytes defaults

bd96b04

github-actions bot added the Java label Nov 6, 2025

augmentcode bot reviewed Nov 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AVRO-4197 Fix schema bytes defaults#18

AVRO-4197 Fix schema bytes defaults#18
martin-augment wants to merge 1 commit intomainfrom
pr-3543-2025-11-06-20-58-24

martin-augment commented Nov 6, 2025 •

edited by cursor bot

Loading

Uh oh!

coderabbitai bot commented Nov 6, 2025 •

edited

Loading

Uh oh!

claude bot commented Nov 6, 2025

Uh oh!

augmentcode bot left a comment

Uh oh!

augmentcode bot Nov 6, 2025

Uh oh!

martin-augment Nov 6, 2025

Uh oh!

github-actions bot commented Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

martin-augment commented Nov 6, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Uh oh!

claude bot commented Nov 6, 2025

Code Review - AVRO-4197 Fix schema bytes defaults

✅ Positive Aspects

🔍 Code Quality Observations

Schema.java (lines 616, 626)

JacksonUtils.java (lines 60-120)

TestSchema.java (lines 236-243)

🤔 Potential Issues & Concerns

🔒 Security Considerations

⚡ Performance Considerations

📝 Recommendations

Summary

Uh oh!

augmentcode bot left a comment

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

martin-augment Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

martin-augment commented Nov 6, 2025 •

edited by cursor bot

Loading

coderabbitai bot commented Nov 6, 2025 •

edited

Loading