-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Adding try_append_value implementation to ByteViewBuilder
#8594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 1 commit
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
8859ff7
Adding try_append_value implementation.
samueleresca bfe53a8
Merge branch 'main' into safer-appendvalue-bytes-view
samueleresca 3fa929c
fix: incorrect assertion in `BitChunks::new` (#8620)
rluvaton a4eb148
[Variant] impl `PartialEq` and `FromIterator<Option<..>>` for `Varian…
friendlymatthew 2f3f34a
Introduce a ThriftProtocolError to avoid allocating and formattings s…
jhorstmann a0e79a4
Updating.
samueleresca 89e40d5
Merge branch 'apache:main' into safer-appendvalue-bytes-view
samueleresca File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -306,15 +306,30 @@ impl<T: ByteViewType + ?Sized> GenericByteViewBuilder<T> { | |
| /// - String length exceeds `u32::MAX` | ||
| #[inline] | ||
| pub fn append_value(&mut self, value: impl AsRef<T::Native>) { | ||
| self.try_append_value(value).unwrap() | ||
| } | ||
|
|
||
| /// Appends a value into the builder | ||
| /// | ||
| /// # Errors | ||
| /// | ||
| /// Returns an error if: | ||
| /// - String buffer count exceeds `u32::MAX` | ||
| /// - String length exceeds `u32::MAX` | ||
| #[inline] | ||
| pub fn try_append_value(&mut self, value: impl AsRef<T::Native>) -> Result<(), ArrowError> { | ||
| let v: &[u8] = value.as_ref().as_ref(); | ||
| let length: u32 = v.len().try_into().unwrap(); | ||
| let length: u32 = v.len().try_into().map_err(|_| { | ||
| ArrowError::InvalidArgumentError(format!("String length {} exceeds u32::MAX", v.len())) | ||
| })?; | ||
|
|
||
| if length <= MAX_INLINE_VIEW_LEN { | ||
| let mut view_buffer = [0; 16]; | ||
| view_buffer[0..4].copy_from_slice(&length.to_le_bytes()); | ||
| view_buffer[4..4 + v.len()].copy_from_slice(v); | ||
| self.views_buffer.push(u128::from_le_bytes(view_buffer)); | ||
| self.null_buffer_builder.append_non_null(); | ||
| return; | ||
| return Ok(()); | ||
| } | ||
|
|
||
| // Deduplication if: | ||
|
|
@@ -339,7 +354,7 @@ impl<T: ByteViewType + ?Sized> GenericByteViewBuilder<T> { | |
| self.views_buffer.push(self.views_buffer[*idx]); | ||
| self.null_buffer_builder.append_non_null(); | ||
| self.string_tracker = Some((ht, hasher)); | ||
| return; | ||
| return Ok(()); | ||
| } | ||
| Entry::Vacant(vacant) => { | ||
| // o.w. we insert the (string hash -> view index) | ||
|
|
@@ -356,17 +371,41 @@ impl<T: ByteViewType + ?Sized> GenericByteViewBuilder<T> { | |
| let to_reserve = v.len().max(self.block_size.next_size() as usize); | ||
| self.in_progress.reserve(to_reserve); | ||
| }; | ||
| let offset = self.in_progress.len() as u32; | ||
| let offset: u32 = self.in_progress.len().try_into().map_err(|_| { | ||
| ArrowError::InvalidArgumentError(format!( | ||
| "In-progress buffer length {} exceeds u32::MAX", | ||
| self.in_progress.len() | ||
| )) | ||
| })?; | ||
| self.in_progress.extend_from_slice(v); | ||
|
|
||
| let prefix = v | ||
| .get(0..4) | ||
| .and_then(|slice| slice.try_into().ok()) | ||
| .map(u32::from_le_bytes) | ||
| .ok_or_else(|| { | ||
| ArrowError::InvalidArgumentError( | ||
| "String must be at least 4 bytes for non-inline view".to_string(), | ||
|
||
| ) | ||
| })?; | ||
|
|
||
| let buffer_index: u32 = self.completed.len().try_into().map_err(|_| { | ||
| ArrowError::InvalidArgumentError(format!( | ||
| "Buffer count {} exceeds u32::MAX", | ||
| self.completed.len() | ||
| )) | ||
| })?; | ||
|
|
||
| let view = ByteView { | ||
| length, | ||
| prefix: u32::from_le_bytes(v[0..4].try_into().unwrap()), | ||
| buffer_index: self.completed.len() as u32, | ||
| prefix, | ||
| buffer_index, | ||
| offset, | ||
| }; | ||
| self.views_buffer.push(view.into()); | ||
| self.null_buffer_builder.append_non_null(); | ||
|
|
||
| Ok(()) | ||
| } | ||
|
|
||
| /// Append an `Option` value into the builder | ||
|
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the method can recover by starting a new in-progress buffer instead of returning an error here.
I am unsure if this error is even reachable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a new buffer would be allocated in the line immediately above this. Maybe we should do a checked add in
let required_cap = self.in_progress.len() + v.len();🤔To error here, we would need a
usizethat doesn't fit into a u32.. I think all platforms we care about haveusizethat is at leastu32(aka 32-bit architectures)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would be the opposite, a
usizein a 64-bit arch wouldn't fit a u32? Anyway, I will review and update these changes over the weekendThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right -- thank you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is right. I wasn't able to trigger this. For the panic to verify, it would need to skip the
flush_in_progress()operation that happens when the buffer reaches the required capacity.On top of that, the
push_completedused by theflush_in_progressasserts on theblock.len()(see below). Therefore,let offset: u32 = self.in_progress.len()wouldn't be reached:arrow-rs/arrow-array/src/builder/generic_bytes_view_builder.rs
Lines 273 to 276 in d49f017
I'm proceeding by reverting this the
map_errin the offset initialization.