Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 5 additions & 13 deletions parquet/src/file/serialized_reader.rs
Original file line number Diff line number Diff line change
Expand Up @@ -397,9 +397,9 @@ pub(crate) fn decode_page(
}
let decompressed_size = uncompressed_page_size - offset;
let mut decompressed = Vec::with_capacity(uncompressed_page_size);
decompressed.extend_from_slice(&buffer.as_ref()[..offset]);
decompressed.extend_from_slice(&buffer[..offset]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems an unrelated (but nice) cleanup

While looking at this code, it seems like it always copies the compressed bytes, even when it then decompresses it immediately. I'll make a small PR to see if I can remove that unecessary copy

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if decompressed_size > 0 {
let compressed = &buffer.as_ref()[offset..];
let compressed = &buffer[offset..];
decompressor.decompress(compressed, &mut decompressed, Some(decompressed_size))?;
}

Expand Down Expand Up @@ -897,31 +897,23 @@ impl<R: ChunkReader> PageReader for SerializedPageReader<R> {
*remaining,
)?;
let data_len = header.compressed_page_size as usize;
let data_start = *offset;
*offset += data_len as u64;
*remaining -= data_len as u64;

if header.r#type == PageType::INDEX_PAGE {
continue;
}

let mut buffer = Vec::with_capacity(data_len);
let read = read.take(data_len as u64).read_to_end(&mut buffer)?;

if read != data_len {
return Err(eof_err!(
"Expected to read {} bytes of page, read only {}",
data_len,
read
));
}
let buffer = self.reader.get_bytes(data_start, data_len)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirm on review this can potentially avoid a copy if the underlying reader is already Bytes


let buffer =
self.context
.decrypt_page_data(buffer, *page_index, *require_dictionary)?;

let page = decode_page(
header,
Bytes::from(buffer),
buffer,
self.physical_type,
self.decompressor.as_mut(),
)?;
Expand Down
Loading