@@ -89,38 +89,38 @@ more pages.
8989This file and the [ Thrift definition] ( src/main/thrift/parquet.thrift ) should be read together to understand the format.
9090
9191 4-byte magic number "PAR1"
92- <Column 1 Chunk 1 + Column Metadata >
93- <Column 2 Chunk 1 + Column Metadata >
92+ <Column 1 Chunk 1>
93+ <Column 2 Chunk 1>
9494 ...
95- <Column N Chunk 1 + Column Metadata >
96- <Column 1 Chunk 2 + Column Metadata >
97- <Column 2 Chunk 2 + Column Metadata >
95+ <Column N Chunk 1>
96+ <Column 1 Chunk 2>
97+ <Column 2 Chunk 2>
9898 ...
99- <Column N Chunk 2 + Column Metadata >
99+ <Column N Chunk 2>
100100 ...
101- <Column 1 Chunk M + Column Metadata >
102- <Column 2 Chunk M + Column Metadata >
101+ <Column 1 Chunk M>
102+ <Column 2 Chunk M>
103103 ...
104- <Column N Chunk M + Column Metadata >
104+ <Column N Chunk M>
105105 File Metadata
106106 4-byte length in bytes of file metadata (little endian)
107107 4-byte magic number "PAR1"
108108
109109In the above example, there are N columns in this table, split into M row
110- groups. The file metadata contains the locations of all the column metadata
110+ groups. The file metadata contains the locations of all the column chunk
111111start locations. More details on what is contained in the metadata can be found
112112in the Thrift definition.
113113
114- Metadata is written after the data to allow for single pass writing.
114+ File Metadata is written after the data to allow for single pass writing.
115115
116116Readers are expected to first read the file metadata to find all the column
117117chunks they are interested in. The columns chunks should then be read sequentially.
118118
119119 ![ File Layout] ( https://raw.github.com/apache/parquet-format/master/doc/images/FileLayout.gif )
120120
121121## Metadata
122- There are three types of metadata: file metadata, column (chunk) metadata and page
123- header metadata. All thrift structures are serialized using the TCompactProtocol.
122+ There are two types of metadata: file metadata and page header metadata. All thrift structures
123+ are serialized using the TCompactProtocol.
124124
125125 ![ Metadata diagram] ( https://github.com/apache/parquet-format/raw/master/doc/images/FileFormat.gif )
126126
0 commit comments