Skip to content

Conversation

@wangkuiyi
Copy link
Collaborator

No description provided.

helinwang
helinwang previously approved these changes Apr 23, 2018
Copy link
Contributor

@helinwang helinwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

cs2be
cs2be previously approved these changes Apr 23, 2018
Copy link
Contributor

@cs2be cs2be left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM


## Fault-tolerant Writing

For the initial design purpose of ReocrdIO within Google, which was logging, RecordIO groups record into *chunks*, whose header contains an MD5 hash of the chunk. A process that writes logs is supposed to call the Writer interface to add records. Once the writer accumulates a handful of them, it groups a chunk, put the MD5 into the chunk header, and appends the chunk to the file. In the case that the process crashes unexpectedly, the leftover could be that the last chunk in the file was half-written. This doesn't prevent the process, after restarted, continue writing to the same RecordIO file, because the reader will be able to identify incomplete chunks and skip them.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReocrdIO misspelled

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -0,0 +1,13 @@
## Background

RecordIO is a file format as a container of records. This package is a C++ implementation of https://github.com/paddlepaddle/recordio, which originates from https://github.com/wangkuiyi/recordio.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe reword to "The RecordIO file format is a container for records."

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


## Fault-tolerant Writing

For the initial design purpose of ReocrdIO within Google, which was logging, RecordIO groups record into *chunks*, whose header contains an MD5 hash of the chunk. A process that writes logs is supposed to call the Writer interface to add records. Once the writer accumulates a handful of them, it groups a chunk, put the MD5 into the chunk header, and appends the chunk to the file. In the case that the process crashes unexpectedly, the leftover could be that the last chunk in the file was half-written. This doesn't prevent the process, after restarted, continue writing to the same RecordIO file, because the reader will be able to identify incomplete chunks and skip them.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"In the case that the process crashes unexpectedly, the leftover could be that the last chunk in the file was half-written. This doesn't prevent the process, after restarted, continue writing to the same RecordIO file, because the reader will be able to identify incomplete chunks and skip them."

Could be reworded to something like:

"In the event the process crashes unexpected, the last chunk in the RecordIO file could be incomplete/corrupt. The RecordIO reader is able to recover from these errors when the process restarts by identifying incomplete chucks and skipping over them".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@wangkuiyi wangkuiyi dismissed stale reviews from cs2be and helinwang via e14c7f1 April 23, 2018 23:04
@wangkuiyi
Copy link
Collaborator Author

Thanks to @abhinavarora and @cs2be ! I followed your comments.

@wangkuiyi wangkuiyi merged commit 2486d56 into develop Apr 24, 2018
@luotao1 luotao1 deleted the wangkuiyi-patch-1 branch April 24, 2018 05:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants