-
Notifications
You must be signed in to change notification settings - Fork 3.4k
HBASE-28996: Implement Custom ReplicationEndpoint to Enable WAL Backup to External Storage #6518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
@vinayakphegde You need to add ASF license header to all new files. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
🎊 +1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
anmolnar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few comments.
| if (backupFs.exists(dirPath)) { | ||
| LOG.info("{} Directory already exists: {}", Utils.logPeerId(peerId), dirPath); | ||
| } else { | ||
| backupFs.mkdirs(dirPath); | ||
| LOG.info("{} Successfully created directory: {}", Utils.logPeerId(peerId), dirPath); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How many HBase instances are running this code? Isn't there a race condition here?
Is this run by master only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will run on region servers. I believe we can simplify the logic by directly using backupFs.mkdirs(dirPath). It will create the directory if it doesn't exist, and simply return true if the directory already exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be better.
...backup/src/main/java/org/apache/hadoop/hbase/backup/replication/ContinuousBackupManager.java
Show resolved
Hide resolved
...backup/src/main/java/org/apache/hadoop/hbase/backup/replication/ContinuousBackupManager.java
Show resolved
Hide resolved
| continuousBackupWalWriter.write(walEntries, bulkLoadFiles); | ||
|
|
||
| // prevent bulk-loaded files from deleting HFileCleaner thread | ||
| stagedBulkloadFileRegistry.addStagedFiles(bulkLoadFiles); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might have a race here:
bulkLoadFilesare written byWalWriterbulkLoadFilesgets added to the exception list forHFileCleaner
If HFileCleaner runs between these 2 events, it will delete files which recently have been staged. I'm not sure yet how to do this properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that scenario doesn't occur. The replication framework prevents the deletion of files using ReplicationHFileCleaner. Files are only deleted once the WALs and bulkload files are successfully replicated, as indicated by a success return from the replicate() method.
So, it the replicate() method is complete, it won't delete the file.
However, in our case, the bulkloaded files are still in the staging area and haven’t been copied yet. Therefore, we had to implement an additional cleaner for this purpose.
|
|
||
| @Override | ||
| public boolean isFileDeletable(FileStatus fStat) { | ||
| // The actual deletion decision is made in getDeletableFiles, so returning true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
Check if the file available for deletion here. Connection to HBase can be kept open.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A single GET op to HBase checking if filename is present in the table should be quick.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach would be slightly faster, right? In getDeletableFiles, we could run the query once and process all the results together, instead of executing multiple queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about that. Currently you run a scanner while with my approach, you'll run multiple GETs. Generally speaking PUTs and GETs are the most performant operations in key-value stores, therefore they're preferred over scanners. But. The table is small, cached, etc.
Personally I don't like scanning the same table over and over again. It's like a Hashtable.
| // Fetch staged files from HBase | ||
| Set<String> stagedFiles = StagedBulkloadFileRegistry.listAllBulkloadFiles(connection); | ||
| LOG.debug("Fetched {} staged files from HBase.", stagedFiles.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to maintain the list of staged files in an HBase table?
Why not just return false (do not delete) if the file is in the staging area?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use special file name prefix while staging:
bulkload1.hfile
dontdelete_bulkload2.hfile
dontdelete_bulkload3.hfile
bulkload1.hfile can be deleted from staging area others should be kept. Rename operation is atomic in HDFS afaik.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the bulkload files, we are not moving them anywhere. They remain in the data directory or archive directory. We are simply maintaining a reference to them in the HBase table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggested another way (file renaming) to avoid accidentally deleting them. This way we could avoid maintaining an HBase table.
This PR implements a custom
ReplicationEndpointfor HBase to enable WAL backup to external storage. It introduces several components includingContinuousBackupReplicationEndpoint,ContinuousBackupManager, andStagedBulkloadFileRegistry, among others, to handle the backup process efficiently.