-
Notifications
You must be signed in to change notification settings - Fork 3.4k
HBASE-23584 cache filestatus in storefileinfo #3596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: branch-2.3
Are you sure you want to change the base?
Conversation
|
🎊 +1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
286eafb to
34e1c8f
Compare
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
34e1c8f to
c13b3f0
Compare
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
c13b3f0 to
9bc85a4
Compare
|
🎊 +1 overall
This message was automatically generated. |
jojochuang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a nit. the rest looks good
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileInfo.java
Outdated
Show resolved
Hide resolved
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
9bc85a4 to
342b196
Compare
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
@jojochuang good by you? |
|
@saintstack why is this PR opened against branch-2.3, not master branch? |
|
|
||
| /** | ||
| * Sets the region coprocessor env. | ||
| * @param coprocessorHost |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated change, and this parameter isn't removed from the method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The @param had no descriptive text so the javadoc is useless... a style warning we usually clean as we go. I can put it back if you want (Would rather not).
| status = fs.getFileStatus(initialPath); | ||
| } | ||
| long length = status.getLen(); | ||
| getFileStatus(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this line since it looks redundant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The call loads the file status... caches it. Let me at least add comment here since two of you didn't get what is happening here.... (There is a comment on the cachedFileStatus member explaining the mechanism but let me make it better)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, it is redundant.... now I look at it. Let me fix.
| private static final Logger LOG = LoggerFactory.getLogger(StoreFileInfo.class); | ||
| /** | ||
| * Cached fileStatus for file or if a link or a reference, then the fileStatus of the referred-to | ||
| * file. We cache filestatus rather than ask NN each time we need it. We set it after construction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "Rather than ask FS each time " ? Its not NN always :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor : Also can we move this member variable below the static members so that all instance members together? That is something we normally follow as coding style
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Let me fix.
| /** | ||
| * @return FileStatus for the linked or referenced file or if not a reference, then the hfile | ||
| */ | ||
| private FileStatus loadFileStatus(final FileSystem fs) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can avoid passing 'fs' as we can refer that here from this.fs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me fix.
| status = fs.getFileStatus(initialPath); | ||
| } | ||
| long length = status.getLen(); | ||
| getFileStatus(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
| this.size = fStatus.getLen(); | ||
| } | ||
| // If a fileStatus, use it. It is pointed at the hfile, not at a link or reference. | ||
| this.cachedFileStatus = fileStatus != null? fileStatus: fs.getFileStatus(p); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So only in case of this is an HFile (not a split ref file and link ), we will keep the cachedFileStatus.
Not sure in such cases what status it will be passing. Should be status of the ref file or link file not the referred one or linked one right.. Checking deeply that we wont break anything in that regard with this patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, only safe to cache if filestatus is a direct reference at this point. For link or ref, need to wait till we are asked get the filestatus for the endpoint of link/ref before we go ahead and cache... will look again to be sure. thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked again and whats here is fine... we are only here if its a direct file path. There are cases where we might be passed a filestatus for a link or reference but we won't get to this point so won't cache them... which is a missed opportunity I think. Let me see if opportunity for early caching passed filestatus even if link or reference. Will be back (This is not my patch originally... trying to help land someone else's good idea..).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep.. This is really useful stuff specially when storage is remote cloud storage. Getting the FileStatus is not that cheap. And that too 2 or 3 times in the path of open an input stream over an HFile. This is more clear when we have many scans where switch from pread to stream read happens. There we will have to open the IS over HFile again. Some times the data that we are looking for, might be already there in Cache. The switch will happen and so open new FIS.. There these calls becoming the overhead. I did some profiling last weeks and it was evident .. BTW my version is bit old 2.1.x where even some recent optimization in the path of HFile IS open was missing. This is really useful stuff even though a small patch... :-) Good on you Stack.. I will also closely see and how max we can avoid the calls for getFileStatus. BTW on the Jira I put another idea also. That involves more work and useful.. The required API from hadoop common is available in some later versions only. So we might have to play with it a bit.. Once this patch is in, lets discuss that also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, this is not my idea/patch. Original was by @HuiHang-Yu / yuhuiyang [email protected] from a while back (They are the author on this patch)
Habit. Let me put up next PR against master. Thanks for reviews. |
342b196 to
2b781d2
Compare
2b781d2 to
842e0d7
Compare
|
💔 -1 overall
This message was automatically generated. |
|
Update addresses review comments. Also is more conservative than previous patches (including original posters') regards when to cache filestatus. Posted this patch against master branch too over at #3655 |
|
💔 -1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
842e0d7 to
c03b4a0
Compare
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
anoopsjohn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looking good but have few more doubts Stack.
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileInfo.java
Show resolved
Hide resolved
| } | ||
| } | ||
| throw exToThrow; | ||
| status = this.link.getFileStatus(this.fs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file status to be fetched from this.fs? Should be from passed param 'fs' ?
And all subsequent calls? When it is calls within this class, anyways we pass this.fs as param only. So that part will be ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have not checked fully when it calls from Snapshot area. Just asking Stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be safe, it should use the passed param. This method is used outside of this class. I looked at shutting it down but that would be bigger change. Let me update. Thanks @anoopsjohn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I closed down the access on this method. Changed snapshot so it creates a StoreFileInfo of its own if different filesystem. Cleaner. Thanks @anoopsjohn
c03b4a0 to
908f8aa
Compare
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
Refresh of pr1380. Trying to land a nice idea that went stale in our PR queue. Kept author. Fixed styling.