-
Notifications
You must be signed in to change notification settings - Fork 3.4k
HBASE-29644: Refresh_meta triggering compaction on user table #7385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
10afd02 to
3a989c8
Compare
This comment has been minimized.
This comment has been minimized.
wchevreuil
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's nice to have such safeguard in case we face any unexpected attempt of performing a write operation in a read-only cluster, but it's not an acceptable solution for the use case here. We know something triggers compaction when the refresh_meta command is executed on a read replica cluster, so we should find out where that's been triggered and put a check there to avoid waste of resources, rather than relying on exception being thrown. That would cause log pollution and could create confusion for operators.
|
@wchevreuil Thanks for the suggestion, we will check what is the root cause of this. |
This comment has been minimized.
This comment has been minimized.
|
@sharmaar12 Try the following: create a unit test which triggers the problem, attach debugger and set a breakpoint in your event handler |
e7cb788 to
a279f76
Compare
|
@wchevreuil @anmolnar |
This comment has been minimized.
This comment has been minimized.
a279f76 to
465bb2c
Compare
This comment has been minimized.
This comment has been minimized.
| if (isReadOnlyEnabled()) { | ||
| LOG.info("Ignoring compaction request for " + region + ",because read-only mode is on."); | ||
| return; | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we don't simply disable compaction altogether in the read replica cluster? See line #343 in CompactionSplit, there's already a check for compaction enabled flag. I would rather refrain from polluting CompactiSplit code with logic for read replica.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use that approach but then one issue I can think of is that hbase.global.readonly.enabled property is dynamically configurable using update_all_config but is it true for hbase.hstore.compaction.enabled also?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like @wchevreuil 's idea.
How about adding the read-only check to the getter?
public boolean isCompactionsEnabled() {
return compactionsEnabled && !isReadOnlyEnabled();
}You don't need to dynamically change the compaction flag.
wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we may need to at least modify the log messages to mention that either compaction is disabled or readonly mode is on. Otherwise compaction may be enabled but we are logging it as disabled because of read-only mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOG.info("Ignoring compaction request for " + region +
(!isReadOnlyEnabled ? ", because compaction is disabled." : " in read-only mode"));There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or just leave it as is, not a biggy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hbase.hstore.compaction.enabled
The actual property name is hbase.regionserver.compaction.enabled. Compaction is actual "switchable" via the Admin.compactionSwitch() method (we also expose an hbase shell command for that). The CompactSplit thread itself exposes a switchCompaction method which could be called on both RS startup and the dynamic config handler for the hbase.global.readonly.enabled property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Be careful with switching the property directly. User might have intentionally disabled it and you should not enable it when go from R/O -> R/W mode. My approach seems safer to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Be careful with switching the property directly. User might have intentionally disabled it and you should not enable it when go from R/O -> R/W mode. My approach seems safer to me.
Good point. Let's just do all checks inside isCompactionsEnabled, as @anmolnar suggested.
465bb2c to
abc30e7
Compare
This comment has been minimized.
This comment has been minimized.
abc30e7 to
6e21ae1
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
@sharmaar12 Do you think that the unit test failure is related to the patch? |
@anmolnar It may not be because, it passes on my local setup. Also our code only execute when read-only mode is on. In previous run https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7385/3/ also there were 8 failures which were different from these and they also passed in my local setup. Could you please help me rerun the job? |
This comment has been minimized.
This comment has been minimized.
|
Looks like the build has been restarted. @wchevreuil would you please review again? |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
6e21ae1 to
c5e2ea5
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Link to JIRA: https://issues.apache.org/jira/browse/HBASE-29644 Description: Consider the two cluster setup with one being active and one read replica. If active cluster create a table with FILE based SFT. If you add few rows through active and do flushes to create few Hfiles and then do refresh_meta from read replica its triggering minor compaction. Which should not happen via read replica, it may create inconsitencies because active is not aware of that event. Cause: This is happening because we should block the compaction event in ReadOnlyController but we missed adding read only guard to preCompactSelection() function. Fix: Add internalReadOnlyGuard to preCompactSelection() in ReadOnlyController
c5e2ea5 to
e6d2559
Compare
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
Link to JIRA: https://issues.apache.org/jira/browse/HBASE-29644
Description:
Consider the two cluster setup with one being active and one read replica. If active cluster create a table with FILE based SFT. If you add few rows through active and do flushes to create few Hfiles and then do refresh_meta from read replica its triggering minor compaction. Which should not happen via read replica, it may create inconsitencies because active is not aware of that event.
Cause:
This is happening because we should block the compaction event in ReadOnlyController but we missed adding read only guard to preCompactSelection() function.
Fix:
Add internalReadOnlyGuard to preCompactSelection() in ReadOnlyController