-
Notifications
You must be signed in to change notification settings - Fork 14
chore: BlockAccessors Hardening & Improved Error Handling
#1833
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
chore: BlockAccessors Hardening & Improved Error Handling
#1833
Conversation
5e07b85 to
3617276
Compare
3617276 to
e75166d
Compare
| } else { | ||
| return new BlockResponse(Code.NOT_FOUND, null); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added this else clause here as I believe this is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is correct. NOT_FOUND is we should have it but something is up.
NOT_AVAILABLE is we don't have it but it might exist on another BN
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Nana-EC there is a very real race condition where we pass the check above, because the block exists, but by the time we try to get it, retention could wipe it. In that case the block is not found (we do not have it). Another possibility is that an exception might have occurred, which also results in us not having it at the moment. That was my thought process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR creates a hardlink when we issue the accessor. But here we check for availability (checking the ranged set) and then getting the accessor. I was refering to that. If we get the accessor, we are fine, we will have access because of the hardlink. But if the ranged set contains the block, then retention deletes it and then we try to actually get the accessor, we will not be able to. That should be extremely rare though. That is why I added the else clause, null check for the accessor.
| /** | ||
| * Constructor. | ||
| */ | ||
| public FilesHistoricConfig { | ||
| Objects.requireNonNull(rootPath); | ||
| Objects.requireNonNull(compression); | ||
| Preconditions.requireInRange(powersOfTenPerZipFileContents, 1, 6); | ||
| Preconditions.requireGreaterOrEqual(blockRetentionThreshold, 0L); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These constructors here and for the other config have been removed as validation must be done by the config dependency. I've also deleted the unit tests for the configs. I think they are redundant and maintaining them is a hassle.
e75166d to
db09b26
Compare
|
We could also add tests for:
|
db09b26 to
1e574a7
Compare
Codecov Report❌ Patch coverage is ❌ Your patch check has failed because the patch coverage (66.99%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage. @@ Coverage Diff @@
## main #1833 +/- ##
============================================
- Coverage 81.37% 81.19% -0.18%
- Complexity 1164 1173 +9
============================================
Files 126 126
Lines 5428 5472 +44
Branches 578 587 +9
============================================
+ Hits 4417 4443 +26
- Misses 751 766 +15
- Partials 260 263 +3 ... and 1 file with indirect coverage changes 🚀 New features to boost your workflow:
|
mustafauzunn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to add the new paths to block-node/app/build.gradle.kts to not break the local runs
mustafauzunn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should update docs/block-node/configuration.md with the new configs
af5273d to
857c37b
Compare
|
@mustafauzunn thanks for the notes, I've updated configs md + added env vars in build gradle for local builds. |
These have been added. |
Nana-EC
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1st pass good improvements, will circle back
| } else { | ||
| return new BlockResponse(Code.NOT_FOUND, null); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is correct. NOT_FOUND is we should have it but something is up.
NOT_AVAILABLE is we don't have it but it might exist on another BN
| Objects.requireNonNull(rootPath); | ||
| Objects.requireNonNull(compression); | ||
| Preconditions.requireInRange(powersOfTenPerZipFileContents, 1, 6); | ||
| Preconditions.requireGreaterOrEqual(blockRetentionThreshold, 0L); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you want to set @Min on blockRetentionThreshold field then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, let me revisit these.
* BlockAccessors are now Autocloseable
* BlockAccessors now create hard links for the duration of their
lifespan
* This allows us to ensure data is accessible for the duration of the
accessor's life
* Tests are updated and migrated
* Deleted redundant configuration tests
* Added tests:
* Tests for subsequent reads to ensure that closing an accessor does
not affect the data or the ability for a new accessor to access it
Signed-off-by: Atanas Atanasov <[email protected]>
857c37b to
1a31bf8
Compare
| } catch (final RuntimeException e) { | ||
| final String message = "Failed to retrieve block number %d.".formatted(request.blockNumber()); | ||
| LOGGER.log(ERROR, message, e); | ||
| responseCounterNotFound.increment(); // Should this be a failure counter? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with comment maybe we should have a failure counter. However, this could be done as part of another issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment may be redundant and we simply need to delete it.
@AlfredoG87 what do we mean here by failure counter? Is that something that is simply grafana UI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The metric should probably be more generic; a failure counter instead of a "not found" counter. That probably works better as a combined "result counter" that counts all results and uses labels to differentiate between success, not found, not available, etc...
That should perhaps wait until we have labels for metrics, however.
| @ConfigData("files.historic") | ||
| public record FilesHistoricConfig( | ||
| @Loggable @ConfigProperty(defaultValue = "/opt/hiero/block-node/data/historic") Path rootPath, | ||
| @Loggable @ConfigProperty(defaultValue = "/opt/hiero/block-node/data/historic-links") Path linksRootPath, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of totally new paths, why not use a subfolder of the existing path?
e.g. rootPath + "/accessor-links" would separate the folders but ensure it's on the same physical device (which is a requirement for hard links).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially I did it so, the links was subfoldered, but thought that it could be misused. I think what you are suggesting is that I remove the config property and simply create a subdir? I could do that. Do you believe there is a risk or room for error/misuse as the data path is quite important?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it (misuse) is something to worry about.
The subfolder and the main blocks folders have to be on the same device, and anything with access to one will likely have access to both.
The benefit of knowing the temp and main storage locations are definitively on the same device is worth more than any misuse concerns, in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed these tests (and the ones for recent) for configs since I've removed the constructors as well. Nothing could be asserted. Supporting these is a hassle and validations and provision of non-null default values must be done via the config dependency.
| import java.nio.file.Files; | ||
| import java.nio.file.Path; | ||
| import java.util.Objects; | ||
| import java.util.UUID; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UUID are costly (and a bit dicey); why not just use a tempfile name (which is guaranteed unique)?
The Files object has methods for creating a tempfile in a specified directory, and that can then be deleted and a hardlink created with the same name (not perfect, but likely better than adding a 128-bit type 4 UUID).
| return getBytesFromPath(format, entry, compressionType); | ||
| } catch (final UncheckedIOException | IOException e) { | ||
| LOGGER.log(WARNING, FAILED_TO_READ_MESSAGE.formatted(blockPath), e); | ||
| LOGGER.log(WARNING, FAILED_TO_READ_MESSAGE.formatted(zipFileLink), e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the link should be a hidden internal; pointing to the block path in logs seems much more useful.
Same comment in the log below (line 135).
| environment("FILES_HISTORIC_LINKS_ROOT_PATH", "${serverDataDir}/files-historic-links") | ||
| environment("FILES_RECENT_LIVE_ROOT_PATH", "${serverDataDir}/files-live") | ||
| environment("FILES_RECENT_LINKS_ROOT_PATH", "${serverDataDir}/files-live-links") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both of these should be subfolders of the same folder used for the actual blocks. It's safer that way and matches what we're doing in other (similar) cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IS this class used?
| // attempt to clear any existing links root directory | ||
| if (Files.exists(linksRoot)) { | ||
| try (final Stream<Path> paths = Files.walk(linksRoot)) { | ||
| for (final Path path : paths.sorted(Comparator.reverseOrder()).toList()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why sort the hardlinks?
| final Bytes bytes = blockAccessor.blockBytes(format); | ||
| // close the accessor as we are done with it and we need to free | ||
| // resources | ||
| blockAccessor.close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth defining the batch as an AutoCloseable and Iterable object (instead of just a list) so that close can be handled automatically in try-with-resources by the caller of this method instead of handling the individual close calls here.
In particular I'm concerned what happens to all the remaining not-yet-complete accessors if this method encounters an exception.
| // attempt to clear any existing links root directory | ||
| if (Files.exists(linksRoot)) { | ||
| try (final Stream<Path> paths = Files.walk(linksRoot)) { | ||
| for (final Path path : paths.sorted(Comparator.reverseOrder()).toList()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why sort the list?
I would think that paths.forEach(path -> Files.delete(path)) would be sufficient...
| final BlockFileBlockAccessor accessor = | ||
| new BlockFileBlockAccessor(verifiedBlockPath, config, blockNumber); | ||
| blocksReadCounter.increment(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought for a future PR:
What would happen if we added the metric to the accessor constructor here, and let the accessor increment the counter when the block is read? If an accessor might be read repeatedly, we could get more accurate metrics that way.
| @ConfigData("files.recent") | ||
| public record FilesRecentConfig( | ||
| @Loggable @ConfigProperty(defaultValue = "/opt/hiero/block-node/data/live") Path liveRootPath, | ||
| @Loggable @ConfigProperty(defaultValue = "/opt/hiero/block-node/data/live-links") Path linksRootPath, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to above, this should probably be a hard-coded subfolder of the liveRootPath rather than a separate configured item.
lifespan
accessor's life
not affect the data or the ability for a new accessor to access it
Reviewer Notes
Related Issue(s)
Resolves #1273