Add automatic retry for multi-page record reads during concurrent modifications#3396
Conversation
…rrentModificationException on queries Read-only queries (e.g. MATCH ()-[r]->() RETURN COUNT(r)) on multi-page records could fail with ConcurrentModificationException when concurrent writes modified the same record. This happened because loadMultiPageRecord detected version mismatches but threw immediately without retrying, and the query execution path (database.query()) has no retry loop unlike database.transaction(). The fix adds an automatic retry loop inside loadMultiPageRecord itself, using the existing TX_RETRIES configuration (default: 3). On version mismatch, it re-fetches the first page and re-reads all chunks. This is safe because it's a pure read operation. The exception is only thrown after exhausting all retries. https://claude.ai/code/session_01E7FXMbiyesGJMwZofCUMdV
Summary of ChangesHello @ExtReMLapin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the database's resilience to concurrent modifications by introducing an automatic retry mechanism for multi-page record reads. Previously, read-only queries could fail due to transient conflicts when records were modified during the read process. The implemented solution transparently retries these operations, ensuring data consistency and preventing unnecessary query failures, thereby improving the overall stability and user experience. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
🧪 CI InsightsHere's what we observed from your CI run for e4360eb. 🟢 All jobs passed!But CI Insights is watching 👀 |
There was a problem hiding this comment.
Code Review
This pull request introduces an automatic retry mechanism for multi-page record reads to handle concurrent modifications, which is a solid improvement for read-only query resilience. The implementation is generally well-done. However, I've identified a potential race condition in the retry logic where a deleted page or record isn't handled, which could lead to a NullPointerException. I've provided a suggestion to add checks for this scenario. Additionally, the new test case uses a broad exception catch block that might hide other errors; I've suggested a change to make the test more robust.
| firstPage = database.getPageManager().getImmutablePage(firstPageId, pageSize, false, true); | ||
| recordPositionInPage = getRecordPositionInPage(firstPage, (int) (originalRID.getPosition() % maxRecordsInPage)); | ||
| recordSize = firstPage.readNumberAndSize(recordPositionInPage); |
There was a problem hiding this comment.
In the retry logic, the call to database.getPageManager().getImmutablePage(...) on line 1316 could return null if the page was deleted concurrently. The current code doesn't handle this, which would lead to a NullPointerException on the next line. Similarly, getRecordPositionInPage(...) on line 1317 can return 0 if the record was deleted from the page, which would cause readNumberAndSize to read from an incorrect offset. It's important to add checks for these conditions to make the retry logic more robust.
firstPage = database.getPageManager().getImmutablePage(firstPageId, pageSize, false, true);
if (firstPage == null)
// Page may have been deleted, so we cannot continue.
throw new ConcurrentModificationException("First page of multi-page record " + originalRID + " was removed during read. Please retry the operation");
recordPositionInPage = getRecordPositionInPage(firstPage, (int) (originalRID.getPosition() % maxRecordsInPage));
if (recordPositionInPage == 0)
// Record was deleted from page.
throw new ConcurrentModificationException("Multi-page record " + originalRID + " was deleted during read. Please retry the operation");
recordSize = firstPage.readNumberAndSize(recordPositionInPage);| } catch (final Exception e) { | ||
| if (e.getMessage() != null && e.getMessage().contains("was modified during read")) | ||
| readErrors.incrementAndGet(); | ||
| } |
There was a problem hiding this comment.
This catch block is too broad and silently ignores any exceptions other than the expected one about a record being modified during read. This could hide other bugs. The test should fail if any unexpected exceptions occur. I recommend re-throwing other exceptions as RuntimeException to make them visible and ensure the test fails as expected.
} catch (final Exception e) {
if (e.getMessage() != null && e.getMessage().contains("was modified during read")) {
readErrors.incrementAndGet();
} else {
throw new RuntimeException("Unexpected exception in reader thread", e);
}
}|
The claude provided fix WORKS, but again, I'm not a java guy, another review would be appreciated Since reproducing the bug is not easy, with the test it's much easier
Not sure about the retry strategy. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3396 +/- ##
============================================
+ Coverage 59.52% 59.58% +0.05%
- Complexity 2444 14693 +12249
============================================
Files 1180 1182 +2
Lines 82893 82986 +93
Branches 16934 16952 +18
============================================
+ Hits 49343 49448 +105
+ Misses 26379 26361 -18
- Partials 7171 7177 +6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@ExtReMLapin this is the same behavior we have in a transaction. How are you executing the read? Are you using a transaction? Because if you wrap it in a transaction should work out of the box. |
|
We don’t use transactions at all We could use them in the write sequences to do clean cancellations…. But in read operations …. ? Edit : wrapping all my read query in transaction seems wrong |
|
ACID compliance is only for transactions ? Isn’t that a bit weird ? And if there is a x number of retries, it’s not real ACID compliance, right ? |
|
What are you using to execute the query? embedded, HTTP protocol, GRPC, Postgres or something else? |
|
HTTP REST |
|
A retry inside this low level API could cause multiple issues. The safest way is to always wrap concurrent reads into a transaction. The new |
|
I thought about this and I think you're right. Because a record is split on multiple pages that could be updated while reading, should not be an issue the user should handle, but it should be considered internal. |
|
@claude please address the concerns from gemini and review this PR |
|
I'll analyze this and get back to you. |
There was a problem hiding this comment.
fromy Claude:
Review
The approach is sound — Luca agreed that multi-page record reading is an internal concern and the retry belongs inside loadMultiPageRecord. Two issues need fixing before merge:
1. (High) Null/zero guards missing in retry path
As Gemini flagged, when retrying on lines 1316-1318, getImmutablePage() can return null (page evicted/deleted) and getRecordPositionInPage() can return 0 (record deleted). This would cause NPE or reading from an incorrect offset. See inline suggestion.
2. (Medium) Broad catch in test hides unexpected errors
The reader catch block silently swallows all exceptions, only counting "was modified during read" ones. Other unexpected exceptions are lost, making the test pass when something else is broken. Re-throw unexpected exceptions so the test fails visibly. See inline suggestion.
Minor observations
- Shared
Randominstance across threads is not thread-safe. Each thread should use its ownRandomorThreadLocalRandom. - Unreachable final
throwafter the for-loop (line 1330) is dead code since the loop always either returns or throws. Could be removed but harmless. currentFirstPage == nulltreated as consistent (line 1307) is fine — the data was already read from valid cached pages; eviction just means it left the cache, not that it was deleted.
engine/src/test/java/com/arcadedb/ConcurrentMultiPageRecordReadTest.java
Show resolved
Hide resolved
| private static final int EMBEDDING_DIM = 3072; // Forces multi-page records | ||
| private static final int WRITER_THREADS = 40; | ||
| private static final int READER_THREADS = 80; | ||
| private static final int OPERATIONS_PER_THREAD = 2000; |
There was a problem hiding this comment.
Minor: This shared Random is accessed from multiple threads without synchronization. Random is not thread-safe — each thread should use its own instance (e.g. ThreadLocalRandom.current() or a per-thread new Random()).
|
Thanks @ExtReMLapin |
|
Alright 🖖🏻 |
…ifications (#3396) * fix: add automatic retry for multi-page record reads to prevent ConcurrentModificationException on queries Read-only queries (e.g. MATCH ()-[r]->() RETURN COUNT(r)) on multi-page records could fail with ConcurrentModificationException when concurrent writes modified the same record. This happened because loadMultiPageRecord detected version mismatches but threw immediately without retrying, and the query execution path (database.query()) has no retry loop unlike database.transaction(). The fix adds an automatic retry loop inside loadMultiPageRecord itself, using the existing TX_RETRIES configuration (default: 3). On version mismatch, it re-fetches the first page and re-reads all chunks. This is safe because it's a pure read operation. The exception is only thrown after exhausting all retries. https://claude.ai/code/session_01E7FXMbiyesGJMwZofCUMdV * more agressive tests * Update engine/src/main/java/com/arcadedb/engine/LocalBucket.java * Update engine/src/test/java/com/arcadedb/ConcurrentMultiPageRecordReadTest.java --------- Co-authored-by: Claude <[email protected]> Co-authored-by: CNE Pierre FICHEPOIL <[email protected]> Co-authored-by: Luca Garulli <[email protected]> (cherry picked from commit 61c7624)

What does this PR do?
Fixes #3393
This PR modifies the
loadMultiPageRecordmethod inLocalBucketto automatically retry reading multi-page records when concurrent modifications are detected, instead of immediately throwing aConcurrentModificationException. The retry logic respects theTX_RETRIESconfiguration setting, ensuring read-only queries transparently handle transient conflicts without failing.Motivation
Previously, read-only queries (e.g.,
MATCH ()-[r]->() RETURN COUNT(r)) could fail withConcurrentModificationExceptionwhen concurrent writes modified multi-page records during the read operation. This was problematic because:The fix moves the retry logic from the caller into the engine itself, making read-only queries resilient to concurrent modifications.
Technical Details
Changes to
LocalBucket.loadMultiPageRecord:TX_RETRIESattempts)ConcurrentModificationExceptionafter exhausting all retriesNew test:
ConcurrentMultiPageRecordReadTestConcurrentModificationExceptionRelated issues
This addresses a regression where concurrent modifications to multi-page records would cause read-only queries to fail unexpectedly.
Additional Notes
Level.FINEhelps diagnose retry behavior in productionChecklist
mvn clean packagecommandhttps://claude.ai/code/session_01E7FXMbiyesGJMwZofCUMdV