feat: Optional taxonomy directory, Umbraco compat, and critical bug fixes#425
feat: Optional taxonomy directory, Umbraco compat, and critical bug fixes#425Shazwazza wants to merge 14 commits intorelease/4.0from
Conversation
- Add UseTaxonomyIndex property to LuceneIndexOptions (default: true for backwards compatibility) - Update IDirectoryFactory.CreateTaxonomyDirectory to return nullable Directory - Update DirectoryFactoryBase and FileSystemDirectoryFactory to check UseTaxonomyIndex option - Update LuceneIndex to handle null TaxonomyWriter when taxonomy is disabled - Add IsTaxonomyEnabled property to LuceneIndex for runtime checks - Update IndexCommitter to handle null TaxonomyWriter - SyncedFileSystemDirectoryFactory still requires taxonomy (throws if disabled) - Update PublicAPI.Unshipped.txt with new API surface BREAKING CHANGE: IDirectoryFactory.CreateTaxonomyDirectory now returns Directory? instead of Directory
- Document new UseTaxonomyIndex and IsTaxonomyEnabled properties - Mark CreateTaxonomyDirectory methods as returning nullable Directory? - Note that SyncedFileSystemDirectoryFactory requires taxonomy enabled
…y searcher - Add LuceneNonTaxonomySearcher class to handle searches when taxonomy is disabled - Update LuceneIndex.CreateSearcher() to return appropriate searcher based on UseTaxonomyIndex option - Add _nrtReopenThreadNoTaxonomy for NRT support without taxonomy - Update WaitForChanges() to use correct NRT thread - Update Dispose() to clean up non-taxonomy NRT thread - Add 4 new tests for SyncedFileSystemDirectoryFactory without taxonomy: - Given_NoTaxonomyDirectory_When_CreatingDirectory_Then_IndexCreatedSuccessfully - Given_NoTaxonomyDirectory_When_IndexingData_Then_SearchSucceeds - Given_CorruptMainIndex_And_HealthyLocalIndex_NoTaxonomy_When_CreatingDirectory_Then_LocalIndexSyncedToMain - Given_CorruptMainIndex_And_CorruptLocalIndex_NoTaxonomy_When_CreatingDirectory_Then_NewIndexesCreatedAndUsable
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
When taxonomy is disabled, the non-taxonomy overload of FacetsConfig.Build(doc) must still be called to process SortedSetDocValuesFacetField entries into proper SortedSetDocValuesField entries. Without this, documents containing facet fields throw ArgumentException during indexing, causing silent failures where no items are indexed and IndexCommitted events never fire. This restores the behavior from v4.0.0-beta.1 where FacetsConfig.Build(doc) was always called regardless of taxonomy configuration. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When a SearchContext is created during application startup before any documents have been indexed, SearchableFields reads from the empty index reader and caches an empty array. After documents are indexed and the NRT reader is refreshed, IsSearcherCurrent() returns true (the refreshed reader IS current), so the SearchContext is reused with its stale empty _searchableFields cache. This causes ManagedQuery/Search(string) to generate queries with no fields, returning zero results even though documents exist in the index. Fix: only cache SearchableFields when the result is non-empty. An empty index has nothing to search anyway, and re-reading on each call has negligible cost. Once documents are indexed and fields exist, the non-empty result is cached normally. Applied to both SearchContext (non-taxonomy) and TaxonomySearchContext. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move WaitForChanges() into CommitNow() before the Committed event fires, and remove the redundant call from TimerRelease(). Previously, in the async commit path (timer-based), Committed fired before WaitForChanges() completed, creating a race condition where consumers reacting to the Committed/IndexCommitted event could search with a stale NRT reader that hadn't yet been refreshed to include the just-committed changes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
CommitNow() now calls WaitForChanges() internally, so the explicit calls after CommitNow() in the !RunAsync paths of PerformIndexItemsInternal and PerformDeleteFromIndexInternal are redundant no-ops. Remove them and update comments for clarity. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
@greptile review |
Greptile SummaryThis PR makes the taxonomy index opt-in in Examine v4, adds Umbraco API compatibility, and fixes three critical bugs related to Key changes:
Notable concerns:
Confidence Score: 4/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[LuceneIndex.CreateSearcher] --> B{IsTaxonomyEnabled?}
B -->|Yes - TaxonomyWriter != null| C[SearcherTaxonomyManager]
B -->|No - TaxonomyWriter == null| D[SearcherManager]
C --> E{NrtEnabled?}
E -->|Yes| F[ControlledRealTimeReopenThread\nSearcherAndTaxonomy]
E -->|No| G[MaybeRefreshBlocking]
F --> H[WaitForChanges]
G --> H
H --> I[return LuceneSearcher]
D --> J{NrtEnabled?}
J -->|Yes| K[ControlledRealTimeReopenThread\nIndexSearcher]
J -->|No| L[MaybeRefreshBlocking]
K --> M[WaitForChanges]
L --> M
M --> N[return LuceneNonTaxonomySearcher]
I --> O[ILuceneTaxonomySearcher\nWithFacets supported]
N --> P[BaseLuceneSearcher\nWithFacets throws NotSupportedException]
subgraph IndexCommitter
Q[CommitNow] --> R[TaxonomyWriter?.Commit]
R --> S[IndexWriter.Commit]
S --> T[WaitForChanges]
T --> U[Committed event fires]
end
subgraph DirectoryResolution
V[LuceneIndex ctor] --> W{DirectoryFactory is\nITaxonomyDirectoryFactory?}
W -->|Yes| X[CreateTaxonomyDirectory\nreturns Directory?]
W -->|No| Y[return null - taxonomy disabled]
X -->|null returned| Y
X -->|non-null| Z[Taxonomy enabled]
end
|
| WaitForChanges(); | ||
| } | ||
| else | ||
| { | ||
| // wait for most recent changes when first creating the searcher | ||
| searcherManager.MaybeRefreshBlocking(); | ||
| } | ||
| // wait for most recent changes when first creating the searcher | ||
| WaitForChanges(); | ||
|
|
||
| return new LuceneSearcher(name + "Searcher", searcherManager, FieldValueTypeCollection, new SearcherOptions(FieldAnalyzer, _options.FacetsConfig), _options.NrtEnabled); |
There was a problem hiding this comment.
Redundant WaitForChanges() call when NRT is enabled
In both the taxonomy-enabled and non-taxonomy branches of CreateSearcher(), WaitForChanges() is called twice when NRT is enabled: once inside the if (_options.NrtEnabled) block (lines 1254 / 1293) and again unconditionally after the if/else (lines 1262 / 1301).
When NRT is disabled, WaitForChanges() is a no-op (it's guarded by if (_options.NrtEnabled)), so the outer call is harmless. But when NRT is enabled and _latestGen already has a value (i.e. indexing started before the searcher was first accessed), the second call could block for another 5-second timeout window unnecessarily.
The simplest fix is to remove the duplicate outer WaitForChanges() call in both branches, since the inner call (inside the NRT block) already handles it:
// taxonomy branch
if (_options.NrtEnabled)
{
_nrtReopenThread = new ControlledRealTimeReopenThread<...>(...) { ... };
_nrtReopenThread.Start();
WaitForChanges();
}
else
{
searcherManager.MaybeRefreshBlocking();
}
// Remove the duplicate WaitForChanges() here
return new LuceneSearcher(...);The same applies to the non-taxonomy branch around line 1301.
| /// Gets or sets whether a sidecar taxonomy index should be used for faceting. | ||
| /// When set to false, faceting will use SortedSetDocValues instead of a separate taxonomy index. | ||
| /// Default is true for backwards compatibility. | ||
| /// </summary> | ||
| public bool UseTaxonomyIndex { get; set; } = true; |
There was a problem hiding this comment.
Misleading doc comment on UseTaxonomyIndex
The summary says "When set to false, faceting will use SortedSetDocValues instead of a separate taxonomy index", but the actual implementation does not automatically switch to SortedSetDocValues-based faceting. When this flag is false, the taxonomy directory is simply not created and TaxonomyWriter returns null. Faceted taxonomy search (ILuceneTaxonomySearcher) becomes unavailable, and WithFacets() will throw NotSupportedException.
SortedSetDocValuesFacetField entries are still processed via FacetsConfig.Build(doc) (the fix for bug #1), so SortedSetDocValues-based facets can work at the field level, but the searcher itself does not automatically expose them via the taxonomy-based API.
The comment should more accurately describe the actual behavior to avoid consumer confusion:
| /// Gets or sets whether a sidecar taxonomy index should be used for faceting. | |
| /// When set to false, faceting will use SortedSetDocValues instead of a separate taxonomy index. | |
| /// Default is true for backwards compatibility. | |
| /// </summary> | |
| public bool UseTaxonomyIndex { get; set; } = true; | |
| /// Gets or sets whether a sidecar taxonomy index should be used for faceting. | |
| /// When set to false, the taxonomy directory is not created and taxonomy-based faceted search | |
| /// (via <see cref="Providers.ILuceneTaxonomySearcher"/>) will not be available. | |
| /// SortedSetDocValues-based facets can still be used at the field level. | |
| /// Default is true for backwards compatibility. | |
| /// </summary> | |
| public bool UseTaxonomyIndex { get; set; } = true; |
| public abstract IOrdering SelectAllFields(); | ||
|
|
||
| /// <inheritdoc/> | ||
| public abstract IQueryExecutor WithFacets(Action<IFacetOperations> facets); | ||
| /// <remarks> | ||
| /// The default implementation throws <see cref="NotSupportedException"/>. | ||
| /// Providers that support faceted search should override this method. | ||
| /// </remarks> | ||
| public virtual IQueryExecutor WithFacets(Action<IFacetOperations> facets) | ||
| => throw new NotSupportedException("Faceted search is not supported by this provider."); |
There was a problem hiding this comment.
Silently breaking existing subclasses by changing abstract to virtual
WithFacets was previously abstract, forcing all subclasses to provide an implementation at compile time. Changing it to virtual with a throw new NotSupportedException(...) body means any existing subclass that previously relied on the compiler to enforce implementation will now silently compile — but throw at runtime when WithFacets is called.
For consumers using a custom LuceneBooleanOperationBase subclass in a taxonomy-enabled setup (the most common case), this is a regression: they had a working implementation before, and after this change they'd inherit the throwing default if they happen to recompile. The [Obsolete] attribute or a clearer breaking-change note in PublicAPI-Changes.md would help, but the real concern is that there's no compile-time signal for affected subclasses.
Consider whether keeping it abstract and instead providing a separate non-abstract subtype for non-taxonomy queries would be safer — that's the approach the LuceneNonTaxonomySearcher / LuceneSearcher split already takes at the searcher level.
Summary
This PR makes the taxonomy directory optional in Examine v4, ensures backward compatibility with Umbraco CMS and Umbraco.Cms.Search, and fixes three critical bugs discovered during compatibility testing.
Features
Optional Taxonomy Directory
Makes the taxonomy index an opt-in/opt-out feature, enabling lighter-weight index configurations when faceted taxonomy search is not needed.
UseTaxonomyIndexproperty toLuceneIndexOptions(default:truefor backward compat)IsTaxonomyEnabledproperty toLuceneIndexfor runtime checksITaxonomyDirectoryFactoryinterface separated fromIDirectoryFactoryfor cleaner abstractionIDirectoryFactory.CreateTaxonomyDirectorynow returnsDirectory?instead ofDirectoryDirectoryFactoryBaseas[Obsolete]— it exists only for compatibility and adds no valueFileSystemDirectoryFactoryimplementsITaxonomyDirectoryFactorywith type-check at usage sitesSyncedFileSystemDirectoryFactoryupdated to handle optional taxonomy dirLuceneNonTaxonomySearcherfor efficient searching when taxonomy is disabledUmbraco API Compatibility
Bug Fixes
1. FacetsConfig.Build not called for non-taxonomy indexes
Commit:
384550eeWhen taxonomy was disabled,
FacetsConfig.Build(doc)was not being called. This is required even for non-taxonomy indexes to processSortedSetDocValuesFacetFieldentries into properSortedSetDocValuesFieldentries. Without it, documents with facet fields threwArgumentExceptionduring indexing, causing silent failures where no items were indexed andIndexCommittedevents never fired.2. SearchableFields caching empty results from initially empty indexes (fixes #426)
Commit:
7e672a87When a
SearchContextwas created during application startup before any documents were indexed,SearchableFieldsread from the empty index reader and cached an empty array. After documents were indexed and the NRT reader was refreshed,IsSearcherCurrent()returnedtrue(the refreshed reader IS current), so theSearchContextwas reused with its stale empty_searchableFieldscache. This causedManagedQuery/Search(string)to generate queries with no fields, returning zero results even though documents existed in the index.Fix: Only cache
SearchableFieldswhen the result is non-empty. Applied to bothSearchContextandTaxonomySearchContext.3. NRT reader not refreshed before Committed event fires (fixes #427)
Commits:
d606a41d,6f040af3In the async commit path (timer-based), the
Committedevent fired beforeWaitForChanges()completed, creating a race condition where consumers reacting to theCommitted/IndexCommittedevent could search with a stale NRT reader. Consumers would get zero or incomplete results even though the commit had completed.Fix: Move
WaitForChanges()intoCommitNow()before theCommittedevent fires. Also removed the now-redundantWaitForChanges()calls in the synchronous!RunAsyncpaths.Breaking Changes
IDirectoryFactory.CreateTaxonomyDirectoryreturnsDirectory?IDirectoryFactoryimplementationsnullto disable taxonomy, or keep returning a directoryTests
SyncedFileSystemDirectoryFactorywithout taxonomyFiles Changed (22 files, +1741/-372)
Core Changes
src/Examine.Lucene/LuceneIndexOptions.cs—UseTaxonomyIndexoptionsrc/Examine.Lucene/Providers/LuceneIndex.cs— Null taxonomy handling, non-taxonomy NRT, FacetsConfig.Build fix, redundant WaitForChanges cleanupsrc/Examine.Lucene/Providers/IndexCommitter.cs— Race condition fix, null taxonomy writersrc/Examine.Lucene/Providers/LuceneNonTaxonomySearcher.cs— New non-taxonomy searchersrc/Examine.Lucene/Search/SearchContext.cs— Empty cache fixsrc/Examine.Lucene/Search/TaxonomySearchContext.cs— Empty cache fixDirectory Infrastructure
src/Examine.Lucene/Directories/IDirectoryFactory.cs— Nullable return typesrc/Examine.Lucene/Directories/ITaxonomyDirectoryFactory.cs— New interfacesrc/Examine.Lucene/Directories/DirectoryFactory.cs— Updated implsrc/Examine.Lucene/Directories/DirectoryFactoryBase.cs— Marked obsoletesrc/Examine.Lucene/Directories/FileSystemDirectoryFactory.cs— ITaxonomyDirectoryFactorysrc/Examine.Lucene/Directories/SyncedFileSystemDirectoryFactory.cs— Optional taxonomy supportPublic API
src/Examine.Lucene/PublicAPI.Unshipped.txt— New API surface entriesRelated Issues
Backport
Bugs #426 and #427 have been backported to
support/3.xin PR #428.Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com