Skip to content

feat: Optional taxonomy directory, Umbraco compat, and critical bug fixes#425

Open
Shazwazza wants to merge 14 commits intorelease/4.0from
feature/optional-taxonomy-directory
Open

feat: Optional taxonomy directory, Umbraco compat, and critical bug fixes#425
Shazwazza wants to merge 14 commits intorelease/4.0from
feature/optional-taxonomy-directory

Conversation

@Shazwazza
Copy link
Owner

@Shazwazza Shazwazza commented Feb 25, 2026

Summary

This PR makes the taxonomy directory optional in Examine v4, ensures backward compatibility with Umbraco CMS and Umbraco.Cms.Search, and fixes three critical bugs discovered during compatibility testing.

Features

Optional Taxonomy Directory

Makes the taxonomy index an opt-in/opt-out feature, enabling lighter-weight index configurations when faceted taxonomy search is not needed.

  • Add UseTaxonomyIndex property to LuceneIndexOptions (default: true for backward compat)
  • Add IsTaxonomyEnabled property to LuceneIndex for runtime checks
  • Introduce ITaxonomyDirectoryFactory interface separated from IDirectoryFactory for cleaner abstraction
  • IDirectoryFactory.CreateTaxonomyDirectory now returns Directory? instead of Directory
  • Mark DirectoryFactoryBase as [Obsolete] — it exists only for compatibility and adds no value
  • FileSystemDirectoryFactory implements ITaxonomyDirectoryFactory with type-check at usage sites
  • SyncedFileSystemDirectoryFactory updated to handle optional taxonomy dir
  • Add LuceneNonTaxonomySearcher for efficient searching when taxonomy is disabled
  • NRT (near-real-time) reopen thread management updated for non-taxonomy path

Umbraco API Compatibility

  • Ensures compatibility with latest Umbraco CMS API surface area for Examine v4 consumers

Bug Fixes

1. FacetsConfig.Build not called for non-taxonomy indexes

Commit: 384550ee

When taxonomy was disabled, FacetsConfig.Build(doc) was not being called. This is required even for non-taxonomy indexes to process SortedSetDocValuesFacetField entries into proper SortedSetDocValuesField entries. Without it, documents with facet fields threw ArgumentException during indexing, causing silent failures where no items were indexed and IndexCommitted events never fired.

2. SearchableFields caching empty results from initially empty indexes (fixes #426)

Commit: 7e672a87

When a SearchContext was created during application startup before any documents were indexed, SearchableFields read from the empty index reader and cached an empty array. After documents were indexed and the NRT reader was refreshed, IsSearcherCurrent() returned true (the refreshed reader IS current), so the SearchContext was reused with its stale empty _searchableFields cache. This caused ManagedQuery/Search(string) to generate queries with no fields, returning zero results even though documents existed in the index.

Fix: Only cache SearchableFields when the result is non-empty. Applied to both SearchContext and TaxonomySearchContext.

3. NRT reader not refreshed before Committed event fires (fixes #427)

Commits: d606a41d, 6f040af3

In the async commit path (timer-based), the Committed event fired before WaitForChanges() completed, creating a race condition where consumers reacting to the Committed/IndexCommitted event could search with a stale NRT reader. Consumers would get zero or incomplete results even though the commit had completed.

Fix: Move WaitForChanges() into CommitNow() before the Committed event fires. Also removed the now-redundant WaitForChanges() calls in the synchronous !RunAsync paths.

Breaking Changes

Change Impact Migration
IDirectoryFactory.CreateTaxonomyDirectory returns Directory? Low — only affects custom IDirectoryFactory implementations Return null to disable taxonomy, or keep returning a directory

Tests

  • Examine v4: 888 tests pass (296 x 3 TFMs: net8.0, net9.0, net10.0), 0 failed
  • Umbraco.Cms.Search: 628 passed, 0 failed, 7 skipped
  • 4 new tests for SyncedFileSystemDirectoryFactory without taxonomy
  • Existing tests for optional taxonomy searcher behavior

Files Changed (22 files, +1741/-372)

Core Changes

  • src/Examine.Lucene/LuceneIndexOptions.csUseTaxonomyIndex option
  • src/Examine.Lucene/Providers/LuceneIndex.cs — Null taxonomy handling, non-taxonomy NRT, FacetsConfig.Build fix, redundant WaitForChanges cleanup
  • src/Examine.Lucene/Providers/IndexCommitter.cs — Race condition fix, null taxonomy writer
  • src/Examine.Lucene/Providers/LuceneNonTaxonomySearcher.cs — New non-taxonomy searcher
  • src/Examine.Lucene/Search/SearchContext.cs — Empty cache fix
  • src/Examine.Lucene/Search/TaxonomySearchContext.cs — Empty cache fix

Directory Infrastructure

  • src/Examine.Lucene/Directories/IDirectoryFactory.cs — Nullable return type
  • src/Examine.Lucene/Directories/ITaxonomyDirectoryFactory.cs — New interface
  • src/Examine.Lucene/Directories/DirectoryFactory.cs — Updated impl
  • src/Examine.Lucene/Directories/DirectoryFactoryBase.cs — Marked obsolete
  • src/Examine.Lucene/Directories/FileSystemDirectoryFactory.cs — ITaxonomyDirectoryFactory
  • src/Examine.Lucene/Directories/SyncedFileSystemDirectoryFactory.cs — Optional taxonomy support

Public API

  • src/Examine.Lucene/PublicAPI.Unshipped.txt — New API surface entries

Related Issues

Backport

Bugs #426 and #427 have been backported to support/3.x in PR #428.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

- Add UseTaxonomyIndex property to LuceneIndexOptions (default: true for backwards compatibility)
- Update IDirectoryFactory.CreateTaxonomyDirectory to return nullable Directory
- Update DirectoryFactoryBase and FileSystemDirectoryFactory to check UseTaxonomyIndex option
- Update LuceneIndex to handle null TaxonomyWriter when taxonomy is disabled
- Add IsTaxonomyEnabled property to LuceneIndex for runtime checks
- Update IndexCommitter to handle null TaxonomyWriter
- SyncedFileSystemDirectoryFactory still requires taxonomy (throws if disabled)
- Update PublicAPI.Unshipped.txt with new API surface

BREAKING CHANGE: IDirectoryFactory.CreateTaxonomyDirectory now returns Directory? instead of Directory
- Document new UseTaxonomyIndex and IsTaxonomyEnabled properties
- Mark CreateTaxonomyDirectory methods as returning nullable Directory?
- Note that SyncedFileSystemDirectoryFactory requires taxonomy enabled
…y searcher

- Add LuceneNonTaxonomySearcher class to handle searches when taxonomy is disabled
- Update LuceneIndex.CreateSearcher() to return appropriate searcher based on UseTaxonomyIndex option
- Add _nrtReopenThreadNoTaxonomy for NRT support without taxonomy
- Update WaitForChanges() to use correct NRT thread
- Update Dispose() to clean up non-taxonomy NRT thread
- Add 4 new tests for SyncedFileSystemDirectoryFactory without taxonomy:
  - Given_NoTaxonomyDirectory_When_CreatingDirectory_Then_IndexCreatedSuccessfully
  - Given_NoTaxonomyDirectory_When_IndexingData_Then_SearchSucceeds
  - Given_CorruptMainIndex_And_HealthyLocalIndex_NoTaxonomy_When_CreatingDirectory_Then_LocalIndexSyncedToMain
  - Given_CorruptMainIndex_And_CorruptLocalIndex_NoTaxonomy_When_CreatingDirectory_Then_NewIndexesCreatedAndUsable
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

Shazwazza and others added 6 commits February 25, 2026 09:05
When taxonomy is disabled, the non-taxonomy overload of FacetsConfig.Build(doc)
must still be called to process SortedSetDocValuesFacetField entries into proper
SortedSetDocValuesField entries. Without this, documents containing facet fields
throw ArgumentException during indexing, causing silent failures where no items
are indexed and IndexCommitted events never fire.

This restores the behavior from v4.0.0-beta.1 where FacetsConfig.Build(doc) was
always called regardless of taxonomy configuration.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When a SearchContext is created during application startup before any
documents have been indexed, SearchableFields reads from the empty index
reader and caches an empty array. After documents are indexed and the
NRT reader is refreshed, IsSearcherCurrent() returns true (the refreshed
reader IS current), so the SearchContext is reused with its stale empty
_searchableFields cache. This causes ManagedQuery/Search(string) to
generate queries with no fields, returning zero results even though
documents exist in the index.

Fix: only cache SearchableFields when the result is non-empty. An empty
index has nothing to search anyway, and re-reading on each call has
negligible cost. Once documents are indexed and fields exist, the
non-empty result is cached normally.

Applied to both SearchContext (non-taxonomy) and TaxonomySearchContext.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move WaitForChanges() into CommitNow() before the Committed event fires,
and remove the redundant call from TimerRelease(). Previously, in the
async commit path (timer-based), Committed fired before WaitForChanges()
completed, creating a race condition where consumers reacting to the
Committed/IndexCommitted event could search with a stale NRT reader that
hadn't yet been refreshed to include the just-committed changes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
CommitNow() now calls WaitForChanges() internally, so the explicit
calls after CommitNow() in the !RunAsync paths of
PerformIndexItemsInternal and PerformDeleteFromIndexInternal are
redundant no-ops. Remove them and update comments for clarity.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@Shazwazza Shazwazza changed the title Feature/optional taxonomy directory feat: Optional taxonomy directory, Umbraco compat, and critical bug fixes Feb 26, 2026
@Shazwazza
Copy link
Owner Author

@greptile review

@greptile-apps
Copy link

greptile-apps bot commented Mar 10, 2026

Greptile Summary

This PR makes the taxonomy index opt-in in Examine v4, adds Umbraco API compatibility, and fixes three critical bugs related to FacetsConfig.Build, stale SearchableFields caching, and a race condition between CommitNow() and the Committed event.

Key changes:

  • Optional taxonomy: UseTaxonomyIndex option on LuceneIndexOptions; IDirectoryFactory.CreateTaxonomyDirectory removed and moved to new ITaxonomyDirectoryFactory; LuceneNonTaxonomySearcher introduced for the non-taxonomy path; separate ControlledRealTimeReopenThread<IndexSearcher> thread managed for NRT without taxonomy
  • Bug fix Examine search issue on Umbraco #1 (FacetsConfig): FacetsConfig.Build(doc) (non-taxonomy overload) now called in UpdateLuceneDocument when TaxonomyWriter is null, preventing silent indexing failures for SortedSetDocValuesFacetField documents
  • Bug fix .gitignore file and a new overload for Search in BaseLuceneSearcher #2 (SearchableFields caching): _searchableFields is no longer cached when the result is an empty array, so an initially-empty index correctly re-reads fields after documents are indexed — applied identically to both SearchContext and TaxonomySearchContext
  • Bug fix Azure providers are missing in this repository? #3 (CommitNow race): WaitForChanges() moved inside CommitNow() before Committed fires, eliminating the race where consumers reacting to IndexCommitted searched with a stale NRT reader

Notable concerns:

  • LuceneBooleanOperationBase.WithFacets was changed from abstract to virtual with a NotSupportedException default — existing subclasses that previously implemented it are unaffected, but any subclass that did not implement it (previously a compile error) will now silently compile and throw at runtime
  • The UseTaxonomyIndex doc comment states faceting falls back to SortedSetDocValues, which is inaccurate — taxonomy-based facet search is simply disabled
  • CreateSearcher() calls WaitForChanges() twice when NRT is enabled in both the taxonomy and non-taxonomy branches (redundant but benign at startup)

Confidence Score: 4/5

  • This PR is safe to merge with minor cleanup items; the three critical bug fixes are correct and the optional taxonomy feature is well-structured.
  • The three critical bug fixes are sound and backed by 888 passing tests across three TFMs. The optional taxonomy feature is cleanly implemented with proper null-safety throughout. The deductions are: (1) a redundant double WaitForChanges() in CreateSearcher() when NRT is enabled — benign at startup but technically wrong; (2) a misleading doc comment on UseTaxonomyIndex; (3) the abstractvirtual change on WithFacets is a silent breaking change for third-party subclasses that did not previously implement it.
  • src/Examine.Lucene/Search/LuceneBooleanOperationBase.cs (abstract→virtual breaking change) and src/Examine.Lucene/Providers/LuceneIndex.cs (double WaitForChanges in CreateSearcher)

Important Files Changed

Filename Overview
src/Examine.Lucene/Providers/LuceneIndex.cs Core index provider heavily refactored to support optional taxonomy; introduces IsTaxonomyEnabled, separate NRT thread for non-taxonomy path, and FacetsConfig.Build fix. Contains a redundant double WaitForChanges() call in CreateSearcher() when NRT is enabled, but otherwise the logic is sound.
src/Examine.Lucene/Providers/IndexCommitter.cs Race condition fix: WaitForChanges() moved before Committed event fires, and redundant post-commit WaitForChanges() removed from the async timer path. TaxonomyWriter?.Commit() correctly made null-safe. Changes are clean.
src/Examine.Lucene/Providers/LuceneNonTaxonomySearcher.cs New internal searcher for non-taxonomy indexes backed by SearcherManager instead of SearcherTaxonomyManager. IsSearcherCurrent check and volatile _searchContext field follow the same pattern as LuceneSearcher; the check-then-write is not strictly thread-safe but is a benign race yielding equivalent objects.
src/Examine.Lucene/Search/SearchContext.cs Bug fix: _searchableFields is no longer cached when the result is empty, preventing stale empty-array caching after NRT reader refresh. Minor concern: permanently empty indexes will re-read fields on every SearchableFields call.
src/Examine.Lucene/Directories/SyncedFileSystemDirectoryFactory.cs Significant refactor to handle optional taxonomy in the sync flow. New SyncIndexWithoutTaxonomy method and nullable taxonomy parameters throughout TryGetIndexWriters. Logic is consistent and backed by new tests.
src/Examine.Lucene/ExamineReplicator.cs Replicator correctly selects IndexRevision vs IndexAndTaxonomyRevision and IndexReplicationHandler vs IndexAndTaxonomyReplicationHandler based on whether taxonomy is enabled. Changes are clean.
src/Examine.Lucene/Search/LuceneBooleanOperationBase.cs WithFacets changed from abstract to virtual with a NotSupportedException default. This is a silent breaking change for any existing subclass: it will compile but throw at runtime instead of failing at compile time.
src/Examine.Lucene/LuceneIndexOptions.cs New UseTaxonomyIndex option added with sensible default. Doc comment incorrectly states SortedSetDocValues faceting is used as a fallback — the actual behaviour is that taxonomy-based facet search is simply disabled.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[LuceneIndex.CreateSearcher] --> B{IsTaxonomyEnabled?}
    B -->|Yes - TaxonomyWriter != null| C[SearcherTaxonomyManager]
    B -->|No - TaxonomyWriter == null| D[SearcherManager]

    C --> E{NrtEnabled?}
    E -->|Yes| F[ControlledRealTimeReopenThread\nSearcherAndTaxonomy]
    E -->|No| G[MaybeRefreshBlocking]
    F --> H[WaitForChanges]
    G --> H
    H --> I[return LuceneSearcher]

    D --> J{NrtEnabled?}
    J -->|Yes| K[ControlledRealTimeReopenThread\nIndexSearcher]
    J -->|No| L[MaybeRefreshBlocking]
    K --> M[WaitForChanges]
    L --> M
    M --> N[return LuceneNonTaxonomySearcher]

    I --> O[ILuceneTaxonomySearcher\nWithFacets supported]
    N --> P[BaseLuceneSearcher\nWithFacets throws NotSupportedException]

    subgraph IndexCommitter
        Q[CommitNow] --> R[TaxonomyWriter?.Commit]
        R --> S[IndexWriter.Commit]
        S --> T[WaitForChanges]
        T --> U[Committed event fires]
    end

    subgraph DirectoryResolution
        V[LuceneIndex ctor] --> W{DirectoryFactory is\nITaxonomyDirectoryFactory?}
        W -->|Yes| X[CreateTaxonomyDirectory\nreturns Directory?]
        W -->|No| Y[return null - taxonomy disabled]
        X -->|null returned| Y
        X -->|non-null| Z[Taxonomy enabled]
    end
Loading

Comments Outside Diff (1)

  1. src/Examine.Lucene/Search/SearchContext.cs, line 67-81 (link)

    SearchableFields repeatedly re-reads from the index for permanently empty indexes

    The fix correctly avoids caching an empty result, so an initially-empty index will re-read once documents are indexed. However, since the guard is if (_searchableFields == null), a permanently-empty index (no documents ever indexed, e.g. a freshly created and unused index) will acquire the searcher, iterate all field infos, filter them, and return an empty array on every call to SearchableFields.

    In practice this only matters if SearchableFields is called frequently on empty indexes (e.g. ManagedQuery or Search(string) calls). A simple guard like a flag or a version counter would avoid the repeated reads, but given the fix is targeted at a specific startup race condition, this is a low-priority concern — just worth being aware of in high-throughput search paths on empty indexes.

Last reviewed commit: e244ac9

Comment on lines +1254 to +1264
WaitForChanges();
}
else
{
// wait for most recent changes when first creating the searcher
searcherManager.MaybeRefreshBlocking();
}
// wait for most recent changes when first creating the searcher
WaitForChanges();

return new LuceneSearcher(name + "Searcher", searcherManager, FieldValueTypeCollection, new SearcherOptions(FieldAnalyzer, _options.FacetsConfig), _options.NrtEnabled);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant WaitForChanges() call when NRT is enabled

In both the taxonomy-enabled and non-taxonomy branches of CreateSearcher(), WaitForChanges() is called twice when NRT is enabled: once inside the if (_options.NrtEnabled) block (lines 1254 / 1293) and again unconditionally after the if/else (lines 1262 / 1301).

When NRT is disabled, WaitForChanges() is a no-op (it's guarded by if (_options.NrtEnabled)), so the outer call is harmless. But when NRT is enabled and _latestGen already has a value (i.e. indexing started before the searcher was first accessed), the second call could block for another 5-second timeout window unnecessarily.

The simplest fix is to remove the duplicate outer WaitForChanges() call in both branches, since the inner call (inside the NRT block) already handles it:

// taxonomy branch
if (_options.NrtEnabled)
{
    _nrtReopenThread = new ControlledRealTimeReopenThread<...>(...) { ... };
    _nrtReopenThread.Start();
    WaitForChanges();
}
else
{
    searcherManager.MaybeRefreshBlocking();
}
// Remove the duplicate WaitForChanges() here

return new LuceneSearcher(...);

The same applies to the non-taxonomy branch around line 1301.

Comment on lines +44 to +48
/// Gets or sets whether a sidecar taxonomy index should be used for faceting.
/// When set to false, faceting will use SortedSetDocValues instead of a separate taxonomy index.
/// Default is true for backwards compatibility.
/// </summary>
public bool UseTaxonomyIndex { get; set; } = true;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misleading doc comment on UseTaxonomyIndex

The summary says "When set to false, faceting will use SortedSetDocValues instead of a separate taxonomy index", but the actual implementation does not automatically switch to SortedSetDocValues-based faceting. When this flag is false, the taxonomy directory is simply not created and TaxonomyWriter returns null. Faceted taxonomy search (ILuceneTaxonomySearcher) becomes unavailable, and WithFacets() will throw NotSupportedException.

SortedSetDocValuesFacetField entries are still processed via FacetsConfig.Build(doc) (the fix for bug #1), so SortedSetDocValues-based facets can work at the field level, but the searcher itself does not automatically expose them via the taxonomy-based API.

The comment should more accurately describe the actual behavior to avoid consumer confusion:

Suggested change
/// Gets or sets whether a sidecar taxonomy index should be used for faceting.
/// When set to false, faceting will use SortedSetDocValues instead of a separate taxonomy index.
/// Default is true for backwards compatibility.
/// </summary>
public bool UseTaxonomyIndex { get; set; } = true;
/// Gets or sets whether a sidecar taxonomy index should be used for faceting.
/// When set to false, the taxonomy directory is not created and taxonomy-based faceted search
/// (via <see cref="Providers.ILuceneTaxonomySearcher"/>) will not be available.
/// SortedSetDocValues-based facets can still be used at the field level.
/// Default is true for backwards compatibility.
/// </summary>
public bool UseTaxonomyIndex { get; set; } = true;

Comment on lines 137 to +145
public abstract IOrdering SelectAllFields();

/// <inheritdoc/>
public abstract IQueryExecutor WithFacets(Action<IFacetOperations> facets);
/// <remarks>
/// The default implementation throws <see cref="NotSupportedException"/>.
/// Providers that support faceted search should override this method.
/// </remarks>
public virtual IQueryExecutor WithFacets(Action<IFacetOperations> facets)
=> throw new NotSupportedException("Faceted search is not supported by this provider.");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silently breaking existing subclasses by changing abstract to virtual

WithFacets was previously abstract, forcing all subclasses to provide an implementation at compile time. Changing it to virtual with a throw new NotSupportedException(...) body means any existing subclass that previously relied on the compiler to enforce implementation will now silently compile — but throw at runtime when WithFacets is called.

For consumers using a custom LuceneBooleanOperationBase subclass in a taxonomy-enabled setup (the most common case), this is a regression: they had a working implementation before, and after this change they'd inherit the throwing default if they happen to recompile. The [Obsolete] attribute or a clearer breaking-change note in PublicAPI-Changes.md would help, but the real concern is that there's no compile-time signal for affected subclasses.

Consider whether keeping it abstract and instead providing a separate non-abstract subtype for non-taxonomy queries would be safer — that's the approach the LuceneNonTaxonomySearcher / LuceneSearcher split already takes at the searcher level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant