Skip to content

Updated to handle partial page returns and query limiter feature#3637

Open
ivakegg wants to merge 8 commits into
integrationfrom
task/runningQuery2
Open

Updated to handle partial page returns and query limiter feature#3637
ivakegg wants to merge 8 commits into
integrationfrom
task/runningQuery2

Conversation

@ivakegg

@ivakegg ivakegg commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

flagging

  • Updated the running query to use the async results thread by default
  • Added a flag in the QueryLogic used to force synchronous running query
  • Added the capability for the query expiration bean to detect when a partial page should be returned and attempt to force it (includes detecting where the RunningQuery may be stuck if at all)
  • Added a flag in the Query Logic used to bypass the QueryLimiter mechanism (potentially for short lived queries such as UUID lookups)

Comment thread core/query/src/main/java/datawave/core/query/logic/QueryLogic.java Outdated

@apmoriarty apmoriarty left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Summary

This PR makes four related changes:

  1. Introduces isShortRunningQuery() on QueryLogic to optionally skip the async results thread.
  2. Introduces bypassQueryLimiter() on QueryLogic to skip ZooKeeper overhead for certain fast queries.
  3. Flips the default for RunningQuery.useResultsThread from falsetrue (all queries are now async by default).
  4. Adds a partial-page-return mechanism: QueryExpirationBean can now detect a stuck-but-has-results query and call RunningQuery.attemptForcedPageReturn().

Bugs

1. Debug t.printStackTrace() left in production code — QueryExecutorBean.java ~L468

} catch (Throwable t) {
    try {
        if (null != qd.logic) {
            t.printStackTrace();   // ← should not be here
            qd.logic.close();
        }
    }

This prints a raw stack trace to stdout (not to the logger). It must be removed before merge.


2. NullPointerException risk in CachedResultsBean.getQueryById() finally block

The refactoring introduced a try/finally to ensure query.setActiveCall(false) always runs:

try {
    if (null == query) {
        List<Query> queries = persister.findById(id);
        if (null == queries || queries.isEmpty())
            throw new NotFoundQueryException(...);   // query still null here
        if (queries.size() > 1)
            throw new NotFoundQueryException(...);   // query still null here
        else {
            // ... query gets set ...
        }
    } else { ... }
} finally {
    query.setActiveCall(false);   // NPE if query was never assigned
}

If query is null on entry and persister.findById() throws, returns empty, or returns multiple results, the finally block throws NullPointerException on query.setActiveCall(false), masking the original exception. Fix: if (query != null) query.setActiveCall(false); in the finally.


3. currentThread field is not volatile

private Thread currentThread;   // written in next(), read in QueryExpirationBean thread

This field is written at the start of next() on the query thread and read by QueryExpirationBean on a separate scheduler thread. Without volatile, the write may not be visible to the reading thread, so getCurrentThread().getStackTrace() could return stale or null state. Needs private volatile Thread currentThread;.


Design / Regression Concerns

4. Default useResultsThread changed from falsetrue is a major behavioral change

- private boolean useResultsThread = false;
+ private boolean useResultsThread = true;

Previously, the async results thread was only used when isLongRunningQuery() or allowIntermediateEmptyPages was true. Now all queries use it by default unless isShortRunningQuery() returns true. Since no existing QueryLogic implementations override isShortRunningQuery() in this PR, all current queries silently switch to the async path.

The async thread introduces non-trivial overhead (thread handoff, queue synchronization, Object.wait/notify cycles). For short queries such as index lookups this could be a latency regression. The intent is documented, but the default-to-async change should be validated against performance benchmarks, or the default should remain false with an explicit opt-in.


5. isShortRunningQuery / bypassQueryLimiter in ShardQueryConfiguration appear unused

The PR adds fields, getters, setters, equals, hashCode, and copyFrom for these two flags to ShardQueryConfiguration. However, none of the callers in QueryExecutorBean, CachedResultsBean, or QueryExpirationBean read them from configuration — they call logic.isShortRunningQuery() and logic.bypassQueryLimiter() directly. If these config fields are intended to allow configuration-driven overrides, the wiring is missing; if not, they are dead code.


6. isShortRunningQuery() and bypassQueryLimiter() are non-default interface methods

// QueryLogic.java
boolean isShortRunningQuery();
boolean bypassQueryLimiter();

These are added as abstract interface methods (not default). Any QueryLogic implementation that does not extend BaseQueryLogic will fail to compile. Adding default boolean isShortRunningQuery() { return false; } and default boolean bypassQueryLimiter() { return false; } would make this non-breaking for downstream consumers.


Minor Issues

7. Typo in comment — RunningQuery.java

// force us to not 8use asynchronous results thread is a short running query

Should be: // do not use asynchronous results thread for a short running query


8. Missing test coverage for attemptForcedPageReturn() and partial-page-return detection

QueryExpirationBean.isNextTooLong() has substantially new logic: it detects when currentPageCount > 0, logs the current thread stack trace, and calls query.attemptForcedPageReturn(). This new path has no unit tests. Given the complexity of the forcedReturn / hasNext / gotNext synchronization, this is high-risk untested code. At minimum there should be a test that verifies attemptForcedPageReturn() breaks out of the results loop in next().


Minor Positives

  • The try/finally guards for setActiveCall(false) throughout QueryExecutorBean and CachedResultsBean are a sound correctness improvement — previously exceptions could leave activeCall in a bad state.
  • The isIdleTooLong / isNextTooLong refactoring (moving the hasActiveCall() guard inside each method) is cleaner.
  • forcedReturn synchronization using a dedicated AtomicBoolean monitor with bounds at the start and end of next() properly constrains the currentPageCount lifecycle.

ivakegg added 3 commits June 26, 2026 14:28
flagging
* Updated the running query to use the async results thread by default
* Added a flag in the QueryLogic used to force synchronous running query
* Added the capability for the query expiration bean to detect when a partial page should be returned and attempt to force it (includes detecting where the RunningQuery may be stuck if at all)
* Added a flag in the Query Logic used to bypass the QueryLimiter mechanism (potentially for short lived queries such as UUID lookups)
@ivakegg ivakegg force-pushed the task/runningQuery2 branch from 7b03a3b to 2269083 Compare June 26, 2026 14:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants