UN-3266 [FIX] Re-indexing issue in Prompt Studio by harini-venkataraman · Pull Request #1907 · Zipstack/unstract

harini-venkataraman · 2026-04-08T10:12:32Z

What

Fixes two Prompt Studio IDE regressions introduced by the Phase 4 async-executor migration:

Symptom 1: After clicking Index on a document, the first Answer Prompt click triggers a visible re-index round-trip in the worker logs. Subsequent Answer Prompts do not re-index.
Symptom 2: After clicking Index on a document, the first Answer Prompt returns "no context found" with zero chunks retrieved. Clicking Index again then Answer Prompt produces a valid answer.

Both symptoms share a single root cause that this PR addresses surgically with three changes.

Why

Pre-Phase-4 behavior (worked)

The synchronous PromptTool HTTP path marked IndexManager.extraction_status[x2text_config_hash] = {extracted: True, ...} as part of indexing. On the next Answer Prompt, check_extraction_status found the COMPLETED marker and skipped re-extraction. The extract .txt stayed byte-identical to what was indexed, so the doc_id hash computed at retrieval time matched the doc_id the VDB was keyed under.

Phase-4 behavior (broken)

Phase 4 replaced the synchronous HTTP call with ExecutionDispatcher.dispatch() to the executor worker. The new callback chain (ide_index_complete) calls mark_document_indexed (Redis cache) and update_index_manager (raw_index_id) — but never calls mark_extraction_status. The worker client didn't even have that method.

How

Three surgical changes:

Change 1 — Wire `mark_extraction_status` through the Phase 4 callback chain (primary fix)

Backend — new internal endpoint v1/prompt-studio/extraction-status/ in internal_views.py + internal_urls.py that resolves profile_manager_id → ProfileManager and delegates to PromptStudioIndexHelper.mark_extraction_status.

No changes to build_index_payload — all four required cb_kwargs fields were already stashed during Phase 4 (lines 577-594).

Change 2 — Fix `_handle_index` unconditional re-index (defense-in-depth)

In workers/executor/executors/legacy_executor.py, _handle_index now early-returns with the existing doc_id when doc_id_found and not reindex, preventing re-writing the same chunks into the VDB on cache misses. The Index button dispatches with reindex=True (prompt_studio_helper.py:545).

Change 3 — Fix `mark_extraction_status` dict-replacement bug

In backend/prompt_studio/prompt_studio_index_manager_v2/prompt_studio_index_helper.py, replaced the update_or_create(defaults={"extraction_status": {hash: data}}) pattern with a select_for_update().get_or_create() + in-place dict merge + save(update_fields=["extraction_status"]).

Can this PR break any existing features

No. Here's why each change is safe:

Change 1 (mark_extraction_status wiring) — Only adds a new call in the success branch of ide_index_complete. The new call is wrapped in a log-only try/except, so even if the backend endpoint is unreachable or returns an error, primary indexing is unaffected and the existing success path continues. The new endpoint is purely additive.
Change 2 (executor early-return) — The early-return only fires when doc_id_found=True AND reindex=False. The Prompt Studio Index button dispatches with reindex=True, so the user-facing "Index" action is unchanged. This guard matches the code's clear original intent (why else would is_document_indexed be called and logged?). Existing test test_reindex_passed_through updated to explicitly assert perform_indexing is still called when reindex=True.
Change 3 (dict-merge fix) — For the single-hash case that's in production today, the observable result is identical (the stored dict has exactly one entry either way). The new code path only diverges when multiple hashes coexist — which is a bug in the old code, not a behavior change we're altering.

Database Migrations

None. The extraction_status JSONField and IndexManager unique constraint already exist in prompt_studio/prompt_studio_index_manager_v2/migrations/0002_indexmanager_extraction_status.py.

Env Config

None.

Relevant Docs

architecture-migration-phases.md (Phase 4 async-executor migration)
backend/prompt_studio/prompt_studio_index_manager_v2/prompt_studio_index_helper.py (helper docstrings)

Related Issues or PRs

Depends on Phase 4 scaffolding in fix/agentic-executor-queue branch (this PR's base).
Fixes a regression introduced by the Phase 4 merge that replaced synchronous PromptTool HTTP calls with ExecutionDispatcher.dispatch().

Dependencies Versions

None changed.

Notes on Testing

Automated

# Workers test suite — all targeted suites pass
cd workers
/home/harini/Documents/Workspace/unstract-poc/clean/unstract/backend/venv/bin/uv run pytest -v \
  tests/test_legacy_executor_index.py \
  tests/test_sanity_phase2.py \
  tests/test_sanity_phase4.py

Results:

test_legacy_executor_index.py: 13/13 passed (including new test_already_indexed_no_reindex_short_circuits)
test_sanity_phase2.py: passing
test_sanity_phase4.py: passing
Full workers suite: 540/545 passed — the 5 failures in test_answer_prompt.py / test_sanity_phase3.py were verified to exist on the baseline (pre-stash) and are unrelated to this fix.

Manual E2E repro (primary validation)

docker-compose up backend + workers stack
Open a chunking-enabled tool with a single profile manager
Click Index on a document → wait for success
Click Answer Prompt on any prompt
- Symptom 2 check: the answer should contain valid context, NOT "no context found"
- Symptom 1 check: tail ide_callback worker logs; the first Answer Prompt should NOT show a re-index round-trip (no second ide_index dispatch)
Click Answer Prompt a second time — should behave identically (no re-index)
Click the explicit Reindex button (or toggle reindex=True) — confirm the executor still runs full re-indexing and produces a valid result

Multi-profile regression check (Change 3)

If the tool has multiple profiles, invoke mark_extraction_status through two different profiles on the same document (or through two different x2text configs)
Verify IndexManager.extraction_status contains BOTH hash entries rather than just the latest

Screenshots

N/A — purely backend fix, no UI surface changed.

Checklist

I have read and understood the Contribution Guidelines.
Automated tests added for the new executor short-circuit path.
Existing tests run and pass (no regressions introduced by this PR).
Log-only fallback for the new callback call (primary indexing path unaffected on failure).
No database migrations required.
No env/config changes required.

Files Modified

backend/prompt_studio/prompt_studio_core_v2/internal_views.py — new extraction_status POST view
backend/prompt_studio/prompt_studio_core_v2/internal_urls.py — wire new URL
backend/prompt_studio/prompt_studio_index_manager_v2/prompt_studio_index_helper.py — dict-merge fix
workers/shared/clients/prompt_studio_client.py — new mark_extraction_status client method
workers/ide_callback/tasks.py — call mark_extraction_status from ide_index_complete success branch
workers/executor/executors/legacy_executor.py — _handle_index early-return guard
workers/tests/test_legacy_executor_index.py — new test for early-return, strengthened reindex test

Conflicts resolved: - docker-compose.yaml: Use main's dedicated dashboard_metric_events queue for worker-metrics - PromptCard.jsx: Keep tool_id matching condition from our async socket feature - PromptRun.jsx: Merge useEffect import from main with our branch - ToolIde.jsx: Keep fire-and-forget socket approach (spinner waits for socket event) - SocketMessages.js: Keep both session-store and socket-custom-tool imports + updateCusToolMessages dep - SocketContext.js: Keep simpler path-based socket connection approach - usePromptRun.js: Keep Celery fire-and-forget with socket delivery over polling - setupProxy.js: Accept main's deletion (migrated to Vite)

…on-backend

for more information, see https://pre-commit.ci

…on-backend

… into feat/execution-backend

for more information, see https://pre-commit.ci

… into feat/execution-backend

Signed-off-by: harini-venkataraman <115449948+harini-venkataraman@users.noreply.github.com>

for more information, see https://pre-commit.ci

Collapse multi-line `<Typography.Text>null</Typography.Text>` JSX to a single line so biome's formatter passes in CI. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add a defensive guard in `UsageHelper.get_usage_by_model()` that drops `Usage` rows where `usage_type == "llm"` and `llm_usage_reason` is empty. Per the Usage model contract, an empty reason is only valid when `usage_type == "embedding"`; an empty reason combined with `usage_type == "llm"` is a producer-side bug (an LLM call site forgot to pass `llm_usage_reason` in `usage_kwargs`). Without this guard the row surfaces in API deployment responses as a malformed bare `"llm"` bucket with no token breakdown alongside the legitimate `"extraction_llm"` bucket. The guard logs a warning on every dropped row so future producer regressions are detectable. Adds three regression tests in `backend/usage_v2/tests/test_helper.py` that stub `account_usage.models` and `usage_v2.models` in `sys.modules` so the helper can be imported without Django being set up: - `test_unlabeled_llm_row_is_dropped` — bare "llm" bucket disappears - `test_embedding_row_is_preserved` — guard is scoped to LLM rows - `test_all_three_llm_reasons_coexist` — extraction/challenge/summarize Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

for more information, see https://pre-commit.ci

- legacy_executor: extract _run_pipeline_answer_step helper to drop _handle_structure_pipeline cognitive complexity from 18 to under 15 - legacy_executor: bundle 9 prompt-run scalars into a prompt_run_args dict so _run_line_item_extraction has 8 params (was 15, limit 13) - legacy_executor: merge implicitly concatenated log string - structure_tool_task: extract _write_pipeline_outputs helper used by both _execute_structure_tool_impl and _run_agentic_extraction to remove the duplicated INFILE / COPY_TO_FOLDER write block (fixes the 6.1% duplication on new code) - test_context_retrieval_metrics: use pytest.approx for float compare, drop unused executor local, drop always-true if is_single_pass Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

for more information, see https://pre-commit.ci

…ming Drop _inject_context_retrieval_metrics and its call site in _handle_single_pass_extraction. The helper was timing a second fs.read against a warm cache (the cloud plugin had already read the file to build its combined prompt) and reporting that under context_retrieval, which is a fabricated number, not a measurement. The cloud plugin is the source of the file read for single-pass and is responsible for populating context_retrieval in its returned metrics. Updated the docstring to spell out the contract. Also fix misleading "Completed prompt" streaming in the table and line-item extraction wrappers: the message was firing on both the success and failure branches, and on failure the user never saw the error (it only went to logger.error). Move the success-only message into the success branch and stream the error at LogLevel.ERROR on the failure branch. Fall back to "unknown error" when the plugin returns an empty result.error. Drop the now-orphan TestInjectContextRetrievalMetrics test class (six tests calling the deleted method) and update the module docstring. Surviving classes (TestSinglePassChunkSizeForcing, TestPipelineIndexUsageKwargsPropagation) cover unrelated invariants and are kept. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai · 2026-04-08T10:12:41Z

Summary by CodeRabbit

New Features
- Added document extraction status tracking to monitor text processing results, including completion status and error details.
Bug Fixes
- Optimized indexing performance by skipping already-indexed documents during routine operations unless a full re-index is explicitly requested.

Walkthrough

The changes introduce a new extraction-status API endpoint that allows background workers to report document extraction completion status tied to specific profiles and text-extraction configurations. The system atomically persists this status to prevent concurrent overwrites. Executor logic is updated to skip unnecessary re-indexing when documents are already indexed and reindex is disabled.

Changes

Cohort / File(s)	Summary
API Routing & Endpoint `backend/prompt_studio/prompt_studio_core_v2/internal_urls.py`, `backend/prompt_studio/prompt_studio_core_v2/internal_views.py`	Added new internal POST endpoint `extraction-status/` that validates required fields (`document_id`, `profile_manager_id`, `x2text_config_hash`, `enable_highlight`), calls `PromptStudioIndexHelper.mark_extraction_status(...)`, and returns success/error responses with proper HTTP status codes.
Extraction Status Persistence `backend/prompt_studio/prompt_studio_index_manager_v2/prompt_studio_index_helper.py`	Modified `mark_extraction_status` to use atomic row locking with `select_for_update()` and in-place dictionary merging instead of full-replacement `update_or_create`, ensuring concurrent extraction status updates for different config hashes do not overwrite each other.
Executor Early Return `workers/executor/executors/legacy_executor.py`	Added early-return logic in `_handle_index` when document is already indexed and `reindex=False`, preventing redundant indexing operations and simplifying conditional structure.
Worker Status Reporting `workers/ide_callback/tasks.py`	Extended `ide_index_complete` to extract and forward extraction metadata (`x2text_config_hash`, `enable_highlight`) to the new `mark_extraction_status` API endpoint after indexing completes, with warning-level error handling.
Client API Method `workers/shared/clients/prompt_studio_client.py`	Added `mark_extraction_status(...)` method to `PromptStudioAPIClient` that POSTs extraction payload including document ID, profile manager ID, config hash, highlight flag, and optional error message.
Executor Tests `workers/tests/test_legacy_executor_index.py`	Added two test cases: one asserting `perform_indexing()` is called when `reindex=True` despite document being indexed, and another verifying the short-circuit behavior when `reindex=False` skips indexing entirely.

Sequence Diagram

sequenceDiagram
    participant Executor as Executor
    participant Callback as ide_callback
    participant Client as PromptStudioAPIClient
    participant API as extraction-status<br/>Endpoint
    participant Helper as IndexHelper
    participant DB as Database

    Executor->>Executor: Check is_document_indexed()
    alt Document Not Indexed
        Executor->>Executor: perform_indexing()
        Executor->>Callback: ide_index_complete(callback_kwargs)
    else Document Indexed & reindex=False
        Executor->>Callback: ide_index_complete(callback_kwargs)
    end
    
    Callback->>Callback: Extract x2text_config_hash,<br/>enable_highlight
    Callback->>Client: mark_extraction_status(document_id,<br/>profile_manager_id, ...)
    
    Client->>API: POST /extraction-status/
    API->>API: Validate required fields
    API->>Helper: mark_extraction_status(...)
    
    Helper->>DB: select_for_update().get_or_create()
    DB-->>Helper: IndexManager instance (locked)
    Helper->>Helper: Read existing<br/>extraction_status dict
    Helper->>Helper: Merge/update entry<br/>for x2text_config_hash
    Helper->>DB: save(update_fields=<br/>["extraction_status"])
    DB-->>Helper: Confirm update
    
    Helper-->>API: Success/Result
    API-->>Client: {"success": true}
    Client-->>Callback: Response
    Callback->>Callback: Log result (warn on error)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	Title clearly identifies the main change: fixing a re-indexing issue in Prompt Studio introduced by Phase 4 migration. It is specific, concise, and directly reflects the primary objective of the PR.
Description check	✅ Passed	Description is comprehensive and well-structured, covering all template sections: What (two symptoms and root cause), Why (pre/post-Phase-4 behavior), How (three surgical changes), impact analysis, testing strategy, and relevant context. All required sections are filled with substantive detail.
Docstring Coverage	✅ Passed	Docstring coverage is 90.91% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/agentic-executor-queue

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

sonarqubecloud · 2026-04-08T10:14:02Z

Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

greptile-apps · 2026-04-08T10:19:16Z

Greptile Summary

This PR fixes two Prompt Studio IDE regressions introduced by the Phase 4 async-executor migration. Both symptoms (unwanted re-index on first Answer Prompt, and "no context found" on first answer) share the same root cause: the Phase 4 callback chain never called mark_extraction_status, meaning check_extraction_status always returned False and forced a re-extraction path on every Answer Prompt.

The fix is clean and surgical:

Change 1: Adds a new v1/prompt-studio/extraction-status/ internal endpoint that the ide_index_complete callback now calls (non-fatally) after successful indexing, wiring the missing link in the Phase 4 callback chain.
Change 2: Adds a guard in _handle_index to early-return when doc_id_found=True and reindex=False, preventing spurious VDB writes. Both _handle_structure_pipeline (explicit reindex=True at line 841) and the IDE Index button (explicit reindex=True at prompt_studio_helper.py:545) are unaffected by this guard.
Change 3: Replaces update_or_create(defaults={\"extraction_status\": {hash: data}}) with a select_for_update().get_or_create() + in-place dict merge, fixing a bug where each call would clobber prior hash entries in multi-profile scenarios.

Key observations:

The extraction_status internal view follows the exact same pattern as the existing index_update and indexing_status endpoints (csrf_exempt, broad exception handler returning 500).
The new mark_extraction_status call in ide_index_complete is wrapped in an isolated try/except with exc_info=True logging, so failures are non-fatal and won't break primary indexing.
All three callers of _handle_index (_handle_ide_index, _handle_structure_pipeline, and direct Operation.INDEX) either explicitly pass reindex=True or represent the user-facing reindex path, so the early-return guard is safe.
New test test_already_indexed_no_reindex_short_circuits correctly asserts the short-circuit path, and the existing test_reindex_passed_through is strengthened with an explicit perform_indexing.assert_called_once() check.

Confidence Score: 5/5

Safe to merge — all three changes are surgical, well-tested, and non-breaking.

No P0 or P1 issues found. The early-return guard in _handle_index is safe because every real caller passes reindex=True. The dict-merge fix is strictly an improvement. The new extraction-status callback path is isolated behind a try/except and matches existing internal endpoint patterns. 13/13 targeted tests pass with new coverage for the short-circuit path. All remaining observations are P2 or lower and were already addressed in previous review threads.

No files require special attention.

Vulnerabilities

No security concerns identified. The new extraction-status internal endpoint is @csrf_exempt and follows the same security posture as all other internal endpoints in this module. It does not expose any user-controlled data beyond what already flows through the existing index callback chain.

Important Files Changed

Filename	Overview
backend/prompt_studio/prompt_studio_core_v2/internal_views.py	Adds new extraction_status POST endpoint mirroring the existing pattern of index_update/indexing_status; resolves ProfileManager by PK, delegates to PromptStudioIndexHelper.mark_extraction_status, and returns {success: bool}. Clean and consistent with the file's existing conventions.
backend/prompt_studio/prompt_studio_core_v2/internal_urls.py	Wires the new extraction-status/ URL path; change is minimal and correct.
backend/prompt_studio/prompt_studio_index_manager_v2/prompt_studio_index_helper.py	Replaces update_or_create(defaults={extraction_status: {...}}) with select_for_update().get_or_create() + in-place dict merge + save(update_fields=[...]), correctly fixing the dict-clobber bug for multi-hash scenarios without changing single-hash behavior.
workers/ide_callback/tasks.py	Adds the missing mark_extraction_status call in ide_index_complete's success branch; correctly guarded behind 'if x2text_config_hash and profile_manager_id' and wrapped in isolated try/except so failure is non-fatal to primary indexing.
workers/shared/clients/prompt_studio_client.py	Adds mark_extraction_status client method with correct payload construction matching the new backend endpoint's expected JSON fields.
workers/executor/executors/legacy_executor.py	Adds early-return in _handle_index when doc_id_found=True and reindex=False; safe because both primary callers (_handle_structure_pipeline at line 841 and the IDE Index button at prompt_studio_helper.py:545) explicitly pass reindex=True.
workers/tests/test_legacy_executor_index.py	Adds test_already_indexed_no_reindex_short_circuits covering the new early-return path, and strengthens test_reindex_passed_through with an explicit perform_indexing.assert_called_once() assertion.

Sequence Diagram

sequenceDiagram
    participant UI as Prompt Studio UI
    participant BE as Backend (Django)
    participant EW as Executor Worker
    participant CB as ide_callback Worker
    participant VDB as Vector DB
    participant ORM as Database (ORM)

    UI->>BE: POST /index (Index button, reindex=True)
    BE->>EW: dispatch ide_index (reindex=True in index_params)
    EW->>EW: _handle_extract → extracted_text
    EW->>VDB: is_document_indexed(doc_id)
    Note over EW: reindex=True → skip early-return guard
    EW->>VDB: perform_indexing(doc_id, extracted_text)
    EW-->>CB: ide_index_complete(result_dict, cb_kwargs)
    CB->>BE: POST /indexing-status/ (mark_document_indexed)
    CB->>BE: POST /index/ (update_index_manager)
    CB->>BE: POST /extraction-status/ NEW
    BE->>ORM: select_for_update().get_or_create(doc, profile)
    BE->>ORM: merge extraction_status dict + save()
    CB-->>UI: WebSocket index_document completed

    UI->>BE: POST /answer-prompt (Answer Prompt)
    BE->>BE: check_extraction_status(x2text_config_hash)
    Note over BE: Now finds COMPLETED marker - skips re-extraction
    BE->>EW: dispatch answer_prompt (no re-index)
    EW-->>CB: ide_prompt_complete
    CB-->>UI: WebSocket fetch_response completed (with valid context)

_{Reviews (2): Last reviewed commit: "Merge branch 'main' into fix/agentic-exe..." | Re-trigger Greptile}

workers/tests/test_legacy_executor_index.py

backend/prompt_studio/prompt_studio_core_v2/internal_views.py

github-actions · 2026-04-08T10:40:34Z

Test Results

Summary

✅ Runner Tests: 11 passed, 0 failed (11 total)
✅ SDK1 Tests: 178 passed, 0 failed (178 total)

Runner Tests - Full Report

filepath	function	$$\textcolor{#23d18b}{\tt{passed}}$$	SUBTOTAL
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_logs}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_cleanup}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_cleanup\_skip}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_client\_init}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image\_exists}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config\_without\_mount}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_run\_container}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image\_for\_sidecar}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_sidecar\_container}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{TOTAL}}$$		$$\textcolor{#23d18b}{\tt{11}}$$	$$\textcolor{#23d18b}{\tt{11}}$$

SDK1 Tests - Full Report

coderabbitai

🧹 Nitpick comments (1)

workers/tests/test_legacy_executor_index.py (1)

227-255: Consider asserting VectorDB cleanup in the new short-circuit test.

Since Line 249 exercises the new early-return branch, it would be useful to assert the mocked VectorDB is still closed in that path.

✅ Suggested test tweak

-        mock_vdb_cls.return_value = MagicMock()
+        mock_vdb = MagicMock()
+        mock_vdb_cls.return_value = mock_vdb
         mock_get_fs.return_value = MagicMock()
@@
         assert result.success is True
         assert result.data[IKeys.DOC_ID] == "doc-already-indexed"
         mock_index.is_document_indexed.assert_called_once()
         mock_index.perform_indexing.assert_not_called()
+        mock_vdb.close.assert_called_once()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@workers/tests/test_legacy_executor_index.py` around lines 227 - 255, The test
test_already_indexed_no_reindex_short_circuits should also assert that the
VectorDB instance is cleaned up when the executor short-circuits: after calling
executor.execute(ctx) add an assertion that the mocked VectorDB
(mock_vdb_cls.return_value) had its close/cleanup method invoked (e.g., assert
mock_vdb_cls.return_value.close.called or assert_called_once()), so locate the
mock_vdb_cls usage in this test and verify its returned mock was closed when
mock_index.is_document_indexed returned True and perform_indexing was not
called.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@workers/tests/test_legacy_executor_index.py`:
- Around line 227-255: The test test_already_indexed_no_reindex_short_circuits
should also assert that the VectorDB instance is cleaned up when the executor
short-circuits: after calling executor.execute(ctx) add an assertion that the
mocked VectorDB (mock_vdb_cls.return_value) had its close/cleanup method invoked
(e.g., assert mock_vdb_cls.return_value.close.called or assert_called_once()),
so locate the mock_vdb_cls usage in this test and verify its returned mock was
closed when mock_index.is_document_indexed returned True and perform_indexing
was not called.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 65586c7c-474a-4856-b5de-bf896366d0c7

📥 Commits

Reviewing files that changed from the base of the PR and between c9ffd9d and f1f071e.

📒 Files selected for processing (7)

backend/prompt_studio/prompt_studio_core_v2/internal_urls.py
backend/prompt_studio/prompt_studio_core_v2/internal_views.py
backend/prompt_studio/prompt_studio_index_manager_v2/prompt_studio_index_helper.py
workers/executor/executors/legacy_executor.py
workers/ide_callback/tasks.py
workers/shared/clients/prompt_studio_client.py
workers/tests/test_legacy_executor_index.py

chandrasekharan-zipstack

@harini-venkataraman the methods mention "mark_extraction_status" but its called from ide_index_complete. Does this mean that extraction is not handled in a separate worker but clubbed with indexing?

backend/prompt_studio/prompt_studio_core_v2/internal_views.py

backend/prompt_studio/prompt_studio_index_manager_v2/prompt_studio_index_helper.py

harini-venkataraman · 2026-04-08T11:14:26Z

@harini-venkataraman the methods mention "mark_extraction_status" but its called from ide_index_complete. Does this mean that extraction is not handled in a separate worker but clubbed with indexing?

Yes, ide_index is a operation that runs extract+index in the same Celery task — see _handle_ide_index in legacy_executor.py. Marking extraction_status from the index callback is because by the time ide_index_complete fires, extraction has already succeeded as part of the same task.

harini-venkataraman and others added 30 commits February 19, 2026 20:39

Execution backend - revamp

2da4907

async flow

41eeef8

Streaming progress to FE

f66dfb2

Removing multi hop in Prompt studio ide and structure tool

95c6592

Merge remote-tracking branch 'origin/main' into feat/execution-backend

44a2b3f

UN-3234 [FIX] Add beta tag to agentic prompt studio navigation item

2f4f2dc

Added executors for agentic prompt studio

d041201

Merge branch 'main' of github.com:Zipstack/unstract into feat/executi…

0a0cfb1

…on-backend

Merge branch 'main' of github.com:Zipstack/unstract into feat/executi…

a4e1fd7

…on-backend

Added executors for agentic prompt studio

ae77d6a

Added executors for agentic prompt studio

5c22956

Removed redundant envs

3cc3213

Removed redundant envs

d0532f8

Removed redundant envs

6173df5

[pre-commit.ci] auto fixes from pre-commit.com hooks

bbe6f58

for more information, see https://pre-commit.ci

Removed redundant envs

a3dc912

Merge branch 'main' of github.com:Zipstack/unstract into feat/executi…

98c8071

…on-backend

Merge branch 'feat/execution-backend' of github.com:Zipstack/unstract…

21157ac

… into feat/execution-backend

Removed redundant envs

0216b59

Removed redundant envs

db81b9d

Removed redundant envs

e1da202

Removed redundant envs

d119797

Removed redundant envs

fbadbf8

Removed redundant envs

882296e

Removed redundant envs

6d3bbbf

[pre-commit.ci] auto fixes from pre-commit.com hooks

292460b

for more information, see https://pre-commit.ci

Removed redundant envs

f35c0e6

Merge branch 'feat/execution-backend' of github.com:Zipstack/unstract…

9bcb458

… into feat/execution-backend

adding worker for callbacks

0cbd10a

harini-venkataraman and others added 13 commits April 6, 2026 19:12

Fixing null fonts

0a91221

Merge branch 'main' into fix/agentic-executor-queue

e1bbc80

Signed-off-by: harini-venkataraman <115449948+harini-venkataraman@users.noreply.github.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

6f2ce13

for more information, see https://pre-commit.ci

Fix biome formatting in DisplayPromptResult

0533ced

Collapse multi-line `<Typography.Text>null</Typography.Text>` JSX to a single line so biome's formatter passes in CI. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

095c7d1

for more information, see https://pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks

7421f3b

for more information, see https://pre-commit.ci

Addressing greptile comments

1a79030

Addressing greptile comments

5c3b67c

[pre-commit.ci] auto fixes from pre-commit.com hooks

adda29e

for more information, see https://pre-commit.ci

Fixing re-indexing marker

10b2431

Merge branch 'main' into fix/agentic-executor-queue

f1f071e

greptile-apps bot reviewed Apr 8, 2026

View reviewed changes

workers/tests/test_legacy_executor_index.py Show resolved Hide resolved

backend/prompt_studio/prompt_studio_core_v2/internal_views.py Show resolved Hide resolved

harini-venkataraman changed the title ~~Fix/agentic executor queue~~ UN-3266 [FIX] Re-indexing issue in Prompt Studio Apr 8, 2026

harini-venkataraman marked this pull request as ready for review April 8, 2026 10:39

harini-venkataraman requested review from Deepak-Kesavan and chandrasekharan-zipstack April 8, 2026 10:41

coderabbitai bot reviewed Apr 8, 2026

View reviewed changes

chandrasekharan-zipstack reviewed Apr 8, 2026

View reviewed changes

backend/prompt_studio/prompt_studio_core_v2/internal_views.py Show resolved Hide resolved

backend/prompt_studio/prompt_studio_index_manager_v2/prompt_studio_index_helper.py Show resolved Hide resolved

chandrasekharan-zipstack approved these changes Apr 8, 2026

View reviewed changes

pk-zipstack approved these changes Apr 8, 2026

View reviewed changes

harini-venkataraman merged commit 8dec64e into main Apr 8, 2026
10 checks passed

harini-venkataraman deleted the fix/agentic-executor-queue branch April 8, 2026 11:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UN-3266 [FIX] Re-indexing issue in Prompt Studio#1907

UN-3266 [FIX] Re-indexing issue in Prompt Studio#1907
harini-venkataraman merged 153 commits intomainfrom
fix/agentic-executor-queue

harini-venkataraman commented Apr 8, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Apr 8, 2026 •

edited

Loading

Uh oh!

sonarqubecloud bot commented Apr 8, 2026

Uh oh!

greptile-apps bot commented Apr 8, 2026 •

edited

Loading

Greptile Summary

Confidence Score: 5/5

Vulnerabilities

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Apr 8, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

chandrasekharan-zipstack left a comment

Uh oh!

Uh oh!

Uh oh!

harini-venkataraman commented Apr 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

harini-venkataraman commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

Pre-Phase-4 behavior (worked)

Phase-4 behavior (broken)

How

Change 1 — Wire mark_extraction_status through the Phase 4 callback chain (primary fix)

Change 2 — Fix _handle_index unconditional re-index (defense-in-depth)

Change 3 — Fix mark_extraction_status dict-replacement bug

Can this PR break any existing features

Database Migrations

Env Config

Relevant Docs

Related Issues or PRs

Dependencies Versions

Notes on Testing

Automated

Manual E2E repro (primary validation)

Multi-profile regression check (Change 3)

Screenshots

Checklist

Files Modified

Uh oh!

coderabbitai bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Uh oh!

sonarqubecloud bot commented Apr 8, 2026

Quality Gate passed

Uh oh!

greptile-apps bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Vulnerabilities

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Apr 8, 2026

Test Results

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

chandrasekharan-zipstack left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

harini-venkataraman commented Apr 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

harini-venkataraman commented Apr 8, 2026 •

edited

Loading

Change 1 — Wire `mark_extraction_status` through the Phase 4 callback chain (primary fix)

Change 2 — Fix `_handle_index` unconditional re-index (defense-in-depth)

Change 3 — Fix `mark_extraction_status` dict-replacement bug

coderabbitai bot commented Apr 8, 2026 •

edited

Loading

greptile-apps bot commented Apr 8, 2026 •

edited

Loading