UN-2807 [MISC] Changed user_data to custom_data in variable replacement by jaags-dev · Pull Request #1548 · Zipstack/unstract

jaags-dev · 2025-09-22T12:16:42Z

What

Add custom_data variable support
Implement custom_data field validation and processing for API deployments
Support nested JSON object access in variable replacement (e.g., {{custom_data.name}}, {{custom_data.address.city}})
Enable custom_data variables in both direct API deployments and exported Prompt Studio tools

Why

Users need ability to pass dynamic JSON data to prompts during API deployment execution
Current variable system only supports static and dynamic variables, missing support for user-provided JSON data
Prompt Studio tools exported as containers need access to the same custom_data functionality as direct API calls
Enhances flexibility for users to create more dynamic and data-driven prompt templates

How

Enhanced ExecutionRequestSerializer with custom_data JSONField including JSON validation
Added CUSTOM_DATA variable type and regex pattern matching in prompt service constants
Implemented dot notation parsing for nested JSON traversal in variable replacement engine

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

No, this PR should not break any existing features because:

custom_data field is optional (required=False, allow_null=True) in API serializer
Added to EXECUTION_EXCLUDED_PARAMS to prevent passing to incompatible methods
Variable replacement maintains backward compatibility with existing static/dynamic variables

Database Migrations

No database schema changes required
All changes are at the application/service layer

Env Config

No new environment variables required
Uses existing workflow execution and metadata infrastructure

Relevant Docs

Related Issues or PRs

Zipstack/unstract-sdk#202

Dependencies Versions

Notes on Testing

Tested custom_data JSON validation in API serializer
Verified dot notation variable replacement for nested objects (custom_data.address.city)
Confirmed exported Prompt Studio tools receive custom_data through metadata
Validated backward compatibility with existing static/dynamic variables
Tested error handling for invalid JSON and missing object keys
Verified workflow execution pipeline passes custom_data correctly

Screenshots

N/A - Backend API feature

Checklist

I have read and understood the Contribution Guidelines.

…ides

…ack/unstract into feature/user-data-variable-support

coderabbitai · 2025-09-22T12:16:52Z

Summary by CodeRabbit

Refactor
- Renamed the “user_data” field to “custom_data” across APIs, workflows, metadata, and prompt services.
- Updated request/response payloads, serializer fields, method parameters, and variable replacement to use custom_data (e.g., prompt variables: custom_data.path.to.value).
- Execution exclusions and metadata keys now reference custom_data.
Chores
- Bumped Structure Tool version to 0.0.88.
Note
- This is a breaking change: update client requests, templates, and integrations to use the custom_data key.

Walkthrough

A project-wide rename changes the user-provided payload key from user_data to custom_data. This propagates through API constants, serializers, helpers, workflow orchestration, file metadata handling, prompt-service variable replacement, and tool settings. Environment/tool versions are bumped from 0.0.87 to 0.0.88. One serializer file contains merge-conflict markers.

Changes

Cohort / File(s)	Summary of changes
API v2 surface `backend/api_v2/constants.py`, `backend/api_v2/deployment_helper.py`, `backend/api_v2/api_deployment_views.py`	Rename USER_DATA → CUSTOM_DATA; function parameters and calls now use custom_data; serializer access key switched accordingly.
API v2 serializers `backend/api_v2/serializers.py`	ExecutionRequestSerializer field user_data → custom_data; validator renamed and messages updated; note: merge-conflict markers present in docstring/field area.
Workflow manager orchestration `backend/workflow_manager/workflow_v2/workflow_helper.py`, `backend/workflow_manager/workflow_v2/file_execution_tasks.py`, `backend/workflow_manager/endpoint_v2/source.py`, `backend/workflow_manager/workflow_v2/dto.py`	Public signatures updated to custom_data; EXECUTION_EXCLUDED_PARAMS filters custom_data; FileData now exposes custom_data; calls to add_file_to_volume pass custom_data; minor formatting-only hunks elsewhere.
Workflow execution layer `unstract/workflow-execution/src/unstract/workflow_execution/constants.py`, `unstract/workflow-execution/src/unstract/workflow_execution/execution_file_handler.py`	MetaDataKey.USER_DATA → CUSTOM_DATA; ExecutionFileHandler.add_metadata_to_volume parameter renamed to custom_data; metadata writes use CUSTOM_DATA key.
Prompt service constants and variable replacement `prompt-service/src/unstract/prompt_service/constants.py`, `prompt-service/src/unstract/prompt_service/helpers/variable_replacement.py`, `prompt-service/src/unstract/prompt_service/services/variable_replacement.py`, `prompt-service/src/unstract/prompt_service/controllers/answer_prompt.py`	Public constants and enum updated to CUSTOM_DATA; regex targets custom_data; replacement helper renamed replace_custom_data_variable; service/controller accept and propagate custom_data instead of user_data.
Structure tool `tools/structure/src/constants.py`, `tools/structure/src/main.py`, `tools/structure/src/config/properties.json`	SettingsKeys.USER_DATA → CUSTOM_DATA; payload now writes custom_data; toolVersion bumped 0.0.87 → 0.0.88.
Env version bump `backend/sample.env`	STRUCTURE_TOOL_IMAGE_URL and STRUCTURE_TOOL_IMAGE_TAG updated to 0.0.88.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant C as Client
    participant API as API v2
    participant DH as DeploymentHelper
    participant WH as WorkflowHelper
    participant Q as Task Queue
    participant EFH as ExecutionFileHandler
    participant SRC as SourceConnector
    participant VOL as Volume/Metadata

    note over C,API: Request contains custom_data
    C->>API: POST /execute (custom_data)
    API->>DH: execute_workflow(custom_data)
    DH->>WH: execute_workflow_async(custom_data)
    WH->>Q: Enqueue task (custom_data)
    Q-->>WH: Task started
    WH->>SRC: add_file_to_volume(..., custom_data)
    SRC->>EFH: add_metadata_to_volume(..., custom_data)
    EFH->>VOL: Write metadata { ..., custom_data }
    VOL-->>EFH: OK
    EFH-->>SRC: OK
    SRC-->>WH: File prepared
    WH-->>DH: Execution progressing
    DH-->>API: Ack
    API-->>C: 202 Accepted
    note right of VOL: Metadata field key: "custom_data"

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title correctly references a real part of the changeset (renaming user_data → custom_data in variable replacement), but it understates the scope: this PR performs a cross-cutting rename and API/serializer changes across multiple modules (constants, serializers, workflow, metadata, prompt service, etc.). As written the title is therefore only partially representative of the main change.
Description Check	✅ Passed	The PR description follows the repository template and fills the required What/Why/How sections, includes a "Can this PR break..." justification, and documents testing and related PRs, so it is largely complete and matches the template structure. Some non-critical template sections (Relevant Docs, Dependencies Versions) are blank, and the description does not call out observed unresolved merge-conflict markers in serializers.py or explicitly describe external API compatibility/migration guidance for consumers expecting "user_data".

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/user-data-variable-support

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)

backend/workflow_manager/workflow_v2/file_execution_tasks.py (3)

555-563: Crash risk: _build_final_result called with workflow_file_execution=None.

This path passes workflow_file_execution=None, but _build_final_result dereferences .id, causing AttributeError. Fix in _build_final_result to handle None.

See proposed fix in the comment for Lines 1128-1192.

564-584: UnboundLocalError risk: destination/workflow_log may be undefined in except.

If the exception occurs before these vars are assigned, referencing them will raise. Initialize both before the try and guard their use.

     def _process_file(
         cls,
         current_file_idx: int,
         total_files: int,
         file_data: FileData,
         file_hash: FileHash,
         workflow_execution: WorkflowExecution,
         workflow_file_execution: WorkflowFileExecution | None = None,
     ) -> FileExecutionResult:
@@
-        try:
+        # Ensure variables exist for exception paths
+        destination: DestinationConnector | None = None
+        workflow_log: WorkflowLog | None = None
+        try:
@@
-        except Exception as error:
+        except Exception as error:
             if isinstance(error, UnsupportedMimeTypeError):
                 error_msg = str(error)
             else:
                 error_msg = f"File execution failed: {error}"
-                workflow_log.log_error(
-                    logger=logger, message=error_msg, exc_info=True, stack_info=True
-                )
+                if workflow_log:
+                    workflow_log.log_error(
+                        logger=logger, message=error_msg, exc_info=True, stack_info=True
+                    )
+                else:
+                    logger.error(error_msg, exc_info=True, stack_info=True)
             workflow_file_execution.update_status(
                 status=ExecutionStatus.ERROR, execution_error=error_msg[:500]
             )
             result = FinalOutputResult(output=None, metadata=None, error=error_msg)
             return cls._build_final_result(
                 workflow_execution=workflow_execution,
                 file_hash=file_hash,
                 result=result,
                 workflow_file_execution=workflow_file_execution,
                 error=error_msg,
-                is_api=destination.is_api if destination else False,
+                is_api=destination.is_api if destination else False,
                 destination=destination,
             )

1128-1192: Null dereference: workflow_file_execution may be None in _build_final_result.

Guard all usages and skip tracker/usage updates when not available.

     def _build_final_result(
         cls,
         workflow_execution: WorkflowExecution,
         file_hash: FileHash,
         result: FinalOutputResult,
-        workflow_file_execution: WorkflowFileExecution | None = None,
+        workflow_file_execution: WorkflowFileExecution | None = None,
         error: str | None = None,
         is_api: bool = False,
         destination: DestinationConnector | None = None,
     ) -> FileExecutionResult:
         """Construct and cache the final execution result."""
-        final_result = FileExecutionResult(
+        file_execution_id = (
+            str(workflow_file_execution.id) if workflow_file_execution else ""
+        )
+        final_result = FileExecutionResult(
             file=file_hash.file_name,
-            file_execution_id=str(workflow_file_execution.id),
+            file_execution_id=file_execution_id,
             error=error,
             result=result.output,
             metadata=result.metadata,
         )
 
-        if is_api:
+        if is_api and workflow_file_execution:
             # Update cache with final result
             ResultCacheUtils.update_api_results(
                 workflow_id=workflow_execution.workflow.id,
                 execution_id=str(workflow_execution.id),
                 api_result=final_result,
             )
@@
-                APIHubUsageUtil.track_api_hub_usage(
-                    workflow_execution_id=str(workflow_execution.id),
-                    workflow_file_execution_id=str(workflow_file_execution.id),
-                    organization_id=organization_id,
-                )
+                if workflow_file_execution:
+                    APIHubUsageUtil.track_api_hub_usage(
+                        workflow_execution_id=str(workflow_execution.id),
+                        workflow_file_execution_id=str(workflow_file_execution.id),
+                        organization_id=organization_id,
+                    )
@@
-        cls._update_file_execution_tracker(
-            execution_id=str(workflow_execution.id),
-            file_execution_id=str(workflow_file_execution.id),
-            stage=FileExecutionStage.COMPLETED,
-            status=status,
-            error=error,
-        )
-        cls.delete_tool_execution_tracker(
-            execution_id=str(workflow_execution.id),
-            file_execution_id=str(workflow_file_execution.id),
-        )
+        if workflow_file_execution:
+            cls._update_file_execution_tracker(
+                execution_id=str(workflow_execution.id),
+                file_execution_id=str(workflow_file_execution.id),
+                stage=FileExecutionStage.COMPLETED,
+                status=status,
+                error=error,
+            )
+            cls.delete_tool_execution_tracker(
+                execution_id=str(workflow_execution.id),
+                file_execution_id=str(workflow_file_execution.id),
+            )
 
         return final_result

unstract/workflow-execution/src/unstract/workflow_execution/execution_file_handler.py (1)

98-106: Public API break: rename user_data→custom_data without alias.

This method is likely imported by other modules. Provide a backward‑compatible alias for user_data (deprecated).

-    def add_metadata_to_volume(
+    def add_metadata_to_volume(
         self,
         input_file_path: str,
         file_execution_id: str,
         source_hash: str,
         tags: list[str],
         llm_profile_id: str | None = None,
-        custom_data: dict[str, Any] | None = None,
+        custom_data: dict[str, Any] | None = None,
+        **kwargs,
     ) -> None:
@@
-        # Add custom_data to metadata if provided
-        if custom_data:
+        # Back-compat: allow legacy 'user_data' kwarg
+        if custom_data is None and "user_data" in kwargs:
+            custom_data = kwargs.get("user_data")
+        # Add custom_data to metadata if provided
+        if custom_data:
             content[MetaDataKey.CUSTOM_DATA] = custom_data

backend/api_v2/constants.py (1)

14-15: Remove merge conflict markers in serializers and add user_data fallback in views

In backend/api_v2/serializers.py (around lines 214–219), remove the leftover conflict markers (=======, >>>>>>>) so only the custom_data doc remains.

In backend/api_v2/api_deployment_views.py (line 73), fall back to the legacy key:
custom_data = (
    serializer.validated_data.get(ApiExecution.CUSTOM_DATA)
    or serializer.validated_data.get('user_data')
)
This ensures clients posting user_data continue working during the deprecation window.

prompt-service/src/unstract/prompt_service/services/variable_replacement.py (1)

83-97: Stop logging the fully rendered prompt (PII/secret leakage).

Full prompts may contain sensitive CUSTOM_DATA; emitting them to logs violates least‑data and can breach compliance.

Apply this diff to redact:

-            app.logger.info(
-                f"[{tool_id}] Prompt after variable replacement: {prompt_text}"
-            )
+            app.logger.info(f"[{tool_id}] Prompt after variable replacement: [redacted]")
             publish_log(
                 log_events_id,
                 {
                     "tool_id": tool_id,
                     "prompt_key": prompt_name,
                     "doc_name": doc_name,
                 },
                 LogLevel.DEBUG,
                 RunLevel.RUN,
-                f"Prompt after variable replacement:{prompt_text} ",
+                "Prompt after variable replacement: [redacted]",
             )

If you must debug content, gate with a secure feature flag and redact values. I can provide a minimal redactor if needed.

🧹 Nitpick comments (17)

backend/workflow_manager/workflow_v2/file_execution_tasks.py (1)
821-828: Bug: provider_file_uuid mismatch logged without comparison.

This warns even when IDs are equal. Compare before warning.
-        if file_history.provider_file_uuid and file_hash.provider_file_uuid:
-            logger.warning(
-                f"Provider file UUID mismatch for file '{file_hash.file_name}' in workflow '{workflow}'"
-            )
+        if (
+            file_history.provider_file_uuid
+            and file_hash.provider_file_uuid
+            and file_history.provider_file_uuid != file_hash.provider_file_uuid
+        ):
+            logger.warning(
+                f"Provider file UUID mismatch for file '{file_hash.file_name}' in workflow '{workflow}'"
+            )
unstract/workflow-execution/src/unstract/workflow_execution/execution_file_handler.py (2)
112-121: Docstring missing param for custom_data.

Add param details for custom_data.
         Parameters:
             input_file_path (str): The path of the input file.
             file_execution_id (str): Unique execution id for the file.
             source_hash (str): The hash value of the source/input file.
             tags (list[str]): Tag names associated with the workflow execution.
             llm_profile_id (str, optional): LLM profile ID for overriding tool settings.
+            custom_data (dict[str, Any], optional): Arbitrary user-provided metadata to persist with the file's METADATA.json.
153-155: Tiny log grammar nit.

Consider: “metadata for … is added into execution directory.”
prompt-service/src/unstract/prompt_service/constants.py (1)

175-176: Regex covers dot-paths; consider BC alias if templates still use user_data.

If you need a deprecation window, support both patterns temporarily at the extractor.
prompt-service/src/unstract/prompt_service/controllers/answer_prompt.py (1)
56-56: BC: accept legacy 'user_data' from payload if CUSTOM_DATA absent.

Prevents breaking existing API clients.
-    custom_data: dict[str, Any] = payload.get(PSKeys.CUSTOM_DATA, {})
+    custom_data: dict[str, Any] = payload.get(PSKeys.CUSTOM_DATA) or payload.get("user_data", {})
+    if not isinstance(custom_data, dict):
+        custom_data = {}
tools/structure/src/main.py (1)
223-225: BC: accept legacy 'user_data' if CUSTOM_DATA absent and validate type.

tools/structure/src/constants.py defines CUSTOM_DATA (line 83); tools/structure/src/main.py (lines 223–225) currently falls back to {} and will ignore legacy "user_data" — add a fallback to "user_data" and ensure custom_data is a dict.
-        custom_data = self.get_exec_metadata.get(SettingsKeys.CUSTOM_DATA, {})
-        payload["custom_data"] = custom_data
+        custom_data = self.get_exec_metadata.get(SettingsKeys.CUSTOM_DATA)
+        # Back-compat: fallback to legacy key if present in exec metadata
+        if custom_data is None:
+            custom_data = self.get_exec_metadata.get("user_data")
+        if not isinstance(custom_data, dict):
+            self.stream_log("Ignoring non-dict custom_data in exec metadata")
+            custom_data = {}
+        payload["custom_data"] = custom_data
tools/structure/src/constants.py (1)
83-83: Add a transitional alias for backward compatibility (optional).

If any external tools/configs still send "user_data", consider a short-lived alias to de-risk the rollout.

Apply this diff:
@@
-    CUSTOM_DATA = "custom_data"
+    CUSTOM_DATA = "custom_data"
+    # TODO: remove after one minor release
+    USER_DATA = "custom_data"
Also, minor nit: SettingsKeys contains duplicate names (e.g., NAME, OUTPUTS, TOOL_ID) earlier in the class—worth consolidating separately.
unstract/workflow-execution/src/unstract/workflow_execution/constants.py (1)
49-49: Provide a migration-friendly alias (optional).

Existing METADATA.json written with "user_data" may still be present in volumes. A temporary alias helps readers tolerate old artifacts.
@@
-    CUSTOM_DATA = "custom_data"
+    CUSTOM_DATA = "custom_data"
+    # TODO: remove after one minor release
+    USER_DATA = "custom_data"
Please verify readers/writers of metadata now use MetaDataKey.CUSTOM_DATA everywhere and gracefully handle old artifacts. If you want, I can script-check the repo for remaining "user_data" metadata usages.
prompt-service/src/unstract/prompt_service/helpers/variable_replacement.py (1)
64-68: Use re.search() instead of re.findall() for presence check.

Small clarity/perf win; avoids building a list when only existence matters.
-        custom_data_pattern = re.compile(VariableConstants.CUSTOM_DATA_VARIABLE_REGEX)
-        if re.findall(custom_data_pattern, variable):
+        custom_data_pattern = re.compile(VariableConstants.CUSTOM_DATA_VARIABLE_REGEX)
+        if re.search(custom_data_pattern, variable):
             variable_type = VariableType.CUSTOM_DATA
backend/api_v2/serializers.py (1)
234-235: Expose custom_data field (OK). Consider accepting legacy user_data for a deprecation window.

To avoid breaking existing clients, optionally accept user_data write-only and map it to custom_data if custom_data is absent.
@@
-    custom_data = JSONField(required=False, allow_null=True)
+    custom_data = JSONField(required=False, allow_null=True)
+    # Backward-compat: accept legacy key, write-only
+    user_data = JSONField(required=False, allow_null=True, write_only=True)
Add mapping in validate (outside the shown hunk):
# Insert at the start of ExecutionRequestSerializer.validate()
legacy = data.pop("user_data", None)
if legacy is not None and data.get("custom_data") is None:
    data["custom_data"] = legacy
elif legacy is not None and data.get("custom_data") is not None:
    raise ValidationError({"custom_data": "Provide either custom_data or user_data, not both."})
If you prefer a hard cutover, skip the alias; otherwise, I can open a follow-up PR with tests and docs for the transition.
prompt-service/src/unstract/prompt_service/services/variable_replacement.py (4)
37-38: Fix implicit Optional typing (RUF013).

Use explicit union for optional types.

Apply this diff:
-        custom_data: dict[str, Any] = None,
+        custom_data: dict[str, Any] | None = None,
101-104: Fix implicit Optional typing (RUF013).

Mirror the public signature fix here.

Apply this diff:
-        prompt_text: str,
-        variable_map: dict[str, Any],
-        custom_data: dict[str, Any] = None,
+        prompt_text: str,
+        variable_map: dict[str, Any],
+        custom_data: dict[str, Any] | None = None,
126-131: Handle missing/empty custom_data when CUSTOM_DATA variables are present.

and custom_data skips replacement for empty dicts and silently leaves placeholders. Prefer failing fast or explicit no‑data behavior.

Option A (fail fast):
-            elif variable_type == VariableType.CUSTOM_DATA and custom_data:
-                prompt_text = VariableReplacementHelper.replace_custom_data_variable(
+            elif variable_type == VariableType.CUSTOM_DATA:
+                if custom_data is None:
+                    raise KeyError(f"Missing custom_data for variable: {variable}")
+                prompt_text = VariableReplacementHelper.replace_custom_data_variable(
                     prompt=prompt_text,
                     variable=variable,
                     custom_data=custom_data,
                 )
Please confirm expected behavior when CUSTOM_DATA variables exist but custom_data is {} or None.

16-24: Docstring arg name mismatch.

Arg doc refers to prompt but function param is prompt_text.

Apply this diff:
-        Args:
-            prompt (str): Prompt to check
+        Args:
+            prompt_text (str): Prompt to check
backend/workflow_manager/workflow_v2/workflow_helper.py (3)
154-156: Signature change LGTM; update docstrings where applicable.

Parameter renamed to custom_data: dict[str, Any] | None = None. Ensure any docstrings/comments reflect this.

445-446: Celery payload risk: size and JSON‑serializability of custom_data.

Large or non‑JSON‑serializable custom_data can break task enqueueing or exceed broker limits.

Enforce JSON‑serializable dicts and consider a size cap (e.g., 256–512 KB).

Optionally strip/whitelist keys before enqueue.

Example pre‑validation (before send_task):
@@
-            async_execution: AsyncResult = celery_app.send_task(
+            # Ensure custom_data is JSON-serializable and bounded
+            if custom_data is not None:
+                try:
+                    _cd_json = json.dumps(custom_data)
+                    # Optional: cap at 512KB
+                    if len(_cd_json.encode("utf-8")) > 512 * 1024:
+                        raise ValueError("custom_data too large for async payload")
+                except TypeError as e:
+                    raise ValueError(f"custom_data must be JSON-serializable: {e}")
+            async_execution: AsyncResult = celery_app.send_task(
485-486: Consider sanitizing custom_data before passing to Celery.

Pass sanitized object to reduce risk and copy by value.

Apply this diff and helper:
-                    "custom_data": custom_data,
+                    "custom_data": custom_data if custom_data is not None else None,
Optional helper (outside this hunk) to centralize logic:
def _sanitize_custom_data(obj: dict[str, Any] | None, max_bytes: int = 512 * 1024) -> dict[str, Any] | None:
    if obj is None:
        return None
    try:
        data = json.loads(json.dumps(obj))  # ensure JSON-serializable copy
    except TypeError as e:
        raise ValueError(f"custom_data must be JSON-serializable: {e}")
    if len(json.dumps(data).encode("utf-8")) > max_bytes:
        raise ValueError("custom_data too large")
    return data
Then call _sanitize_custom_data(custom_data) before enqueue.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between d40ea54 and c953332.

📒 Files selected for processing (18)

backend/api_v2/api_deployment_views.py (2 hunks)
backend/api_v2/constants.py (1 hunks)
backend/api_v2/deployment_helper.py (3 hunks)
backend/api_v2/serializers.py (3 hunks)
backend/sample.env (1 hunks)
backend/workflow_manager/endpoint_v2/source.py (2 hunks)
backend/workflow_manager/workflow_v2/dto.py (1 hunks)
backend/workflow_manager/workflow_v2/file_execution_tasks.py (3 hunks)
backend/workflow_manager/workflow_v2/workflow_helper.py (8 hunks)
prompt-service/src/unstract/prompt_service/constants.py (3 hunks)
prompt-service/src/unstract/prompt_service/controllers/answer_prompt.py (2 hunks)
prompt-service/src/unstract/prompt_service/helpers/variable_replacement.py (2 hunks)
prompt-service/src/unstract/prompt_service/services/variable_replacement.py (4 hunks)
tools/structure/src/config/properties.json (1 hunks)
tools/structure/src/constants.py (1 hunks)
tools/structure/src/main.py (1 hunks)
unstract/workflow-execution/src/unstract/workflow_execution/constants.py (1 hunks)
unstract/workflow-execution/src/unstract/workflow_execution/execution_file_handler.py (2 hunks)

🧰 Additional context used

🪛 Ruff (0.13.1)

backend/api_v2/serializers.py

261-261: Avoid specifying long messages outside the exception class

(TRY003)

prompt-service/src/unstract/prompt_service/services/variable_replacement.py

37-37: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

103-103: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: build

🔇 Additional comments (18)

prompt-service/src/unstract/prompt_service/constants.py (2)

151-152: Enum rename confirmed — no VariableType.USER_DATA references remain.
rg output shows only VariableType.CUSTOM_DATA occurrences; no VariableType.USER_DATA found.

70-71: Approve — rename clear; verify no remaining USER_DATA references

File: prompt-service/src/unstract/prompt_service/constants.py (CUSTOM_DATA, TEXT). rg in prompt-service returned no matches; run a repo-wide search for 'USER_DATA' / 'user_data' and confirm all downstream consumers are updated.

tools/structure/src/config/properties.json (1)

5-5: Approve: version bump and image tags aligned to 0.0.88

Verified — tools/structure/src/config/properties.json and backend/sample.env both reference 0.0.88; no 0.0.87 occurrences found.

backend/workflow_manager/workflow_v2/file_execution_tasks.py (2)

922-925: Confirmed — signature matches; no change required.
Definition: def log_total_cost_per_file(self, run_id: str, file_name: str) in backend/workflow_manager/workflow_v2/execution.py:279; call uses run_id and file_name in backend/workflow_manager/workflow_v2/file_execution_tasks.py:922-925.

749-756: Confirm add_file_to_volume signature accepts custom_data (not user_data).

Verify backend/workflow_manager/endpoint_v2/source.py (def add_file_to_volume — ~line 918) accepts the keyword parameter custom_data; update either the callee or the callsites to match to avoid a runtime TypeError.

prompt-service/src/unstract/prompt_service/controllers/answer_prompt.py (1)

90-100: replace_variables_in_prompt accepts custom_data — no action required. The signature in prompt-service/src/unstract/prompt_service/services/variable_replacement.py declares custom_data: dict[str, Any] = None and internal calls pass custom_data.

backend/api_v2/serializers.py (1)

255-263: Validator rename aligned; message clear.

No issues.

prompt-service/src/unstract/prompt_service/helpers/variable_replacement.py (1)

103-156: CUSTOM_DATA_VARIABLE_REGEX exposes the path in group(1) — resolved.
Regex is defined as r"custom_data.([a-zA-Z0-9_.]+)" in prompt-service/src/unstract/prompt_service/constants.py:175, so custom_data_match.group(1) correctly yields the path; no change required.

backend/api_v2/api_deployment_views.py (1)

73-74: LGTM — confirm ApiExecution.USER_DATA removed and execute_workflow is called with custom_data=
Sandbox ripgrep produced no output; cannot verify here. Re-run the two checks locally or paste their output.
Location: backend/api_v2/api_deployment_views.py (≈lines 73–91).

backend/workflow_manager/endpoint_v2/source.py (1)

924-925: Signature rename approved — downstream handler accepts custom_data.
add_metadata_to_volume includes the custom_data parameter and writes it as content[MetaDataKey.CUSTOM_DATA] = custom_data in unstract/workflow-execution/src/unstract/workflow_execution/execution_file_handler.py (def at line 98, write at line 147).

backend/workflow_manager/workflow_v2/dto.py (1)

159-166: Confirm legacy user_data mapping necessity
I couldn’t find any occurrences of "user_data" in the repo; please verify whether any external producers still send this legacy key before adding the backward-compat mapping.

backend/api_v2/deployment_helper.py (1)

158-159: Verified — WorkflowHelper accepts and propagates custom_data.
execute_workflow_async is defined and custom_data is passed through call sites (backend/workflow_manager/workflow_v2/workflow_helper.py — lines ~206, 325, 433, 702), and propagated to downstream callers (file_execution_tasks.py:754; endpoint_v2/source.py:966).

backend/workflow_manager/workflow_v2/workflow_helper.py (5)

274-275: Propagation to run_workflow LGTM.

325-326: Forwarding custom_data into process_input_files LGTM.

703-703: Propagation into execute_workflow LGTM.

206-207: custom_data is supported in DTO and carried in task payloads; DB persistence is explicitly excluded — confirm intent.

FileData declares custom_data and its to_dict()/from_dict() include it (backend/workflow_manager/workflow_v2/dto.py).

WorkflowHelper builds FileData(custom_data=...) and sends FileBatchData.to_dict() to Celery (backend/workflow_manager/workflow_v2/workflow_helper.py).

FileExecutionTasks reconstructs FileBatchData/FileData and uses file_data.custom_data during processing (backend/workflow_manager/workflow_v2/file_execution_tasks.py).

create_workflow_execution explicitly excludes "custom_data" via EXECUTION_EXCLUDED_PARAMS, so custom_data is not persisted to the workflow execution DB and no cache-persist behavior was found — confirm whether custom_data should be stored or intentional omission is desired.

67-68: Excluding "custom_data" is correct — create_workflow_execution has no matching parameter.
Signature at backend/workflow_manager/workflow_v2/execution.py:126 does not include "custom_data", so excluding it prevents silent drops/TypeErrors.

prompt-service/src/unstract/prompt_service/services/variable_replacement.py (1)

69-74: Propagation looks correct — add unit tests for CUSTOM_DATA path.

replace_custom_data_variable is present (prompt-service/src/unstract/prompt_service/helpers/variable_replacement.py) and is invoked from the service (prompt-service/src/unstract/prompt_service/services/variable_replacement.py, ~lines 126–129); controllers pull CUSTOM_DATA at prompt-service/src/unstract/prompt_service/controllers/answer_prompt.py:56. Add unit tests for:

Prompt with CUSTOM_DATA variable + present data (assert replacement).

Prompt with CUSTOM_DATA variable + empty/missing data (assert no crash and expected fallback/behavior).

backend/api_v2/serializers.py

backend/sample.env

github-actions · 2025-09-22T12:25:31Z

filepath	function	$$\textcolor{#23d18b}{\tt{passed}}$$	SUBTOTAL
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_logs}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_cleanup}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_cleanup\_skip}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_client\_init}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image\_exists}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config\_without\_mount}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_run\_container}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image\_for\_sidecar}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_sidecar\_container}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{TOTAL}}$$		$$\textcolor{#23d18b}{\tt{11}}$$	$$\textcolor{#23d18b}{\tt{11}}$$

sonarqubecloud · 2025-09-22T13:13:11Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

jaags-dev and others added 9 commits September 19, 2025 12:45

Add user_data parameter for API deployment variable replacement overr…

7352c17

…ides

error handling fixes

8f9c91e

Commit uv.lock changes

db88951

added doc string

98c61b5

Merge branch 'feature/user-data-variable-support' of github.com:Zipst…

21cd69c

…ack/unstract into feature/user-data-variable-support

updated sample.env

747f3db

custom_data variable name change

74b757a

custom_data variable name change

69f22b5

updated structure too version

c953332

updated param info

d3e83cd

jaags-dev requested review from chandrasekharan-zipstack and muhammad-ali-e September 22, 2025 12:20

jaags-dev changed the title ~~Feature/user data variable support~~ UN-2807 [FEAT] Add custom_data variable support for Prompt Studio Sep 22, 2025

updated sdk version

5604b5c

jaags-dev changed the title ~~UN-2807 [FEAT] Add custom_data variable support for Prompt Studio~~ UN-2807 [FEAT] Changed user_data to custom_data in variable replacement Sep 22, 2025

coderabbitai bot reviewed Sep 22, 2025

View reviewed changes

backend/api_v2/serializers.py Outdated Show resolved Hide resolved

backend/sample.env Show resolved Hide resolved

jaags-dev requested review from Deepak-Kesavan, harini-venkataraman, jaseemjaskp and vishnuszipstack September 22, 2025 12:26

jaags-dev changed the title ~~UN-2807 [FEAT] Changed user_data to custom_data in variable replacement~~ UN-2807 [MISC] Changed user_data to custom_data in variable replacement Sep 22, 2025

chandrasekharan-zipstack approved these changes Sep 22, 2025

View reviewed changes

muhammad-ali-e approved these changes Sep 22, 2025

View reviewed changes

jaseemjaskp approved these changes Sep 22, 2025

View reviewed changes

Commit uv.lock changes

52fd282

jaseemjaskp merged commit 29b89ff into main Sep 22, 2025
4 checks passed

jaseemjaskp deleted the feature/user-data-variable-support branch September 22, 2025 13:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UN-2807 [MISC] Changed user_data to custom_data in variable replacement#1548

UN-2807 [MISC] Changed user_data to custom_data in variable replacement#1548
jaseemjaskp merged 12 commits intomainfrom
feature/user-data-variable-support

jaags-dev commented Sep 22, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Sep 22, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Sep 22, 2025

Uh oh!

sonarqubecloud bot commented Sep 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jaags-dev commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

How

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

Database Migrations

Env Config

Relevant Docs

Related Issues or PRs

Dependencies Versions

Notes on Testing

Screenshots

Checklist

Uh oh!

coderabbitai bot commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Sep 22, 2025

Uh oh!

sonarqubecloud bot commented Sep 22, 2025

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jaags-dev commented Sep 22, 2025 •

edited

Loading

coderabbitai bot commented Sep 22, 2025 •

edited

Loading