Skip to content

Conversation

@cyyeh
Copy link
Member

@cyyeh cyyeh commented Jul 1, 2025

Summary by CodeRabbit

  • New Features

    • Added support for "dry plan" mode in SQL generation and correction pipelines, enabling enhanced metadata-driven configuration and fallback options.
    • Exposed new API endpoints to list and execute available pipelines dynamically.
  • Improvements

    • Standardized request and response models to require non-optional types with default values, improving data consistency and validation.
    • Enhanced SQL correction prompts with explicit examples for clearer guidance.
    • Explicitly associated SQL-related pipelines with the Qdrant document store across multiple configuration files.
  • Chores

    • Updated environment configuration to use newer versions of core components.
  • Refactor

    • Removed unused dummy endpoints and related logic to streamline the API.
    • Simplified type annotations throughout the codebase for improved reliability.
    • Added test skips to several test suites to temporarily disable them during runs.

@cyyeh cyyeh requested a review from yichieh-lu July 1, 2025 15:00
@cyyeh cyyeh added module/ai-service ai-service related ci/ai-service ai-service related labels Jul 1, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 1, 2025

Walkthrough

This update standardizes type annotations throughout the codebase, removing the use of Optional for parameters and attributes that have default values and should not accept None. It also introduces new dry plan support for SQL-related pipelines and exposes dynamic pipeline execution endpoints in the development web API, while updating environment version numbers.

Changes

Files/Paths Change Summary
.../chart_generation.py, .../chart.py Changed remove_data_from_chart_schema from Optional[bool] to bool.
.../followup_sql_generation.py, .../sql_generation.py, .../sql_correction.py Added dry plan support: new parameters (use_dry_plan, allow_dry_plan_fallback, data_source), refactored metadata retrieval, made engine_timeout non-optional.
.../sql_regeneration.py, .../relationship_recommendation.py, .../sql_executor.py, .../sql_functions.py Changed engine_timeout and related parameters from optional to non-optional types.
.../utils/sql.py Made timeout parameter non-optional in SQLGenPostProcessor.run, added SQL_CORRECTION_EXAMPLES constant.
.../db_schema.py, .../db_schema_retrieval.py, .../historical_question_retrieval.py Made batch size and threshold parameters non-optional.
.../instructions.py, .../sql_pairs.py Made project_id and related parameters non-optional strings, fixed mutable default arguments.
.../instructions.py (retrieval), .../sql_pairs_retrieval.py Made similarity and retrieval size parameters non-optional.
.../litellm.py Made timeout parameter non-optional in embedder provider.
.../question_recommendation.py, .../question_recommendation.py (services) Made max_questions, max_categories, regenerate, and response fields non-optional.
.../ask.py Added use_dry_plan and allow_dry_plan_fallback to AskRequest, made several booleans non-optional, propagated new flags.
.../web/development.py Removed dummy endpoints, added dynamic pipeline listing and execution endpoints.
.../services/init.py Made language and timezone fields non-optional in Configuration.
tools/dev/.env Bumped component version numbers.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant WebAPI
    participant Pipelines
    participant MetadataStore

    Client->>WebAPI: POST /pipelines/{pipeline_name} (with kwargs)
    WebAPI->>Pipelines: pipeline.run(**kwargs)
    alt use_dry_plan enabled
        Pipelines->>MetadataStore: get_project_metadata(project_id)
        MetadataStore-->>Pipelines: metadata dict
        Pipelines->>Pipelines: extract data_source from metadata
    else use_dry_plan disabled
        Pipelines->>Pipelines: use empty metadata
    end
    Pipelines-->>WebAPI: result
    WebAPI-->>Client: result
Loading

Possibly related PRs

  • feat(wren-ai-service): add custom instructions #1756: The main PR changes the type annotation of the remove_data_from_chart_schema parameter in the ChartGeneration.run method from Optional[bool] to bool, while the retrieved PR adds a new optional custom_instruction parameter to the same method and related functions; since the main PR already includes the custom_instruction parameter in the method signature, these changes are directly related as they both modify the same method signature and parameters in ChartGeneration.run.

Suggested reviewers

  • yichieh-lu

Poem

A hop and a skip through type annotation land,
Where Optional is gone, and all types are planned.
Pipelines now flex with dry plan in tow,
Endpoints dynamic, ready to go!
Versions are bumped, the future looks bright—
This rabbit approves, with code running right! 🐇✨


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 74d083c and 2e1b7f7.

📒 Files selected for processing (3)
  • wren-ai-service/tests/pytest/services/test_instructions.py (6 hunks)
  • wren-ai-service/tests/pytest/services/test_sql_pairs.py (6 hunks)
  • wren-ai-service/tests/pytest/test_main.py (4 hunks)
✅ Files skipped from review due to trivial changes (3)
  • wren-ai-service/tests/pytest/test_main.py
  • wren-ai-service/tests/pytest/services/test_instructions.py
  • wren-ai-service/tests/pytest/services/test_sql_pairs.py
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: pytest
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (javascript-typescript)
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate Unit Tests
  • Create PR with Unit Tests
  • Post Copyable Unit Tests in Comment
  • Commit Unit Tests in branch chore/ai-service/improve-text2sql

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai auto-generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🔭 Outside diff range comments (1)
wren-ai-service/src/web/development.py (1)

67-92: Refactor to reduce duplication and add parameter validation.

This endpoint has several issues:

  1. Duplicates the pipeline extraction logic from the GET endpoint
  2. No validation of request parameters against expected pipeline parameters
  3. Error messages could expose internal implementation details

Extract the pipeline collection logic and add parameter validation:

+def _get_all_pipelines(service_container) -> Dict[str, Any]:
+    """Extract all pipelines from the service container."""
+    pipelines = {}
+    for attr_name, service in service_container.__dict__.items():
+        if attr_name.startswith("_"):
+            continue
+        if hasattr(service, "_pipelines"):
+            pipelines.update(service._pipelines)
+    return pipelines
+
 @router.post("/pipelines/{pipeline_name}")
 async def run_pipeline(pipeline_name: str, request_body: Dict[str, Any]) -> dict:
+    if os.getenv("ENVIRONMENT", "production") == "production":
+        raise HTTPException(status_code=403, detail="This endpoint is only available in development")
+    
     service_container = app.state.service_container
-    pipe_components = {}
-
-    for _, service in service_container.__dict__.items():
-        if hasattr(service, "_pipelines"):
-            for _pipeline_name, pipeline_instance in service._pipelines.items():
-                pipe_components[_pipeline_name] = pipeline_instance
+    pipe_components = _get_all_pipelines(service_container)
 
     if pipeline_name not in pipe_components:
         logger.error(f"Pipeline {pipeline_name} not found")
         raise HTTPException(
             status_code=404, detail=f"Pipeline '{pipeline_name}' not found"
         )
 
     try:
         pipeline = pipe_components[pipeline_name]
+        
+        # Validate parameters
+        expected_params = _extract_run_method_params(pipeline)
+        unexpected_params = set(request_body.keys()) - set(expected_params.keys())
+        if unexpected_params:
+            logger.warning(f"Unexpected parameters for pipeline '{pipeline_name}': {unexpected_params}")
+        
         result = await pipeline.run(**request_body)
         return result
     except Exception as e:
         logger.error(f"Error running pipeline {pipeline_name}: {e}")
         raise HTTPException(
-            status_code=500, detail=f"Error running pipeline '{pipeline_name}': {e}"
+            status_code=500, detail=f"Error running pipeline '{pipeline_name}': {type(e).__name__}"
         )

Also update the GET endpoint to use the new helper function:

 @router.get("/pipelines")
 async def get_pipelines() -> dict:
+    if os.getenv("ENVIRONMENT", "production") == "production":
+        raise HTTPException(status_code=403, detail="This endpoint is only available in development")
+    
     service_container = app.state.service_container
-    pipeline_params = {}
-
-    # Extract pipelines from all services
-    for _, service in service_container.__dict__.items():
-        if hasattr(service, "_pipelines"):
-            for pipeline_name, pipeline_instance in service._pipelines.items():
-                pipeline_params[pipeline_name] = _extract_run_method_params(
-                    pipeline_instance
-                )
+    pipelines = _get_all_pipelines(service_container)
+    
+    pipeline_params = {
+        name: _extract_run_method_params(pipeline)
+        for name, pipeline in pipelines.items()
+    }
 
     return pipeline_params
🧹 Nitpick comments (1)
wren-ai-service/src/pipelines/generation/sql_generation.py (1)

188-217: Validate project_id when dry plan is enabled.

The implementation looks good, but consider validating project_id when use_dry_plan is True to avoid potential issues with metadata retrieval.

     logger.info("SQL Generation pipeline is running...")
 
     if use_dry_plan:
+        if not project_id:
+            logger.warning("use_dry_plan is True but project_id is not provided, metadata retrieval may fail")
         metadata = await retrieve_metadata(project_id or "", self._retriever)
     else:
         metadata = {}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 20ab7e2 and f96349f.

📒 Files selected for processing (24)
  • wren-ai-service/src/pipelines/generation/chart_generation.py (1 hunks)
  • wren-ai-service/src/pipelines/generation/followup_sql_generation.py (6 hunks)
  • wren-ai-service/src/pipelines/generation/relationship_recommendation.py (2 hunks)
  • wren-ai-service/src/pipelines/generation/sql_correction.py (6 hunks)
  • wren-ai-service/src/pipelines/generation/sql_generation.py (6 hunks)
  • wren-ai-service/src/pipelines/generation/sql_regeneration.py (2 hunks)
  • wren-ai-service/src/pipelines/generation/utils/sql.py (3 hunks)
  • wren-ai-service/src/pipelines/indexing/db_schema.py (1 hunks)
  • wren-ai-service/src/pipelines/indexing/instructions.py (4 hunks)
  • wren-ai-service/src/pipelines/indexing/sql_pairs.py (6 hunks)
  • wren-ai-service/src/pipelines/retrieval/db_schema_retrieval.py (1 hunks)
  • wren-ai-service/src/pipelines/retrieval/historical_question_retrieval.py (1 hunks)
  • wren-ai-service/src/pipelines/retrieval/instructions.py (1 hunks)
  • wren-ai-service/src/pipelines/retrieval/sql_executor.py (1 hunks)
  • wren-ai-service/src/pipelines/retrieval/sql_functions.py (1 hunks)
  • wren-ai-service/src/pipelines/retrieval/sql_pairs_retrieval.py (1 hunks)
  • wren-ai-service/src/providers/embedder/litellm.py (1 hunks)
  • wren-ai-service/src/web/development.py (1 hunks)
  • wren-ai-service/src/web/v1/routers/question_recommendation.py (1 hunks)
  • wren-ai-service/src/web/v1/services/__init__.py (1 hunks)
  • wren-ai-service/src/web/v1/services/ask.py (6 hunks)
  • wren-ai-service/src/web/v1/services/chart.py (1 hunks)
  • wren-ai-service/src/web/v1/services/question_recommendation.py (2 hunks)
  • wren-ai-service/tools/dev/.env (1 hunks)
🧰 Additional context used
🧠 Learnings (17)
wren-ai-service/tools/dev/.env (2)
Learnt from: paopa
PR: Canner/WrenAI#976
File: wren-ai-service/tools/dev/.env:17-17
Timestamp: 2025-02-03T06:25:14.605Z
Learning: The IBIS server version sha-ce21e44 refers to a commit in the internal wren-engine repository (https://github.com/Canner/wren-engine), not the upstream ibis-project/ibis repository. This version includes fixes for server startup issues.
Learnt from: cyyeh
PR: Canner/WrenAI#1293
File: wren-ai-service/pyproject.toml:38-38
Timestamp: 2025-02-12T22:05:37.109Z
Learning: The qdrant-client package version in wren-ai-service must match the Qdrant server version (1.11.0) to maintain compatibility.
wren-ai-service/src/web/v1/services/chart.py (3)
Learnt from: andreashimin
PR: Canner/WrenAI#1117
File: wren-ui/src/apollo/server/backgrounds/chart.ts:86-86
Timestamp: 2025-01-15T03:50:47.868Z
Learning: In the WrenAI codebase, ChartType is maintained as a string in the web database, while the ChartType enum is specifically used for AI adaptor input/output interfaces. Therefore, string case conversions like toUpperCase() are intentional for maintaining data consistency in the web DB.
Learnt from: andreashimin
PR: Canner/WrenAI#1117
File: wren-ui/src/apollo/server/repositories/threadResponseRepository.ts:37-37
Timestamp: 2025-01-15T03:51:14.432Z
Learning: In the ThreadResponseRepository, the `chartType` property is intentionally typed as string in the database layer, while the ChartType enum is used specifically for AI adaptor input/output in the application layer.
Learnt from: cyyeh
PR: Canner/WrenAI#1763
File: wren-ai-service/src/pipelines/generation/semantics_description.py:31-33
Timestamp: 2025-06-20T02:37:21.292Z
Learning: In the wren-ai-service codebase, when adding new fields like "alias" to the output of functions that use Pydantic models for validation, the user prefers not to update the corresponding Pydantic model definitions to include these new fields.
wren-ai-service/src/web/v1/services/__init__.py (1)
Learnt from: cyyeh
PR: Canner/WrenAI#1763
File: wren-ai-service/src/pipelines/generation/semantics_description.py:31-33
Timestamp: 2025-06-20T02:37:21.292Z
Learning: In the wren-ai-service codebase, when adding new fields like "alias" to the output of functions that use Pydantic models for validation, the user prefers not to update the corresponding Pydantic model definitions to include these new fields.
wren-ai-service/src/pipelines/generation/chart_generation.py (1)
Learnt from: andreashimin
PR: Canner/WrenAI#1117
File: wren-ui/src/apollo/server/backgrounds/chart.ts:86-86
Timestamp: 2025-01-15T03:50:47.868Z
Learning: In the WrenAI codebase, ChartType is maintained as a string in the web database, while the ChartType enum is specifically used for AI adaptor input/output interfaces. Therefore, string case conversions like toUpperCase() are intentional for maintaining data consistency in the web DB.
wren-ai-service/src/pipelines/retrieval/db_schema_retrieval.py (1)
Learnt from: cyyeh
PR: Canner/WrenAI#1236
File: wren-ai-service/src/providers/llm/litellm.py:49-49
Timestamp: 2025-01-27T08:55:44.919Z
Learning: In the LitellmLLMProvider class, model parameters (`self._model_kwargs`) are intentionally designed to take precedence over runtime parameters (`generation_kwargs`), which differs from other LLM providers. This is the intended behavior specific to this provider.
wren-ai-service/src/pipelines/generation/relationship_recommendation.py (1)
Learnt from: cyyeh
PR: Canner/WrenAI#1236
File: wren-ai-service/src/providers/llm/litellm.py:49-49
Timestamp: 2025-01-27T08:55:44.919Z
Learning: In the LitellmLLMProvider class, model parameters (`self._model_kwargs`) are intentionally designed to take precedence over runtime parameters (`generation_kwargs`), which differs from other LLM providers. This is the intended behavior specific to this provider.
wren-ai-service/src/pipelines/indexing/instructions.py (2)
Learnt from: paopa
PR: Canner/WrenAI#1376
File: wren-ai-service/src/pipelines/indexing/instructions.py:100-107
Timestamp: 2025-03-13T03:48:24.664Z
Learning: The `InstructionsCleaner.run()` method in the instructions indexing pipeline only accepts `instruction_ids` and `project_id` parameters, and doesn't support a `delete_all` parameter for unconditional deletion.
Learnt from: onlyjackfrost
PR: Canner/WrenAI#1392
File: wren-ui/src/apollo/server/services/instructionService.ts:107-146
Timestamp: 2025-03-17T09:16:53.878Z
Learning: In the WrenAI project, the instructionResolver merges the instruction ID (from args.where) and project ID (from the current project context) with instruction data before passing to the instructionService, allowing the GraphQL schema to have separate input types while the service layer works with complete objects.
wren-ai-service/src/web/v1/routers/question_recommendation.py (1)
Learnt from: cyyeh
PR: Canner/WrenAI#1763
File: wren-ai-service/src/pipelines/generation/semantics_description.py:31-33
Timestamp: 2025-06-20T02:37:21.292Z
Learning: In the wren-ai-service codebase, when adding new fields like "alias" to the output of functions that use Pydantic models for validation, the user prefers not to update the corresponding Pydantic model definitions to include these new fields.
wren-ai-service/src/pipelines/generation/utils/sql.py (1)
Learnt from: andreashimin
PR: Canner/WrenAI#1414
File: wren-ui/src/apollo/server/utils/error.ts:0-0
Timestamp: 2025-03-18T10:28:10.593Z
Learning: The typo in error code enum names ("IDENTIED_AS_GENERAL" and "IDENTIED_AS_MISLEADING_QUERY" instead of "IDENTIFIED_AS_GENERAL" and "IDENTIFIED_AS_MISLEADING_QUERY") in wren-ui/src/apollo/server/utils/error.ts was recognized but intentionally not fixed as it was considered out of scope for PR #1414.
wren-ai-service/src/web/v1/services/ask.py (3)
Learnt from: cyyeh
PR: Canner/WrenAI#1763
File: wren-ai-service/src/pipelines/generation/semantics_description.py:31-33
Timestamp: 2025-06-20T02:37:21.292Z
Learning: In the wren-ai-service codebase, when adding new fields like "alias" to the output of functions that use Pydantic models for validation, the user prefers not to update the corresponding Pydantic model definitions to include these new fields.
Learnt from: cyyeh
PR: Canner/WrenAI#1112
File: wren-ai-service/src/web/v1/services/sql_question.py:17-28
Timestamp: 2025-01-14T07:45:32.410Z
Learning: In the SqlQuestionRequest model (Python/FastAPI), the query_id field is implemented using a protected _query_id attribute with getter/setter properties, rather than a direct field, as per the team's preference.
Learnt from: cyyeh
PR: Canner/WrenAI#1112
File: wren-ai-service/src/web/v1/services/sql_question.py:101-109
Timestamp: 2025-01-14T07:45:56.117Z
Learning: Warning about accessing `query_id` during exception handling in SQL question service can be skipped as per team's decision. This applies to the pattern where `sql_question_request.query_id` is accessed within the exception handling block.
wren-ai-service/src/pipelines/generation/sql_regeneration.py (2)
Learnt from: wwwy3y3
PR: Canner/WrenAI#1585
File: wren-ui/src/pages/api/v1/generate_sql.ts:98-106
Timestamp: 2025-04-24T16:10:43.308Z
Learning: In the generate_sql API, allow users to specify language codes not predefined in the WrenAILanguage enum, passing them through directly rather than strictly validating against the enum.
Learnt from: cyyeh
PR: Canner/WrenAI#1236
File: wren-ai-service/src/providers/llm/litellm.py:49-49
Timestamp: 2025-01-27T08:55:44.919Z
Learning: In the LitellmLLMProvider class, model parameters (`self._model_kwargs`) are intentionally designed to take precedence over runtime parameters (`generation_kwargs`), which differs from other LLM providers. This is the intended behavior specific to this provider.
wren-ai-service/src/web/v1/services/question_recommendation.py (1)
Learnt from: cyyeh
PR: Canner/WrenAI#1763
File: wren-ai-service/src/pipelines/generation/semantics_description.py:31-33
Timestamp: 2025-06-20T02:37:21.292Z
Learning: In the wren-ai-service codebase, when adding new fields like "alias" to the output of functions that use Pydantic models for validation, the user prefers not to update the corresponding Pydantic model definitions to include these new fields.
wren-ai-service/src/providers/embedder/litellm.py (1)
Learnt from: cyyeh
PR: Canner/WrenAI#1236
File: wren-ai-service/src/providers/llm/litellm.py:49-49
Timestamp: 2025-01-27T08:55:44.919Z
Learning: In the LitellmLLMProvider class, model parameters (`self._model_kwargs`) are intentionally designed to take precedence over runtime parameters (`generation_kwargs`), which differs from other LLM providers. This is the intended behavior specific to this provider.
wren-ai-service/src/web/development.py (1)
Learnt from: wwwy3y3
PR: Canner/WrenAI#1388
File: wren-ui/src/apollo/server/services/askingService.ts:599-604
Timestamp: 2025-03-17T19:26:10.835Z
Learning: Error handling for non-existent asking tasks in WrenAI is handled at the resolver level rather than in the service implementation.
wren-ai-service/src/pipelines/generation/sql_correction.py (3)
Learnt from: andreashimin
PR: Canner/WrenAI#1414
File: wren-ui/src/apollo/server/utils/error.ts:0-0
Timestamp: 2025-03-18T10:28:10.593Z
Learning: The typo in error code enum names ("IDENTIED_AS_GENERAL" and "IDENTIED_AS_MISLEADING_QUERY" instead of "IDENTIFIED_AS_GENERAL" and "IDENTIFIED_AS_MISLEADING_QUERY") in wren-ui/src/apollo/server/utils/error.ts was recognized but intentionally not fixed as it was considered out of scope for PR #1414.
Learnt from: andreashimin
PR: Canner/WrenAI#1414
File: wren-ui/src/apollo/server/utils/error.ts:0-0
Timestamp: 2025-03-18T10:28:26.608Z
Learning: The enum names `IDENTIED_AS_GENERAL` and `IDENTIED_AS_MISLEADING_QUERY` in GeneralErrorCodes enum in wren-ui/src/apollo/server/utils/error.ts have typos (missing "FI" in "IDENTIFIED") but fixing them is not in scope for the current PR.
Learnt from: cyyeh
PR: Canner/WrenAI#1236
File: wren-ai-service/src/providers/llm/litellm.py:49-49
Timestamp: 2025-01-27T08:55:44.919Z
Learning: In the LitellmLLMProvider class, model parameters (`self._model_kwargs`) are intentionally designed to take precedence over runtime parameters (`generation_kwargs`), which differs from other LLM providers. This is the intended behavior specific to this provider.
wren-ai-service/src/pipelines/generation/sql_generation.py (1)
Learnt from: cyyeh
PR: Canner/WrenAI#1236
File: wren-ai-service/src/providers/llm/litellm.py:49-49
Timestamp: 2025-01-27T08:55:44.919Z
Learning: In the LitellmLLMProvider class, model parameters (`self._model_kwargs`) are intentionally designed to take precedence over runtime parameters (`generation_kwargs`), which differs from other LLM providers. This is the intended behavior specific to this provider.
wren-ai-service/src/pipelines/generation/followup_sql_generation.py (2)
Learnt from: cyyeh
PR: Canner/WrenAI#1763
File: wren-ai-service/src/pipelines/generation/semantics_description.py:31-33
Timestamp: 2025-06-20T02:37:21.292Z
Learning: In the wren-ai-service codebase, when adding new fields like "alias" to the output of functions that use Pydantic models for validation, the user prefers not to update the corresponding Pydantic model definitions to include these new fields.
Learnt from: cyyeh
PR: Canner/WrenAI#1236
File: wren-ai-service/src/providers/llm/litellm.py:49-49
Timestamp: 2025-01-27T08:55:44.919Z
Learning: In the LitellmLLMProvider class, model parameters (`self._model_kwargs`) are intentionally designed to take precedence over runtime parameters (`generation_kwargs`), which differs from other LLM providers. This is the intended behavior specific to this provider.
🧬 Code Graph Analysis (2)
wren-ai-service/src/pipelines/generation/sql_correction.py (1)
wren-ai-service/src/pipelines/common.py (1)
  • retrieve_metadata (87-105)
wren-ai-service/src/pipelines/indexing/sql_pairs.py (4)
wren-ai-service/src/pipelines/indexing/instructions.py (3)
  • run (31-52)
  • run (61-76)
  • run (149-164)
wren-ai-service/src/pipelines/retrieval/instructions.py (2)
  • run (23-34)
  • run (177-187)
wren-ai-service/src/pipelines/retrieval/sql_pairs_retrieval.py (2)
  • run (23-33)
  • run (146-156)
wren-ai-service/src/pipelines/indexing/__init__.py (3)
  • run (25-47)
  • run (57-72)
  • run (78-87)
⏰ Context from checks skipped due to timeout of 90000ms (4)
  • GitHub Check: pytest
  • GitHub Check: pytest
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (41)
wren-ai-service/src/pipelines/retrieval/db_schema_retrieval.py (1)

440-441: LGTM! Type annotations improved for better clarity.

The removal of Optional typing from parameters with default values is a good practice. These parameters will never be None in practice, so the updated type annotations (int instead of Optional[int]) provide better clarity and type safety.

wren-ai-service/src/providers/embedder/litellm.py (1)

168-168: LGTM! Timeout parameter type annotation standardized.

The change from Optional[float] to float with a default value is consistent with the codebase-wide effort to remove optionality from parameters that have defaults and should never be None.

wren-ai-service/src/pipelines/retrieval/historical_question_retrieval.py (1)

125-125: LGTM! Similarity threshold parameter properly typed.

The change from Optional[float] to float with a default value of 0.9 improves type safety and aligns with the PR's goal of standardizing parameter typing by removing optionality where defaults are provided.

wren-ai-service/src/web/v1/services/chart.py (1)

20-20: LGTM! Boolean flag properly typed.

The change from Optional[bool] to bool with a default value ensures the flag will always be a boolean value, eliminating potential None-related issues and improving type clarity.

wren-ai-service/src/pipelines/retrieval/sql_executor.py (1)

69-69: LGTM! Engine timeout parameter properly typed.

The change from Optional[float] to float with a default value of 30.0 is consistent with the codebase-wide standardization effort and ensures the timeout parameter will always have a valid float value.

wren-ai-service/src/web/v1/services/__init__.py (1)

29-30: LGTM! Type annotation improvement enhances type safety.

Removing Optional in favor of non-optional types with default values ensures these configuration fields always have valid values and cannot be None. This improves type safety while maintaining backward compatibility.

wren-ai-service/src/pipelines/generation/relationship_recommendation.py (1)

190-190: LGTM! Consistent timeout parameter typing improvement.

Changing from Optional[float] to float = 30.0 ensures the timeout parameter always has a valid numeric value, preventing potential None-related errors and making the timeout behavior more predictable.

wren-ai-service/src/pipelines/generation/sql_regeneration.py (1)

153-153: LGTM! Consistent with timeout parameter standardization.

This change aligns with the broader effort to standardize timeout parameters across pipeline classes, ensuring they always have valid float values instead of potentially being None.

wren-ai-service/src/pipelines/indexing/db_schema.py (1)

339-339: LGTM! Batch size parameter typing improvement.

Changing from Optional[int] to int = 50 ensures the batch size parameter always has a valid integer value, preventing potential issues in column batching logic.

wren-ai-service/src/web/v1/routers/question_recommendation.py (1)

78-80: LGTM! Request model typing improvements enhance API consistency.

Removing Optional types in favor of non-optional types with sensible defaults (max_questions=5, max_categories=3, regenerate=False) ensures these API parameters always have valid values, improving predictability and reducing the need for None checks in downstream processing.

wren-ai-service/src/pipelines/generation/chart_generation.py (1)

157-157: LGTM! Type annotation improvement.

Removing Optional from the type annotation is correct since this parameter has a default value and should never be None. This improves type safety while maintaining the same functionality.

wren-ai-service/src/pipelines/retrieval/sql_functions.py (1)

97-98: LGTM! Type annotation improvements.

Removing Optional from engine_timeout and ttl parameters is correct since they have default values and should always be concrete numeric values for proper functionality.

wren-ai-service/src/pipelines/retrieval/sql_pairs_retrieval.py (1)

121-122: LGTM! Type annotation improvements.

Removing Optional from similarity threshold and retrieval size parameters is appropriate since they have default values and should always be concrete values for proper retrieval functionality.

wren-ai-service/src/pipelines/retrieval/instructions.py (1)

153-154: LGTM! Type annotation improvements.

Removing Optional from similarity_threshold and top_k parameters is correct since they have default values and should always be concrete values for proper retrieval filtering.

wren-ai-service/src/web/v1/services/question_recommendation.py (2)

25-25: LGTM! Type annotation improvement.

Removing Optional from the response attribute is correct since it has a default dictionary value and should never be None.


159-161: LGTM! Type annotation improvements.

Removing Optional from max_questions, max_categories, and regenerate parameters is appropriate since they all have default values and should always be concrete values for proper request handling.

wren-ai-service/src/pipelines/generation/utils/sql.py (3)

2-2: LGTM: Type annotation standardization

Removing the Optional import aligns with the broader effort to standardize type annotations by using non-optional types with default values instead of optional types.


32-32: LGTM: Consistent timeout parameter handling

The change from Optional[float] to float = 30.0 standardizes timeout handling across SQL-related pipelines and eliminates potential null pointer issues while maintaining the same default behavior.


247-252: LGTM: Valuable SQL correction examples

The addition of SQL_CORRECTION_EXAMPLES provides concrete guidance for SQL correction scenarios, specifically addressing UNION query syntax issues with parentheses. This will enhance the effectiveness of the SQL correction prompt.

wren-ai-service/src/web/v1/services/ask.py (4)

28-31: LGTM: Well-designed dry plan configuration

The new boolean fields use_dry_plan and allow_dry_plan_fallback provide clear control over dry plan behavior with sensible defaults. The use_dry_plan=False default maintains backward compatibility, while allow_dry_plan_fallback=True provides a safe fallback mechanism.


83-83: LGTM: Type annotation consistency

Changing is_followup from Optional[bool] to bool = False aligns with the broader type annotation standardization effort and eliminates potential null handling issues.


213-214: LGTM: Proper parameter extraction

The extraction of use_dry_plan and allow_dry_plan_fallback from the request follows the established pattern for other configuration parameters in this method.


501-502: LGTM: Consistent parameter propagation

The dry plan parameters are correctly passed to all relevant SQL pipelines (followup_sql_generation, sql_generation, and sql_correction), ensuring consistent behavior across different execution paths.

Also applies to: 518-519, 561-562

wren-ai-service/src/pipelines/indexing/instructions.py (1)

31-31: LGTM: Consistent type annotation standardization

The changes from Optional[str] to str = "" for project_id parameters across multiple functions maintain the same default behavior while improving type safety by eliminating potential null values. This aligns with the broader codebase standardization effort.

Also applies to: 86-86, 104-104, 152-152

wren-ai-service/src/pipelines/generation/sql_correction.py (4)

3-3: LGTM: Type annotation standardization

Removing the Optional import and changing engine_timeout from Optional[float] to float = 30.0 aligns with the codebase standardization effort and eliminates potential null handling issues.

Also applies to: 115-115


16-16: LGTM: Enhanced SQL correction guidance

The import and integration of SQL_CORRECTION_EXAMPLES into the system prompt provides concrete examples for SQL correction scenarios, which should improve the LLM's ability to fix syntax errors in SQL queries.

Also applies to: 32-39


91-91: LGTM: Proper data source parameter integration

Adding the data_source parameter to the post_process function signature enables the pipeline to handle different data sources appropriately during SQL validation and correction.


153-156: LGTM: Efficient conditional metadata retrieval

The conditional metadata retrieval based on use_dry_plan is implemented efficiently - only fetching metadata when needed for dry plan validation, with proper fallback to empty dict when not required.

wren-ai-service/src/pipelines/generation/followup_sql_generation.py (4)

3-3: LGTM: Type annotation consistency

Removing the Optional import and changing engine_timeout to float = 30.0 maintains consistency with other SQL pipeline modules and improves type safety.

Also applies to: 153-153


12-13: LGTM: Proper dependency integration

The addition of DocumentStoreProvider dependency and retriever initialization enables metadata retrieval for dry plan functionality. The implementation follows the established pattern from other SQL generation pipelines.

Also applies to: 151-151, 156-158


129-129: LGTM: Consistent parameter handling

The addition of data_source, use_dry_plan, and allow_dry_plan_fallback parameters to the post_process function maintains consistency with the updated SQLGenPostProcessor interface.

Also applies to: 131-132, 138-140


194-195: LGTM: Complete dry plan integration

The implementation properly integrates dry plan functionality with conditional metadata retrieval and parameter propagation through the pipeline execution. The pattern matches the implementation in other SQL generation pipelines, ensuring consistent behavior.

Also applies to: 199-202, 218-220

wren-ai-service/src/web/development.py (1)

1-3: LGTM!

The imports are appropriate for the introspection functionality.

wren-ai-service/src/pipelines/generation/sql_generation.py (3)

3-13: LGTM!

The import changes are appropriate for the dry plan functionality.


120-136: LGTM!

The function correctly propagates the dry plan parameters to the post processor.


143-154: LGTM!

The constructor changes properly initialize the document store retriever for metadata access.

wren-ai-service/src/pipelines/indexing/sql_pairs.py (5)

31-31: LGTM!

The type annotation change is consistent with the codebase standardization.


111-111: LGTM!

The type annotation standardization is consistent.


129-129: LGTM!

The type annotation change follows the established pattern.


171-171: LGTM!

Removing Optional from a parameter with a non-None default value improves type accuracy.


196-209: Excellent fix for the mutable default argument!

These changes improve the code by:

  1. Standardizing the project_id type annotation
  2. Fixing the mutable default argument anti-pattern for external_pairs
  3. Properly handling the None case with (external_pairs or {})

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f96349f and a552ed1.

📒 Files selected for processing (1)
  • wren-ai-service/tests/data/config.test.yaml (2 hunks)
🧰 Additional context used
🧠 Learnings (1)
wren-ai-service/tests/data/config.test.yaml (2)
Learnt from: cyyeh
PR: Canner/WrenAI#1763
File: wren-ai-service/src/pipelines/generation/semantics_description.py:31-33
Timestamp: 2025-06-20T02:37:21.292Z
Learning: In the wren-ai-service codebase, when adding new fields like "alias" to the output of functions that use Pydantic models for validation, the user prefers not to update the corresponding Pydantic model definitions to include these new fields.
Learnt from: cyyeh
PR: Canner/WrenAI#1236
File: wren-ai-service/src/providers/llm/litellm.py:49-49
Timestamp: 2025-01-27T08:55:44.919Z
Learning: In the LitellmLLMProvider class, model parameters (`self._model_kwargs`) are intentionally designed to take precedence over runtime parameters (`generation_kwargs`), which differs from other LLM providers. This is the intended behavior specific to this provider.
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: pytest
  • GitHub Check: Analyze (go)
🔇 Additional comments (1)
wren-ai-service/tests/data/config.test.yaml (1)

2-2: Double-check provider IDs

provider: litellm_llm and provider: litellm_embedder must exactly match the identifiers registered in the service’s provider registry. A mismatch will make pipeline resolution fail at runtime.

Please verify the canonical names in src/providers/__init__.py (or equivalent).

Also applies to: 17-17

@github-actions github-actions bot changed the title chore(wren-ai-service): minor updates chore(wren-ai-service): minor updates (ai-env-changed) Jul 1, 2025
@cyyeh cyyeh merged commit eba270d into main Jul 2, 2025
11 checks passed
@cyyeh cyyeh deleted the chore/ai-service/improve-text2sql branch July 2, 2025 02:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants