chore(wren-ai-service): improve text2sql process #1070

cyyeh · 2024-12-27T06:55:53Z

Summary by CodeRabbit

New Features
- Enhanced query handling by prioritizing historical questions for SQL result generation.
- Improved error handling and response generation based on query outcomes.
Bug Fixes
- Refined logic for managing scenarios with no relevant SQL results.

coderabbitai · 2024-12-27T06:56:01Z

Walkthrough

The pull request modifies the ask method in the AskService class, focusing on enhancing the query processing logic. The primary change involves prioritizing historical question processing before intent classification. The method now first checks for historical questions, generating SQL results if found. If no historical questions are detected, it falls back to intent classification. The changes aim to improve the handling of user queries by providing a more streamlined approach to result generation, with refined error handling and response management.

Changes

File	Change Summary
`wren-ai-service/src/web/v1/services/ask.py`	Refactored `ask` method to prioritize historical question processing, modify intent classification logic, and improve result generation and error handling

Sequence Diagram

sequenceDiagram
    participant User
    participant AskService
    participant HistoricalQuestionProcessor
    participant IntentClassifier

    User->>AskService: Submit query
    AskService->>HistoricalQuestionProcessor: Check historical questions
    alt Historical questions found
        HistoricalQuestionProcessor-->>AskService: Generate SQL results
    else No historical questions
        AskService->>IntentClassifier: Perform intent classification
        IntentClassifier-->>AskService: Classify intent and generate results
    end
    AskService-->>User: Return query response

Possibly related PRs

chore(wren-ai-service): fix historical question query input #1064: Modifies the ask method in AskService with a focus on historical question handling

Suggested reviewers

paopa

Poem

🐰 In the realm of queries, a rabbit's delight,
Historical whispers now shine so bright
Intent takes a step back, data leads the way
SQL results dance in a more elegant play
Wren's AI service, smarter than before
A leap of logic through an open door! 🔍

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (6)

wren-ai-service/src/web/v1/services/ask.py (6)

145-145: Consider clarifying the purpose of api_results.
While initializing api_results to an empty list is straightforward, it might be beneficial to rename it to something more specific (e.g., historical_or_generated_results) to reflect how it may hold data from different pipelines.

160-163: Clarify slicing to a single document.
Slicing the first result ([:1]) works fine, but if only one document is needed, you could consider returning either a single item or None for clarity.

206-225: Consider capturing results from data_assistance.
Currently, data_assistance is run in a separate task, and its response is never stored. If real-time or post-processing usage is required, you might want to capture the results or recheck the pipeline outcome.

273-283: DRY up the followup vs. direct SQL generation.
The logic branches into two pipelines, "followup_sql_generation" vs. "sql_generation". If they share much of the same core logic, consider introducing a helper function to reduce duplication.

335-346: Finalize when api_results is present.
This completion logic properly updates the ask status and returns the final results. For readability, consider centralizing success/failure outcome setting in a single helper method.

347-360: logger.exception used for a non-exception scenario.
Raising a structured warning or info log (e.g., logger.warning) might be more appropriate when indicating no relevant SQL rather than including a stack trace.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cd4dc69 and 719f188.

📒 Files selected for processing (1)

wren-ai-service/src/web/v1/services/ask.py (3 hunks)

🔇 Additional comments (12)

wren-ai-service/src/web/v1/services/ask.py (12)

155-157: Validate pipeline availability.
Consider verifying that "historical_question" is a valid key in self._pipelines before attempting to run it, to avoid potential KeyError.

165-176: Check for missing keys in pipeline results.
The code assumes "statement" and "viewId" exist in each result. Consider handling potential missing keys or invalid structures from upstream pipeline data.

177-188: Account for partially populated classification results.
If intent_classification_result lacks expected fields like "intent", "rephrased_question", or "reasoning", this code could cause unexpected behaviors. Ensure you handle missing or malformed data gracefully.

190-193: Edge case with empty rephrased_question.
if not rephrased_question: also evaluates to False if rephrased_question is an empty string. Verify that this logic covers all intended scenarios.

196-205: Misleading query logic is clear.
The flow for handling a "MISLEADING_QUERY" intent looks consistent, sets the proper status, and returns.

227-233: Transition to TEXT_TO_SQL is handled well.
Setting up the response status and type for TEXT_TO_SQL is logically consistent.

265-265: Status progression to generating.
The code correctly updates the status to generating only if no results have been populated yet. No issues spotted.

284-292: Similar branching logic.
This block largely mirrors the logic in lines 273-283, so centralizing the pipeline decision in one place could make the code more maintainable.

294-305: Valid generation results successfully integrated.
Assigning valid SQL results to api_results is straightforward and well-structured.

306-310: Fallback to SQL correction looks good.
Switching to SQL correction if we detect failed dry-run results is a logical flow to ensure query accuracy.

311-320: Tracking status as correcting.
Setting the status to correcting prior to invoking the correction pipeline allows clear progress monitoring.

322-332: Integrating valid generation results post-correction.
Capturing the first valid corrected SQL and storing it in api_results aligns well with the pipeline design.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (5)

wren-ai-service/src/web/v1/services/ask.py (5)

165-176: Building AskResult from historical question results
The list comprehension is neat. Ensuring the fields statement and viewId exist in result is crucial; otherwise, consider default values or error handling if they might be absent.

190-193: Consider inlining user_query
The inline conditional is readable. As a minor nitpick, you could inline user_query usage wherever needed, but this approach is perfectly fine if you want to keep logic clear.

218-233: Fallback for unexpected intents
This else branch covers the "TEXT_TO_SQL" flow. If the pipeline returns an unknown or unsupported intent, the code also falls here. You might consider explicitly handling unexpected intents or adding a default logging.warning in case the pipeline yields unexpected values.

265-320: Consider refactoring to reduce complexity
The logic here (generating SQL or handling invalid generation through corrections) is comprehensive but significantly increases method length. Consider extracting the text-to-SQL generation and correction logic into helper functions to improve readability and reusability.

322-360: Use logger.error instead of logger.exception for logical errors
Line 347, for example, uses logger.exception outside of an actual except block. If the situation is not triggered by an exception, use logger.error or logger.warning. Reserve logger.exception for cases within an actual exception handler to preserve stack trace usage.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cd4dc69 and 719f188.

📒 Files selected for processing (1)

wren-ai-service/src/web/v1/services/ask.py (3 hunks)

🔇 Additional comments (5)

wren-ai-service/src/web/v1/services/ask.py (5)

145-145: Initialize api_results earlier for clarity
This line introduces the api_results list. It's a straightforward initialization, but it's good practice to declare such tracking lists at the start of the method for clarity and maintainability.

155-157: Verify the pipeline key "historical_question"
Ensure that "historical_question" is a valid key in the self._pipelines dictionary and handle the possibility of missing or misnamed pipelines (e.g., with a try-except, fallback, or logging).

160-163: Slicing the historical question results
Slicing to the top result [:1] is a clear way to limit the returned documents. This approach is concise and maintains the code’s readability.

177-188: Intent classification result usage
This block correctly retrieves the post-processed classification data. The .get("post_process", {}) usage safely handles missing keys. Ensure that any unexpected or missing data triggers appropriate logging or error handling if needed.

196-217: Double-check concurrency and background task handling
Creating a background task via asyncio.create_task detaches its execution from the main flow. Make sure exceptions in the data_assistance pipeline won't silently fail. Consider adding a callback or logging to handle potential background errors.

update

719f188

cyyeh added module/ai-service ai-service related ci/ai-service ai-service related labels Dec 27, 2024

cyyeh requested a review from paopa December 27, 2024 06:55

coderabbitai bot reviewed Dec 27, 2024

View reviewed changes

paopa approved these changes Dec 27, 2024

View reviewed changes

paopa merged commit e64e0ea into main Dec 27, 2024
13 of 14 checks passed

paopa deleted the chore/ai-service/improve-historical-question branch December 27, 2024 07:08

This was referenced Jan 23, 2025

chore(wren-ai-service): improve text2sql #1208

Merged

chore(wren-ai-service): add planning stage #1214

Merged

chore(wren-ai-service): minor updates #1219

Merged

coderabbitai bot mentioned this pull request Feb 4, 2025

chore(wren-ai-service): fix followup ask #1256

Merged

This was referenced Mar 4, 2025

feat(wren-ai-service): Add invalid SQL tracking to AskResultResponse #1356

Merged

feat(wren-ai-service): Implement Chat History Management with Max History Limit #1377

Merged

This was referenced Mar 13, 2025

feat(wren-ai-service): Add Instructions for SQL Generation #1376

Merged

chore(wren-ai-service): add followup sql generation reasoning #1407

Merged

chore(wren-ai-service): improve sql pairs and instructions #1422

Merged

This was referenced Mar 27, 2025

chore(wren-ai-service): minor updates #1470

Merged

chore(wren-ai-service): minor updates #1503

Merged

[DON'T MERGE]: chore(wren-ai-service): add name, score to retrieved tables #1510

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore(wren-ai-service): improve text2sql process #1070

chore(wren-ai-service): improve text2sql process #1070

Uh oh!

cyyeh commented Dec 27, 2024 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 27, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chore(wren-ai-service): improve text2sql process #1070

chore(wren-ai-service): improve text2sql process #1070

Uh oh!

Conversation

cyyeh commented Dec 27, 2024 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Possibly related PRs

Suggested reviewers

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cyyeh commented Dec 27, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 27, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)