fix(sql): dialect sql parser fixed #1778

ArslanSaleem · 2025-07-11T07:25:42Z

[Sqlglot parse error due to LLM Code Response (tested with GPT-4) #1729 ] Closes #xxxx (Replace xxxx with the GitHub issue number).
[ *] Tests added and passed if fixing a bug or adding a new feature.

Important

Fixes SQL dialect handling in SQL parser by using dialect from DataFrame context and adds get_dialect() method to determine dialect based on data source.

Behavior:
- Fixes SQL dialect handling in _clean_sql_query() in code_cleaning.py by using dialect from DataFrame context.
- Adds get_dialect() to DataFrame in base.py to determine SQL dialect based on data source.
Tests:
- Updates test_clean_sql_query() and test_validate_and_make_table_name_case_sensitive() in test_code_cleaning.py to mock get_dialect() method.

^{This description was created by}^{for 3ea585c. You can customize this summary. It will automatically update as commits are pushed.}

codecov · 2025-07-11T07:27:25Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.23%. Comparing base (20241be) to head (3ea585c).
Report is 3 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1778      +/-   ##
==========================================
+ Coverage   91.22%   91.23%   +0.01%     
==========================================
  Files          70       70              
  Lines        2711     2715       +4     
==========================================
+ Hits         2473     2477       +4     
  Misses        238      238

Flag	Coverage Δ
unittests	`91.23% <100.00%> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ellipsis-dev

Important

Looks good to me! 👍

Reviewed everything up to 3ea585c in 1 minute and 44 seconds. Click for details.

Reviewed 69 lines of code in 3 files
Skipped 0 files when reviewing.
Skipped posting 3 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. pandasai/core/code_generation/code_cleaning.py:58

Draft comment:
Ensure that 'self.context.dfs' is not empty before using index 0 to call get_dialect(). This assumption may lead to an IndexError if dfs is empty.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The comment points out a real potential issue - an IndexError could occur if dfs is empty. However, looking at the broader context, this is part of SQL query cleaning where we're validating table names against available dataframes. If there are no dataframes, there shouldn't be any SQL queries to clean in the first place. The code assumes a valid context with at least one dataframe, which seems reasonable given the class's purpose. I might be missing some edge cases where the code could be called with an empty context. The comment does point out a real potential runtime error. While technically correct about the potential error, this seems like an invalid state that shouldn't occur in normal operation. The class appears to be designed to work with contexts that have dataframes. The comment should be removed as it suggests defensive programming for what appears to be an invalid state. If dfs is empty, there would be no tables to validate against anyway.

2. pandasai/dataframe/base.py:135

Draft comment:
The new get_dialect method correctly determines the SQL dialect. Consider adding a docstring to explain its behavior and defaulting logic, ensuring clarity for future maintainers.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

3. tests/unit_tests/core/code_generation/test_code_cleaning.py:74

Draft comment:
The tests repeatedly mock 'get_dialect' for the dataframe. Consider refactoring this setup into the setUp method to adhere to DRY principles.
Reason this comment was not posted:
Comment was on unchanged code.

Workflow ID: wflow_00Gjr7nJajjXNIgS

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

Copilot

Pull Request Overview

This PR centralizes SQL dialect resolution in the DataFrame API and ensures the SQL parser uses the correct dialect from the DataFrame context.

Adds a new get_dialect() method to determine SQL dialect based on the DataFrame’s source.
Refactors serialize_dataframe() to call get_dialect().
Updates _clean_sql_query() to pass the resolved dialect into SQLParser.extract_table_names().
Adjusts existing unit tests to mock get_dialect() where needed.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
tests/unit_tests/core/code_generation/test_code_cleaning.py	Mock `get_dialect()` in `test_clean_sql_query` and `test_validate_and_make_table_name_case_sensitive`
pandasai/dataframe/base.py	Introduce `get_dialect()` and replace inline dialect logic in `serialize_dataframe()`
pandasai/core/code_generation/code_cleaning.py	Update `_clean_sql_query()` to fetch and pass the DataFrame dialect to the SQL parser

Comments suppressed due to low confidence (2)

pandasai/dataframe/base.py:135

[nitpick] Consider adding a docstring to get_dialect() that clearly explains the fallback rules (defaulting to 'postgres' when no source, mapping local sources to 'duckdb', etc.) to improve readability.

    def get_dialect(self):

pandasai/dataframe/base.py:135

Add unit tests for get_dialect() covering cases where schema.source is None, where source.type is in LOCAL_SOURCE_TYPES, and where it's an external type to ensure all branches are validated.

    def get_dialect(self):

Copilot · 2025-07-13T11:51:46Z

pandasai/dataframe/base.py

+            dialect = "duckdb" if source.type in LOCAL_SOURCE_TYPES else source.type
+        else:
+            dialect = "postgres"


[nitpick] Extract the literal strings 'duckdb' and 'postgres' into named constants (e.g., DEFAULT_SQL_DIALECT, LOCAL_SQL_DIALECT) to avoid magic literals and reduce the risk of typos.

Suggested change

dialect = "duckdb" if source.type in LOCAL_SOURCE_TYPES else source.type

else:

dialect = "postgres"

dialect = LOCAL_SQL_DIALECT if source.type in LOCAL_SOURCE_TYPES else source.type

else:

dialect = DEFAULT_SQL_DIALECT

fix(sql): dialect sql parser fixed

3ea585c

ArslanSaleem requested a review from gventuri July 11, 2025 07:25

ellipsis-dev bot reviewed Jul 11, 2025

View reviewed changes

gventuri requested a review from Copilot July 13, 2025 11:51

Copilot AI reviewed Jul 13, 2025

View reviewed changes

gventuri merged commit b119213 into main Jul 13, 2025
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(sql): dialect sql parser fixed #1778

fix(sql): dialect sql parser fixed #1778

Uh oh!

ArslanSaleem commented Jul 11, 2025 •

edited by ellipsis-dev bot

Loading

Uh oh!

codecov bot commented Jul 11, 2025 •

edited

Loading

Uh oh!

ellipsis-dev bot left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jul 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix(sql): dialect sql parser fixed #1778

fix(sql): dialect sql parser fixed #1778

Uh oh!

Conversation

ArslanSaleem commented Jul 11, 2025 • edited by ellipsis-dev bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jul 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ArslanSaleem commented Jul 11, 2025 •

edited by ellipsis-dev bot

Loading

codecov bot commented Jul 11, 2025 •

edited

Loading