-
Notifications
You must be signed in to change notification settings - Fork 2.3k
fix(sql): dialect sql parser fixed #1778
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1778 +/- ##
==========================================
+ Coverage 91.22% 91.23% +0.01%
==========================================
Files 70 70
Lines 2711 2715 +4
==========================================
+ Hits 2473 2477 +4
Misses 238 238
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed everything up to 3ea585c in 1 minute and 44 seconds. Click for details.
- Reviewed
69lines of code in3files - Skipped
0files when reviewing. - Skipped posting
3draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. pandasai/core/code_generation/code_cleaning.py:58
- Draft comment:
Ensure that 'self.context.dfs' is not empty before using index 0 to call get_dialect(). This assumption may lead to an IndexError if dfs is empty. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The comment points out a real potential issue - an IndexError could occur if dfs is empty. However, looking at the broader context, this is part of SQL query cleaning where we're validating table names against available dataframes. If there are no dataframes, there shouldn't be any SQL queries to clean in the first place. The code assumes a valid context with at least one dataframe, which seems reasonable given the class's purpose. I might be missing some edge cases where the code could be called with an empty context. The comment does point out a real potential runtime error. While technically correct about the potential error, this seems like an invalid state that shouldn't occur in normal operation. The class appears to be designed to work with contexts that have dataframes. The comment should be removed as it suggests defensive programming for what appears to be an invalid state. If dfs is empty, there would be no tables to validate against anyway.
2. pandasai/dataframe/base.py:135
- Draft comment:
The new get_dialect method correctly determines the SQL dialect. Consider adding a docstring to explain its behavior and defaulting logic, ensuring clarity for future maintainers. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
3. tests/unit_tests/core/code_generation/test_code_cleaning.py:74
- Draft comment:
The tests repeatedly mock 'get_dialect' for the dataframe. Consider refactoring this setup into the setUp method to adhere to DRY principles. - Reason this comment was not posted:
Comment was on unchanged code.
Workflow ID: wflow_00Gjr7nJajjXNIgS
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR centralizes SQL dialect resolution in the DataFrame API and ensures the SQL parser uses the correct dialect from the DataFrame context.
- Adds a new
get_dialect()method to determine SQL dialect based on the DataFrame’s source. - Refactors
serialize_dataframe()to callget_dialect(). - Updates
_clean_sql_query()to pass the resolved dialect intoSQLParser.extract_table_names(). - Adjusts existing unit tests to mock
get_dialect()where needed.
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| tests/unit_tests/core/code_generation/test_code_cleaning.py | Mock get_dialect() in test_clean_sql_query and test_validate_and_make_table_name_case_sensitive |
| pandasai/dataframe/base.py | Introduce get_dialect() and replace inline dialect logic in serialize_dataframe() |
| pandasai/core/code_generation/code_cleaning.py | Update _clean_sql_query() to fetch and pass the DataFrame dialect to the SQL parser |
Comments suppressed due to low confidence (2)
pandasai/dataframe/base.py:135
- [nitpick] Consider adding a docstring to
get_dialect()that clearly explains the fallback rules (defaulting to 'postgres' when no source, mapping local sources to 'duckdb', etc.) to improve readability.
def get_dialect(self):
pandasai/dataframe/base.py:135
- Add unit tests for
get_dialect()covering cases whereschema.sourceisNone, wheresource.typeis inLOCAL_SOURCE_TYPES, and where it's an external type to ensure all branches are validated.
def get_dialect(self):
| dialect = "duckdb" if source.type in LOCAL_SOURCE_TYPES else source.type | ||
| else: | ||
| dialect = "postgres" |
Copilot
AI
Jul 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Extract the literal strings 'duckdb' and 'postgres' into named constants (e.g., DEFAULT_SQL_DIALECT, LOCAL_SQL_DIALECT) to avoid magic literals and reduce the risk of typos.
| dialect = "duckdb" if source.type in LOCAL_SOURCE_TYPES else source.type | |
| else: | |
| dialect = "postgres" | |
| dialect = LOCAL_SQL_DIALECT if source.type in LOCAL_SOURCE_TYPES else source.type | |
| else: | |
| dialect = DEFAULT_SQL_DIALECT |
Important
Fixes SQL dialect handling in SQL parser by using dialect from DataFrame context and adds
get_dialect()method to determine dialect based on data source._clean_sql_query()incode_cleaning.pyby using dialect fromDataFramecontext.get_dialect()toDataFrameinbase.pyto determine SQL dialect based on data source.test_clean_sql_query()andtest_validate_and_make_table_name_case_sensitive()intest_code_cleaning.pyto mockget_dialect()method.This description was created by
for 3ea585c. You can customize this summary. It will automatically update as commits are pushed.