Skip to content

Conversation

@ArslanSaleem
Copy link
Contributor

@ArslanSaleem ArslanSaleem commented Jul 11, 2025


Important

Fixes SQL dialect handling in SQL parser by using dialect from DataFrame context and adds get_dialect() method to determine dialect based on data source.

  • Behavior:
    • Fixes SQL dialect handling in _clean_sql_query() in code_cleaning.py by using dialect from DataFrame context.
    • Adds get_dialect() to DataFrame in base.py to determine SQL dialect based on data source.
  • Tests:
    • Updates test_clean_sql_query() and test_validate_and_make_table_name_case_sensitive() in test_code_cleaning.py to mock get_dialect() method.

This description was created by Ellipsis for 3ea585c. You can customize this summary. It will automatically update as commits are pushed.

@ArslanSaleem ArslanSaleem requested a review from gventuri July 11, 2025 07:25
@codecov
Copy link

codecov bot commented Jul 11, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.23%. Comparing base (20241be) to head (3ea585c).
Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1778      +/-   ##
==========================================
+ Coverage   91.22%   91.23%   +0.01%     
==========================================
  Files          70       70              
  Lines        2711     2715       +4     
==========================================
+ Hits         2473     2477       +4     
  Misses        238      238              
Flag Coverage Δ
unittests 91.23% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed everything up to 3ea585c in 1 minute and 44 seconds. Click for details.
  • Reviewed 69 lines of code in 3 files
  • Skipped 0 files when reviewing.
  • Skipped posting 3 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. pandasai/core/code_generation/code_cleaning.py:58
  • Draft comment:
    Ensure that 'self.context.dfs' is not empty before using index 0 to call get_dialect(). This assumption may lead to an IndexError if dfs is empty.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The comment points out a real potential issue - an IndexError could occur if dfs is empty. However, looking at the broader context, this is part of SQL query cleaning where we're validating table names against available dataframes. If there are no dataframes, there shouldn't be any SQL queries to clean in the first place. The code assumes a valid context with at least one dataframe, which seems reasonable given the class's purpose. I might be missing some edge cases where the code could be called with an empty context. The comment does point out a real potential runtime error. While technically correct about the potential error, this seems like an invalid state that shouldn't occur in normal operation. The class appears to be designed to work with contexts that have dataframes. The comment should be removed as it suggests defensive programming for what appears to be an invalid state. If dfs is empty, there would be no tables to validate against anyway.
2. pandasai/dataframe/base.py:135
  • Draft comment:
    The new get_dialect method correctly determines the SQL dialect. Consider adding a docstring to explain its behavior and defaulting logic, ensuring clarity for future maintainers.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50% None
3. tests/unit_tests/core/code_generation/test_code_cleaning.py:74
  • Draft comment:
    The tests repeatedly mock 'get_dialect' for the dataframe. Consider refactoring this setup into the setUp method to adhere to DRY principles.
  • Reason this comment was not posted:
    Comment was on unchanged code.

Workflow ID: wflow_00Gjr7nJajjXNIgS

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

@gventuri gventuri requested a review from Copilot July 13, 2025 11:51
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR centralizes SQL dialect resolution in the DataFrame API and ensures the SQL parser uses the correct dialect from the DataFrame context.

  • Adds a new get_dialect() method to determine SQL dialect based on the DataFrame’s source.
  • Refactors serialize_dataframe() to call get_dialect().
  • Updates _clean_sql_query() to pass the resolved dialect into SQLParser.extract_table_names().
  • Adjusts existing unit tests to mock get_dialect() where needed.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
tests/unit_tests/core/code_generation/test_code_cleaning.py Mock get_dialect() in test_clean_sql_query and test_validate_and_make_table_name_case_sensitive
pandasai/dataframe/base.py Introduce get_dialect() and replace inline dialect logic in serialize_dataframe()
pandasai/core/code_generation/code_cleaning.py Update _clean_sql_query() to fetch and pass the DataFrame dialect to the SQL parser
Comments suppressed due to low confidence (2)

pandasai/dataframe/base.py:135

  • [nitpick] Consider adding a docstring to get_dialect() that clearly explains the fallback rules (defaulting to 'postgres' when no source, mapping local sources to 'duckdb', etc.) to improve readability.
    def get_dialect(self):

pandasai/dataframe/base.py:135

  • Add unit tests for get_dialect() covering cases where schema.source is None, where source.type is in LOCAL_SOURCE_TYPES, and where it's an external type to ensure all branches are validated.
    def get_dialect(self):

Comment on lines +138 to +140
dialect = "duckdb" if source.type in LOCAL_SOURCE_TYPES else source.type
else:
dialect = "postgres"
Copy link

Copilot AI Jul 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Extract the literal strings 'duckdb' and 'postgres' into named constants (e.g., DEFAULT_SQL_DIALECT, LOCAL_SQL_DIALECT) to avoid magic literals and reduce the risk of typos.

Suggested change
dialect = "duckdb" if source.type in LOCAL_SOURCE_TYPES else source.type
else:
dialect = "postgres"
dialect = LOCAL_SQL_DIALECT if source.type in LOCAL_SOURCE_TYPES else source.type
else:
dialect = DEFAULT_SQL_DIALECT

Copilot uses AI. Check for mistakes.
@gventuri gventuri merged commit b119213 into main Jul 13, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants