Skip to content

Conversation

@ArslanSaleem
Copy link
Contributor

@ArslanSaleem ArslanSaleem commented Apr 8, 2025

Important

Avoid lowercasing column names in SQL queries by using quote_identifiers in sanitize_view_column_name.

  • Behavior:
    • sanitize_view_column_name in sql_sanitizer.py now uses quote_identifiers to avoid lowercasing column names.
    • normalize_view_column_name and normalize_view_column_alias in ViewQueryBuilder updated to use sanitize_view_column_name directly.
  • Tests:
    • Updated expected results in test_sql_sanitizer.py and test_view_query_builder.py to reflect quoted column names.
    • Added tests for SQL injection scenarios in test_view_query_builder.py.

This description was created by Ellipsis for 38bdf96. It will automatically update as commits are pushed.

@ArslanSaleem ArslanSaleem requested a review from gventuri April 8, 2025 15:42
@codecov
Copy link

codecov bot commented Apr 8, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.72%. Comparing base (dab41c2) to head (38bdf96).
Report is 5 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1703   +/-   ##
=======================================
  Coverage   91.72%   91.72%           
=======================================
  Files          72       72           
  Lines        2803     2803           
=======================================
  Hits         2571     2571           
  Misses        232      232           
Flag Coverage Δ
unittests 91.72% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Reviewed everything up to d8d90d1 in 2 minutes and 54 seconds

More details
  • Looked at 421 lines of code in 4 files
  • Skipped 0 files when reviewing.
  • Skipped posting 15 drafted comments based on config settings.
1. pandasai/helpers/sql_sanitizer.py:9
  • Draft comment:
    Using parse_one with transform(quote_identifiers) ensures columns keep proper quoting. Ensure that invalid SQL inputs are correctly handled when parse_one might throw errors.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
    The first part of the comment is just describing what the code does, which isn't useful. The second part asks to "ensure" error handling, which violates our rule about not asking authors to verify things. If error handling was critical, the comment should directly state what error handling to add. The function is called "sanitize" so some error handling might be appropriate, but the comment isn't making a clear case for it.
    Perhaps I'm being too harsh - error handling for parse_one could be important for a sanitization function. The comment might be trying to point out a legitimate issue.
    While error handling could be valuable, the comment doesn't make a clear case for why it's needed here or what specifically should be done. It's just asking the author to "ensure" something, which our rules explicitly say to avoid.
    Delete the comment. It's partly descriptive (not useful) and partly asking the author to verify something (explicitly against our rules). If error handling is needed, a new comment should clearly state what to add.
2. pandasai/query_builders/view_query_builder.py:26
  • Draft comment:
    normalize_view_column_name now delegates to sanitize_view_column_name. Confirm this change is intended and that it handles all edge cases compared to prior normalization.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50%
    The comment is asking the PR author to confirm their intention and to ensure that the change handles all edge cases. This violates the rule against asking the author to confirm their intention or to ensure behavior is intended. Therefore, this comment should not be approved.
3. pandasai/query_builders/view_query_builder.py:30
  • Draft comment:
    normalize_view_column_alias now replaces dots with underscores then sanitizes. Verify this meets alias formatting requirements consistently.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50%
    The comment is asking the author to verify if the change meets alias formatting requirements, which is against the rules. It doesn't provide a specific suggestion or point out a clear issue with the code. It is not actionable or specific enough to be useful.
4. tests/unit_tests/query_builders/test_view_query_builder.py:118
  • Draft comment:
    Remove debugging print statements from test code.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50%
    While removing debug print statements is generally good practice, these print statements are in test code rather than production code. Test code often uses print statements for debugging test failures. The print statements don't affect production behavior. Our rules don't specifically mention test code print statements.
    The comment assumes all print statements should be removed, but test code may legitimately use print statements for debugging. We may be being too strict about print statements in test files.
    The print statements don't affect production code and may be helpful for test debugging. Our rules focus on production code quality, not test code.
    We should not keep this comment since it refers to print statements in test code which may be legitimately useful for debugging and don't affect production behavior.
5. tests/unit_tests/query_builders/test_view_query_builder.py:181
  • Draft comment:
    Remove debugging print statement from test_column_name_injection.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50%
    While print statements in production code should be removed, this is a test file. Print statements in tests can be useful for debugging test failures. The print statement shows the actual query when the test fails, which helps developers understand what went wrong. This is a common and acceptable practice in test files.
    Maybe the print statement adds noise to test output when all tests pass? Maybe it should be wrapped in a debug flag or only print on failure?
    The benefit of having the query visible when the test fails outweighs the minor noise in successful test runs. This is standard practice in test files.
    Do not remove the print statement as it serves a useful debugging purpose in this test file.
6. tests/unit_tests/query_builders/test_view_query_builder.py:337
  • Draft comment:
    Remove unnecessary debugging print statement in test_multiple_joins_same_table.
  • Reason this comment was not posted:
    Marked as duplicate.
7. tests/unit_tests/query_builders/test_view_query_builder.py:393
  • Draft comment:
    Remove debugging print statement from test_multiple_joins_same_table_with_aliases.
  • Reason this comment was not posted:
    Marked as duplicate.
8. pandasai/helpers/sql_sanitizer.py:12
  • Draft comment:
    Consider using a list comprehension for readability instead of list(map(...)).
  • Reason this comment was not posted:
    Comment was on unchanged code.
9. pandasai/query_builders/view_query_builder.py:30
  • Draft comment:
    Review the alias normalization logic: replacing '.' with '_' and then sanitizing may not cover all edge cases. Consider unifying this logic with normalize_view_column_name if appropriate.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
    The comment is speculative, using words like "may not cover all edge cases". It doesn't point to any specific issues or edge cases. The code has already been simplified and both methods use the same sanitization function. The comment suggests "consider unifying" but the code is already quite unified - they both use sanitize_view_column_name as their core sanitization.
    The comment could be right that there are edge cases we haven't considered. The simple string replace of "." with "_" might be insufficient for some column names.
    Without specific edge cases being identified, this is just speculation. The code is already using a dedicated sanitization function and has been simplified from a more complex version.
    The comment should be deleted as it's speculative and doesn't point to specific issues. It suggests unifying logic that is already largely unified.
10. pandasai/query_builders/view_query_builder.py:112
  • Draft comment:
    Alias generation logic is duplicated between _get_aliases and _get_columns. Consider consolidating this to adhere to the DRY principle.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
11. tests/unit_tests/query_builders/test_view_query_builder.py:118
  • Draft comment:
    Remove debugging print statements from tests.
  • Reason this comment was not posted:
    Marked as duplicate.
12. tests/unit_tests/query_builders/test_view_query_builder.py:181
  • Draft comment:
    Remove debugging print statements from tests.
  • Reason this comment was not posted:
    Marked as duplicate.
13. tests/unit_tests/query_builders/test_view_query_builder.py:250
  • Draft comment:
    Remove debugging print statements from tests.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50%
    While print statements in production code should definitely be removed, print statements in test files are less problematic. They can actually be helpful during test development and debugging. The print statements here appear to be intentionally left to help debug test failures. They don't affect production code or test outcomes.
    The comment assumes all print statements are bad and should be removed, but test files have different standards than production code. Print statements in tests can serve a legitimate debugging purpose.
    While the comment identifies a real issue (print statements), the context of being in a test file makes these print statements potentially useful rather than problematic.
    The comment should be deleted since print statements in test files can be useful for debugging and don't impact production code.
14. tests/unit_tests/query_builders/test_view_query_builder.py:337
  • Draft comment:
    Remove debugging print statements from tests.
  • Reason this comment was not posted:
    Marked as duplicate.
15. tests/unit_tests/query_builders/test_view_query_builder.py:393
  • Draft comment:
    Remove debugging print statements from tests.
  • Reason this comment was not posted:
    Marked as duplicate.

Workflow ID: wflow_9zzX56XhiI1f5j6i


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

0
].name = "column UNION SELECT username, password FROM users;"
query = view_query_builder.build_query()
print(query)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove debugging print statement from test_column_name_union_injection.

Suggested change
print(query)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 38bdf96 in 1 minute and 17 seconds

More details
  • Looked at 20 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 5 drafted comments based on config settings.
1. tests/unit_tests/query_builders/test_view_query_builder.py:178
  • Draft comment:
    Removed debugging print statement; it's best to avoid prints in production or test code.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
2. tests/unit_tests/query_builders/test_view_query_builder.py:247
  • Draft comment:
    Removed debugging print statement; it's best to avoid prints in production or test code.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
3. tests/unit_tests/query_builders/test_view_query_builder.py:179
  • Draft comment:
    Good removal of the debugging print statement. Avoid print statements in tests to keep output clean.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
4. tests/unit_tests/query_builders/test_view_query_builder.py:247
  • Draft comment:
    Nice removal of the extraneous print statement. Debug prints should not be in committed test code.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
5. tests/unit_tests/query_builders/test_view_query_builder.py:112
  • Draft comment:
    The function name 'test_get__group_by_columns' contains a double underscore after 'get'. This appears to be a typographical inconsistency as the method under test is _get_group_by_columns. Consider renaming the test to 'test_get_group_by_columns' for clarity.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_tlEkp2heL9PRdD3H


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@gventuri gventuri merged commit 455dfdb into main Apr 9, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants