fix(view_loader): avoid lower casing the column name #1703

ArslanSaleem · 2025-04-08T15:42:52Z

Important

Avoid lowercasing column names in SQL queries by using quote_identifiers in sanitize_view_column_name.

Behavior:
- sanitize_view_column_name in sql_sanitizer.py now uses quote_identifiers to avoid lowercasing column names.
- normalize_view_column_name and normalize_view_column_alias in ViewQueryBuilder updated to use sanitize_view_column_name directly.
Tests:
- Updated expected results in test_sql_sanitizer.py and test_view_query_builder.py to reflect quoted column names.
- Added tests for SQL injection scenarios in test_view_query_builder.py.

^{This description was created by}^{for 38bdf96. It will automatically update as commits are pushed.}

codecov · 2025-04-08T15:44:45Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.72%. Comparing base (dab41c2) to head (38bdf96).
Report is 5 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1703   +/-   ##
=======================================
  Coverage   91.72%   91.72%           
=======================================
  Files          72       72           
  Lines        2803     2803           
=======================================
  Hits         2571     2571           
  Misses        232      232

Flag	Coverage Δ
unittests	`91.72% <100.00%> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ellipsis-dev

❌ Changes requested. Reviewed everything up to d8d90d1 in 2 minutes and 54 seconds

More details

Looked at 421 lines of code in 4 files
Skipped 0 files when reviewing.
Skipped posting 15 drafted comments based on config settings.

1. pandasai/helpers/sql_sanitizer.py:9

Draft comment:
Using parse_one with transform(quote_identifiers) ensures columns keep proper quoting. Ensure that invalid SQL inputs are correctly handled when parse_one might throw errors.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
The first part of the comment is just describing what the code does, which isn't useful. The second part asks to "ensure" error handling, which violates our rule about not asking authors to verify things. If error handling was critical, the comment should directly state what error handling to add. The function is called "sanitize" so some error handling might be appropriate, but the comment isn't making a clear case for it.
Perhaps I'm being too harsh - error handling for parse_one could be important for a sanitization function. The comment might be trying to point out a legitimate issue.
While error handling could be valuable, the comment doesn't make a clear case for why it's needed here or what specifically should be done. It's just asking the author to "ensure" something, which our rules explicitly say to avoid.
Delete the comment. It's partly descriptive (not useful) and partly asking the author to verify something (explicitly against our rules). If error handling is needed, a new comment should clearly state what to add.

2. pandasai/query_builders/view_query_builder.py:26

Draft comment:
normalize_view_column_name now delegates to sanitize_view_column_name. Confirm this change is intended and that it handles all edge cases compared to prior normalization.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50%
The comment is asking the PR author to confirm their intention and to ensure that the change handles all edge cases. This violates the rule against asking the author to confirm their intention or to ensure behavior is intended. Therefore, this comment should not be approved.

3. pandasai/query_builders/view_query_builder.py:30

Draft comment:
normalize_view_column_alias now replaces dots with underscores then sanitizes. Verify this meets alias formatting requirements consistently.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50%
The comment is asking the author to verify if the change meets alias formatting requirements, which is against the rules. It doesn't provide a specific suggestion or point out a clear issue with the code. It is not actionable or specific enough to be useful.

4. tests/unit_tests/query_builders/test_view_query_builder.py:118

Draft comment:
Remove debugging print statements from test code.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50%
While removing debug print statements is generally good practice, these print statements are in test code rather than production code. Test code often uses print statements for debugging test failures. The print statements don't affect production behavior. Our rules don't specifically mention test code print statements.
The comment assumes all print statements should be removed, but test code may legitimately use print statements for debugging. We may be being too strict about print statements in test files.
The print statements don't affect production code and may be helpful for test debugging. Our rules focus on production code quality, not test code.
We should not keep this comment since it refers to print statements in test code which may be legitimately useful for debugging and don't affect production behavior.

5. tests/unit_tests/query_builders/test_view_query_builder.py:181

Draft comment:
Remove debugging print statement from test_column_name_injection.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50%
While print statements in production code should be removed, this is a test file. Print statements in tests can be useful for debugging test failures. The print statement shows the actual query when the test fails, which helps developers understand what went wrong. This is a common and acceptable practice in test files.
Maybe the print statement adds noise to test output when all tests pass? Maybe it should be wrapped in a debug flag or only print on failure?
The benefit of having the query visible when the test fails outweighs the minor noise in successful test runs. This is standard practice in test files.
Do not remove the print statement as it serves a useful debugging purpose in this test file.

6. tests/unit_tests/query_builders/test_view_query_builder.py:337

Draft comment:
Remove unnecessary debugging print statement in test_multiple_joins_same_table.
Reason this comment was not posted:
Marked as duplicate.

7. tests/unit_tests/query_builders/test_view_query_builder.py:393

Draft comment:
Remove debugging print statement from test_multiple_joins_same_table_with_aliases.
Reason this comment was not posted:
Marked as duplicate.

8. pandasai/helpers/sql_sanitizer.py:12

Draft comment:
Consider using a list comprehension for readability instead of list(map(...)).
Reason this comment was not posted:
Comment was on unchanged code.

9. pandasai/query_builders/view_query_builder.py:30

Draft comment:
Review the alias normalization logic: replacing '.' with '_' and then sanitizing may not cover all edge cases. Consider unifying this logic with normalize_view_column_name if appropriate.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
The comment is speculative, using words like "may not cover all edge cases". It doesn't point to any specific issues or edge cases. The code has already been simplified and both methods use the same sanitization function. The comment suggests "consider unifying" but the code is already quite unified - they both use sanitize_view_column_name as their core sanitization.
The comment could be right that there are edge cases we haven't considered. The simple string replace of "." with "_" might be insufficient for some column names.
Without specific edge cases being identified, this is just speculation. The code is already using a dedicated sanitization function and has been simplified from a more complex version.
The comment should be deleted as it's speculative and doesn't point to specific issues. It suggests unifying logic that is already largely unified.

10. pandasai/query_builders/view_query_builder.py:112

Draft comment:
Alias generation logic is duplicated between _get_aliases and _get_columns. Consider consolidating this to adhere to the DRY principle.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

11. tests/unit_tests/query_builders/test_view_query_builder.py:118

Draft comment:
Remove debugging print statements from tests.
Reason this comment was not posted:
Marked as duplicate.

12. tests/unit_tests/query_builders/test_view_query_builder.py:181

Draft comment:
Remove debugging print statements from tests.
Reason this comment was not posted:
Marked as duplicate.

13. tests/unit_tests/query_builders/test_view_query_builder.py:250

Draft comment:
Remove debugging print statements from tests.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50%
While print statements in production code should definitely be removed, print statements in test files are less problematic. They can actually be helpful during test development and debugging. The print statements here appear to be intentionally left to help debug test failures. They don't affect production code or test outcomes.
The comment assumes all print statements are bad and should be removed, but test files have different standards than production code. Print statements in tests can serve a legitimate debugging purpose.
While the comment identifies a real issue (print statements), the context of being in a test file makes these print statements potentially useful rather than problematic.
The comment should be deleted since print statements in test files can be useful for debugging and don't impact production code.

14. tests/unit_tests/query_builders/test_view_query_builder.py:337

Draft comment:
Remove debugging print statements from tests.
Reason this comment was not posted:
Marked as duplicate.

15. tests/unit_tests/query_builders/test_view_query_builder.py:393

Draft comment:
Remove debugging print statements from tests.
Reason this comment was not posted:
Marked as duplicate.

Workflow ID: wflow_9zzX56XhiI1f5j6i

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev · 2025-04-08T15:45:54Z

tests/unit_tests/query_builders/test_view_query_builder.py

            0
        ].name = "column UNION SELECT username, password FROM users;"
        query = view_query_builder.build_query()
+        print(query)


Remove debugging print statement from test_column_name_union_injection.

Suggested change

print(query)

ellipsis-dev

👍 Looks good to me! Incremental review on 38bdf96 in 1 minute and 17 seconds

More details

Looked at 20 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 5 drafted comments based on config settings.

1. tests/unit_tests/query_builders/test_view_query_builder.py:178

Draft comment:
Removed debugging print statement; it's best to avoid prints in production or test code.
Reason this comment was not posted:
Comment looked like it was already resolved.

2. tests/unit_tests/query_builders/test_view_query_builder.py:247

Draft comment:
Removed debugging print statement; it's best to avoid prints in production or test code.
Reason this comment was not posted:
Comment looked like it was already resolved.

3. tests/unit_tests/query_builders/test_view_query_builder.py:179

Draft comment:
Good removal of the debugging print statement. Avoid print statements in tests to keep output clean.
Reason this comment was not posted:
Comment looked like it was already resolved.

4. tests/unit_tests/query_builders/test_view_query_builder.py:247

Draft comment:
Nice removal of the extraneous print statement. Debug prints should not be in committed test code.
Reason this comment was not posted:
Comment looked like it was already resolved.

5. tests/unit_tests/query_builders/test_view_query_builder.py:112

Draft comment:
The function name 'test_get__group_by_columns' contains a double underscore after 'get'. This appears to be a typographical inconsistency as the method under test is _get_group_by_columns. Consider renaming the test to 'test_get_group_by_columns' for clarity.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_tlEkp2heL9PRdD3H

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

fix(view_loader): avoid lower casing the column name

d8d90d1

ArslanSaleem requested a review from gventuri April 8, 2025 15:42

ellipsis-dev bot reviewed Apr 8, 2025

View reviewed changes

fix: remove print statements

38bdf96

ellipsis-dev bot reviewed Apr 8, 2025

View reviewed changes

gventuri merged commit 455dfdb into main Apr 9, 2025
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(view_loader): avoid lower casing the column name #1703

fix(view_loader): avoid lower casing the column name #1703

Uh oh!

ArslanSaleem commented Apr 8, 2025 •

edited by ellipsis-dev bot

Loading

Uh oh!

codecov bot commented Apr 8, 2025 •

edited

Loading

Uh oh!

ellipsis-dev bot left a comment

Uh oh!

ellipsis-dev bot Apr 8, 2025

Uh oh!

ArslanSaleem Apr 8, 2025

Uh oh!

ellipsis-dev bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix(view_loader): avoid lower casing the column name #1703

fix(view_loader): avoid lower casing the column name #1703

Uh oh!

Conversation

ArslanSaleem commented Apr 8, 2025 • edited by ellipsis-dev bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Apr 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

ArslanSaleem Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ArslanSaleem commented Apr 8, 2025 •

edited by ellipsis-dev bot

Loading

codecov bot commented Apr 8, 2025 •

edited

Loading