Skip to content

Conversation

@scaliseraoul
Copy link
Contributor

@scaliseraoul scaliseraoul commented Jan 27, 2025


Important

Sanitize dataset name in _load_schema() in loader.py to prevent SQL injection or invalid table names.

  • Behavior:
    • Sanitize dataset name in _load_schema() in loader.py using sanitize_sql_table_name to prevent SQL injection or invalid table names.

This description was created by Ellipsis for 059533d. It will automatically update as commits are pushed.

@dosubot dosubot bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Jan 27, 2025
@scaliseraoul scaliseraoul changed the title fix: sanitize name of dataset passed as param in pai.load fix: sanitize name of dataset in pai.load Jan 27, 2025
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Reviewed everything up to 059533d in 37 seconds

More details
  • Looked at 20 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. pandasai/data_loader/loader.py:82
  • Draft comment:
    Ensure that sanitize_sql_table_name is robust and covers all potential SQL injection vectors. This is crucial for security when dealing with SQL-related operations.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The PR introduces a sanitization step for the dataset name in the schema. This is a good security measure to prevent SQL injection or other issues related to unsanitized inputs. However, it's important to ensure that the sanitize_sql_table_name function is correctly implemented and covers all necessary cases.

Workflow ID: wflow_DGzktg6fVeOs9ma9


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.


with open(schema_path, "r") as file:
raw_schema = yaml.safe_load(file)
raw_schema["name"] = sanitize_sql_table_name(raw_schema["name"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure that there is a corresponding unit test for the sanitize_sql_table_name function to verify that the dataset name is correctly sanitized in the schema.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests already defined in tests/unit_tests/helpers/test_sql_sanitizer.py

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scaliseraoul maybe we could also add a test to enforce the name of the schema loaded from the loader is sanitized (like in this case)?

@dosubot dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. and removed size:XS This PR changes 0-9 lines, ignoring generated files. labels Jan 27, 2025
@gventuri gventuri merged commit c95eea7 into sinaptik-ai:release/v3 Jan 27, 2025
12 checks passed
@scaliseraoul scaliseraoul deleted the release/v3 branch January 27, 2025 12:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:S This PR changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants