Skip to content

Conversation

@ArslanSaleem
Copy link
Collaborator

@ArslanSaleem ArslanSaleem commented Apr 10, 2025

memory issues in production where parallel request is happening same dataset name can override the already registered from other user


Important

DuckDBConnectionManager is refactored to be non-singleton, adding unregister() and ensuring proper connection closure.

  • Behavior:
    • DuckDBConnectionManager is no longer a singleton; each instance manages its own connection.
    • Removes __new__ method and singleton pattern; adds __del__ to close connections.
    • Introduces unregister() method to remove tables from _registered_tables.
  • Tests:
    • Adds test_unregister in test_duckdbmanager.py to verify unregister() functionality.
    • Ensures close() does not throw errors when called multiple times.

This description was created by Ellipsis for 80007c0. It will automatically update as commits are pushed.

@ArslanSaleem ArslanSaleem requested a review from gventuri April 10, 2025 16:41
@ArslanSaleem ArslanSaleem changed the title fix(duckdbConnection): make duckdb connection manager non singleton to avoid shared space of registered tables fix(duckdbConnection): make duckdb connection manager non singleton Apr 10, 2025
@codecov
Copy link

codecov bot commented Apr 10, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.71%. Comparing base (cd68a1d) to head (80007c0).
Report is 5 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1710      +/-   ##
==========================================
- Coverage   91.71%   91.71%   -0.01%     
==========================================
  Files          72       72              
  Lines        2800     2799       -1     
==========================================
- Hits         2568     2567       -1     
  Misses        232      232              
Flag Coverage Δ
unittests 91.71% <100.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Reviewed everything up to 80c4e5a in 2 minutes and 20 seconds

More details
  • Looked at 64 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 5 drafted comments based on config settings.
1. pandasai/data_loader/duck_db_connection_manager.py:18
  • Draft comment:
    Consider adding an explicit type annotation for the 'df' parameter in register() for clarity (e.g., pandas.DataFrame).
  • Reason this comment was not posted:
    Comment was on unchanged code.
2. pandasai/data_loader/duck_db_connection_manager.py:14
  • Draft comment:
    Using del for closing resources may not be reliably invoked. Consider providing an explicit close() method usage and clear documentation on lifecycle management.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
    While the comment raises a valid general concern about del, the code already implements the suggested solution - it has an explicit close() method. The del is just a backup mechanism. The comment seems to be asking for something that's already there. The documentation could be improved, but that's not what the comment is specifically requesting.
    The comment might be hinting at a deeper issue about resource management patterns in Python that could be worth addressing. Maybe there should be context manager support?
    While context manager support could be nice, that's not what this comment is suggesting. The core concern about having an explicit close() method is already addressed in the code.
    The comment should be deleted because its main suggestion (having an explicit close method) is already implemented in the code.
3. pandasai/data_loader/duck_db_connection_manager.py:25
  • Draft comment:
    Ensure that your changes, particularly the new unregister() method and non-singleton pattern, are covered by unit tests to avoid potential regressions.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
    The rules explicitly state not to ask authors to "ensure" things or to "double-check". The comment starts with "Ensure that..." which is a red flag. While having tests is important, the comment is phrased as a verification request rather than pointing out a specific missing test. We don't have visibility into whether tests already exist.
    Tests are crucial for code quality, so removing this comment might lead to untested code. The changes are significant enough that tests would be valuable.
    While tests are important, the rules are clear about not making verification requests. If there's a specific missing test, it should be pointed out directly rather than asking for verification.
    The comment should be deleted as it violates the rule against asking authors to verify or ensure things, and we don't have clear evidence that tests are actually missing.
4. pandasai/data_loader/duck_db_connection_manager.py:36
  • Draft comment:
    The connection existence check can be simplified. Since self.connection is always initialized, consider replacing 'if hasattr(self, "connection") and self.connection:' with 'if self.connection is not None:'.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50%
    None
5. pandasai/data_loader/duck_db_connection_manager.py:8
  • Draft comment:
    Update documentation and tests to reflect the new instance-based (non-singleton) connection management, ensuring that multiple connections do not lead to conflicts.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50%
    This is a significant architectural change that affects how connections are managed. However, the comment is asking for documentation updates and test changes, which violates our rules about not commenting on cross-file issues (documentation and tests would be in different files). We also don't have access to the documentation or test files to verify if updates are actually needed.
    The architectural change is indeed significant and could have important implications for users of this class. Documentation and tests are crucial for such changes.
    While the concern is valid, we must follow the rules strictly - we can't verify documentation or test files, and we shouldn't comment on cross-file issues.
    The comment should be deleted as it refers to documentation and tests which are in other files, violating our rule about ignoring cross-file issues.

Workflow ID: wflow_rJsBf9YScE7zYkrY


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Incremental review on 80007c0 in 2 minutes and 6 seconds

More details
  • Looked at 30 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 3 drafted comments based on config settings.
1. tests/unit_tests/data_loader/test_duckdbmanager.py:17
  • Draft comment:
    When asserting table registration, consider adding an error message for clarity.
  • Reason this comment was not posted:
    Confidence changes required: 50% <= threshold 50%
    None
2. pandasai/data_loader/duck_db_connection_manager.py:18
  • Draft comment:
    Removed auto-unregistration for duplicate table names. Ensure that duckdb.register handles duplicate registrations gracefully, and consider adding an inline comment explaining this design decision.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50%
    None
3. tests/unit_tests/data_loader/test_duckdbmanager.py:14
  • Draft comment:
    Test accesses the private _registered_tables attribute directly. Consider providing a public accessor for better encapsulation.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50%
    None

Workflow ID: wflow_AyNtXeGQQM1KvlKf


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@gventuri gventuri merged commit 9b57945 into main Apr 11, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants