Skip to content

Conversation

@matteocacciola
Copy link
Contributor

@matteocacciola matteocacciola commented Jun 16, 2025

  • Closes #xxxx (Replace xxxx with the GitHub issue number).
  • Tests added and passed if fixing a bug or adding a new feature.
  • All code checks passed.

Important

Introduces read_excel() in pandasai/__init__.py to read Excel files with support for multiple sheets and adds comprehensive tests in test_pandasai_read_excel.py.

  • New Functionality:
    • Adds read_excel() in pandasai/__init__.py to read Excel files, supporting both str and BytesIO file paths.
    • Handles single and multiple sheets, returning a DataFrame or a dictionary of DataFrames.
    • Supports optional sheet_name parameter to specify a sheet or return all sheets if not specified.
  • Tests:
    • Adds test_pandasai_read_excel.py with tests for read_excel() covering single/multiple sheets, str/BytesIO paths, and nonexistent sheets.
    • Tests ensure correct handling of sheet_name and propagation of exceptions.
  • Misc:
    • Updates read_csv() in pandasai/__init__.py to accept BytesIO in addition to str.

This description was created by Ellipsis for 44300f0. You can customize this summary. It will automatically update as commits are pushed.

adding unit tests
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Changes requested ❌

Reviewed everything up to 44300f0 in 2 minutes and 44 seconds. Click for details.
  • Reviewed 257 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 0 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

Workflow ID: wflow_bAi4ME3yq48bqHTg

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

@matteocacciola matteocacciola mentioned this pull request Jun 16, 2025
3 tasks
@ArslanSaleem
Copy link
Collaborator

@matteocacciola The current implementation breaks when the Excel sheet name contains spaces. I tested it with a file where the sheet name was "Display Tools".

Ideally, the behavior should match pd.read_excel — when a single sheet is passed or detected, it should return a DataFrame. However, in our case, it always returns a dictionary, which leads to inconsistencies.

@matteocacciola
Copy link
Contributor Author

@matteocacciola The current implementation breaks when the Excel sheet name contains spaces. I tested it with a file where the sheet name was "Display Tools".

Ideally, the behavior should match pd.read_excel — when a single sheet is passed or detected, it should return a DataFrame. However, in our case, it always returns a dictionary, which leads to inconsistencies.

I will check the bug with a space in the name. However, pd.read_excel returns a dictionary when the Excel file contains more than one sheet and no sheet name is provided

@ArslanSaleem
Copy link
Collaborator

@matteocacciola The current implementation breaks when the Excel sheet name contains spaces. I tested it with a file where the sheet name was "Display Tools".
Ideally, the behavior should match pd.read_excel — when a single sheet is passed or detected, it should return a DataFrame. However, in our case, it always returns a dictionary, which leads to inconsistencies.

I will check the bug with a space in the name. However, pd.read_excel returns a dictionary when the Excel file contains more than one sheet and no sheet name is provided

I tested with excel file having one sheet.

data = pd.read_excel("/pandas-ai/examples/data/Loan payments data.xlsx") // this returns dataframe

data = pai.read_excel("/pandas-ai/examples/data/Loan payments data.xlsx") // this return dictionary

@matteocacciola
Copy link
Contributor Author

@matteocacciola The current implementation breaks when the Excel sheet name contains spaces. I tested it with a file where the sheet name was "Display Tools".
Ideally, the behavior should match pd.read_excel — when a single sheet is passed or detected, it should return a DataFrame. However, in our case, it always returns a dictionary, which leads to inconsistencies.

I will check the bug with a space in the name. However, pd.read_excel returns a dictionary when the Excel file contains more than one sheet and no sheet name is provided

I tested with excel file having one sheet.

data = pd.read_excel("/pandas-ai/examples/data/Loan payments data.xlsx") // this returns dataframe

data = pai.read_excel("/pandas-ai/examples/data/Loan payments data.xlsx") // this return dictionary

requested changes applied. I hope the current version is fine to you

@ArslanSaleem
Copy link
Collaborator

ArslanSaleem commented Jun 18, 2025

@matteocacciola Thanks a lot! Just one last thing, could you please remove openpyxl if it's not being used? Everything seems to be working fine without it, so it might be an unnecessary dependency. Then we are good to go.

@matteocacciola
Copy link
Contributor Author

@matteocacciola Thanks a lot! Just one last thing, could you please remove openpyxl if it's not being used? Everything seems to be working fine without it, so it might be an unnecessary dependency. Then we are good to go.

it looks like this check failed because of the missing library

@matteocacciola
Copy link
Contributor Author

Hey @ArslanSaleem , can you please proceed with this PR?

@gventuri gventuri requested a review from ArslanSaleem June 26, 2025 07:27
@ArslanSaleem
Copy link
Collaborator

Thank you @matteocacciola for the improvement.

@ArslanSaleem ArslanSaleem merged commit 20241be into sinaptik-ai:main Jun 29, 2025
12 checks passed
@matteocacciola matteocacciola deleted the feature/read_excel branch July 10, 2025 20:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants