Skip to content

Conversation

@rmnskb
Copy link
Contributor

@rmnskb rmnskb commented Oct 30, 2025

Rationale for this change

See #47728. Check source argument in pyarrow.parquet.read_table if pyarrow.dataset is not available.

What changes are included in this PR?

Check the source argument, raise ValueError if the source argument is either a list of .parquet files or a directory.

Are these changes tested?

Yes

Are there any user-facing changes?

No

In case if the source argument is a directory, I decided not to check it directly, but to catch the exceptions coming from the fs.open_input_file, since it already checks for it, and add extra exception on top of the stack that explains the actual reason.

Copy link
Member

@raulcd raulcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Oct 31, 2025
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Nov 2, 2025
@rmnskb rmnskb requested a review from raulcd November 2, 2025 12:20
Copy link
Member

@raulcd raulcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test failure test_read_table_without_dataset failing on the CI jobs is related. It seems a test that was using a MockDataset with a non-existing file is failing with this change, we might have to revisit that test.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Nov 3, 2025
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Nov 4, 2025
@rmnskb
Copy link
Contributor Author

rmnskb commented Nov 4, 2025

The test failure test_read_table_without_dataset failing on the CI jobs is related. It seems a test that was using a MockDataset with a non-existing file is failing with this change, we might have to revisit that test.

Yep, you're right, I've updated it with the new ValueError.

Copy link
Member

@AlenkaF AlenkaF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for another contribution @rmnskb!
LGTM. I just added minor styling suggestions.

Comment on lines +1000 to +1001
def test_read_table_raises_value_error_when_ds_is_unavailable(
monkeypatch, source):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_read_table_raises_value_error_when_ds_is_unavailable(
monkeypatch, source):
def test_read_table_raises_value_error_when_ds_is_unavailable(monkeypatch, source):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants