Skip to content

[BUG] dynamic_partition_pruning::read_table errors on single-file Parquet datasets #1268

@charlesbluca

Description

@charlesbluca

What happened:
While playing around with hooking dask-sql into Coiled's benchmarks, I noticed some issues around DPP with test_query_3:

pyo3_runtime.PanicException: called `Result::unwrap()` on an `Err` value: Os { code: 20, kind: NotADirectory, message: "Not a directory" }

Think this is because dynamic_partition_pruning::read_table assumes we're working with a directory of chunked parquet files and doesn't have handling for the case where we have a single parquet file:

let paths = fs::read_dir(tables.get(&table_string).unwrap().filepath.clone()).unwrap();

What you expected to happen:
In general, I would expect this to emit a warning and skip DPP rather than bubble up to an error, though I don't think it should be too difficult to modify the handling of tables for the single file case? cc @sarahyurick

Environment:

  • dask-sql version: 2023.10.1
  • Python version: 3.9
  • Operating System: ubuntu20.04
  • Install method (conda, pip, source): conda

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingrustPull requests that update Rust code

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions