Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Jul 16, 2023

Which issue does this PR close?

Closes #6983
Closes #6908

Rationale for this change

This code uses a single core to read the file

    let _df = _ctx.read_parquet(FILENAME, _read_options).await.unwrap();
    let _cached = _df.cache().await;

What changes are included in this PR?

Use multiple cores

Testing using using cargo --release

With main (16s)

datafusion end -> 2023-07-16T09:07:29.895269-04:00 16.080133858s

With this branch (3s)

datafusion end -> 2023-07-16T08:52:05.511984-04:00 2.947019517s

Are these changes tested?

Yes

Are there any user-facing changes?

@github-actions github-actions bot added the core Core DataFusion crate label Jul 16, 2023
@alamb alamb changed the title [DataFrame] Read files in parallel [DataFrame] Read files in parallel (4x faster) Jul 16, 2023
@alamb alamb force-pushed the alamb/parallel_read branch from e9d6a0e to 0dbdd94 Compare July 16, 2023 13:14
@alamb alamb marked this pull request as ready for review July 16, 2023 13:14
@alamb alamb marked this pull request as draft July 16, 2023 13:32
@alamb
Copy link
Contributor Author

alamb commented Jul 17, 2023

I think a better approach is described on #6983 (comment)

@alamb alamb closed this Jul 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[DataFrame] Parallel Load into dataframe

1 participant