-
Notifications
You must be signed in to change notification settings - Fork 72
Support for LIMIT clause with DataFusion #529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
ayushdg
merged 69 commits into
dask-contrib:datafusion-sql-planner
from
jdye64:datafusion-limit
May 24, 2022
Merged
Changes from all commits
Commits
Show all changes
69 commits
Select commit
Hold shift + click to select a range
b1900cf
Condition for BinaryExpr, filter, input_ref, rexcall, and rexliteral
jdye64 1e48597
Updates for test_filter
jdye64 fd41a8c
more of test_filter.py working with the exception of some date pytests
jdye64 682c009
Add workflow to keep datafusion dev branch up to date (#440)
charlesbluca ab69dd8
Include setuptools-rust in conda build recipie, in host and run
jdye64 ce4c31e
Remove PyArrow dependency
jdye64 8785b8c
rebase with datafusion-sql-planner
jdye64 3e45ab8
refactor changes that were inadvertent during rebase
jdye64 1734b89
timestamp with loglca time zone
jdye64 ac7d9f6
Bump DataFusion version (#494)
andygrove cbf5db0
Include RelDataType work
jdye64 d9380a6
Include RelDataType work
jdye64 ad56fc2
Introduced SqlTypeName Enum in Rust and mappings for Python
jdye64 7b20e66
impl PyExpr.getIndex()
jdye64 7dd2017
add getRowType() for logical.rs
jdye64 984f523
Introduce DaskTypeMap for storing correlating SqlTypeName and DataTypes
jdye64 1405fea
use str values instead of Rust Enums, Python is unable to Hash the Ru…
jdye64 789aaad
linter changes, why did that work on my local pre-commit??
jdye64 652205e
linter changes, why did that work on my local pre-commit??
jdye64 5127f87
Convert final strs to SqlTypeName Enum
jdye64 cf568dc
removed a few print statements
jdye64 4fb640e
commit to share with colleague
jdye64 32127e5
updates
jdye64 f5e24fe
checkpoint
jdye64 11cf212
Temporarily disable conda run_test.py script since it uses features n…
jdye64 46dfb0a
formatting after upstream merge
jdye64 fa71674
expose fromString method for SqlTypeName to use Enums instead of stri…
jdye64 f6e86ca
expanded SqlTypeName from_string() support
jdye64 3d1a5ad
accept INT as INTEGER
jdye64 384e446
tests update
jdye64 199b9d2
checkpoint
jdye64 c9dffae
checkpoint
jdye64 c9aad43
Refactor PyExpr by removing From trait, and using recursion to expand…
jdye64 11100fa
skip test that uses create statement for gpuci
jdye64 643e85d
Basic DataFusion Select Functionality (#489)
jdye64 b36ef16
updates for expression
jdye64 5c94fbc
uncommented pytests
jdye64 bb461c8
uncommented pytests
jdye64 f65b1ab
code cleanup for review
jdye64 dc7553f
code cleanup for review
jdye64 f1dc0b2
Enabled more pytest that work now
jdye64 940e867
Enabled more pytest that work now
jdye64 6769ca0
Output Expression as String when BinaryExpr does not contain a named …
jdye64 c4ed9bd
Output Expression as String when BinaryExpr does not contain a named …
jdye64 05c5788
Disable 2 pytest that are causing gpuCI issues. They will be address …
jdye64 a33aa63
Handle Between operation for case-when
jdye64 20efd5c
adjust timestamp casting
jdye64 281baf7
merge with upstream
jdye64 d666bdd
merge with upstream/datafusion-sql-planner
jdye64 533f50a
Refactor projection _column_name() logic to the _column_name logic in…
jdye64 a42a133
removed println! statements
jdye64 10cd463
merge with upstream
jdye64 a1841c3
Updates from review
jdye64 3001943
Add Offset and point to repo with offset in datafusion
jdye64 7ec66da
Introduce offset
jdye64 b72917b
limit updates
jdye64 651c9ab
commit before upstream merge
jdye64 4e69813
merged with upstream/datafusion-sql-planner
jdye64 3219ad0
Code formatting
jdye64 5a88155
Merge with upstream
jdye64 bd94ccf
Merge remote-tracking branch 'upstream/datafusion-sql-planner' into d…
jdye64 bf91e8f
update Cargo.toml to use Arrow-DataFusion version with LIMIT logic
jdye64 3dc6a89
Bump DataFusion version to get changes around variant_name()
jdye64 08b38aa
Use map partitions for determining the offset
jdye64 7b52f41
Merge with upstream datafusion-crossjoin merge
jdye64 e129068
Refactor offset partition func
charlesbluca 5e0de03
Merge remote-tracking branch 'upstream/datafusion-sql-planner' into d…
jdye64 2d11de5
Update to use TryFrom logic
jdye64 c993377
Add cloudpickle to independent scheduler requirements
charlesbluca File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,35 @@ | ||
| use crate::expression::PyExpr; | ||
| use crate::sql::exceptions::py_type_err; | ||
|
|
||
| use datafusion::scalar::ScalarValue; | ||
| use pyo3::prelude::*; | ||
|
|
||
| use datafusion::logical_expr::{logical_plan::Limit, Expr, LogicalPlan}; | ||
|
|
||
| #[pyclass(name = "Limit", module = "dask_planner", subclass)] | ||
| #[derive(Clone)] | ||
| pub struct PyLimit { | ||
| limit: Limit, | ||
| } | ||
|
|
||
| #[pymethods] | ||
| impl PyLimit { | ||
| #[pyo3(name = "getLimitN")] | ||
| pub fn limit_n(&self) -> PyResult<PyExpr> { | ||
| Ok(PyExpr::from( | ||
| Expr::Literal(ScalarValue::UInt64(Some(self.limit.n.try_into().unwrap()))), | ||
| Some(self.limit.input.clone()), | ||
| )) | ||
| } | ||
| } | ||
|
|
||
| impl TryFrom<LogicalPlan> for PyLimit { | ||
| type Error = PyErr; | ||
|
|
||
| fn try_from(logical_plan: LogicalPlan) -> Result<Self, Self::Error> { | ||
| match logical_plan { | ||
| LogicalPlan::Limit(limit) => Ok(PyLimit { limit: limit }), | ||
| _ => Err(py_type_err("unexpected plan")), | ||
| } | ||
| } | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,44 @@ | ||
| use crate::expression::PyExpr; | ||
| use crate::sql::exceptions::py_type_err; | ||
|
|
||
| use datafusion::scalar::ScalarValue; | ||
| use pyo3::prelude::*; | ||
|
|
||
| use datafusion::logical_expr::{logical_plan::Offset, Expr, LogicalPlan}; | ||
|
|
||
| #[pyclass(name = "Offset", module = "dask_planner", subclass)] | ||
| #[derive(Clone)] | ||
| pub struct PyOffset { | ||
| offset: Offset, | ||
| } | ||
|
|
||
| #[pymethods] | ||
| impl PyOffset { | ||
| #[pyo3(name = "getOffset")] | ||
| pub fn offset(&self) -> PyResult<PyExpr> { | ||
| Ok(PyExpr::from( | ||
| Expr::Literal(ScalarValue::UInt64(Some(self.offset.offset as u64))), | ||
| Some(self.offset.input.clone()), | ||
| )) | ||
| } | ||
|
|
||
| #[pyo3(name = "getFetch")] | ||
| pub fn offset_fetch(&self) -> PyResult<PyExpr> { | ||
| // TODO: Still need to implement fetch size! For now get everything from offset on with '0' | ||
| Ok(PyExpr::from( | ||
| Expr::Literal(ScalarValue::UInt64(Some(0))), | ||
| Some(self.offset.input.clone()), | ||
| )) | ||
| } | ||
| } | ||
|
|
||
| impl TryFrom<LogicalPlan> for PyOffset { | ||
| type Error = PyErr; | ||
|
|
||
| fn try_from(logical_plan: LogicalPlan) -> Result<Self, Self::Error> { | ||
| match logical_plan { | ||
| LogicalPlan::Offset(offset) => Ok(PyOffset { offset: offset }), | ||
| _ => Err(py_type_err("unexpected plan")), | ||
| } | ||
| } | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My personal opinion here is to still check if the first partition has enough elements and if not call head with
npartitions=-1.@charlesbluca Do you think this is worth a broader issue/discussion to see how this can be optimized?
For example in cases like this
is going to read every single partition if
npartitions=-1before returning 100 rows which isn't ideal.