-
Notifications
You must be signed in to change notification settings - Fork 72
Datafusion invalid projection #571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
charlesbluca
merged 112 commits into
dask-contrib:datafusion-sql-planner
from
jdye64:datafusion-invalid-projection
Jun 14, 2022
Merged
Changes from 111 commits
Commits
Show all changes
112 commits
Select commit
Hold shift + click to select a range
b1900cf
Condition for BinaryExpr, filter, input_ref, rexcall, and rexliteral
jdye64 1e48597
Updates for test_filter
jdye64 fd41a8c
more of test_filter.py working with the exception of some date pytests
jdye64 682c009
Add workflow to keep datafusion dev branch up to date (#440)
charlesbluca ab69dd8
Include setuptools-rust in conda build recipie, in host and run
jdye64 ce4c31e
Remove PyArrow dependency
jdye64 8785b8c
rebase with datafusion-sql-planner
jdye64 3e45ab8
refactor changes that were inadvertent during rebase
jdye64 1734b89
timestamp with loglca time zone
jdye64 ac7d9f6
Bump DataFusion version (#494)
andygrove cbf5db0
Include RelDataType work
jdye64 d9380a6
Include RelDataType work
jdye64 ad56fc2
Introduced SqlTypeName Enum in Rust and mappings for Python
jdye64 7b20e66
impl PyExpr.getIndex()
jdye64 7dd2017
add getRowType() for logical.rs
jdye64 984f523
Introduce DaskTypeMap for storing correlating SqlTypeName and DataTypes
jdye64 1405fea
use str values instead of Rust Enums, Python is unable to Hash the Ru…
jdye64 789aaad
linter changes, why did that work on my local pre-commit??
jdye64 652205e
linter changes, why did that work on my local pre-commit??
jdye64 5127f87
Convert final strs to SqlTypeName Enum
jdye64 cf568dc
removed a few print statements
jdye64 4fb640e
commit to share with colleague
jdye64 32127e5
updates
jdye64 f5e24fe
checkpoint
jdye64 11cf212
Temporarily disable conda run_test.py script since it uses features n…
jdye64 46dfb0a
formatting after upstream merge
jdye64 fa71674
expose fromString method for SqlTypeName to use Enums instead of stri…
jdye64 f6e86ca
expanded SqlTypeName from_string() support
jdye64 3d1a5ad
accept INT as INTEGER
jdye64 384e446
tests update
jdye64 199b9d2
checkpoint
jdye64 c9dffae
checkpoint
jdye64 c9aad43
Refactor PyExpr by removing From trait, and using recursion to expand…
jdye64 11100fa
skip test that uses create statement for gpuci
jdye64 643e85d
Basic DataFusion Select Functionality (#489)
jdye64 b36ef16
updates for expression
jdye64 5c94fbc
uncommented pytests
jdye64 bb461c8
uncommented pytests
jdye64 f65b1ab
code cleanup for review
jdye64 dc7553f
code cleanup for review
jdye64 f1dc0b2
Enabled more pytest that work now
jdye64 940e867
Enabled more pytest that work now
jdye64 6769ca0
Output Expression as String when BinaryExpr does not contain a named …
jdye64 c4ed9bd
Output Expression as String when BinaryExpr does not contain a named …
jdye64 05c5788
Disable 2 pytest that are causing gpuCI issues. They will be address …
jdye64 a33aa63
Handle Between operation for case-when
jdye64 20efd5c
adjust timestamp casting
jdye64 281baf7
merge with upstream
jdye64 d666bdd
merge with upstream/datafusion-sql-planner
jdye64 533f50a
Refactor projection _column_name() logic to the _column_name logic in…
jdye64 a42a133
removed println! statements
jdye64 dc12f5d
introduce join getCondition() logic for retrieving the combining Rex …
jdye64 9dce68a
merge with upstream
jdye64 10cd463
merge with upstream
jdye64 a1841c3
Updates from review
jdye64 3001943
Add Offset and point to repo with offset in datafusion
jdye64 7ec66da
Introduce offset
jdye64 b72917b
limit updates
jdye64 651c9ab
commit before upstream merge
jdye64 4e69813
merged with upstream/datafusion-sql-planner
jdye64 3219ad0
Code formatting
jdye64 5a88155
Merge with upstream
jdye64 23adefa
Merge with upstream
jdye64 bd94ccf
Merge remote-tracking branch 'upstream/datafusion-sql-planner' into d…
jdye64 bf91e8f
update Cargo.toml to use Arrow-DataFusion version with LIMIT logic
jdye64 3dc6a89
Bump DataFusion version to get changes around variant_name()
jdye64 08b38aa
Use map partitions for determining the offset
jdye64 7b52f41
Merge with upstream datafusion-crossjoin merge
jdye64 6638930
Added multiple LogicalPlan inputs for join conditions
jdye64 e24b97f
Merge with upstream LIMIT PR
jdye64 61bd864
Merge remote-tracking branch 'upstream/datafusion-sql-planner' into d…
jdye64 e3b0d2b
Merge with upstream
jdye64 0407c6f
Rename underlying DataContainer's DataFrame instance to match the col…
jdye64 af1c138
Adjust ColumnContainer mapping after join.py logic to entire the bake…
jdye64 8853765
Add enumerate to column_{i} generation string to ensure columns exist…
jdye64 2adc5ce
Adjust join schema logic to perform merge instead of join on rust sid…
jdye64 6005018
Handle DataFusion COUNT(UInt8(1)) as COUNT(*)
jdye64 f640e1d
commit before merge
jdye64 f0cc07b
merge with upstream datafusion-sql-planner
jdye64 3159645
Update function for gathering index of a expression
jdye64 ba8cec2
Update for review check
jdye64 a8fba46
Adjust RelDataType to retrieve fully qualified column names
jdye64 8a1a865
Adjust base.py to get fully qualified column name
jdye64 6e966b6
Enable passing pytests in test_join.py
jdye64 b9604cc
Adjust keys provided by getting backend column mapping name
jdye64 014fe68
Adjust output_col to not use the backend_column name for special rese…
jdye64 5b0dba3
uncomment cross join pytest which works now
jdye64 d17d859
Uncomment passing pytests in test_select.py
jdye64 805ec8a
Review updates
jdye64 7728bd4
Add back complex join case condition, not just cross join but 'comple…
jdye64 6f8d0d9
Enable DataFusion CBO logic
jdye64 dad9eb4
Disable EliminateFilter optimization rule
jdye64 adc0083
updates
jdye64 4101d27
upstream merge
jdye64 be7d502
Disable tests that hit CBO generated plan edge cases of yet to be imp…
jdye64 a006def
[REVIEW] - Modifiy sql.skip_optimize to use dask_config.get and remov…
jdye64 6ba6edb
[REVIEW] - change name of configuration from skip_optimize to optimize
jdye64 984c5bb
[REVIEW] - Add OptimizeException catch and raise statements back
jdye64 e59cd1e
Found issue where backend column names which are results of a single …
jdye64 4edb4b5
Remove SQL from OptimizationException
jdye64 06e76ed
Merge remote-tracking branch 'upstream/datafusion-sql-planner' into d…
jdye64 15115ab
Upstream merge and removed unused code imports
jdye64 da37517
skip tests that CBO plan reorganization causes missing features to be…
jdye64 a3633b2
If TableScan contains projections use those instead of all of the Tab…
jdye64 9d7166b
Merge remote-tracking branch 'upstream/datafusion-sql-planner' into d…
jdye64 c91df1e
[REVIEW] remove compute(), remove temp row_type variable
jdye64 4bb73a1
[REVIEW] - Add test for projection pushdown
jdye64 7494f87
[REVIEW] - Add some more parametrized test combinations
jdye64 3d967a4
[REVIEW] - Use iterator instead of for loop and simplify contains_pro…
jdye64 4bbb9ed
Merge remote-tracking branch 'upstream/datafusion-sql-planner' into d…
jdye64 18f490a
[REVIEW] - merge upstream and adjust imports
jdye64 d828f19
[REVIEW] - Rename pytest function and remove duplicate table creation
jdye64 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,45 @@ | ||
| use crate::sql::exceptions::py_type_err; | ||
| use crate::sql::logical; | ||
| use datafusion_expr::logical_plan::TableScan; | ||
| use pyo3::prelude::*; | ||
|
|
||
| #[pyclass(name = "TableScan", module = "dask_planner", subclass)] | ||
| #[derive(Clone)] | ||
| pub struct PyTableScan { | ||
| pub(crate) table_scan: TableScan, | ||
| } | ||
|
|
||
| #[pymethods] | ||
| impl PyTableScan { | ||
| #[pyo3(name = "getTableScanProjects")] | ||
| fn scan_projects(&mut self) -> PyResult<Vec<String>> { | ||
| match &self.table_scan.projection { | ||
| Some(indices) => { | ||
| let schema = self.table_scan.source.schema(); | ||
| Ok(indices | ||
| .iter() | ||
| .map(|i| schema.field(*i).name().to_string()) | ||
| .collect()) | ||
| } | ||
| None => Ok(vec![]), | ||
| } | ||
| } | ||
|
|
||
| /// If the 'TableScan' contains columns that should be projected during the | ||
| /// read return True, otherwise return False | ||
| #[pyo3(name = "containsProjections")] | ||
| fn contains_projections(&self) -> bool { | ||
| self.table_scan.projection.is_some() | ||
| } | ||
| } | ||
|
|
||
| impl TryFrom<logical::LogicalPlan> for PyTableScan { | ||
| type Error = PyErr; | ||
|
|
||
| fn try_from(logical_plan: logical::LogicalPlan) -> Result<Self, Self::Error> { | ||
| match logical_plan { | ||
| logical::LogicalPlan::TableScan(table_scan) => Ok(PyTableScan { table_scan }), | ||
| _ => Err(py_type_err("unexpected plan")), | ||
| } | ||
| } | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -33,6 +33,9 @@ def convert( | |||||
| # There should not be any input. This is the first step. | ||||||
| self.assert_inputs(rel, 0) | ||||||
|
|
||||||
| # Rust table_scan instance handle | ||||||
| table_scan = rel.table_scan() | ||||||
|
|
||||||
| # The table(s) we need to return | ||||||
| table = rel.getTable() | ||||||
|
|
||||||
|
|
@@ -48,11 +51,15 @@ def convert( | |||||
| df = dc.df | ||||||
| cc = dc.column_container | ||||||
|
|
||||||
| # Make sure we only return the requested columns | ||||||
| row_type = table.getRowType() | ||||||
| field_specifications = [str(f) for f in row_type.getFieldNames()] | ||||||
| cc = cc.limit_to(field_specifications) | ||||||
| # If the 'TableScan' instance contains projected columns only retrieve those columns | ||||||
| # otherwise get all projected columns from the 'Projection' instance, which is contained | ||||||
| # in the 'RelDataType' instance, aka 'row_type' | ||||||
| if table_scan.containsProjections(): | ||||||
jdye64 marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
| field_specifications = table_scan.getTableScanProjects() | ||||||
| else: | ||||||
| field_specifications = [str(f) for f in table.getRowType().getFieldNames()] | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Do we have to loop over and cast field names with |
||||||
|
|
||||||
| cc = cc.limit_to(field_specifications) | ||||||
| cc = self.fix_column_to_row_type(cc, rel.getRowType()) | ||||||
| dc = DataContainer(df, cc) | ||||||
| dc = self.fix_dtype_to_row_type(dc, rel.getRowType()) | ||||||
|
|
||||||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.