Commit 230d726
Datafusion invalid projection (#571)
* Condition for BinaryExpr, filter, input_ref, rexcall, and rexliteral
* Updates for test_filter
* more of test_filter.py working with the exception of some date pytests
* Add workflow to keep datafusion dev branch up to date (#440)
* Include setuptools-rust in conda build recipie, in host and run
* Remove PyArrow dependency
* rebase with datafusion-sql-planner
* refactor changes that were inadvertent during rebase
* timestamp with loglca time zone
* Bump DataFusion version (#494)
* bump DataFusion version
* remove unnecessary downcasts and use separate structs for TableSource and TableProvider
* Include RelDataType work
* Include RelDataType work
* Introduced SqlTypeName Enum in Rust and mappings for Python
* impl PyExpr.getIndex()
* add getRowType() for logical.rs
* Introduce DaskTypeMap for storing correlating SqlTypeName and DataTypes
* use str values instead of Rust Enums, Python is unable to Hash the Rust Enums if used in a dict
* linter changes, why did that work on my local pre-commit??
* linter changes, why did that work on my local pre-commit??
* Convert final strs to SqlTypeName Enum
* removed a few print statements
* commit to share with colleague
* updates
* checkpoint
* Temporarily disable conda run_test.py script since it uses features not yet implemented
* formatting after upstream merge
* expose fromString method for SqlTypeName to use Enums instead of strings for type checking
* expanded SqlTypeName from_string() support
* accept INT as INTEGER
* tests update
* checkpoint
* checkpoint
* Refactor PyExpr by removing From trait, and using recursion to expand expression list for rex calls
* skip test that uses create statement for gpuci
* Basic DataFusion Select Functionality (#489)
* Condition for BinaryExpr, filter, input_ref, rexcall, and rexliteral
* Updates for test_filter
* more of test_filter.py working with the exception of some date pytests
* Add workflow to keep datafusion dev branch up to date (#440)
* Include setuptools-rust in conda build recipie, in host and run
* Remove PyArrow dependency
* rebase with datafusion-sql-planner
* refactor changes that were inadvertent during rebase
* timestamp with loglca time zone
* Include RelDataType work
* Include RelDataType work
* Introduced SqlTypeName Enum in Rust and mappings for Python
* impl PyExpr.getIndex()
* add getRowType() for logical.rs
* Introduce DaskTypeMap for storing correlating SqlTypeName and DataTypes
* use str values instead of Rust Enums, Python is unable to Hash the Rust Enums if used in a dict
* linter changes, why did that work on my local pre-commit??
* linter changes, why did that work on my local pre-commit??
* Convert final strs to SqlTypeName Enum
* removed a few print statements
* Temporarily disable conda run_test.py script since it uses features not yet implemented
* expose fromString method for SqlTypeName to use Enums instead of strings for type checking
* expanded SqlTypeName from_string() support
* accept INT as INTEGER
* Remove print statements
* Default to UTC if tz is None
* Delegate timezone handling to the arrow library
* Updates from review
Co-authored-by: Charles Blackmon-Luca <[email protected]>
* updates for expression
* uncommented pytests
* uncommented pytests
* code cleanup for review
* code cleanup for review
* Enabled more pytest that work now
* Enabled more pytest that work now
* Output Expression as String when BinaryExpr does not contain a named alias
* Output Expression as String when BinaryExpr does not contain a named alias
* Disable 2 pytest that are causing gpuCI issues. They will be address in a follow up PR
* Handle Between operation for case-when
* adjust timestamp casting
* Refactor projection _column_name() logic to the _column_name logic in expression.rs
* removed println! statements
* introduce join getCondition() logic for retrieving the combining Rex logic for joining
* Updates from review
* Add Offset and point to repo with offset in datafusion
* Introduce offset
* limit updates
* commit before upstream merge
* Code formatting
* update Cargo.toml to use Arrow-DataFusion version with LIMIT logic
* Bump DataFusion version to get changes around variant_name()
* Use map partitions for determining the offset
* Merge with upstream
* Rename underlying DataContainer's DataFrame instance to match the column container names
* Adjust ColumnContainer mapping after join.py logic to entire the bakend mapping is reset
* Add enumerate to column_{i} generation string to ensure columns exist in both dataframes
* Adjust join schema logic to perform merge instead of join on rust side to avoid name collisions
* Handle DataFusion COUNT(UInt8(1)) as COUNT(*)
* commit before merge
* Update function for gathering index of a expression
* Update for review check
* Adjust RelDataType to retrieve fully qualified column names
* Adjust base.py to get fully qualified column name
* Enable passing pytests in test_join.py
* Adjust keys provided by getting backend column mapping name
* Adjust output_col to not use the backend_column name for special reserved exprs
* uncomment cross join pytest which works now
* Uncomment passing pytests in test_select.py
* Review updates
* Add back complex join case condition, not just cross join but 'complex' joins
* Enable DataFusion CBO logic
* Disable EliminateFilter optimization rule
* updates
* Disable tests that hit CBO generated plan edge cases of yet to be implemented logic
* [REVIEW] - Modifiy sql.skip_optimize to use dask_config.get and remove used method parameter
* [REVIEW] - change name of configuration from skip_optimize to optimize
* [REVIEW] - Add OptimizeException catch and raise statements back
* Found issue where backend column names which are results of a single aggregate resulting column, COUNT(*) for example, need to get the first agg df column since names are not valid
* Remove SQL from OptimizationException
* skip tests that CBO plan reorganization causes missing features to be present
* If TableScan contains projections use those instead of all of the TableColums for limiting columns read during table_scan
* [REVIEW] remove compute(), remove temp row_type variable
* [REVIEW] - Add test for projection pushdown
* [REVIEW] - Add some more parametrized test combinations
* [REVIEW] - Use iterator instead of for loop and simplify contains_projections
* [REVIEW] - merge upstream and adjust imports
* [REVIEW] - Rename pytest function and remove duplicate table creation
Co-authored-by: Charles Blackmon-Luca <[email protected]>
Co-authored-by: Andy Grove <[email protected]>1 parent a52dd7b commit 230d726
File tree
5 files changed
+93
-7
lines changed- dask_planner/src/sql
- logical
- types
- dask_sql/physical/rel/logical
- tests/integration
5 files changed
+93
-7
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
12 | | - | |
| 12 | + | |
13 | 13 | | |
| 14 | + | |
14 | 15 | | |
15 | 16 | | |
16 | 17 | | |
| |||
111 | 112 | | |
112 | 113 | | |
113 | 114 | | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
114 | 120 | | |
115 | 121 | | |
116 | 122 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
76 | 76 | | |
77 | 77 | | |
78 | 78 | | |
79 | | - | |
80 | | - | |
| 79 | + | |
81 | 80 | | |
82 | 81 | | |
83 | 82 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
33 | 33 | | |
34 | 34 | | |
35 | 35 | | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
36 | 39 | | |
37 | 40 | | |
38 | 41 | | |
| |||
48 | 51 | | |
49 | 52 | | |
50 | 53 | | |
51 | | - | |
52 | | - | |
53 | | - | |
54 | | - | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
55 | 61 | | |
| 62 | + | |
56 | 63 | | |
57 | 64 | | |
58 | 65 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
222 | 222 | | |
223 | 223 | | |
224 | 224 | | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
0 commit comments