Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
94 commits
Select commit Hold shift + click to select a range
23603de
Add basic predicate-pushdown optimization (#433)
rjzamora Mar 25, 2022
09c7bdf
Add workflow to keep datafusion dev branch up to date (#440)
charlesbluca Mar 25, 2022
9038b85
Condition for BinaryExpr, filter, input_ref, rexcall, and rexliteral
jdye64 Mar 26, 2022
2d16579
Updates for test_filter
jdye64 Mar 31, 2022
a4aeee5
more of test_filter.py working with the exception of some date pytests
jdye64 Mar 31, 2022
f6f8061
Updates to dates and parsing dates like postgresql does
jdye64 Mar 31, 2022
1b0b6f7
Update gpuCI `RAPIDS_VER` to `22.06` (#434)
github-actions[bot] Apr 1, 2022
a05138d
Bump black to 22.3.0 (#443)
charlesbluca Apr 4, 2022
ab2aa5a
Check for ucx-py nightlies when updating gpuCI (#441)
charlesbluca Apr 5, 2022
8eb70bb
Refactored to adjust for better type management
jdye64 Apr 6, 2022
d7d86c7
Refactor schema and statements
jdye64 Apr 6, 2022
812e85e
update types
jdye64 Apr 6, 2022
e04365b
fix syntax issues and renamed function name calls
jdye64 Apr 6, 2022
a28f757
Add handling for newer `prompt_toolkit` versions in cmd tests (#447)
charlesbluca Apr 6, 2022
486fc66
Fix version for gha-find-replace (#446)
charlesbluca Apr 6, 2022
f0e1cbb
Improved error handling and code clean up
jdye64 Apr 6, 2022
36909dd
move pieces of logical.rs to seperated files to ensure code readability
jdye64 Apr 7, 2022
d1ea26a
left join working
jdye64 Apr 7, 2022
ce176e0
Update versions of Java dependencies (#445)
ayushdg Apr 7, 2022
50d95d2
Update jackson databind version (#449)
ayushdg Apr 7, 2022
37a3a61
Disable SQL server functionality (#448)
charlesbluca Apr 7, 2022
ffdc42f
Update dask pinnings for release (#450)
charlesbluca Apr 7, 2022
fa74aef
Add Java source code to source distribution (#451)
charlesbluca Apr 7, 2022
37ea6b6
Bump `httpclient` dependency (#453)
charlesbluca Apr 8, 2022
f19ee4d
Unpin Dask/distributed versions (#452)
charlesbluca Apr 11, 2022
1eb30c1
Add jsonschema to ci testing (#454)
ayushdg Apr 11, 2022
2bd1d18
Switch tests from `pd.testing.assert_frame_equal` to `dd.assert_eq` (…
charlesbluca Apr 11, 2022
263fdba
First basic working checkpoint for group by
jdye64 Apr 12, 2022
95b0dd0
Set max pin on antlr4-python-runtime (#456)
ayushdg Apr 12, 2022
1077da2
Updates to style
jdye64 Apr 12, 2022
653f6a8
stage pre-commit changes for upstream merge
jdye64 Apr 12, 2022
f53d24d
Merge with upstream/main
jdye64 Apr 12, 2022
78d59f0
Fix black failures
charlesbluca Apr 12, 2022
84f6b1d
Updates to Rust formatting
jdye64 Apr 12, 2022
ac6cf3a
Merge remote-tracking branch 'origin/datafusion-aggregate' into dataf…
jdye64 Apr 12, 2022
1ac78a5
Fix rust lint and clippy
jdye64 Apr 12, 2022
e38d9d2
Remove jar building step which is no longer needed
jdye64 Apr 12, 2022
c3da3e5
Remove Java from github workflows matrix
jdye64 Apr 12, 2022
d8450b6
Removes jar and Java references from test.yml
jdye64 Apr 12, 2022
a297185
Update Release workflow to remove references to Java
jdye64 Apr 12, 2022
54ddf39
Update rust.yml to remove references from linux-build-lib
jdye64 Apr 12, 2022
f3c9a5b
Add pre-commit.sh file to provide pre-commit support for Rust in a co…
jdye64 Apr 12, 2022
9bce14c
Removed overlooked jdk references
jdye64 Apr 12, 2022
c975304
cargo clippy auto fixes
jdye64 Apr 12, 2022
4ca5963
Address all Rust clippy warnings
jdye64 Apr 13, 2022
866f815
Include setuptools-rust in conda build recipie
jdye64 Apr 13, 2022
10f4550
Include setuptools-rust in conda build recipie, in host and run
jdye64 Apr 13, 2022
f4cf13d
Adjustments for conda build, committing for others to help with error…
jdye64 Apr 13, 2022
3e4dcbd
Include sql.yaml in package files
jdye64 Apr 13, 2022
f50d4eb
Include pyarrow in run section of conda build to ensure tests pass
jdye64 Apr 13, 2022
4483c5e
include setuptools-rust in host and run of conda since removing cause…
jdye64 Apr 13, 2022
9936c92
to_string() method had been removed in rust and not removed here, cau…
jdye64 Apr 13, 2022
17762cb
Replace commented out tests with pytest.skip and bump version of pyar…
jdye64 Apr 13, 2022
8f0ba93
Fix setup.py syntax issue introduced on last commit by find/replace
jdye64 Apr 13, 2022
7691bb2
Rename Datafusion -> DataFusion and Apache DataFusion -> Arrow DataFu…
jdye64 Apr 13, 2022
e7690ba
Fix docs build environment
jdye64 Apr 13, 2022
c3b905e
Include Rust compiler in docs environment
jdye64 Apr 13, 2022
9436f7d
Bump Rust compiler version to 1.59
jdye64 Apr 13, 2022
a3b43c0
Ok, well readthedocs didn't like that
jdye64 Apr 13, 2022
1fe10f0
Store libdask_planner.so and retrieve it between github workflows
jdye64 Apr 13, 2022
f4ad591
Cache the Rust library binary
jdye64 Apr 13, 2022
bb4d2c3
Remove Cargo.lock from git
jdye64 Apr 13, 2022
4cc1450
Remove unused datafusion-expr crate
jdye64 Apr 13, 2022
2251663
Build datafusion at each test step instead of caching binaries
jdye64 Apr 14, 2022
d12004d
Remove maven and jar cache steps from test-upstream.yaml
jdye64 Apr 14, 2022
e81a0c6
Removed dangling 'build' workflow step reference
jdye64 Apr 14, 2022
9293359
Lowered PyArrow version to 6.0.1 since cudf has a hard requirement on…
jdye64 Apr 14, 2022
24d057f
Add Rust build step to test in dask cluster
jdye64 Apr 14, 2022
e71e476
Install setuptools-rust for pip to use for bare requirements import
jdye64 Apr 14, 2022
0728c61
Include pyarrow 6.0.1 via conda as a bare minimum dependency
jdye64 Apr 14, 2022
496ba8d
Remove cudf dependency for python 3.9 which is causing build issues o…
jdye64 Apr 14, 2022
ed330b3
Address documentation from review
jdye64 Apr 14, 2022
542fc21
Install Rust as readthedocs post_create_environment step
jdye64 Apr 14, 2022
fc4d08b
Run rust install non-interactively
jdye64 Apr 14, 2022
d9cca16
Run rust install non-interactively
jdye64 Apr 14, 2022
a6030b9
Rust isn't available in PyPi so remove that dependency
jdye64 Apr 14, 2022
c82c062
Append ~/.cargo/bin to the PATH
jdye64 Apr 14, 2022
a6c3de6
Print out some environment information for debugging
jdye64 Apr 14, 2022
8aad550
Print out some environment information for debugging
jdye64 Apr 14, 2022
04fb814
More - Increase verbosity
jdye64 Apr 14, 2022
d6bea9d
More - Increase verbosity
jdye64 Apr 14, 2022
e24b77f
More - Increase verbosity
jdye64 Apr 14, 2022
f2a1071
Switch RTD over to use Conda instead of Pip since having issues with …
jdye64 Apr 14, 2022
f00d498
Try to use mamba for building docs environment
jdye64 Apr 14, 2022
02f05b5
Partial review suggestion address, checking CI still works
jdye64 Apr 18, 2022
0e53eab
Skip mistakenly enabled tests
jdye64 Apr 18, 2022
991cc5a
Use DataFusion master branch, and fix syntax issues related to the ve…
jdye64 Apr 18, 2022
daa7ee0
More updates after bumping DataFusion version to master
jdye64 Apr 18, 2022
3c83833
Use actions-rs in github workflows debug flag for setup.py
jdye64 Apr 18, 2022
1f704dd
Remove setuptools-rust from conda
jdye64 Apr 20, 2022
58c452d
Use re-exported Rust types for BuiltinScalarFunction
jdye64 Apr 20, 2022
2a9e9a5
Move python imports to TYPE_CHECKING section where applicable
jdye64 Apr 20, 2022
92d357b
Address review concerns and remove pre-commit.sh file
jdye64 Apr 21, 2022
9fc4fc3
Pin to a specific github rev for DataFusion
jdye64 Apr 21, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/docker-compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,5 +11,7 @@ services:
container_name: dask-worker
image: daskdev/dask:latest
command: dask-worker dask-scheduler:8786
environment:
EXTRA_CONDA_PACKAGES: "pyarrow>=4.0.0" # required for parquet IO
volumes:
- /tmp:/tmp
7 changes: 1 addition & 6 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,6 @@ jobs:
if: github.repository == 'dask-contrib/dask-sql'
steps:
- uses: actions/checkout@v2
- name: Cache local Maven repository
uses: actions/cache@v2
with:
path: ~/.m2/repository
key: ${{ runner.os }}-maven-v1-jdk11-${{ hashFiles('**/pom.xml') }}
- name: Set up Python
uses: conda-incubator/setup-miniconda@v2
with:
Expand All @@ -29,7 +24,7 @@ jobs:
python-version: "3.8"
channel-priority: strict
activate-environment: dask-sql
environment-file: continuous_integration/environment-3.8-jdk11-dev.yaml
environment-file: continuous_integration/environment-3.8-dev.yaml
- name: Install dependencies
run: |
pip install setuptools wheel twine
Expand Down
54 changes: 2 additions & 52 deletions .github/workflows/test-upstream.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,64 +5,23 @@ on:
workflow_dispatch: # allows you to trigger the workflow run manually

jobs:
build:
# This build step should be similar to the deploy build, to make sure we actually test
# the future deployable
name: Build the jar on ubuntu
runs-on: ubuntu-latest
if: github.repository == 'dask-contrib/dask-sql'
defaults:
run:
shell: bash -l {0}
steps:
- uses: actions/checkout@v2
- name: Cache local Maven repository
uses: actions/cache@v2
with:
path: ~/.m2/repository
key: ${{ runner.os }}-maven-v1-jdk11-${{ hashFiles('**/pom.xml') }}
- name: Set up Python
uses: conda-incubator/setup-miniconda@v2
with:
miniforge-variant: Mambaforge
use-mamba: true
python-version: "3.8"
channel-priority: strict
activate-environment: dask-sql
environment-file: continuous_integration/environment-3.8-jdk11-dev.yaml
- name: Install dependencies and build the jar
run: |
python setup.py build_ext
- name: Upload the jar
uses: actions/upload-artifact@v1
with:
name: jar
path: dask_sql/jar/DaskSQL.jar

test-dev:
name: "Test upstream dev (${{ matrix.os }}, java: ${{ matrix.java }}, python: ${{ matrix.python }})"
needs: build
name: "Test upstream dev (${{ matrix.os }}, python: ${{ matrix.python }})"
runs-on: ${{ matrix.os }}
env:
CONDA_FILE: continuous_integration/environment-${{ matrix.python }}-jdk${{ matrix.java }}-dev.yaml
CONDA_FILE: continuous_integration/environment-${{ matrix.python }}-dev.yaml
defaults:
run:
shell: bash -l {0}
strategy:
fail-fast: false
matrix:
java: [8, 11]
os: [ubuntu-latest, windows-latest]
python: ["3.8", "3.9", "3.10"]
steps:
- uses: actions/checkout@v2
with:
fetch-depth: 0 # Fetch all history for all branches and tags.
- name: Cache local Maven repository
uses: actions/cache@v2
with:
path: ~/.m2/repository
key: ${{ runner.os }}-maven-v1-jdk${{ matrix.java }}-${{ hashFiles('**/pom.xml') }}
- name: Set up Python
uses: conda-incubator/setup-miniconda@v2
with:
Expand All @@ -72,21 +31,12 @@ jobs:
channel-priority: strict
activate-environment: dask-sql
environment-file: ${{ env.CONDA_FILE }}
- name: Download the pre-build jar
uses: actions/download-artifact@v1
with:
name: jar
path: dask_sql/jar/
- name: Install hive testing dependencies for Linux
if: matrix.os == 'ubuntu-latest'
run: |
mamba install -c conda-forge sasl>=0.3.1
docker pull bde2020/hive:2.3.2-postgresql-metastore
docker pull bde2020/hive-metastore-postgresql:2.3.0
- name: Set proper JAVA_HOME for Windows
if: matrix.os == 'windows-latest'
run: |
echo "JAVA_HOME=${{ env.CONDA }}\envs\dask-sql\Library" >> $GITHUB_ENV
- name: Install upstream dev Dask / dask-ml
run: |
python -m pip install --no-deps git+https://github.com/dask/dask
Expand Down
100 changes: 28 additions & 72 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
# Test the main branch and every pull request by
# 1. building the jar on ubuntu
# 2. testing code (using the build jar) on ubuntu and windows, with different java versions
# 1. build dask_planner (Arrow DataFusion Rust bindings) on ubuntu
# 2. testing code (using the build DataFusion bindings) on ubuntu and windows
name: Test Python package
on:
push:
Expand Down Expand Up @@ -36,55 +36,20 @@ jobs:
with:
keyword: "[test-upstream]"

build:
# This build step should be similar to the deploy build, to make sure we actually test
# the future deployable
name: Build the jar on ubuntu
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Cache local Maven repository
uses: actions/cache@v2
with:
path: ~/.m2/repository
key: ${{ runner.os }}-maven-v1-jdk11-${{ hashFiles('**/pom.xml') }}
- name: Set up Python
uses: conda-incubator/setup-miniconda@v2
with:
miniforge-variant: Mambaforge
use-mamba: true
python-version: "3.8"
channel-priority: strict
activate-environment: dask-sql
environment-file: continuous_integration/environment-3.8-jdk11-dev.yaml
- name: Build the jar
run: |
python setup.py build_ext
- name: Upload the jar
uses: actions/upload-artifact@v1
with:
name: jar
path: dask_sql/jar/DaskSQL.jar

test:
name: "Test (${{ matrix.os }}, java: ${{ matrix.java }}, python: ${{ matrix.python }})"
needs: [detect-ci-trigger, build]
name: "Build & Test (${{ matrix.os }}, python: ${{ matrix.python }}, Rust: ${{ matrix.toolchain }})"
needs: [detect-ci-trigger]
runs-on: ${{ matrix.os }}
env:
CONDA_FILE: continuous_integration/environment-${{ matrix.python }}-jdk${{ matrix.java }}-dev.yaml
CONDA_FILE: continuous_integration/environment-${{ matrix.python }}-dev.yaml
strategy:
fail-fast: false
matrix:
java: [8, 11]
os: [ubuntu-latest, windows-latest]
python: ["3.8", "3.9", "3.10"]
toolchain: [stable]
steps:
- uses: actions/checkout@v2
- name: Cache local Maven repository
uses: actions/cache@v2
with:
path: ~/.m2/repository
key: ${{ runner.os }}-maven-v1-jdk${{ matrix.java }}-${{ hashFiles('**/pom.xml') }}
- name: Set up Python
uses: conda-incubator/setup-miniconda@v2
with:
Expand All @@ -94,21 +59,21 @@ jobs:
channel-priority: strict
activate-environment: dask-sql
environment-file: ${{ env.CONDA_FILE }}
- name: Download the pre-build jar
uses: actions/download-artifact@v1
- name: Setup Rust Toolchain
uses: actions-rs/toolchain@v1
id: rust-toolchain
with:
name: jar
path: dask_sql/jar/
toolchain: stable
override: true
- name: Build the Rust DataFusion bindings
run: |
python setup.py build install
- name: Install hive testing dependencies for Linux
if: matrix.os == 'ubuntu-latest'
run: |
mamba install -c conda-forge sasl>=0.3.1
docker pull bde2020/hive:2.3.2-postgresql-metastore
docker pull bde2020/hive-metastore-postgresql:2.3.0
- name: Set proper JAVA_HOME for Windows
if: matrix.os == 'windows-latest'
run: |
echo "JAVA_HOME=${{ env.CONDA }}\envs\dask-sql\Library" >> $GITHUB_ENV
- name: Optionally install upstream dev Dask / dask-ml
if: needs.detect-ci-trigger.outputs.triggered == 'true'
run: |
Expand All @@ -133,15 +98,10 @@ jobs:

cluster:
name: "Test in a dask cluster"
needs: [detect-ci-trigger, build]
needs: [detect-ci-trigger]
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Cache local Maven repository
uses: actions/cache@v2
with:
path: ~/.m2/repository
key: ${{ runner.os }}-maven-v1-jdk11-${{ hashFiles('**/pom.xml') }}
- name: Set up Python
uses: conda-incubator/setup-miniconda@v2
with:
Expand All @@ -150,12 +110,16 @@ jobs:
python-version: "3.8"
channel-priority: strict
activate-environment: dask-sql
environment-file: continuous_integration/environment-3.8-jdk11-dev.yaml
- name: Download the pre-build jar
uses: actions/download-artifact@v1
with:
name: jar
path: dask_sql/jar/
environment-file: continuous_integration/environment-3.8-dev.yaml
- name: Setup Rust Toolchain
uses: actions-rs/toolchain@v1
id: rust-toolchain
with:
toolchain: stable
override: true
- name: Build the Rust DataFusion bindings
run: |
python setup.py build install
- name: Install dependencies
run: |
mamba install python-blosc lz4 -c conda-forge
Expand Down Expand Up @@ -184,29 +148,21 @@ jobs:

import:
name: "Test importing with bare requirements"
needs: [detect-ci-trigger, build]
needs: [detect-ci-trigger]
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Cache local Maven repository
uses: actions/cache@v2
with:
path: ~/.m2/repository
key: ${{ runner.os }}-maven-v1-jdk11-${{ hashFiles('**/pom.xml') }}
- name: Set up Python
uses: conda-incubator/setup-miniconda@v2
with:
python-version: "3.8"
mamba-version: "*"
channels: conda-forge,defaults
channel-priority: strict
- name: Download the pre-build jar
uses: actions/download-artifact@v1
with:
name: jar
path: dask_sql/jar/
- name: Install dependencies and nothing else
run: |
conda install setuptools-rust
conda install pyarrow>=4.0.0
pip install -e .

which python
Expand Down
59 changes: 45 additions & 14 deletions .github/workflows/update-gpuci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,39 +13,70 @@ jobs:
steps:
- uses: actions/checkout@v2

- name: Parse current axis YAML
uses: the-coding-turtle/[email protected]
with:
file: continuous_integration/gpuci/axis.yaml

- name: Get latest cuDF nightly version
id: latest_version
id: cudf_latest
uses: jacobtomlinson/[email protected]
with:
org: "rapidsai-nightly"
package: "cudf"
version_system: "CalVer"

- name: Strip git tags from versions
- name: Get latest cuML nightly version
id: cuml_latest
uses: jacobtomlinson/[email protected]
with:
org: "rapidsai-nightly"
package: "cuml"
version_system: "CalVer"

- name: Get latest UCX-Py nightly version
id: ucx_py_latest
uses: jacobtomlinson/[email protected]
with:
org: "rapidsai-nightly"
package: "ucx-py"
version_system: "CalVer"

- name: Get old RAPIDS / UCX-Py versions
env:
FULL_RAPIDS_VER: ${{ steps.latest_version.outputs.version }}
run: echo "RAPIDS_VER=${FULL_RAPIDS_VER::-10}" >> $GITHUB_ENV
FULL_CUDF_VER: ${{ steps.cudf_latest.outputs.version }}
FULL_CUML_VER: ${{ steps.cuml_latest.outputs.version }}
FULL_UCX_PY_VER: ${{ steps.ucx_py_latest.outputs.version }}
run: |
echo RAPIDS_VER=$RAPIDS_VER_0 >> $GITHUB_ENV
echo UCX_PY_VER=$(curl -sL https://version.gpuci.io/rapids/$RAPIDS_VER_0) >> $GITHUB_ENV
echo NEW_CUDF_VER=${FULL_CUDF_VER::-10} >> $GITHUB_ENV
echo NEW_CUML_VER=${FULL_CUML_VER::-10} >> $GITHUB_ENV
echo NEW_UCX_PY_VER=${FULL_UCX_PY_VER::-10} >> $GITHUB_ENV

- name: Find and Replace Release
uses: jacobtomlinson/gha-find-replace@0.1.4
- name: Update RAPIDS version
uses: jacobtomlinson/gha-find-replace@v2
with:
include: 'continuous_integration\/gpuci\/axis\.yaml'
find: "RAPIDS_VER:\n- .*"
replace: |-
RAPIDS_VER:
- "${{ env.RAPIDS_VER }}"
find: "${{ env.RAPIDS_VER }}"
replace: "${{ env.NEW_CUDF_VER }}"
regex: false

- name: Create Pull Request
uses: peter-evans/create-pull-request@v3
# make sure ucx-py nightlies are available and that cuDF/cuML nightly versions match up
if: |
env.UCX_PY_VER != env.NEW_UCX_PY_VER &&
env.NEW_CUDF_VER == env.NEW_CUML_VER
with:
token: ${{ secrets.GITHUB_TOKEN }}
draft: true
commit-message: "Update gpuCI `RAPIDS_VER` to `${{ env.RAPIDS_VER }}`"
title: "Update gpuCI `RAPIDS_VER` to `${{ env.RAPIDS_VER }}`"
commit-message: "Update gpuCI `RAPIDS_VER` to `${{ env.NEW_CUDF_VER }}`"
title: "Update gpuCI `RAPIDS_VER` to `${{ env.NEW_CUDF_VER }}`"
team-reviewers: "dask/gpu"
author: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
branch: "upgrade-gpuci-rapids"
body: |
A new cuDF nightly version has been detected.
New cuDF and ucx-py nightly versions have been detected.

Updated `axis.yaml` to use `${{ env.RAPIDS_VER }}`.
Updated `axis.yaml` to use `${{ env.NEW_CUDF_VER }}`.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -60,3 +60,4 @@ dask_sql/jar
dask-worker-space/
node_modules/
docs/source/_build/
dask_planner/Cargo.lock
Loading