Skip to content

Commit b1dc02c

Browse files
committed
[SPARK-32316][TESTS][INFRA] Test PySpark with Python 3.8 in Github Actions
This PR aims to test PySpark with Python 3.8 in Github Actions. In the script side, it is already ready: https://github.com/apache/spark/blob/4ad9bfd53b84a6d2497668c73af6899bae14c187/python/run-tests.py#L161 This PR includes small related fixes together: 1. Install Python 3.8 2. Only install one Python implementation instead of installing many for SQL and Yarn test cases because they need one Python executable in their test cases that is higher than Python 2. 3. Do not install Python 2 which is not needed anymore after we dropped Python 2 at SPARK-32138 4. Remove a comment about installing PyPy3 on Jenkins - SPARK-32278. It is already installed. Currently, only PyPy3 and Python 3.6 are being tested with PySpark in Github Actions. We should test the latest version of Python as well because some optimizations can be only enabled with Python 3.8+. See also apache#29114 No, dev-only. Was not tested. Github Actions build in this PR will test it out. Closes apache#29116 from HyukjinKwon/test-python3.8-togehter. Authored-by: HyukjinKwon <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
1 parent c6bbdf6 commit b1dc02c

File tree

2 files changed

+18
-13
lines changed

2 files changed

+18
-13
lines changed

.github/workflows/master.yml

Lines changed: 17 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -110,37 +110,42 @@ jobs:
110110
java-version: ${{ matrix.java }}
111111
# PySpark
112112
- name: Install PyPy3
113-
# SQL component also has Python related tests, for example, IntegratedUDFTestUtils.
114113
# Note that order of Python installations here matters because default python3 is
115114
# overridden by pypy3.
116115
uses: actions/setup-python@v2
117-
if: contains(matrix.modules, 'pyspark') || (contains(matrix.modules, 'sql') && !contains(matrix.modules, 'sql-'))
116+
if: contains(matrix.modules, 'pyspark')
118117
with:
119118
python-version: pypy3
120119
architecture: x64
121120
- name: Install Python 2.7
122121
uses: actions/setup-python@v2
123-
if: contains(matrix.modules, 'pyspark') || (contains(matrix.modules, 'sql') && !contains(matrix.modules, 'sql-'))
122+
if: contains(matrix.modules, 'pyspark')
124123
with:
125124
python-version: 2.7
126125
architecture: x64
127-
- name: Install Python 3.6
126+
- name: Install Python 3.8
128127
uses: actions/setup-python@v2
129-
if: contains(matrix.modules, 'pyspark') || (contains(matrix.modules, 'sql') && !contains(matrix.modules, 'sql-'))
128+
# We should install one Python that is higher then 3+ for SQL and Yarn because:
129+
# - SQL component also has Python related tests, for example, IntegratedUDFTestUtils.
130+
# - Yarn has a Python specific test too, for example, YarnClusterSuite.
131+
if: contains(matrix.modules, 'yarn') || contains(matrix.modules, 'pyspark') || (contains(matrix.modules, 'sql') && !contains(matrix.modules, 'sql-'))
130132
with:
131-
python-version: 3.6
133+
python-version: 3.8
132134
architecture: x64
133-
- name: Install Python packages
134-
if: contains(matrix.modules, 'pyspark') || (contains(matrix.modules, 'sql') && !contains(matrix.modules, 'sql-'))
135+
- name: Install Python packages (Python 2.7 and PyPy3)
136+
if: contains(matrix.modules, 'pyspark')
135137
# PyArrow is not supported in PyPy yet, see ARROW-2651.
136138
# TODO(SPARK-32247): scipy installation with PyPy fails for an unknown reason.
137139
run: |
138-
python3 -m pip install numpy pyarrow pandas scipy
139-
python3 -m pip list
140-
python2 -m pip install numpy pyarrow pandas scipy
141-
python2 -m pip list
140+
python2.7 -m pip install numpy pyarrow pandas scipy
141+
python2.7 -m pip list
142142
pypy3 -m pip install numpy pandas
143143
pypy3 -m pip list
144+
- name: Install Python packages (Python 3.8)
145+
if: contains(matrix.modules, 'pyspark') || (contains(matrix.modules, 'sql') && !contains(matrix.modules, 'sql-'))
146+
run: |
147+
python3.8 -m pip install numpy pyarrow pandas scipy
148+
python3.8 -m pip list
144149
# SparkR
145150
- name: Install R 3.6
146151
uses: r-lib/actions/setup-r@v1

python/run-tests.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -160,7 +160,7 @@ def run_individual_python_test(target_dir, test_name, pyspark_python):
160160

161161

162162
def get_default_python_executables():
163-
python_execs = [x for x in ["python2.7", "python3.6", "pypy", "pypy3"] if which(x)]
163+
python_execs = [x for x in ["python2.7", "python3.8", "pypy", "pypy3"] if which(x)]
164164
if "python2.7" not in python_execs:
165165
LOGGER.warning("Not testing against `python2.7` because it could not be found; falling"
166166
" back to `python` instead")

0 commit comments

Comments
 (0)