-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-27276][PYTHON][DOCS][FOLLOW-UP] Update documentation about Arrow version in PySpark as well #24504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @BryanCutler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realised that other Arrow optimization might likely be placed in other places .. :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense. The filename is sql-pyspark...
|
Test build #105055 has finished for PR 24504 at commit
|
viirya
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good.
There is another section:
Supported SQL Types
Currently, all Spark SQL data types are supported by Arrow-based conversion except MapType, ArrayType of TimestampType, and nested StructType. BinaryType is supported only when installed PyArrow is equal to or higher than 0.10.0.
As currently supported version is 0.12.1, the last sentence looks redundant?
|
Yea, thanks for checking it. |
|
Test build #105058 has finished for PR 24504 at commit
|
11a50df to
ffdb362
Compare
|
Test build #105061 has finished for PR 24504 at commit
|
BryanCutler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
merged to master, thanks @HyukjinKwon ! |
…ow version in PySpark as well ## What changes were proposed in this pull request? Looks updating documentation from 0.8.0 to 0.12.1 was missed. ## How was this patch tested? N/A Closes apache#24504 from HyukjinKwon/SPARK-27276-followup. Authored-by: HyukjinKwon <[email protected]> Signed-off-by: Bryan Cutler <[email protected]>
* [SPARK-27276][PYTHON][SQL] Increase minimum version of pyarrow to 0.12.1 and remove prior workarounds This increases the minimum support version of pyarrow to 0.12.1 and removes workarounds in pyspark to remain compatible with prior versions. This means that users will need to have at least pyarrow 0.12.1 installed and available in the cluster or an `ImportError` will be raised to indicate an upgrade is needed. Existing tests using: Python 2.7.15, pyarrow 0.12.1, pandas 0.24.2 Python 3.6.7, pyarrow 0.12.1, pandas 0.24.0 Closes apache#24298 from BryanCutler/arrow-bump-min-pyarrow-SPARK-27276. Authored-by: Bryan Cutler <[email protected]> Signed-off-by: HyukjinKwon <[email protected]> * Fix pandas infer_dtype warning * [SPARK-27276][PYTHON][DOCS][FOLLOW-UP] Update documentation about Arrow version in PySpark as well ## What changes were proposed in this pull request? Looks updating documentation from 0.8.0 to 0.12.1 was missed. ## How was this patch tested? N/A Closes apache#24504 from HyukjinKwon/SPARK-27276-followup. Authored-by: HyukjinKwon <[email protected]> Signed-off-by: Bryan Cutler <[email protected]> Co-authored-by: Bryan Cutler <[email protected]> Co-authored-by: HyukjinKwon <[email protected]>
…ow version in PySpark as well ## What changes were proposed in this pull request? Looks updating documentation from 0.8.0 to 0.12.1 was missed. ## How was this patch tested? N/A Closes apache#24504 from HyukjinKwon/SPARK-27276-followup. Authored-by: HyukjinKwon <[email protected]> Signed-off-by: Bryan Cutler <[email protected]>
What changes were proposed in this pull request?
Looks updating documentation from 0.8.0 to 0.12.1 was missed.
How was this patch tested?
N/A