-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-20952] ParquetFileFormat should forward TaskContext to its forkjoinpool #18176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Jenkins this is ok to test |
|
Test build #77681 has finished for PR 18176 at commit
|
06405ef to
6430dca
Compare
|
Test build #77753 has finished for PR 18176 at commit
|
|
cc @JoshRosen who had some concerns with shared threadpool objects |
|
Discussion on this is happening on JIRA: https://issues.apache.org/jira/browse/SPARK-20952 |
6430dca to
25e8408
Compare
|
Test build #78531 has finished for PR 18176 at commit
|
|
Jenkins this is ok to test |
|
Test build #78719 has finished for PR 18176 at commit
|
|
Can one of the admins verify this patch? |
…kjoinpool See ticket [1] and the PR [2] @robert3005 opened on apache/spark. It seems upstream wasn't convinced by our use-case. We need this change because Spark reads parquet footers on a different thread. Without this change, that thread doesn't inherit the thread-local that stores the TaskContext, meaning we don't have access to the properties stored inside that TaskContext Our internal filesystem needs those properties in the task context and fails to read footers without. [1] https://issues.apache.org/jira/browse/SPARK-20952 [2] apache#18176 Co-authored-by: Robert Kruszewski <[email protected]> Co-authored-by: Josh Casale <[email protected]> Co-authored-by: Will Raschkowski <[email protected]>
…kjoinpool See ticket [1] and the PR [2] @robert3005 opened on apache/spark. It seems upstream wasn't convinced by our use-case. We need this change because Spark reads parquet footers on a different thread. Without this change, that thread doesn't inherit the thread-local that stores the TaskContext, meaning we don't have access to the properties stored inside that TaskContext Our internal filesystem needs those properties in the task context and fails to read footers without. [1] https://issues.apache.org/jira/browse/SPARK-20952 [2] apache#18176 Co-authored-by: Robert Kruszewski <[email protected]> Co-authored-by: Josh Casale <[email protected]> Co-authored-by: Will Raschkowski <[email protected]>
…kjoinpool See ticket [1] and the PR [2] @robert3005 opened on apache/spark. It seems upstream wasn't convinced by our use-case. We need this change because Spark reads parquet footers on a different thread. Without this change, that thread doesn't inherit the thread-local that stores the TaskContext, meaning we don't have access to the properties stored inside that TaskContext Our internal filesystem needs those properties in the task context and fails to read footers without. [1] https://issues.apache.org/jira/browse/SPARK-20952 [2] apache#18176 Co-authored-by: Robert Kruszewski <[email protected]> Co-authored-by: Josh Casale <[email protected]> Co-authored-by: Will Raschkowski <[email protected]>
…kjoinpool See ticket [1] and the PR [2] @robert3005 opened on apache/spark. It seems upstream wasn't convinced by our use-case. We need this change because Spark reads parquet footers on a different thread. Without this change, that thread doesn't inherit the thread-local that stores the TaskContext, meaning we don't have access to the properties stored inside that TaskContext Our internal filesystem needs those properties in the task context and fails to read footers without. [1] https://issues.apache.org/jira/browse/SPARK-20952 [2] apache#18176 Co-authored-by: Robert Kruszewski <[email protected]> Co-authored-by: Josh Casale <[email protected]> Co-authored-by: Will Raschkowski <[email protected]>
…kjoinpool See ticket [1] and the PR [2] @robert3005 opened on apache/spark. It seems upstream wasn't convinced by our use-case. We need this change because Spark reads parquet footers on a different thread. Without this change, that thread doesn't inherit the thread-local that stores the TaskContext, meaning we don't have access to the properties stored inside that TaskContext Our internal filesystem needs those properties in the task context and fails to read footers without. [1] https://issues.apache.org/jira/browse/SPARK-20952 [2] apache#18176 Co-authored-by: Robert Kruszewski <[email protected]> Co-authored-by: Josh Casale <[email protected]> Co-authored-by: Will Raschkowski <[email protected]>
…kjoinpool See ticket [1] and the PR [2] @robert3005 opened on apache/spark. It seems upstream wasn't convinced by our use-case. We need this change because Spark reads parquet footers on a different thread. Without this change, that thread doesn't inherit the thread-local that stores the TaskContext, meaning we don't have access to the properties stored inside that TaskContext Our internal filesystem needs those properties in the task context and fails to read footers without. [1] https://issues.apache.org/jira/browse/SPARK-20952 [2] apache#18176 Co-authored-by: Robert Kruszewski <[email protected]> Co-authored-by: Josh Casale <[email protected]> Co-authored-by: Will Raschkowski <[email protected]>
What changes were proposed in this pull request?
Make TaskContext reference an InheritableTheadLocal so thread pools spun up inside tasks have access to the reference
How was this patch tested?
Added tests