Skip to content

[SPARK-26228][MLLIB] OOM issue encountered when computing Gramian matrix#23600

Closed
srowen wants to merge 1 commit intoapache:masterfrom
srowen:SPARK-26228
Closed

[SPARK-26228][MLLIB] OOM issue encountered when computing Gramian matrix#23600
srowen wants to merge 1 commit intoapache:masterfrom
srowen:SPARK-26228

Conversation

@srowen
Copy link
Copy Markdown
Member

@srowen srowen commented Jan 21, 2019

What changes were proposed in this pull request?

Avoid memory problems in closure cleaning when handling large Gramians (>= 16K rows/cols) by using null as zeroValue

How was this patch tested?

Existing tests.
Note that it's hard to test the case that triggers this issue as it would require a large amount of memory and run a while. I confirmed locally that a 16K x 16K Gramian failed with tons of driver memory before, and didn't fail upfront after this change.

…s (>= 16K rows/cols) by using null as zeroValue
@srowen srowen self-assigned this Jan 21, 2019
@srowen srowen requested a review from mengxr January 21, 2019 01:33
@SparkQA
Copy link
Copy Markdown

SparkQA commented Jan 21, 2019

Test build #101457 has finished for PR 23600 at commit eef9ea0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen srowen closed this in 6dcad38 Jan 23, 2019
srowen added a commit that referenced this pull request Jan 23, 2019
Avoid memory problems in closure cleaning when handling large Gramians (>= 16K rows/cols) by using null as zeroValue

Existing tests.
Note that it's hard to test the case that triggers this issue as it would require a large amount of memory and run a while. I confirmed locally that a 16K x 16K Gramian failed with tons of driver memory before, and didn't fail upfront after this change.

Closes #23600 from srowen/SPARK-26228.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
(cherry picked from commit 6dcad38)
Signed-off-by: Sean Owen <sean.owen@databricks.com>
srowen added a commit that referenced this pull request Jan 23, 2019
Avoid memory problems in closure cleaning when handling large Gramians (>= 16K rows/cols) by using null as zeroValue

Existing tests.
Note that it's hard to test the case that triggers this issue as it would require a large amount of memory and run a while. I confirmed locally that a 16K x 16K Gramian failed with tons of driver memory before, and didn't fail upfront after this change.

Closes #23600 from srowen/SPARK-26228.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
(cherry picked from commit 6dcad38)
Signed-off-by: Sean Owen <sean.owen@databricks.com>
@srowen
Copy link
Copy Markdown
Member Author

srowen commented Jan 23, 2019

Merged to master/2.4/2.3

@srowen srowen deleted the SPARK-26228 branch February 14, 2019 21:52
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
## What changes were proposed in this pull request?

Avoid memory problems in closure cleaning when handling large Gramians (>= 16K rows/cols) by using null as zeroValue

## How was this patch tested?

Existing tests.
Note that it's hard to test the case that triggers this issue as it would require a large amount of memory and run a while. I confirmed locally that a 16K x 16K Gramian failed with tons of driver memory before, and didn't fail upfront after this change.

Closes apache#23600 from srowen/SPARK-26228.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Jul 23, 2019
Avoid memory problems in closure cleaning when handling large Gramians (>= 16K rows/cols) by using null as zeroValue

Existing tests.
Note that it's hard to test the case that triggers this issue as it would require a large amount of memory and run a while. I confirmed locally that a 16K x 16K Gramian failed with tons of driver memory before, and didn't fail upfront after this change.

Closes apache#23600 from srowen/SPARK-26228.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
(cherry picked from commit 6dcad38)
Signed-off-by: Sean Owen <sean.owen@databricks.com>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Aug 1, 2019
Avoid memory problems in closure cleaning when handling large Gramians (>= 16K rows/cols) by using null as zeroValue

Existing tests.
Note that it's hard to test the case that triggers this issue as it would require a large amount of memory and run a while. I confirmed locally that a 16K x 16K Gramian failed with tons of driver memory before, and didn't fail upfront after this change.

Closes apache#23600 from srowen/SPARK-26228.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
(cherry picked from commit 6dcad38)
Signed-off-by: Sean Owen <sean.owen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants