Skip to content

Conversation

@AnthonyTruchet
Copy link
Owner

Backport apache#16037 to criteo-2.0 branch

What changes were proposed in this pull request?

CostFun used to send a dense vector of zeroes as a closure in a
treeAggregate call. To avoid that, we change the aggregation operations
to convert sparse vectors into dense vectors on the fly if needed and we
pass a sparse 0 vector which is lightweight.

How was this patch tested?

Unit test for module mllib run locally for correctness.

As for performance we run an heavy optimization on our production data (50 iterations on 128 MB weight vectors) and have seen significant decrease in terms both of runtime and container being killed by lack of off-heap memory.

Author: Anthony Truchet [email protected]
Author: sethah [email protected]

…rs of 0

CostFun used to send a dense vector of zeroes as a closure in a
treeAggregate call. To avoid that, we replace treeAggregate by
mapPartition + treeReduce, creating a zero vector inside the mapPartition
block in-place.

Unit test for module mllib run locally for correctness.

As for performance we run an heavy optimization on our production data (50 iterations on 128 MB weight vectors) and have seen significant decrease in terms both of runtime and container being killed by lack of off-heap memory.

Author: Anthony Truchet <[email protected]>
Author: sethah <[email protected]>
@Willymontaz Willymontaz deleted the SPARK-18471-branch-2.0 branch April 2, 2019 15:06
@AnthonyTruchet
Copy link
Owner Author

Abandonned

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant