Skip to content
Closed
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -992,7 +992,20 @@ object Matrices {
new DenseMatrix(dm.rows, dm.cols, dm.data, dm.isTranspose)
case sm: BSM[Double] =>
// There is no isTranspose flag for sparse matrices in Breeze
new SparseMatrix(sm.rows, sm.cols, sm.colPtrs, sm.rowIndices, sm.data)

// Some Breeze CSCMatrices may have extra trailing zeros in
Copy link
Contributor

@hhbyyh hhbyyh May 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: make this comment more compact in fewer lines.

// .rowIndices and .data, which are added after some matrix
// operations for efficiency.
//
// Therefore the last element of sm.colPtrs would no longer be
// coherent with the size of sm.rowIndices and sm.data
// despite sm being a valid CSCMatrix.
// We need to truncate both arrays (rowIndices, data)
// to the real size of the vector sm.activeSize to allow valid conversion

Copy link
Contributor

@hhbyyh hhbyyh May 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can add if (sm.activeSize != sm.rowIndices.length) here, since this is only needed when necessary.
Please refer to https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/linalg/CSCMatrix.scala#L130

val truncRowIndices = sm.rowIndices.slice(0, sm.activeSize)
val truncData = sm.data.slice(0, sm.activeSize)
Copy link
Contributor

@hhbyyh hhbyyh May 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same as calling compact(). To make it less sensitive to the breeze internal implementation, how about:

val matCopy = sm.copy
matCopy.compact()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm implementing both suggestions, however, wouldn't be the sm.copy more expensive than just doing those 2 slices?

new SparseMatrix(sm.rows, sm.cols, sm.colPtrs, truncRowIndices, truncData)
case _ =>
throw new UnsupportedOperationException(
s"Do not support conversion from type ${breeze.getClass.getName}.")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,26 @@ class MatricesSuite extends SparkFunSuite {
}
}

test("Test Breeze Conversion Bug - SPARK-20687") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specific name: Test FromBreeze when Breeze.CSCMatrix.rowIndices has trailing zeros.

And move the test after another unit test "fromBreeze with sparse matrix"

// (2, 0, 0)
// (2, 0, 0)
val mat1Brz = Matrices.sparse(2, 3, Array(0, 2, 2, 2), Array(0, 1), Array(2, 2)).asBreeze
// (2, 1E-15, 1E-15)
// (2, 1E-15, 1E-15)
val mat2Brz = Matrices.sparse(2, 3,
Array(0, 2, 4, 6),
Array(0, 0, 0, 1, 1, 1),
Array(2, 1E-15, 1E-15, 2, 1E-15, 1E-15)).asBreeze
val t1Brz = mat1Brz - mat2Brz
val t2Brz = mat2Brz - mat1Brz
// The following operations raise exceptions on un-patch Matrices.fromBreeze
val t1 = Matrices.fromBreeze(t1Brz)
val t2 = Matrices.fromBreeze(t2Brz)
// t1 == t1Brz && t2 == t2Brz
assert((t1.asBreeze - t1Brz).iterator.map((x) => math.abs(x._2)).sum < 1E-15)
assert((t2.asBreeze - t2Brz).iterator.map((x) => math.abs(x._2)).sum < 1E-15)
}

test("sparse matrix construction") {
val m = 3
val n = 4
Expand Down