Skip to content

Conversation

@huaxingao
Copy link
Contributor

What changes were proposed in this pull request?

Update HashingTF to use new implementation of MurmurHash3
Make HashingTF use the old MurmurHash3 when a model from pre 3.0 is loaded

How was this patch tested?

Change existing unit tests. Also add one unit test to make sure HashingTF use the old MurmurHash3 when a model from pre 3.0 is loaded

@SparkQA
Copy link

SparkQA commented Jul 30, 2019

Test build #108419 has finished for PR 25303 at commit ad82dac.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

assert(hashingTF.indexOf("d") === 90)
}

test("Load HashingTF prior to Spark 3.0") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a compatibility bug fix, could you add SPARK-23469 prefix?

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-23469] HashingTF should use corrected MurmurHash3 implementation [SPARK-23469][ML] HashingTF should use corrected MurmurHash3 implementation Jul 30, 2019
@SparkQA
Copy link

SparkQA commented Jul 30, 2019

Test build #108421 has finished for PR 25303 at commit 293700d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 30, 2019

Test build #108422 has finished for PR 25303 at commit e7419e5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// We support loading old `HashingTF` saved by previous Spark versions.
// Previous `HashingTF` uses `mllib.feature.HashingTF.murmur3Hash`, but new `HashingTF` uses
// `ml.Feature.FeatureHasher.murmur3Hash`.
val (majorVersion, minorVersion) = majorMinorVersion(metadata.sparkVersion)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good catch here.
Nit: you can replace minorVersion with _ just for tidiness

@SparkQA
Copy link

SparkQA commented Jul 31, 2019

Test build #108478 has finished for PR 25303 at commit 3de1929.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Aug 2, 2019

Merged to master. I added a release note for this change.

@srowen srowen closed this in 660423d Aug 2, 2019
@huaxingao
Copy link
Contributor Author

Thanks for your help! @srowen @dongjoon-hyun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants