Skip to content

Commit 7e72c60

Browse files
committed
addressed @srowen 's comments regarding implicit feedback
1 parent 5af5577 commit 7e72c60

2 files changed

Lines changed: 23 additions & 25 deletions

File tree

docs/ml-collaborative-filtering.md

Lines changed: 14 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -33,16 +33,16 @@ following parameters:
3333

3434
The standard approach to matrix factorization based collaborative filtering treats
3535
the entries in the user-item matrix as *explicit* preferences given by the user to the item.
36+
For example, users giving ratings to movies.
3637

37-
It is common in many real-world use cases to only have access to *implicit feedback* (e.g. views,
38-
clicks, purchases, likes, shares etc.). The approach used in `spark.ml` to deal with such data is taken
39-
from
40-
[Collaborative Filtering for Implicit Feedback Datasets](http://dx.doi.org/10.1109/ICDM.2008.22).
41-
Essentially instead of trying to model the matrix of ratings directly, this approach treats the data
42-
as a combination of binary preferences and *confidence values*. The ratings are then related to the
43-
level of confidence in observed user preferences, rather than explicit ratings given to items. The
44-
model then tries to find latent factors that can be used to predict the expected preference of a
45-
user for an item.
38+
It is common in many real-world use cases to only have access to *implicit feedback* (e.g. views,
39+
clicks, purchases, likes, shares etc.). The approach used in `spark.mllib` to deal with such data is taken
40+
from [Collaborative Filtering for Implicit Feedback Datasets](http://dx.doi.org/10.1109/ICDM.2008.22).
41+
Essentially, instead of trying to model the matrix of ratings directly, this approach treats the data
42+
as the number of observations of user actions. Those numbers are then related to the level of
43+
confidence in observed user preferences, rather than explicit ratings given to items. The model
44+
then tries to find latent factors that can be used to predict the expected preference of a user for
45+
an item.
4646

4747
### Scaling of the regularization parameter
4848

@@ -51,9 +51,8 @@ the number of ratings the user generated in updating user factors,
5151
or the number of ratings the product received in updating product factors.
5252
This approach is named "ALS-WR" and discussed in the paper
5353
"[Large-Scale Parallel Collaborative Filtering for the Netflix Prize](http://dx.doi.org/10.1007/978-3-540-68880-8_32)".
54-
It makes `regParam` less dependent on the scale of the dataset.
55-
So we can apply the best parameter learned from a sampled subset to the full dataset
56-
and expect similar performance.
54+
It makes `regParam` less dependent on the scale of the dataset, so we can apply the
55+
best parameter learned from a sampled subset to the full dataset and expect similar performance.
5756

5857
## Examples
5958

@@ -73,7 +72,7 @@ for more details on the API.
7372

7473
{% include_example scala/org/apache/spark/examples/ml/ALSExample.scala %}
7574

76-
If the rating matrix is derived from another source of information (e.g. it is
75+
If the rating matrix is derived from another source of information (i.e. it is
7776
inferred from other signals), you can set `implicitPrefs` to `true` to get
7877
better results:
7978

@@ -104,7 +103,7 @@ for more details on the API.
104103

105104
{% include_example java/org/apache/spark/examples/ml/JavaALSExample.java %}
106105

107-
If the rating matrix is derived from another source of information (e.g. it is
106+
If the rating matrix is derived from another source of information (i.e. it is
108107
inferred from other signals), you can set `implicitPrefs` to `true` to get
109108
better results:
110109

@@ -135,7 +134,7 @@ for more details on the API.
135134

136135
{% include_example python/ml/als_example.py %}
137136

138-
If the rating matrix is derived from another source of information (e.g. it is
137+
If the rating matrix is derived from another source of information (i.e. it is
139138
inferred from other signals), you can set `implicitPrefs` to `True` to get
140139
better results:
141140

docs/mllib-collaborative-filtering.md

Lines changed: 9 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -32,16 +32,16 @@ following parameters:
3232

3333
The standard approach to matrix factorization based collaborative filtering treats
3434
the entries in the user-item matrix as *explicit* preferences given by the user to the item.
35+
For example, users giving ratings to movies.
3536

3637
It is common in many real-world use cases to only have access to *implicit feedback* (e.g. views,
3738
clicks, purchases, likes, shares etc.). The approach used in `spark.mllib` to deal with such data is taken
38-
from
39-
[Collaborative Filtering for Implicit Feedback Datasets](http://dx.doi.org/10.1109/ICDM.2008.22).
40-
Essentially instead of trying to model the matrix of ratings directly, this approach treats the data
41-
as a combination of binary preferences and *confidence values*. The ratings are then related to the
42-
level of confidence in observed user preferences, rather than explicit ratings given to items. The
43-
model then tries to find latent factors that can be used to predict the expected preference of a
44-
user for an item.
39+
from [Collaborative Filtering for Implicit Feedback Datasets](http://dx.doi.org/10.1109/ICDM.2008.22).
40+
Essentially, instead of trying to model the matrix of ratings directly, this approach treats the data
41+
as the number of observations of user actions. Those numbers are then related to the level of
42+
confidence in observed user preferences, rather than explicit ratings given to items. The model
43+
then tries to find latent factors that can be used to predict the expected preference of a user for
44+
an item.
4545

4646
### Scaling of the regularization parameter
4747

@@ -50,9 +50,8 @@ the number of ratings the user generated in updating user factors,
5050
or the number of ratings the product received in updating product factors.
5151
This approach is named "ALS-WR" and discussed in the paper
5252
"[Large-Scale Parallel Collaborative Filtering for the Netflix Prize](http://dx.doi.org/10.1007/978-3-540-68880-8_32)".
53-
It makes `lambda` less dependent on the scale of the dataset.
54-
So we can apply the best parameter learned from a sampled subset to the full dataset
55-
and expect similar performance.
53+
It makes `lambda` less dependent on the scale of the dataset, so we can apply the
54+
best parameter learned from a sampled subset to the full dataset and expect similar performance.
5655

5756
## Examples
5857

0 commit comments

Comments
 (0)