@@ -33,16 +33,16 @@ following parameters:
3333
3434The standard approach to matrix factorization based collaborative filtering treats
3535the entries in the user-item matrix as * explicit* preferences given by the user to the item.
36+ For example, users giving ratings to movies.
3637
37- It is common in many real-world use cases to only have access to * implicit feedback* (e.g. views,
38- clicks, purchases, likes, shares etc.). The approach used in ` spark.ml ` to deal with such data is taken
39- from
40- [ Collaborative Filtering for Implicit Feedback Datasets] ( http://dx.doi.org/10.1109/ICDM.2008.22 ) .
41- Essentially instead of trying to model the matrix of ratings directly, this approach treats the data
42- as a combination of binary preferences and * confidence values* . The ratings are then related to the
43- level of confidence in observed user preferences, rather than explicit ratings given to items. The
44- model then tries to find latent factors that can be used to predict the expected preference of a
45- user for an item.
38+ It is common in many real-world use cases to only have access to * implicit feedback* (e.g. views,
39+ clicks, purchases, likes, shares etc.). The approach used in ` spark.mllib ` to deal with such data is taken
40+ from [ Collaborative Filtering for Implicit Feedback Datasets] ( http://dx.doi.org/10.1109/ICDM.2008.22 ) .
41+ Essentially, instead of trying to model the matrix of ratings directly, this approach treats the data
42+ as the number of observations of user actions. Those numbers are then related to the level of
43+ confidence in observed user preferences, rather than explicit ratings given to items. The model
44+ then tries to find latent factors that can be used to predict the expected preference of a user for
45+ an item.
4646
4747### Scaling of the regularization parameter
4848
@@ -51,9 +51,8 @@ the number of ratings the user generated in updating user factors,
5151or the number of ratings the product received in updating product factors.
5252This approach is named "ALS-WR" and discussed in the paper
5353"[ Large-Scale Parallel Collaborative Filtering for the Netflix Prize] ( http://dx.doi.org/10.1007/978-3-540-68880-8_32 ) ".
54- It makes ` regParam ` less dependent on the scale of the dataset.
55- So we can apply the best parameter learned from a sampled subset to the full dataset
56- and expect similar performance.
54+ It makes ` regParam ` less dependent on the scale of the dataset, so we can apply the
55+ best parameter learned from a sampled subset to the full dataset and expect similar performance.
5756
5857## Examples
5958
@@ -73,7 +72,7 @@ for more details on the API.
7372
7473{% include_example scala/org/apache/spark/examples/ml/ALSExample.scala %}
7574
76- If the rating matrix is derived from another source of information (e.g . it is
75+ If the rating matrix is derived from another source of information (i.e . it is
7776inferred from other signals), you can set ` implicitPrefs ` to ` true ` to get
7877better results:
7978
@@ -104,7 +103,7 @@ for more details on the API.
104103
105104{% include_example java/org/apache/spark/examples/ml/JavaALSExample.java %}
106105
107- If the rating matrix is derived from another source of information (e.g . it is
106+ If the rating matrix is derived from another source of information (i.e . it is
108107inferred from other signals), you can set ` implicitPrefs ` to ` true ` to get
109108better results:
110109
@@ -135,7 +134,7 @@ for more details on the API.
135134
136135{% include_example python/ml/als_example.py %}
137136
138- If the rating matrix is derived from another source of information (e.g . it is
137+ If the rating matrix is derived from another source of information (i.e . it is
139138inferred from other signals), you can set ` implicitPrefs ` to ` True ` to get
140139better results:
141140
0 commit comments