Standardize predict interface using SAR standard#1039
Standardize predict interface using SAR standard#1039miguelgfierro merged 10 commits intostagingfrom
Conversation
|
Check out this pull request on You'll be able to see Jupyter notebook diff and discuss changes. Powered by ReviewNB. |
|
|
||
|
|
||
| def compute_ranking_predictions( | ||
| def recommend_k_items( |
There was a problem hiding this comment.
This function does not recommend k items. It computes the predictions for all users and items.
There was a problem hiding this comment.
ohh good catch, the cut of k is done in fact when the metric is computed https://github.com/microsoft/recommenders/blob/master/notebooks/02_model/surprise_svd_deep_dive.ipynb, the output is a massive matrix instead of n_users x k
There was a problem hiding this comment.
after the meeting, we need to check whether the compute_ranking_pred method is used in another function that needs the full matrix, if not, we can optimize it and
for user in data[usercol].unique():
for item in data[itemcol].unique():
preds_lst.append(algo.predict(user, item).est)
preds = sort(preds_lst)
preds = preds[:k]
preds_lst.append(preds)
There was a problem hiding this comment.
also move this up:
if remove_seen:
tempdf = pd.concat(
[
data[[usercol, itemcol]],
pd.DataFrame(
data=np.ones(data.shape[0]), columns=["dummycol"], index=data.index
),
],
axis=1,
)
merged = pd.merge(tempdf, all_predictions, on=[usercol, itemcol], how="outer")
return merged[merged["dummycol"].isnull()].drop("dummycol", axis=1)
else:
return all_predictions
so we remove the seen items before sorting
There was a problem hiding this comment.
also check with @yueguoguo whether or not we are using the threshold cut #1041
There was a problem hiding this comment.
if we don't want to work with the scores as a matrix like sar does we could use a heap
from heapq import heappush, heappushpop
users = data[usercol].unique()
items = data[itemcol].unique()
preds_lst = np.zeros([len(users), k])
for user_idx, user in enumerate(users):
heap = []
for item in items:
score = algo.predict(user, item).est
if len(heap) < k:
heappush(heap, score)
elif score > heap[k - 1]:
heappushpop(heap, score);
preds_lst[user_idx] = heap[::-1]
There was a problem hiding this comment.
heapq also has an nlargest() method. Not sure which way would be faster (depends on which algorithm it uses).
There was a problem hiding this comment.
true, but it wasn't clear to me if that method tries to sort the items (which would be unnecessary) or leverages the heap?
There was a problem hiding this comment.
https://github.com/python/cpython/blob/61b3484cdf27ceca1c1069a351487d2db4b2b48c/Lib/heapq.py#L395
It looks similar to your for loop.
|
hey guys @anargyri @gramhagen, I reverted the interface of ranking in surprise to do the work on a different PR. The related issue is here: #1042. Please let me know if you see anything else on this PR |
Description
for rating: predict
for ranking: recommend_top_k
also applied black
Related Issues
Checklist:
stagingand notmaster.