Skip to content

Standardize predict interface using SAR standard#1039

Merged
miguelgfierro merged 10 commits intostagingfrom
miguel/predict_interface
Jan 24, 2020
Merged

Standardize predict interface using SAR standard#1039
miguelgfierro merged 10 commits intostagingfrom
miguel/predict_interface

Conversation

@miguelgfierro
Copy link
Collaborator

Description

for rating: predict
for ranking: recommend_top_k
also applied black

Related Issues

Checklist:

  • I have followed the contribution guidelines and code style for this project.
  • I have added tests covering my contributions.
  • I have updated the documentation accordingly.
  • This PR is being made to staging and not master.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

You'll be able to see Jupyter notebook diff and discuss changes. Powered by ReviewNB.



def compute_ranking_predictions(
def recommend_k_items(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function does not recommend k items. It computes the predictions for all users and items.

Copy link
Collaborator Author

@miguelgfierro miguelgfierro Jan 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohh good catch, the cut of k is done in fact when the metric is computed https://github.com/microsoft/recommenders/blob/master/notebooks/02_model/surprise_svd_deep_dive.ipynb, the output is a massive matrix instead of n_users x k

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after the meeting, we need to check whether the compute_ranking_pred method is used in another function that needs the full matrix, if not, we can optimize it and

for user in data[usercol].unique():
        for item in data[itemcol].unique():
            preds_lst.append(algo.predict(user, item).est)
       preds = sort(preds_lst)
       preds = preds[:k]
       
       preds_lst.append(preds)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also move this up:

    if remove_seen:
        tempdf = pd.concat(
            [
                data[[usercol, itemcol]],
                pd.DataFrame(
                    data=np.ones(data.shape[0]), columns=["dummycol"], index=data.index
                ),
            ],
            axis=1,
        )
        merged = pd.merge(tempdf, all_predictions, on=[usercol, itemcol], how="outer")
        return merged[merged["dummycol"].isnull()].drop("dummycol", axis=1)
    else:
        return all_predictions

so we remove the seen items before sorting

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also check with @yueguoguo whether or not we are using the threshold cut #1041

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we don't want to work with the scores as a matrix like sar does we could use a heap

from heapq import heappush, heappushpop

users = data[usercol].unique()
items = data[itemcol].unique()
preds_lst = np.zeros([len(users), k])

for user_idx, user in enumerate(users):
    heap = []
    for item in items:
        score = algo.predict(user, item).est
        if len(heap) < k:
            heappush(heap, score)
        elif score > heap[k - 1]:
            heappushpop(heap, score);
    preds_lst[user_idx] = heap[::-1]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heapq also has an nlargest() method. Not sure which way would be faster (depends on which algorithm it uses).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, but it wasn't clear to me if that method tries to sort the items (which would be unnecessary) or leverages the heap?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miguelgfierro
Copy link
Collaborator Author

miguelgfierro commented Jan 24, 2020

hey guys @anargyri @gramhagen, I reverted the interface of ranking in surprise to do the work on a different PR. The related issue is here: #1042.

Please let me know if you see anything else on this PR

@miguelgfierro miguelgfierro merged commit 9f3dfdd into staging Jan 24, 2020
@miguelgfierro miguelgfierro deleted the miguel/predict_interface branch January 24, 2020 23:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants