Python evaluator module fix by loomlike · Pull Request #863 · recommenders-team/recommenders

loomlike · 2019-07-12T02:26:21Z

Description

Python evaluation module' ranking metric functions have redundant and unnecessary sorting codes.
E.g.

df_hit["rank"] = df_hit.groupby(col_user)[col_prediction].rank(
        method="first", ascending=False
)

doesn't need to use rank() since df_hit is already sorted by user and ratings as it is generated by groupby user (pandas groupby's sort argument is by default True) and nlargest ratings.

This change removes those redundant and unnecessary sorts and also refactor get_top_k_items to return DataFrame with 'rank' column to make its behavior the same as our pyspark evaluation module.

Related Issues

Checklist:

I have followed the contribution guidelines and code style for this project.
I have added tests covering my contributions.
I have updated the documentation accordingly.

Remove redundant and unnecessary sortings Refactor get_top_k_items to return DataFrame with 'rank' column same as pyspark's

miguelgfierro

LGTM

gramhagen

this is great, small improvement suggested

gramhagen · 2019-07-12T13:46:49Z

reco_utils/evaluation/python_evaluation.py

        .apply(lambda x: x.nlargest(k, col_rating))
        .reset_index(drop=True)
    )
+    top_k_items["rank"] = top_k_items.groupby(col_user).cumcount() + 1


you can avoid the repeated groupby too

groups = dataframe.groupby(col_user, as_index=False) top_k_items = groups.apply(lambda x: x.nlargest(k, col_rating)).reset_index(drop=True) top_k_items["rank"] = groups.cumcount() + 1

gramhagen · 2019-07-12T13:47:27Z

reco_utils/evaluation/python_evaluation.py


    Returns:
-        pd.DataFrame: DataFrame of top k items for each user
+        pd.DataFrame: DataFrame of top k items for each user, sorted by `col_user` and `"rank"`


i would remove the double quotes from rank to match just the backticks like col_user

also, in the returns section of get_top_k_items =)

good catch!

yueguoguo

Great

gramhagen

one more "rank" is there, if you can fix that then we're good

loomlike · 2019-07-12T18:28:57Z

@gramhagen Few changes since the last review:

changed "rank" to rank
caching groups turns out does not work, since nlargest sorts the ratings while the cached group object still contains unsorted ratings. I changed it back to use groupby again, but added sort=False so that groupby can be performed efficiently (groupby-without-sorting still keeps the inter-group orders and we already sorted previously by 'nlargest')
found the above issue from spark's unit-tests which matches spark-evaluation-fn results to python's. Python evaluation tests couldn't catch the error because the test case users and items were already sorted. I made a simple tweak to the test case so that can catch such errors in the future.

gramhagen · 2019-07-12T18:38:16Z

oh interesting, i didn't realize we use the python evaluation to validate test results for spark, we should remove that linkage, I'll add a separate feature request

gramhagen · 2019-07-12T18:53:30Z

oh, i take it back, I guess that's an additional check just to ensure they match. i guess it helped in this case.

miguelgfierro · 2019-07-15T12:04:16Z

@loomlike feel free to merge when you think it is convenient

* Python evaluator module fix Remove redundant and unnecessary sortings Refactor get_top_k_items to return DataFrame with 'rank' column same as pyspark's * Update test to catch corner case

Python evaluator module fix

3c429f2

Remove redundant and unnecessary sortings Refactor get_top_k_items to return DataFrame with 'rank' column same as pyspark's

loomlike requested review from gramhagen, miguelgfierro and yueguoguo July 12, 2019 02:26

miguelgfierro approved these changes Jul 12, 2019

View reviewed changes

gramhagen requested changes Jul 12, 2019

View reviewed changes

more clean-up

9954dc3

yueguoguo approved these changes Jul 12, 2019

View reviewed changes

gramhagen approved these changes Jul 12, 2019

View reviewed changes

Update test to catch corner case

f3c5b92

loomlike merged commit 793799a into staging Jul 15, 2019

loomlike deleted the jumin/evaluation-fix branch July 15, 2019 14:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python evaluator module fix#863

Python evaluator module fix#863
loomlike merged 3 commits intostagingfrom
jumin/evaluation-fix

loomlike commented Jul 12, 2019 •

edited

Loading

Uh oh!

miguelgfierro left a comment

Uh oh!

gramhagen left a comment

Uh oh!

gramhagen Jul 12, 2019

Uh oh!

gramhagen Jul 12, 2019

Uh oh!

gramhagen Jul 12, 2019 •

edited

Loading

Uh oh!

loomlike Jul 12, 2019

Uh oh!

yueguoguo left a comment

Uh oh!

gramhagen left a comment

Uh oh!

loomlike commented Jul 12, 2019

Uh oh!

gramhagen commented Jul 12, 2019

Uh oh!

gramhagen commented Jul 12, 2019

Uh oh!

miguelgfierro commented Jul 15, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

loomlike commented Jul 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Checklist:

Uh oh!

miguelgfierro left a comment

Choose a reason for hiding this comment

Uh oh!

gramhagen left a comment

Choose a reason for hiding this comment

Uh oh!

gramhagen Jul 12, 2019

Choose a reason for hiding this comment

Uh oh!

gramhagen Jul 12, 2019

Choose a reason for hiding this comment

Uh oh!

gramhagen Jul 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

loomlike Jul 12, 2019

Choose a reason for hiding this comment

Uh oh!

yueguoguo left a comment

Choose a reason for hiding this comment

Uh oh!

gramhagen left a comment

Choose a reason for hiding this comment

Uh oh!

loomlike commented Jul 12, 2019

Uh oh!

gramhagen commented Jul 12, 2019

Uh oh!

gramhagen commented Jul 12, 2019

Uh oh!

miguelgfierro commented Jul 15, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

loomlike commented Jul 12, 2019 •

edited

Loading

gramhagen Jul 12, 2019 •

edited

Loading