Skip to content

addition of item similarity measure - python version#1522

Merged
anargyri merged 2 commits intostagingfrom
zhangya/itemsimilaritypythonversion
Sep 20, 2021
Merged

addition of item similarity measure - python version#1522
anargyri merged 2 commits intostagingfrom
zhangya/itemsimilaritypythonversion

Conversation

@YanZhangADS
Copy link
Collaborator

Description

Related Issues

Checklist:

  • I have followed the contribution guidelines and code style for this project.
  • [x ] I have added tests covering my contributions.
  • [x ] I have updated the documentation accordingly.
  • [x ] This PR is being made to staging branch and not to main branch.

Copy link
Collaborator

@miguelgfierro miguelgfierro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yan this is really nice, I have one point for discussion

@codecov-commenter
Copy link

Codecov Report

Merging #1522 (20b0b53) into staging (1cab0e2) will increase coverage by 0.17%.
The diff coverage is 94.01%.

Impacted file tree graph

@@             Coverage Diff             @@
##           staging    #1522      +/-   ##
===========================================
+ Coverage    62.03%   62.20%   +0.17%     
===========================================
  Files           84       84              
  Lines         8397     8441      +44     
===========================================
+ Hits          5209     5251      +42     
- Misses        3188     3190       +2     
Impacted Files Coverage Δ
recommenders/evaluation/python_evaluation.py 93.68% <94.01%> (-3.21%) ⬇️
recommenders/datasets/movielens.py 77.52% <0.00%> (+1.12%) ⬆️
recommenders/datasets/criteo.py 86.79% <0.00%> (+3.77%) ⬆️
recommenders/datasets/split_utils.py 88.00% <0.00%> (+4.00%) ⬆️
recommenders/utils/spark_utils.py 96.00% <0.00%> (+8.00%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a3a9ac3...20b0b53. Read the comment docs.

Copy link
Collaborator

@miguelgfierro miguelgfierro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really good Yan, you have changed the code super quickly.
The only thing I see is that these changes also affect the notebooks right? but the tests didn't fail, so are we not testing the diversity notebook?

@YanZhangADS
Copy link
Collaborator Author

This is really good Yan, you have changed the code super quickly.
The only thing I see is that these changes also affect the notebooks right? but the tests didn't fail, so are we not testing the diversity notebook?

diversity notebook only use spark version as an example. We do not have example diversity notebook for python version.

Copy link
Collaborator

@miguelgfierro miguelgfierro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is super good Yan

# diversity metrics
class PythonDiversityEvaluation:
"""Python Diversity Evaluator"""
def check_column_dtypes_diversity_serendipity(func):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we have this decorator private (with an '_' at the front of the name)?

Copy link
Collaborator Author

@YanZhangADS YanZhangADS Sep 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did not use _ in existing code, e.g. the function "check_column_dtypes" does not have _ in front of the function name. Therefore I don't add _ to be consistent.

Copy link
Collaborator

@anargyri anargyri Sep 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I meant adding just an underscore, without any quotation marks. Currently you see these methods in readthedocs, where they are probably not needed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another issue with the docstrings. The args are missing in the python functions e.g. here because before they were inside the encapsulating class but now they are required.

* Serendipity - The "unusualness" or "surprise" of recommendations to a user. When 'col_relevance' is used, it indicates how "pleasant surprise" of recommendations is to a user.

The metric definitions/formulations are based on the following references with modification:
def check_column_dtypes_novelty_coverage(func):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar, should this be private?

Copy link
Collaborator

@anargyri anargyri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job, thanks!

@anargyri anargyri merged commit d43e450 into staging Sep 20, 2021
@anargyri
Copy link
Collaborator

There are some long lines (caught by flake).

@YanZhangADS
Copy link
Collaborator Author

There are some long lines (caught by flake).

I used "black" to format the files.

@YanZhangADS
Copy link
Collaborator Author

Thanks @anargyri for catching many issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants