add mind utils#1247
Conversation
…nto v-jinyi/add-news-reco-methods
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
|
@miguelgfierro Could you please help review the PR? |
| @@ -0,0 +1,528 @@ | |||
| { | |||
There was a problem hiding this comment.
Minor detail, would you mind changing the private function _download_and_extract_globe to public download_and_extract_globe. In other notebooks we don't import private functions
Reply via ReviewNB
| @@ -0,0 +1,528 @@ | |||
| { | |||
There was a problem hiding this comment.
can you please move all imports to the first cell? In the rest of the notebooks we follow that convention
Reply via ReviewNB
| @@ -0,0 +1,528 @@ | |||
| { | |||
There was a problem hiding this comment.
Would you mind to move this function to the libraries? maybe in the mind utils from reco_utils.dataset.mind. Would you please explain what the regex does in the docstring?
Reply via ReviewNB
| @@ -0,0 +1,528 @@ | |||
| { | |||
There was a problem hiding this comment.
same here, please move this to the utils and add docstrings. Also, it might be easier for our users to understand what this function does if the title is more explicit. Some ideas would be load_glove_matrix, generate_embeddings, generate_embedding_matrix or any other you think is better
Reply via ReviewNB
There was a problem hiding this comment.
Also would be nice to have some comments in the codes what's going there. e.g. what's the data in l[0] vs l[1:] ? People who are not familiar w/ MIND dataset like me :-) will appreciate those comments.
There was a problem hiding this comment.
wordvec = [float(x) for x in l[1:]] if word in word_dict:
This part, seems we don't need to initialize wordvec if word is not in word_dict, meaning we can move wordvec = ... under if statement. That means, we can use 1 if-statement like:
if len(word) > 0 and word in word_dict:do something here
or if we sure word_dict doesn't include any len(word) == 0, simply:
if word in word_dict:
w/o checking len(word) > 0
miguelgfierro
left a comment
There was a problem hiding this comment.
This is really good, I have just small format suggestions. Thanks @yjw1029!!
tests/unit/test_notebooks_python.py
Outdated
| ), | ||
| ) | ||
|
|
||
| @pytest.mark.notebooks |
There was a problem hiding this comment.
how long does this test take? if it takes too long (ex. more than a couple of minutes) it might be better to move it to smoke
|
hi @yjw1029, I would like to follow up to see whether you have seen the comments. Thanks! |
|
Sorry for the late update. Already make the changes. |
|
great @yjw1029, thanks for the contribution! |
Description
add examples/01_prepare_data/prepare_mind_utils.ipynb to generate
Related Issues
#1182
#1238
Checklist:
stagingand notmaster.