-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Kdd2020 tutorial updated #1208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Kdd2020 tutorial updated #1208
Changes from all commits
Commits
Show all changes
33 commits
Select commit
Hold shift + click to select a range
ffbce15
add kdd2020 tutorials for knowledge-aware recommendations
Leavingseason 141eb91
v0: ready for running
Leavingseason 184d289
add environment config files
Leavingseason 8f37eb8
text changes
Leavingseason 70f0c47
update notebook step1
Leavingseason eacac58
update notebook step2
Leavingseason 9db5623
update notebook step3
Leavingseason a38528d
update notebook steps
Leavingseason aa6d9d9
add README
yueguoguo 1949734
update readme
yueguoguo 6238d41
Merge pull request #1164 from microsoft/le/kdd_tutorial
Leavingseason 171d244
update notebooks; move functions to utils
Leavingseason 681239e
update notebook step 3
Leavingseason c101ad7
update step1 and step5
Leavingseason 5918168
fix LightGCN bug and update step2 step5
Leavingseason d840596
add reco_gpu_kdd.yaml
Leavingseason d7c0c0e
delete unused folder; add cpu yaml
Leavingseason 1b40882
update reco_cpu_kdd.yaml
Leavingseason a2679a6
update yaml config: remove pytorch and fastai
Leavingseason 950dfd8
Update README.md
Leavingseason a9aa7ed
add scripts for subgraph analysis
Leavingseason cc9c645
Update reco_gpu_kdd.yaml
miguelgfierro 03d3b19
Merge branch 'staging' into kdd2020_tutorial
Leavingseason 283a3bd
Merge branch 'staging' into kdd2020_tutorial
Leavingseason e884a69
update yaml
Leavingseason d854c39
Adjust structure; update comments
Leavingseason df9d996
add test cases
Leavingseason 9394ede
add gensim to yaml env config
Leavingseason 464f5fb
add liscense info
Leavingseason b55f3d3
move the tutorial to examples/07_tutorials
Leavingseason 7058113
add yaml and sh files
Leavingseason e13cf67
update step4
Leavingseason 2d7249d
update README
Leavingseason File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,46 @@ | ||
| # Environment setup | ||
| The following setup instructions assume users work in a Linux system. The testing was performed on a Ubuntu Linux system. | ||
| We use Conda to install packages and manage the virtual environment. Type ``` conda list ``` to check if you have conda in your machine. If not, please follow the instructions on https://conda.io/projects/conda/en/latest/user-guide/install/linux.html to install either Miniconda or Anaconda (preferred) before we proceed. | ||
|
|
||
| 1. Clone the repository | ||
| ```bash | ||
| git clone https://github.com/microsoft/recommenders | ||
| ``` | ||
|
|
||
| 1. Navigate to the tutorial folder. The materials for the tutorial are located under the directory of `recommenders/examples/07_tutorials/KDD2020-tutorial`. | ||
| ```bash | ||
| cd recommenders/examples/07_tutorials/KDD2020-tutorial | ||
| ``` | ||
| 1. Download the dataset | ||
| 1. Download the dataset for hands on experiments and unzip to data_folder: | ||
| ```bash | ||
| wget https://recodatasets.blob.core.windows.net/kdd2020/data_folder.zip | ||
| unzip data_folder.zip -d data_folder | ||
| ``` | ||
| After you unzip the file, there are two folders under data_folder, i.e. 'raw' and 'my_cached'. 'raw' folder contains original txt files from the COVID MAG dataset. 'my_cached' folder contains processed data files, if you miss some steps during the hands-on tutorial, you can make it up by copying corresponding files into experiment folders. | ||
| 1. Install the dependencies | ||
| 1. The model pre-training will use a tool for converting the original data into embeddings. Use of the tool will require `g++`. The following installs `g++` on a Linux system. | ||
| ```bash | ||
| sudo apt-get install g++ | ||
| ``` | ||
| 1. The Python script will be run in a conda environment where the dependencies are installed. This can be done by using the `reco_gpu_kdd.yaml` file provided in the branch subfolder with the following commands. | ||
| ```bash | ||
| conda env create -n kdd_tutorial_2020 -f reco_gpu_kdd.yaml | ||
| conda activate kdd_tutorial_2020 | ||
| ``` | ||
| 1. The tutorial will be conducated by using the Jupyter notebooks. The newly created conda kernel can be registered with the Jupyter notebook server | ||
| ```bash | ||
| python -m ipykernel install --user --name kdd_tutorial_2020 --display-name "Python (kdd tutorial)" | ||
| ``` | ||
|
|
||
| # Tutorial notebooks/scripts | ||
| After the setup, the users should be able to launch the notebooks locally with the command | ||
| ```bash | ||
| jupyter notebook --port=8080 | ||
| ``` | ||
| Then the notebook can be spinned off in a browser at the address of `localhost:8080`. | ||
| Alternatively, if the jupyter notebook server is on a remote server, the users can launch the jupyter notebook by using the following command. | ||
| ```bash | ||
| jupyter notebook --no-browser --ip=10.214.70.89 --port=8080 | ||
| ``` | ||
| From the local browser, the notebook can be spinned off at the address of `10.214.70.89:8080`. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,61 @@ | ||
| data: | ||
| doc_size: 15 # Each feature length should be fixed at doc_size, if the number of words in document is more than doc_size, you should truncate the document to doc_size words, and if the number of words in document is less than doc_size, you should padding 0. | ||
| his_size: 20 # Max number of user click history, we will automatically keep the last his_size number of user click history, if users' click history is more than his_size, and we will automatically padding 0 if less than his_size. | ||
| word_size: 194755 # word vocabulary size | ||
| entity_size: 57267 # entity vocabulary size | ||
| data_format: dkn | ||
|
|
||
| info: | ||
| metrics: | ||
| - auc | ||
| pairwise_metrics: | ||
| - group_auc | ||
| - mean_mrr | ||
| - ndcg@2;4;6 | ||
| show_step: 10000 # print loss every show_step batches | ||
|
|
||
| model: | ||
| method : classification | ||
| activation: | ||
| - sigmoid | ||
| attention_activation: relu | ||
| attention_dropout: 0.0 | ||
| attention_layer_sizes: 32 | ||
| dim: 32 # word embedding dim | ||
| use_entity: true # use entity embedding | ||
| use_context: true # use context embedding | ||
|
|
||
| entity_dim: 32 # entity embedding dim | ||
| entity_embedding_method: TransE | ||
| transform: true # add a transform layer for entity and context embeddings | ||
|
|
||
| dropout: | ||
| - 0.0 | ||
| filter_sizes: # window size of kcnn filters | ||
| - 1 | ||
| - 2 | ||
| - 3 | ||
| layer_sizes: # layer size for final prediction score layer | ||
| - 300 | ||
| # model_type: DKN_without_context | ||
| model_type: dkn | ||
| num_filters: 50 # number of filter for each filter_size in kcnn part | ||
| infer_model_name : epoch_2 | ||
|
|
||
| train: | ||
| batch_size: 100 | ||
| embed_l1: 0.000 | ||
| embed_l2: 0.000001 | ||
| epochs: 50 | ||
| init_method: uniform | ||
| init_value: 0.01 | ||
| layer_l1: 0.000 | ||
| layer_l2: 0.000001 | ||
| learning_rate: 0.00005 | ||
| loss: log_loss | ||
| optimizer: adam | ||
| save_model: True | ||
| save_epoch : 1 # save model every save_epoch epochs | ||
| enable_BN : False | ||
| is_clip_norm: False | ||
| max_grad_norm: 0.5 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,22 @@ | ||
| #model | ||
| model: | ||
| model_type : "lightgcn" | ||
| embed_size : 64 # the embedding dimension of users and items | ||
| n_layers : 3 # number of layers of the model | ||
|
|
||
| #train | ||
| train: | ||
| batch_size : 1024 | ||
| decay : 0.0001 # l2 regularization for embedding parameters | ||
| epochs : 1000 # number of epochs for training | ||
| learning_rate : 0.001 | ||
| eval_epoch : -1 # if it is not -1, evaluate the model every eval_epoch; -1 means that evaluation will not be performed during training | ||
| top_k : 20 # number of items to recommend when calculating evaluation metrics | ||
|
|
||
| #show info | ||
| #metric : "recall", "ndcg", "precision", "map" | ||
| info: | ||
| save_model : True # whether to save model | ||
| save_epoch : 1 # if save_model is set to True, save the model every save_epoch | ||
| metrics : ["recall", "ndcg", "precision", "map"] # metrics for evaluation | ||
| MODEL_DIR : ./tests/resources/deeprec/lightgcn/model/lightgcn_model/ # directory of saved models |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.