Repository for our TACL 2022 paper "MuSiQue: Multi-hop Questions via Single-hop Question Composition"
MuSiQue is distributed under a CC BY 4.0 License.
Usage Caution: If you're using any of our seed single-hop datasets (SQuAD, T-REx, Natural Questions, MLQA, Zero Shot RE) in any way (e.g., pretraining on them), please note that MuSiQue was created by composing questions from these seed datasets. Therefore, single-hop questions used in MuSiQue's dev/test sets may occur in the training sets of these seed datasets. To help avoid information leakage, we are releasing the IDs of single-hop questions that are used in MuSiQue dev/test sets. Once you download the data below, these IDs and corresponding questions will be in data/dev_test_singlehop_questions_v1.0.json. If you use our seed single-hop datasets in any way in your model, please be sure to avoid using any single-hop question IDs present in this file
To download MuSiQue, either run the following script or download it manually from here.
bash download_data.sh
The result will be stored in data/ directory. It contains (i) train, dev and test sets of MuSiQue-Ans and MuSiQue-Full, (ii) single-hop questions and ids from source datasets (squad, natural questions, trex, mlqa, zerore) that are part of dev or test of MuSiQue.
We're releasing the model predictions (in official format) for 4 models on dev sets of MuSiQue-Ans and MuSiQue-Full. To get it, you can run the following script or download it manually from here.
bash download_predictions.sh
You can use evaluate_v1.0.py to evaluate your predictions against ground-truths. For eg.:
python evaluate_v1.0.py predictions/musique_ans_v1.0_dev_end2end_model_predictions.jsonl data/musique_ans_v1.0_dev.jsonl
These are the results you would get for MuSiQue-Answerable and MuSiQue-Full validation sets and for each of the four models (End2End Model, Select+Answer Model, Execution by End2End Model, Execution by Select+Answer Model).
# MuSiQue-Answerable
python evaluate_v1.0.py predictions/musique_ans_v1.0_dev_end2end_model_predictions.jsonl data/musique_ans_v1.0_dev.jsonl
# => {"answer_f1": 0.423, "support_f1": 0.676}
python evaluate_v1.0.py predictions/musique_ans_v1.0_dev_select_answer_model_predictions.jsonl data/musique_ans_v1.0_dev.jsonl
# => {"answer_f1": 0.473, "support_f1": 0.723}
python evaluate_v1.0.py predictions/musique_ans_v1.0_dev_step_execution_by_end2end_model_predictions.jsonl data/musique_ans_v1.0_dev.jsonl
# => {"answer_f1": 0.456, "support_f1": 0.778}
python evaluate_v1.0.py predictions/musique_ans_v1.0_dev_step_execution_by_select_answer_model_predictions.jsonl data/musique_ans_v1.0_dev.jsonl
# => {"answer_f1": 0.497, "support_f1": 0.792}
# MuSiQue-Full
python evaluate_v1.0.py predictions/musique_full_v1.0_dev_end2end_model_predictions.jsonl data/musique_full_v1.0_dev.jsonl
# => {"answer_f1": 0.406, "support_f1": 0.325, "group_answer_sufficiency_f1": 0.22, "group_support_sufficiency_f1": 0.252}
python evaluate_v1.0.py predictions/musique_full_v1.0_dev_select_answer_model_predictions.jsonl data/musique_full_v1.0_dev.jsonl
# => {"answer_f1": 0.486, "support_f1": 0.522, "group_answer_sufficiency_f1": 0.344, "group_support_sufficiency_f1": 0.42}
python evaluate_v1.0.py predictions/musique_full_v1.0_dev_step_execution_by_end2end_model_predictions.jsonl data/musique_full_v1.0_dev.jsonl
# => {"answer_f1": 0.463, "support_f1": 0.75, "group_answer_sufficiency_f1": 0.321, "group_support_sufficiency_f1": 0.447}
python evaluate_v1.0.py predictions/musique_full_v1.0_dev_step_execution_by_select_answer_model_predictions.jsonl data/musique_full_v1.0_dev.jsonl
# => {"answer_f1": 0.498, "support_f1": 0.777, "group_answer_sufficiency_f1": 0.328, "group_support_sufficiency_f1": 0.431}We've two leaderboards for MuSiQue: MuSiQue-Answerable and MuSiQue-Full.
Once you've the test set predictions in the official format, it's just about uploading the files in the above leadboards! Feel free to contact me (Harsh) in case you've any questions.
We've relased the code that we used for experiments in the paper. If you're interested in trying our trained models, training them from sratch, viewing their predictions or generating their predictions from your trained model, follow the steps below.
# Set env.
conda create -n musique python=3.8 -y && conda activate musique
# Set allennlp in root directory
git clone https://github.com/allenai/allennlp
cd allennlp
git checkout v2.1.0
git apply ../allennlp.diff # small diff to get longformer global attention to work correctly.
cd ..
pip install allennlp==2.1.0 # we only need dependencies of allennlp
pip uninstall -y allennlp
pip install gdown==v4.5.1
python -m nltk.downloader stopwords
pip uninstall -y transformers
pip install transformers==4.7.0 # we used this version of transformersOur models were developed using a different (non-official) format of the dataset files. So to run our code, you'll first need to download the dataset files in the raw format.
python download_raw_data.pyNote that officially released data and what we've used here are only different in the format (e.g. uses different names for json fields), and are not qualitatively different. Take a look at raw_data_to_official_format.py if you're interested.
We've done experiments on 4 datasets (MuSiQue-Ans, MuSiQue-Full, HotpotQA-20K, 2WikiMultihopQA-20K) with 4 multihop models (End2End Model, Select+Answer Model, Execution by End2End Model, Execution by Select+Answer Model) where possible. See Table 1. You can explore each combination using the instruction toggle below.
For each combination, you'll see instructions on how (i) download trained model (ii) train a model from scratch (iii) download model prediction/s (iv) generate predictions with a trained or a downloaded model.
Our models are implemented in allennlp. If you're familiar with it, using the code should be pretty straightforward. The only difference is that instead of using allennlp command, we're using run.py as an entrypoint, which mainly loads allennlp_lib to load our allennlp code (readers, models, predictors, etc).
End2End Model [EE]
end2end_model_for_musique_ans_datasetpython download_models.py end2end_model_for_musique_ans_datasetpython run.py train experiment_configs/end2end_model_for_musique_ans_dataset.jsonnet \
--serialization-dir serialization_dir/end2end_model_for_musique_ans_datasetpython download_raw_predictions.py end2end_model_for_musique_ans_datasetpython run.py predict serialization_dir/end2end_model_for_musique_ans_dataset/model.tar.gz \
raw_data/musique_ans_dev.jsonl \
--output-file serialization_dir/end2end_model_for_musique_ans_dataset/predictions/musique_ans_dev.jsonl \
--predictor transformer_rc --batch-size 16 --cuda-device 0 --silent
# If you want to convert predictions to the official format, run:
python raw_predictions_to_official_format.py serialization_dir/end2end_model_for_musique_ans_dataset/predictions/musique_ans_dev.jsonl
Select+Answer Model [SA]
The system has 2 parts given below: (i) Selector Model (ii) Answerer Model
# Selector Model
select_and_answer_model_selector_for_musique_anspython download_models.py select_and_answer_model_selector_for_musique_anspython run.py train experiment_configs/select_and_answer_model_selector_for_musique_ans.jsonnet \
--serialization-dir serialization_dir/select_and_answer_model_selector_for_musique_anspython download_raw_predictions.py select_and_answer_model_selector_for_musique_anspython run.py predict serialization_dir/select_and_answer_model_selector_for_musique_ans/model.tar.gz \
raw_data/musique_ans_train.jsonl \
--output-file serialization_dir/select_and_answer_model_selector_for_musique_ans/predictions/musique_ans_train.jsonl \
--predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent
python run.py predict serialization_dir/select_and_answer_model_selector_for_musique_ans/model.tar.gz \
raw_data/musique_ans_dev.jsonl \
--output-file serialization_dir/select_and_answer_model_selector_for_musique_ans/predictions/musique_ans_dev.jsonl \
--predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent
# Answerer Model
select_and_answer_model_answerer_for_musique_anspython download_models.py select_and_answer_model_answerer_for_musique_anspython run.py train experiment_configs/select_and_answer_model_answerer_for_musique_ans.jsonnet \
--serialization-dir serialization_dir/select_and_answer_model_answerer_for_musique_anspython download_raw_predictions.py select_and_answer_model_answerer_for_musique_anspython run.py predict serialization_dir/select_and_answer_model_answerer_for_musique_ans/model.tar.gz \
serialization_dir/select_and_answer_model_selector_for_musique_ans/predictions/musique_ans_dev.jsonl \
--output-file serialization_dir/select_and_answer_model_answerer_for_musique_ans/predictions/serialization_dir__select_and_answer_model_selector_for_musique_ans__predictions__musique_ans_dev.jsonl \
--predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent
# If you want to convert predictions to the official format, run:
python raw_predictions_to_official_format.py serialization_dir/select_and_answer_model_answerer_for_musique_ans/predictions/serialization_dir__select_and_answer_model_selector_for_musique_ans__predictions__musique_ans_dev.jsonl
Execution by End2End Model [EX(EE)]
The system has 2 parts given below: (i) Decomposer Model (ii) Executor Model.
# Decomposer Model
execution_model_decomposer_for_musique_ans_and_fullpython download_models.py execution_model_decomposer_for_musique_ans_and_fullpython run.py train experiment_configs/execution_model_decomposer_for_musique_ans_and_full.jsonnet \
--serialization-dir serialization_dir/execution_model_decomposer_for_musique_ans_and_fullpython download_raw_predictions.py execution_model_decomposer_for_musique_ans_and_fullpython run.py predict serialization_dir/execution_model_decomposer_for_musique_ans_and_full/model.tar.gz \
raw_data/musique_ans_dev.jsonl \
--output-file serialization_dir/execution_model_decomposer_for_musique_ans_and_full/predictions/musique_ans_dev.jsonl \
--predictor question_translator --batch-size 16 --cuda-device 0 --silent
# Executor Model
execution_by_end2end_model_for_musique_anspython download_models.py execution_by_end2end_model_for_musique_anspython run.py train experiment_configs/execution_by_end2end_model_for_musique_ans.jsonnet \
--serialization-dir serialization_dir/execution_by_end2end_model_for_musique_anspython download_raw_predictions.py execution_by_end2end_model_for_musique_anspython run.py predict serialization_dir/execution_by_end2end_model_for_musique_ans/model.tar.gz \
serialization_dir/execution_model_decomposer_for_musique_ans_and_full/predictions/musique_ans_dev.jsonl \
--output-file serialization_dir/execution_by_end2end_model_for_musique_ans/predictions/serialization_dir__execution_model_decomposer_for_musique_ans_and_full__predictions__musique_ans_dev.jsonl \
--predictor multi_step_end2end_transformer_rc --batch-size 16 --cuda-device 0 --silent \
--predictor-args '{"predict_answerability":false,"skip_distractor_paragraphs":false,"use_predicted_decomposition":true}'
# If you want to convert predictions to the official format, run:
python raw_predictions_to_official_format.py serialization_dir/execution_by_end2end_model_for_musique_ans/predictions/serialization_dir__execution_model_decomposer_for_musique_ans_and_full__predictions__musique_ans_dev.jsonl
Execution by Select+Answer Model [EX(SA)]
The system has 3 parts given below: (i) Decomposer Model (ii) Selector of Executor Model (iii) Answerer of Executor Model.
# Decomposer Model
execution_model_decomposer_for_musique_ans_and_fullpython download_models.py execution_model_decomposer_for_musique_ans_and_fullpython run.py train experiment_configs/execution_model_decomposer_for_musique_ans_and_full.jsonnet \
--serialization-dir serialization_dir/execution_model_decomposer_for_musique_ans_and_fullpython download_raw_predictions.py execution_model_decomposer_for_musique_ans_and_fullpython run.py predict serialization_dir/execution_model_decomposer_for_musique_ans_and_full/model.tar.gz \
raw_data/musique_ans_dev.jsonl \
--output-file serialization_dir/execution_model_decomposer_for_musique_ans_and_full/predictions/musique_ans_dev.jsonl \
--predictor question_translator --batch-size 16 --cuda-device 0 --silent
# Selector of Executor Model
execution_by_select_and_answer_model_selector_for_musique_anspython download_models.py execution_by_select_and_answer_model_selector_for_musique_anspython run.py train experiment_configs/execution_by_select_and_answer_model_selector_for_musique_ans.jsonnet \
--serialization-dir serialization_dir/execution_by_select_and_answer_model_selector_for_musique_anspython download_raw_predictions.py execution_by_select_and_answer_model_selector_for_musique_anspython run.py predict serialization_dir/execution_by_select_and_answer_model_selector_for_musique_ans/model.tar.gz \
raw_data/musique_ans_single_hop_version_train.jsonl \
--output-file serialization_dir/execution_by_select_and_answer_model_selector_for_musique_ans/predictions/musique_ans_single_hop_version_train.jsonl \
--predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent
python run.py predict serialization_dir/execution_by_select_and_answer_model_selector_for_musique_ans/model.tar.gz \
raw_data/musique_ans_single_hop_version_dev.jsonl \
--output-file serialization_dir/execution_by_select_and_answer_model_selector_for_musique_ans/predictions/musique_ans_single_hop_version_dev.jsonl \
--predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent
# Answerer of Executor Model
execution_by_select_and_answer_model_answerer_for_musique_anspython download_models.py execution_by_select_and_answer_model_answerer_for_musique_anspython run.py train experiment_configs/execution_by_select_and_answer_model_answerer_for_musique_ans.jsonnet \
--serialization-dir serialization_dir/execution_by_select_and_answer_model_answerer_for_musique_anspython download_raw_predictions.py execution_by_select_and_answer_model_answerer_for_musique_anspython run.py predict serialization_dir/execution_by_select_and_answer_model_answerer_for_musique_ans/model.tar.gz \
serialization_dir/execution_model_decomposer_for_musique_ans_and_full/predictions/musique_ans_dev.jsonl \
--output-file serialization_dir/execution_by_select_and_answer_model_answerer_for_musique_ans/predictions/serialization_dir__execution_model_decomposer_for_musique_ans_and_full__predictions__musique_ans_dev.jsonl \
--predictor multi_step_select_and_answer_transformer_rc --batch-size 16 --cuda-device 0 --silent \
--predictor-args '{"predict_answerability":false,"skip_distractor_paragraphs":false,"use_predicted_decomposition":true,"selector_model_path":"serialization_dir/execution_by_select_and_answer_model_selector_for_musique_ans/model.tar.gz","num_select":3}'
# If you want to convert predictions to the official format, run:
python raw_predictions_to_official_format.py serialization_dir/execution_by_select_and_answer_model_answerer_for_musique_ans/predictions/serialization_dir__execution_model_decomposer_for_musique_ans_and_full__predictions__musique_ans_dev.jsonl
End2End Model [EE]
end2end_model_for_musique_full_datasetpython download_models.py end2end_model_for_musique_full_datasetpython run.py train experiment_configs/end2end_model_for_musique_full_dataset.jsonnet \
--serialization-dir serialization_dir/end2end_model_for_musique_full_datasetpython download_raw_predictions.py end2end_model_for_musique_full_datasetpython run.py predict serialization_dir/end2end_model_for_musique_full_dataset/model.tar.gz \
raw_data/musique_full_dev.jsonl \
--output-file serialization_dir/end2end_model_for_musique_full_dataset/predictions/musique_full_dev.jsonl \
--predictor transformer_rc --batch-size 16 --cuda-device 0 --silent
# If you want to convert predictions to the official format, run:
python raw_predictions_to_official_format.py serialization_dir/end2end_model_for_musique_full_dataset/predictions/musique_full_dev.jsonl
Select+Answer Model [SA]
The system has 2 parts given below: (i) Selector Model (ii) Answerer Model.
# Selector Model
select_and_answer_model_selector_for_musique_fullpython download_models.py select_and_answer_model_selector_for_musique_fullpython run.py train experiment_configs/select_and_answer_model_selector_for_musique_full.jsonnet \
--serialization-dir serialization_dir/select_and_answer_model_selector_for_musique_fullpython download_raw_predictions.py select_and_answer_model_selector_for_musique_fullpython run.py predict serialization_dir/select_and_answer_model_selector_for_musique_full/model.tar.gz \
raw_data/musique_full_train.jsonl \
--output-file serialization_dir/select_and_answer_model_selector_for_musique_full/predictions/musique_full_train.jsonl \
--predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent
python run.py predict serialization_dir/select_and_answer_model_selector_for_musique_full/model.tar.gz \
raw_data/musique_full_dev.jsonl \
--output-file serialization_dir/select_and_answer_model_selector_for_musique_full/predictions/musique_full_dev.jsonl \
--predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent
# Answerer Model
select_and_answer_model_answerer_for_musique_fullpython download_models.py select_and_answer_model_answerer_for_musique_fullpython run.py train experiment_configs/select_and_answer_model_answerer_for_musique_full.jsonnet \
--serialization-dir serialization_dir/select_and_answer_model_answerer_for_musique_fullpython download_raw_predictions.py select_and_answer_model_answerer_for_musique_fullpython run.py predict serialization_dir/select_and_answer_model_answerer_for_musique_full/model.tar.gz \
serialization_dir/select_and_answer_model_selector_for_musique_full/predictions/musique_full_dev.jsonl \
--output-file serialization_dir/select_and_answer_model_answerer_for_musique_full/predictions/serialization_dir__select_and_answer_model_selector_for_musique_full__predictions__musique_full_dev.jsonl \
--predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent
# If you want to convert predictions to the official format, run:
python raw_predictions_to_official_format.py serialization_dir/select_and_answer_model_answerer_for_musique_full/predictions/serialization_dir__select_and_answer_model_selector_for_musique_full__predictions__musique_full_dev.jsonl
Execution by End2End Model [EX(EE)]
The system has 2 parts given below: (i) Decomposer Model (ii) Executor Model.
# Decomposer Model
execution_model_decomposer_for_musique_ans_and_fullpython download_models.py execution_model_decomposer_for_musique_ans_and_fullpython run.py train experiment_configs/execution_model_decomposer_for_musique_ans_and_full.jsonnet \
--serialization-dir serialization_dir/execution_model_decomposer_for_musique_ans_and_fullpython download_raw_predictions.py execution_model_decomposer_for_musique_ans_and_fullpython run.py predict serialization_dir/execution_model_decomposer_for_musique_ans_and_full/model.tar.gz \
raw_data/musique_full_dev.jsonl \
--output-file serialization_dir/execution_model_decomposer_for_musique_ans_and_full/predictions/musique_full_dev.jsonl \
--predictor question_translator --batch-size 16 --cuda-device 0 --silent
# Executor Model
execution_by_end2end_model_for_musique_fullpython download_models.py execution_by_end2end_model_for_musique_fullpython run.py train experiment_configs/execution_by_end2end_model_for_musique_full.jsonnet \
--serialization-dir serialization_dir/execution_by_end2end_model_for_musique_fullpython download_raw_predictions.py execution_by_end2end_model_for_musique_fullpython run.py predict serialization_dir/execution_by_end2end_model_for_musique_full/model.tar.gz \
serialization_dir/execution_model_decomposer_for_musique_ans_and_full/predictions/musique_full_dev.jsonl \
--output-file serialization_dir/execution_by_end2end_model_for_musique_full/predictions/serialization_dir__execution_model_decomposer_for_musique_ans_and_full__predictions__musique_full_dev.jsonl \
--predictor multi_step_end2end_transformer_rc --batch-size 16 --cuda-device 0 --silent \
--predictor-args '{"predict_answerability":true,"skip_distractor_paragraphs":false,"use_predicted_decomposition":true}'
# If you want to convert predictions to the official format, run:
python raw_predictions_to_official_format.py serialization_dir/execution_by_end2end_model_for_musique_full/predictions/serialization_dir__execution_model_decomposer_for_musique_ans_and_full__predictions__musique_full_dev.jsonl
Execution by Select+Answer Model [EX(SA)]
The system has 3 parts given below: (i) Decomposer Model (ii) Selector of Executor Model (iii) Answerer of Executor Model.
# Decomposer Model
execution_model_decomposer_for_musique_ans_and_fullpython download_models.py execution_model_decomposer_for_musique_ans_and_fullpython run.py train experiment_configs/execution_model_decomposer_for_musique_ans_and_full.jsonnet \
--serialization-dir serialization_dir/execution_model_decomposer_for_musique_ans_and_fullpython download_raw_predictions.py execution_model_decomposer_for_musique_ans_and_fullpython run.py predict serialization_dir/execution_model_decomposer_for_musique_ans_and_full/model.tar.gz \
raw_data/musique_full_dev.jsonl \
--output-file serialization_dir/execution_model_decomposer_for_musique_ans_and_full/predictions/musique_full_dev.jsonl \
--predictor question_translator --batch-size 16 --cuda-device 0 --silent
# Selector of Executor Model
execution_by_select_and_answer_model_selector_for_musique_fullpython download_models.py execution_by_select_and_answer_model_selector_for_musique_fullpython run.py train experiment_configs/execution_by_select_and_answer_model_selector_for_musique_full.jsonnet \
--serialization-dir serialization_dir/execution_by_select_and_answer_model_selector_for_musique_fullpython download_raw_predictions.py execution_by_select_and_answer_model_selector_for_musique_fullpython run.py predict serialization_dir/execution_by_select_and_answer_model_selector_for_musique_full/model.tar.gz \
raw_data/musique_full_single_hop_version_train.jsonl \
--output-file serialization_dir/execution_by_select_and_answer_model_selector_for_musique_full/predictions/musique_full_single_hop_version_train.jsonl \
--predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent
python run.py predict serialization_dir/execution_by_select_and_answer_model_selector_for_musique_full/model.tar.gz \
raw_data/musique_full_single_hop_version_dev.jsonl \
--output-file serialization_dir/execution_by_select_and_answer_model_selector_for_musique_full/predictions/musique_full_single_hop_version_dev.jsonl \
--predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent
# Answerer of Executor Model
execution_by_select_and_answer_model_answerer_for_musique_fullpython download_models.py execution_by_select_and_answer_model_answerer_for_musique_fullpython run.py train experiment_configs/execution_by_select_and_answer_model_answerer_for_musique_full.jsonnet \
--serialization-dir serialization_dir/execution_by_select_and_answer_model_answerer_for_musique_fullpython download_raw_predictions.py execution_by_select_and_answer_model_answerer_for_musique_fullpython run.py predict serialization_dir/execution_by_select_and_answer_model_answerer_for_musique_full/model.tar.gz \
serialization_dir/execution_model_decomposer_for_musique_ans_and_full/predictions/musique_full_dev.jsonl \
--output-file serialization_dir/execution_by_select_and_answer_model_answerer_for_musique_full/predictions/serialization_dir__execution_model_decomposer_for_musique_ans_and_full__predictions__musique_full_dev.jsonl \
--predictor multi_step_select_and_answer_transformer_rc --batch-size 16 --cuda-device 0 --silent \
--predictor-args '{"predict_answerability":true,"skip_distractor_paragraphs":false,"use_predicted_decomposition":true,"selector_model_path":"serialization_dir/execution_by_select_and_answer_model_selector_for_musique_full/model.tar.gz","num_select":3}'
# If you want to convert predictions to the official format, run:
python raw_predictions_to_official_format.py serialization_dir/execution_by_select_and_answer_model_answerer_for_musique_full/predictions/serialization_dir__execution_model_decomposer_for_musique_ans_and_full__predictions__musique_full_dev.jsonl
End2End Model [EE]
end2end_model_for_hotpotqa_20k_datasetpython download_models.py end2end_model_for_hotpotqa_20k_datasetpython run.py train experiment_configs/end2end_model_for_hotpotqa_20k_dataset.jsonnet \
--serialization-dir serialization_dir/end2end_model_for_hotpotqa_20k_datasetpython download_raw_predictions.py end2end_model_for_hotpotqa_20k_datasetpython run.py predict serialization_dir/end2end_model_for_hotpotqa_20k_dataset/model.tar.gz \
raw_data/hotpotqa_dev_20k.jsonl \
--output-file serialization_dir/end2end_model_for_hotpotqa_20k_dataset/predictions/hotpotqa_dev_20k.jsonl \
--predictor transformer_rc --batch-size 16 --cuda-device 0 --silent
# If you want to convert predictions to the official format, run:
python raw_predictions_to_official_format.py serialization_dir/end2end_model_for_hotpotqa_20k_dataset/predictions/hotpotqa_dev_20k.jsonl
Select+Answer Model [SA]
The system has 2 parts given below: (i) Selector Model (ii) Answerer Model.
# Selector Model
select_and_answer_model_selector_for_hotpotqa_20kpython download_models.py select_and_answer_model_selector_for_hotpotqa_20kpython run.py train experiment_configs/select_and_answer_model_selector_for_hotpotqa_20k.jsonnet \
--serialization-dir serialization_dir/select_and_answer_model_selector_for_hotpotqa_20kpython download_raw_predictions.py select_and_answer_model_selector_for_hotpotqa_20kpython run.py predict serialization_dir/select_and_answer_model_selector_for_hotpotqa_20k/model.tar.gz \
raw_data/hotpotqa_train_20k.jsonl \
--output-file serialization_dir/select_and_answer_model_selector_for_hotpotqa_20k/predictions/hotpotqa_train_20k.jsonl \
--predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent
python run.py predict serialization_dir/select_and_answer_model_selector_for_hotpotqa_20k/model.tar.gz \
raw_data/hotpotqa_dev_20k.jsonl \
--output-file serialization_dir/select_and_answer_model_selector_for_hotpotqa_20k/predictions/hotpotqa_dev_20k.jsonl \
--predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent
# Answerer Model
select_and_answer_model_answerer_for_hotpotqa_20kpython download_models.py select_and_answer_model_answerer_for_hotpotqa_20kpython run.py train experiment_configs/select_and_answer_model_answerer_for_hotpotqa_20k.jsonnet \
--serialization-dir serialization_dir/select_and_answer_model_answerer_for_hotpotqa_20kpython download_raw_predictions.py select_and_answer_model_answerer_for_hotpotqa_20kpython run.py predict serialization_dir/select_and_answer_model_answerer_for_hotpotqa_20k/model.tar.gz \
serialization_dir/select_and_answer_model_selector_for_hotpotqa_20k/predictions/hotpotqa_dev_20k.jsonl \
--output-file serialization_dir/select_and_answer_model_answerer_for_hotpotqa_20k/predictions/serialization_dir__select_and_answer_model_selector_for_hotpotqa_20k__predictions__hotpotqa_dev_20k.jsonl \
--predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent
# If you want to convert predictions to the official format, run:
python raw_predictions_to_official_format.py serialization_dir/select_and_answer_model_answerer_for_hotpotqa_20k/predictions/serialization_dir__select_and_answer_model_selector_for_hotpotqa_20k__predictions__hotpotqa_dev_20k.jsonl
End2End Model [EE]
end2end_model_for_2wikimultihopqa_20k_datasetpython download_models.py end2end_model_for_2wikimultihopqa_20k_datasetpython run.py train experiment_configs/end2end_model_for_2wikimultihopqa_20k_dataset.jsonnet \
--serialization-dir serialization_dir/end2end_model_for_2wikimultihopqa_20k_datasetpython download_raw_predictions.py end2end_model_for_2wikimultihopqa_20k_datasetpython run.py predict serialization_dir/end2end_model_for_2wikimultihopqa_20k_dataset/model.tar.gz \
raw_data/2wikimultihopqa_dev_20k.jsonl \
--output-file serialization_dir/end2end_model_for_2wikimultihopqa_20k_dataset/predictions/2wikimultihopqa_dev_20k.jsonl \
--predictor transformer_rc --batch-size 16 --cuda-device 0 --silent
# If you want to convert predictions to the official format, run:
python raw_predictions_to_official_format.py serialization_dir/end2end_model_for_2wikimultihopqa_20k_dataset/predictions/2wikimultihopqa_dev_20k.jsonl
Select+Answer Model [SA]
The system has 2 parts given below: (i) Selector Model (ii) Answerer Model.
# Selector Model
select_and_answer_model_selector_for_2wikimultihopqa_20k_datasetpython download_models.py select_and_answer_model_selector_for_2wikimultihopqa_20k_datasetpython run.py train experiment_configs/select_and_answer_model_selector_for_2wikimultihopqa_20k_dataset.jsonnet \
--serialization-dir serialization_dir/select_and_answer_model_selector_for_2wikimultihopqa_20k_datasetpython download_raw_predictions.py select_and_answer_model_selector_for_2wikimultihopqa_20k_datasetpython run.py predict serialization_dir/select_and_answer_model_selector_for_2wikimultihopqa_20k_dataset/model.tar.gz \
raw_data/2wikimultihopqa_train_20k.jsonl \
--output-file serialization_dir/select_and_answer_model_selector_for_2wikimultihopqa_20k_dataset/predictions/2wikimultihopqa_train_20k.jsonl \
--predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent
python run.py predict serialization_dir/select_and_answer_model_selector_for_2wikimultihopqa_20k_dataset/model.tar.gz \
raw_data/2wikimultihopqa_dev_20k.jsonl \
--output-file serialization_dir/select_and_answer_model_selector_for_2wikimultihopqa_20k_dataset/predictions/2wikimultihopqa_dev_20k.jsonl \
--predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent
# Answerer Model
select_and_answer_model_answerer_for_2wikimultihopqa_20k_datasetpython download_models.py select_and_answer_model_answerer_for_2wikimultihopqa_20k_datasetpython run.py train experiment_configs/select_and_answer_model_answerer_for_2wikimultihopqa_20k_dataset.jsonnet \
--serialization-dir serialization_dir/select_and_answer_model_answerer_for_2wikimultihopqa_20k_datasetpython download_raw_predictions.py select_and_answer_model_answerer_for_2wikimultihopqa_20k_datasetpython run.py predict serialization_dir/select_and_answer_model_answerer_for_2wikimultihopqa_20k_dataset/model.tar.gz \
serialization_dir/select_and_answer_model_selector_for_2wikimultihopqa_20k_dataset/predictions/2wikimultihopqa_dev_20k.jsonl \
--output-file serialization_dir/select_and_answer_model_answerer_for_2wikimultihopqa_20k_dataset/predictions/serialization_dir__select_and_answer_model_selector_for_2wikimultihopqa_20k_dataset__predictions__2wikimultihopqa_dev_20k.jsonl \
--predictor transformer_rc --batch-size 16 --cuda-device 0 --silent
# If you want to convert predictions to the official format, run:
python raw_predictions_to_official_format.py serialization_dir/select_and_answer_model_answerer_for_2wikimultihopqa_20k_dataset/predictions/serialization_dir__select_and_answer_model_selector_for_2wikimultihopqa_20k_dataset__predictions__2wikimultihopqa_dev_20k.jsonl
Execution by End2End Model [EX(EE)]
The system has 2 parts given below: (i) Decomposer Model (ii) Executor Model.
# Decomposer Model
execution_model_decomposer_for_2wikimultihopqapython download_models.py execution_model_decomposer_for_2wikimultihopqapython run.py train experiment_configs/execution_model_decomposer_for_2wikimultihopqa.jsonnet \
--serialization-dir serialization_dir/execution_model_decomposer_for_2wikimultihopqapython download_raw_predictions.py execution_model_decomposer_for_2wikimultihopqapython run.py predict serialization_dir/execution_model_decomposer_for_2wikimultihopqa/model.tar.gz \
raw_data/2wikimultihopqa_dev_20k.jsonl \
--output-file serialization_dir/execution_model_decomposer_for_2wikimultihopqa/predictions/2wikimultihopqa_dev_20k.jsonl \
--predictor question_translator --batch-size 16 --cuda-device 0 --silent
# Executor Model
execution_by_end2end_model_for_2wikimultihopqapython download_models.py execution_by_end2end_model_for_2wikimultihopqapython run.py train experiment_configs/execution_by_end2end_model_for_2wikimultihopqa.jsonnet \
--serialization-dir serialization_dir/execution_by_end2end_model_for_2wikimultihopqapython download_raw_predictions.py execution_by_end2end_model_for_2wikimultihopqapython run.py predict serialization_dir/execution_by_end2end_model_for_2wikimultihopqa/model.tar.gz \
serialization_dir/execution_model_decomposer_for_2wikimultihopqa/predictions/2wikimultihopqa_dev_20k.jsonl \
--output-file serialization_dir/execution_by_end2end_model_for_2wikimultihopqa/predictions/serialization_dir__execution_model_decomposer_for_2wikimultihopqa__predictions__2wikimultihopqa_dev_20k.jsonl \
--predictor multi_step_end2end_transformer_rc --batch-size 16 --cuda-device 0 --silent \
--predictor-args '{"predict_answerability":false,"skip_distractor_paragraphs":false,"use_predicted_decomposition":true}'
# If you want to convert predictions to the official format, run:
python raw_predictions_to_official_format.py serialization_dir/execution_by_end2end_model_for_2wikimultihopqa/predictions/serialization_dir__execution_model_decomposer_for_2wikimultihopqa__predictions__2wikimultihopqa_dev_20k.jsonl
Execution by Select+Answer Model [EX(SA)]
The system has 3 parts given below: (i) Decomposer Model (ii) Selector of Executor Model (iii) Answerer of Executor Model.
# Decomposer Model
execution_model_decomposer_for_2wikimultihopqapython download_models.py execution_model_decomposer_for_2wikimultihopqapython run.py train experiment_configs/execution_model_decomposer_for_2wikimultihopqa.jsonnet \
--serialization-dir serialization_dir/execution_model_decomposer_for_2wikimultihopqapython download_raw_predictions.py execution_model_decomposer_for_2wikimultihopqapython run.py predict serialization_dir/execution_model_decomposer_for_2wikimultihopqa/model.tar.gz \
raw_data/2wikimultihopqa_dev_20k.jsonl \
--output-file serialization_dir/execution_model_decomposer_for_2wikimultihopqa/predictions/2wikimultihopqa_dev_20k.jsonl \
--predictor question_translator --batch-size 16 --cuda-device 0 --silent
# Selector of Executor Model
execution_by_select_and_answer_model_selector_for_2wikimultihopqapython download_models.py execution_by_select_and_answer_model_selector_for_2wikimultihopqapython run.py train experiment_configs/execution_by_select_and_answer_model_selector_for_2wikimultihopqa.jsonnet \
--serialization-dir serialization_dir/execution_by_select_and_answer_model_selector_for_2wikimultihopqapython download_raw_predictions.py execution_by_select_and_answer_model_selector_for_2wikimultihopqapython run.py predict serialization_dir/execution_by_select_and_answer_model_selector_for_2wikimultihopqa/model.tar.gz \
raw_data/2wikimultihopqa_single_hop_version_train_20k.jsonl \
--output-file serialization_dir/execution_by_select_and_answer_model_selector_for_2wikimultihopqa/predictions/2wikimultihopqa_single_hop_version_train_20k.jsonl \
--predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent
python run.py predict serialization_dir/execution_by_select_and_answer_model_selector_for_2wikimultihopqa/model.tar.gz \
raw_data/2wikimultihopqa_single_hop_version_dev.jsonl \
--output-file serialization_dir/execution_by_select_and_answer_model_selector_for_2wikimultihopqa/predictions/2wikimultihopqa_single_hop_version_dev.jsonl \
--predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent
# Answerer of Executor Model
execution_by_select_and_answer_model_answerer_for_2wikimultihopqapython download_models.py execution_by_select_and_answer_model_answerer_for_2wikimultihopqapython run.py train experiment_configs/execution_by_select_and_answer_model_answerer_for_2wikimultihopqa.jsonnet \
--serialization-dir serialization_dir/execution_by_select_and_answer_model_answerer_for_2wikimultihopqapython download_raw_predictions.py execution_by_select_and_answer_model_answerer_for_2wikimultihopqapython run.py predict serialization_dir/execution_by_select_and_answer_model_answerer_for_2wikimultihopqa/model.tar.gz \
serialization_dir/execution_model_decomposer_for_2wikimultihopqa/predictions/2wikimultihopqa_dev_20k.jsonl \
--output-file serialization_dir/execution_by_select_and_answer_model_answerer_for_2wikimultihopqa/predictions/serialization_dir__execution_model_decomposer_for_2wikimultihopqa__predictions__2wikimultihopqa_dev_20k.jsonl \
--predictor multi_step_select_and_answer_transformer_rc --batch-size 16 --cuda-device 0 --silent \
--predictor-args '{"predict_answerability":false,"skip_distractor_paragraphs":false,"use_predicted_decomposition":true,"selector_model_path":"serialization_dir/execution_by_select_and_answer_model_selector_for_2wikimultihopqa/model.tar.gz","num_select":3}'
# If you want to convert predictions to the official format, run:
python raw_predictions_to_official_format.py serialization_dir/execution_by_select_and_answer_model_answerer_for_2wikimultihopqa/predictions/serialization_dir__execution_model_decomposer_for_2wikimultihopqa__predictions__2wikimultihopqa_dev_20k.jsonl
If you use this in your work, please cite use:
@article{trivedi2021musique,
title={{M}u{S}i{Q}ue: Multihop Questions via Single-hop Question Composition},
author={Trivedi, Harsh and Balasubramanian, Niranjan and Khot, Tushar and Sabharwal, Ashish},
journal={Transactions of the Association for Computational Linguistics},
year={2022}
publisher={MIT Press}
}