-
Notifications
You must be signed in to change notification settings - Fork 297
Description
Lines 380 to 393 in c36b74e
| if task.name in pretrain_task_names: | |
| log.info("\tCreating trimmed pretraining-only version of " + task.name + " train.") | |
| task.train_data = _get_instance_generator( | |
| task.name, "train", preproc_dir, fraction=args.pretrain_data_fraction | |
| ) | |
| pretrain_tasks.append(task) | |
| # When using target_train_data_fraction, we need modified iterators | |
| # only for training datasets at do_target_task_training time. | |
| if task.name in target_task_names: | |
| log.info("\tCreating trimmed target-only version of " + task.name + " train.") | |
| task.train_data = _get_instance_generator( | |
| task.name, "train", preproc_dir, fraction=args.target_train_data_fraction | |
| ) | |
| target_tasks.append(task) |
If a task is specified in pretrain_tasks and also in target_tasks, then the target_train_data_fraction always overrides the pretrain_data_fraction. This happens at line 391 of jiant/preprocess.py (see link above). This is true even if do_target_task_training = 0. Note that in the instructions in defaults.conf, it says: "If you want to train and evaluate on a single task without doing any new pretraining, you should set target_tasks and pretraining_tasks to the same task, set do_pretrain to 1, and do_target_task_training to 0."
Thus, if I do what is recommended above, and set pretrain_data_fraction < 1, but don't override the default target_data_fraction, then the data fraction gets overriden to be 1.
As an example, this occurs if I pass in the following as overrides for defaults.conf: "pretrain_tasks = qnli, target_tasks = qnli, pretrain_data_fraction = 0.1, do_pretrain = 1, do_target_task_training = 0, do_full_eval = 1".