Skip to content

If task is in pretrain_tasks and target_tasks, target_train_data_fraction always overrides pretrain_data_fraction #918

@avnermay

Description

@avnermay

jiant/jiant/preprocess.py

Lines 380 to 393 in c36b74e

if task.name in pretrain_task_names:
log.info("\tCreating trimmed pretraining-only version of " + task.name + " train.")
task.train_data = _get_instance_generator(
task.name, "train", preproc_dir, fraction=args.pretrain_data_fraction
)
pretrain_tasks.append(task)
# When using target_train_data_fraction, we need modified iterators
# only for training datasets at do_target_task_training time.
if task.name in target_task_names:
log.info("\tCreating trimmed target-only version of " + task.name + " train.")
task.train_data = _get_instance_generator(
task.name, "train", preproc_dir, fraction=args.target_train_data_fraction
)
target_tasks.append(task)

If a task is specified in pretrain_tasks and also in target_tasks, then the target_train_data_fraction always overrides the pretrain_data_fraction. This happens at line 391 of jiant/preprocess.py (see link above). This is true even if do_target_task_training = 0. Note that in the instructions in defaults.conf, it says: "If you want to train and evaluate on a single task without doing any new pretraining, you should set target_tasks and pretraining_tasks to the same task, set do_pretrain to 1, and do_target_task_training to 0."

Thus, if I do what is recommended above, and set pretrain_data_fraction < 1, but don't override the default target_data_fraction, then the data fraction gets overriden to be 1.
As an example, this occurs if I pass in the following as overrides for defaults.conf: "pretrain_tasks = qnli, target_tasks = qnli, pretrain_data_fraction = 0.1, do_pretrain = 1, do_target_task_training = 0, do_full_eval = 1".

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghelp wantedExtra attention is neededhigh-priorityFix this before addressing any other major issue.jiant-v1-legacyRelevant to versions <= v1.3.2

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions