Skip to content

Fix target_train_data_fraction overriding pretrain_data_fraction#1070

Merged
pyeres merged 10 commits intomasterfrom
py/fix_data_fraction
Apr 23, 2020
Merged

Fix target_train_data_fraction overriding pretrain_data_fraction#1070
pyeres merged 10 commits intomasterfrom
py/fix_data_fraction

Conversation

@pyeres
Copy link
Contributor

@pyeres pyeres commented Apr 16, 2020

This PR addresses the bug described in issue #1066 ("Results with pretrain_data_fraction args").

This PR...

  • Adds a new private field to the Task object (_instance_generators) to store instance generators with the appropriate phase-specific data fraction settings.
  • Adds new methods, getter get_instance_generator and setter set_instance_generator, to Task to access the _instance_generators.
  • Updates preprocessing, training, and evaluation code to use the new getter and setter.
  • Updates a test that uses evaluation logic to use the new getter and setter.
  • Adds tests for Task's that an exception is raised when training instance generator is requested without specifying the phase (because phase determines data fraction).

Validation: the config provided in issue #1066 was run with these code changes. Results logged for the task (MNLI) using a pretraining data fraction were more in line with expectation: after 25k steps training accuracy was ~100%, while val and test accuracy were ~75%.

@pyeres pyeres linked an issue Apr 16, 2020 that may be closed by this pull request
@pyeres pyeres changed the title [WIP] Fix target_train_data_fraction overriding pretrain_data_fraction Fix target_train_data_fraction overriding pretrain_data_fraction Apr 16, 2020
@pyeres pyeres marked this pull request as ready for review April 16, 2020 17:45
@HaokunLiu HaokunLiu mentioned this pull request Apr 16, 2020
Copy link
Contributor

@sleepinyourhat sleepinyourhat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see some naming issues that I'd recommend fixing, but the logic all looks reasonable. Merge when ready.

@pyeres pyeres merged commit 14fae87 into master Apr 23, 2020
@pyeres pyeres deleted the py/fix_data_fraction branch April 23, 2020 17:46
@jeswan jeswan added the jiant-v1-legacy Relevant to versions <= v1.3.2 label Sep 17, 2020
leo-liuzy pushed a commit to leo-liuzy/dynamic_jiant that referenced this pull request Nov 11, 2020
…-mll#1070)

* add private instance generator field to Task w/ getter and setter

* update build_tasks to use new instance generator Task field setter

* update trainer to get phase-appropriate instance generators

* update evaluate to use new Task instance generator getter

* update pred writing task to use new instance generator getter/setter

* add tests for new Task instance generator functionality
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jiant-v1-legacy Relevant to versions <= v1.3.2

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Results with pretrain_data_fraction args

3 participants