-
Notifications
You must be signed in to change notification settings - Fork 0
Closed
Labels
Description
Description
Repetitive elements are a major component of eukaryotic genomes and are often interspersed with functional elements:

Several papers have modified the training loss to downweight repeats, e.g. GPN family, PCAD family, Evo 2. It would be important to better understand the downweighting hyperparameter, and if there are alternatives (e.g. doing random shuffling of repeats with certain probability, similar in spirit to the data augmentation in GPN-MSA).
Hypothesis or Goal
Downweighting repeats will generally improve downstream task performance, but the optimal weight might depend on species and/or repeat family.
Links
Results
Promoters, VEP task:
- Downweighting repeats improves downstream task performance and also seems to stabilize training.
Reactions are currently unavailable