Skip to content

Experiment: repeat downweighting #9

@gonzalobenegas

Description

@gonzalobenegas

Description

Repetitive elements are a major component of eukaryotic genomes and are often interspersed with functional elements:
Image

Several papers have modified the training loss to downweight repeats, e.g. GPN family, PCAD family, Evo 2. It would be important to better understand the downweighting hyperparameter, and if there are alternatives (e.g. doing random shuffling of repeats with certain probability, similar in spirit to the data augmentation in GPN-MSA).

Hypothesis or Goal

Downweighting repeats will generally improve downstream task performance, but the optimal weight might depend on species and/or repeat family.

Links

Wandb report

Results

Promoters, VEP task:

  • Downweighting repeats improves downstream task performance and also seems to stabilize training.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions