-
Notifications
You must be signed in to change notification settings - Fork 0
Closed
Labels
Description
Description
- Gather genomic region statistics:
- Results from Genomic region statistics #36.
- Results from other regions (e.g. cCREs tend to be 150–350 bp).
- Propose context sizes, train and evaluate performance on downstream tasks.
Hypothesis or Goal
A small context size (e.g. 128, 256, 512) could be enough for good performance on variant effect prediction. Models with small context size could afterwards be finetuned on longer range tasks and reach good performance (perhaps using a hierarchical model where this gLM operates at high resolution but low context, and subsequent layer operate at lower resolution but higher context).
Links
Training code
Analysis code
Wandb: 1
Results
- On VEP (see Experiment: promoters YOLO run #21) it's unclear if there's any difference between 512bp and 256bp (the latter trained with double batch size so equal tokens per batch).

Reactions are currently unavailable