Experiment: context size

## Description

- Gather genomic region statistics:
  - Results from #36.
  - Results from other regions (e.g. [cCREs tend to be 150–350 bp](https://www.nature.com/articles/s41586-025-09909-9)).
- Propose context sizes, train and evaluate performance on downstream tasks.

## Hypothesis or Goal

A small context size (e.g. 128, 256, 512) could be enough for good performance on variant effect prediction. Models with small context size could afterwards be finetuned on longer range tasks and reach good performance (perhaps using a hierarchical model where this gLM operates at high resolution but low context, and subsequent layer operate at lower resolution but higher context). 

## Links

[Training code](https://github.com/marin-community/marin/blob/eac18a8a29f8a30dba7018ad54d68e3fd5721e4c/experiments/dna/exp37_context_size.py)
[Analysis code](https://github.com/Open-Athena/bolinas-dna/tree/main/snakemake/analysis/evals_v1)
Wandb: [1](https://wandb.ai/gonzalobenegas/marin/runs/exp37-genomes-v4-genome_set-animals-intervals-v1_256_128-r02-4e2580?nw=nwusergonzalobenegas)

## Results

- On VEP (see #21) it's unclear if there's any difference between 512bp and 256bp (the latter trained with double batch size so equal tokens per batch). 

<img width="1296" height="566" alt="Image" src="https://github.com/user-attachments/assets/24af7e25-f296-4953-84ea-83e4c88591ad" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiment: context size #37

Description

Hypothesis or Goal

Links

Results

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Experiment: context size #37

Description

Description

Hypothesis or Goal

Links

Results

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions