Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 22 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,12 @@ This branch contains test data to be used for automated testing with the [nf-cor

## Content of this repository

`design.csv`: Experiment design file
`design.csv`: Experiment design file for minimal test dataset
`design_full.csv`: Experiment design file for full test dataset
`reference/`: Genome reference files (iGenomes R64-1-1 Ensembl release)
`testdata/` : FastQ files sub-sampled to 100,000 paired-end reads

## Dataset origin
## Minimal test dataset origin

*S. cerevisiae* paired-end ATAC-seq dataset was obtained from:

Expand All @@ -23,7 +24,7 @@ Schep AN, Buenrostro JD, Denny SK, Schwartz K, Sherlock G, Greenleaf WJ. Structu
| GSM1621343 | SRR1822157 | Osmotic Stress Time 15 C rep1 |
| GSM1621344 | SRR1822158 | Osmotic Stress Time 15 C rep2 |

## Sampling procedure
### Sampling procedure

The example command below was used to sub-sample the raw paired-end FastQ files to 100,000 reads (see [seqtk](https://github.com/lh3/seqtk)).

Expand All @@ -33,7 +34,7 @@ seqtk sample -s100 SRR1822153_1.fastq.gz 100000 | gzip > ./sample/SRR1822153_1.f
seqtk sample -s100 SRR1822153_2.fastq.gz 100000 | gzip > ./sample/SRR1822153_2.fastq.gz
```

## Expected output
### Expected output

To track and test the reproducibility of the pipeline with default parameters below are some of the expected outputs.

Expand All @@ -54,3 +55,20 @@ To track and test the reproducibility of the pipeline with default parameters be
| OSMOTIC_STRESS_T15 | 1395 |

These are just guidelines and will change with the use of different software, and with any restructuring of the pipeline away from the current defaults.

## Full test dataset origin

*H. sapiens* paired-end ATAC-seq dataset was obtained from:

Buenrostro JD, Giresi PG, Zaba LC, Chang HY, Greenleaf WJ. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat Methods. 2013 Dec;10(12):1213-8. [Pubmed](https://www.ncbi.nlm.nih.gov/pubmed/24097267) [GEO](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE47753)

### Sample information

| GEO_ID | SRA_ID | SAMPLE_NAME |
|-------------|-------------|------------------------|
| GSM1155964 | SRR891275 | CD4+_ATACseq_Day1_Rep1 |
| GSM1155965 | SRR891276 | CD4+_ATACseq_Day1_Rep2 |
| GSM1155966 | SRR891277 | CD4+_ATACseq_Day2_Rep1 |
| GSM1155967 | SRR891278 | CD4+_ATACseq_Day2_Rep2 |
| GSM1155968 | SRR891279 | CD4+_ATACseq_Day3_Rep1 |
| GSM1155969 | SRR891280 | CD4+_ATACseq_Day3_Rep2 |
7 changes: 7 additions & 0 deletions design_full.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
group,replicate,fastq_1,fastq_2
CD4_DAY1,1,ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR891/SRR891275/SRR891275_1.fastq.gz,ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR891/SRR891275/SRR891275_2.fastq.gz
CD4_DAY1,2,ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR891/SRR891276/SRR891276_1.fastq.gz,ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR891/SRR891276/SRR891276_2.fastq.gz
CD4_DAY2,1,ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR891/SRR891277/SRR891277_1.fastq.gz,ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR891/SRR891277/SRR891277_2.fastq.gz
CD4_DAY2,2,ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR891/SRR891278/SRR891278_1.fastq.gz,ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR891/SRR891278/SRR891278_2.fastq.gz
CD4_DAY3,1,ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR891/SRR891279/SRR891279_1.fastq.gz,ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR891/SRR891279/SRR891279_2.fastq.gz
CD4_DAY3,2,ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR891/SRR891280/SRR891280_1.fastq.gz,ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR891/SRR891280/SRR891280_2.fastq.gz