diff --git a/README.md b/README.md index 299778186..e90ecd3de 100644 --- a/README.md +++ b/README.md @@ -4,11 +4,12 @@ This branch contains test data to be used for automated testing with the [nf-cor ## Content of this repository -`design.csv`: Experiment design file +`design.csv`: Experiment design file for minimal test dataset +`design_full.csv`: Experiment design file for full test dataset `reference/`: Genome reference files (iGenomes R64-1-1 Ensembl release) `testdata/` : FastQ files sub-sampled to 100,000 paired-end reads -## Dataset origin +## Minimal test dataset origin *S. cerevisiae* paired-end ATAC-seq dataset was obtained from: @@ -23,7 +24,7 @@ Schep AN, Buenrostro JD, Denny SK, Schwartz K, Sherlock G, Greenleaf WJ. Structu | GSM1621343 | SRR1822157 | Osmotic Stress Time 15 C rep1 | | GSM1621344 | SRR1822158 | Osmotic Stress Time 15 C rep2 | -## Sampling procedure +### Sampling procedure The example command below was used to sub-sample the raw paired-end FastQ files to 100,000 reads (see [seqtk](https://github.com/lh3/seqtk)). @@ -33,7 +34,7 @@ seqtk sample -s100 SRR1822153_1.fastq.gz 100000 | gzip > ./sample/SRR1822153_1.f seqtk sample -s100 SRR1822153_2.fastq.gz 100000 | gzip > ./sample/SRR1822153_2.fastq.gz ``` -## Expected output +### Expected output To track and test the reproducibility of the pipeline with default parameters below are some of the expected outputs. @@ -54,3 +55,20 @@ To track and test the reproducibility of the pipeline with default parameters be | OSMOTIC_STRESS_T15 | 1395 | These are just guidelines and will change with the use of different software, and with any restructuring of the pipeline away from the current defaults. + +## Full test dataset origin + +*H. sapiens* paired-end ATAC-seq dataset was obtained from: + +Buenrostro JD, Giresi PG, Zaba LC, Chang HY, Greenleaf WJ. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat Methods. 2013 Dec;10(12):1213-8. [Pubmed](https://www.ncbi.nlm.nih.gov/pubmed/24097267) [GEO](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE47753) + +### Sample information + +| GEO_ID | SRA_ID | SAMPLE_NAME | +|-------------|-------------|------------------------| +| GSM1155964 | SRR891275 | CD4+_ATACseq_Day1_Rep1 | +| GSM1155965 | SRR891276 | CD4+_ATACseq_Day1_Rep2 | +| GSM1155966 | SRR891277 | CD4+_ATACseq_Day2_Rep1 | +| GSM1155967 | SRR891278 | CD4+_ATACseq_Day2_Rep2 | +| GSM1155968 | SRR891279 | CD4+_ATACseq_Day3_Rep1 | +| GSM1155969 | SRR891280 | CD4+_ATACseq_Day3_Rep2 | diff --git a/design_full.csv b/design_full.csv new file mode 100644 index 000000000..36de0d29b --- /dev/null +++ b/design_full.csv @@ -0,0 +1,7 @@ +group,replicate,fastq_1,fastq_2 +CD4_DAY1,1,ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR891/SRR891275/SRR891275_1.fastq.gz,ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR891/SRR891275/SRR891275_2.fastq.gz +CD4_DAY1,2,ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR891/SRR891276/SRR891276_1.fastq.gz,ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR891/SRR891276/SRR891276_2.fastq.gz +CD4_DAY2,1,ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR891/SRR891277/SRR891277_1.fastq.gz,ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR891/SRR891277/SRR891277_2.fastq.gz +CD4_DAY2,2,ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR891/SRR891278/SRR891278_1.fastq.gz,ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR891/SRR891278/SRR891278_2.fastq.gz +CD4_DAY3,1,ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR891/SRR891279/SRR891279_1.fastq.gz,ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR891/SRR891279/SRR891279_2.fastq.gz +CD4_DAY3,2,ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR891/SRR891280/SRR891280_1.fastq.gz,ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR891/SRR891280/SRR891280_2.fastq.gz