Small change in dataset yields large change in results

Hello,

I am running SCENIC+ on a scATAC/snRNA paired dataset. I have run the pipeline two different times using the same parameters on two almost identical datasets: the first dataset consists of 8100 cells, and the second dataset consists of the first 8100 cells plus an additional 100 cells. There are 14 cell types contained in the dataset, and the extra 100 cells are split fairly evenly across these cell types. 

However, the two runs yield quite different results, particularly with respect to the regulons identified for each cell type. If I assign regulons to the cell types based on the extended_gene_based_AUC scores and compare these regulon assignments by cell type across the two runs, there is little overlap. For example, my first run with 8100 cells identified 12 regulons as being active in Cell Type A, but the second run with the 100 extra cells added only identified 5 regulons as being active in Cell Type A, with a Jaccard index of 0.2. I see the same pattern with respect to the other 13 cell types in the dataset.

I am quite surprised that an additional 100 cells (less than 2% of the dataset) would make such a difference in the SCENIC+ results, particularly because they are spread out across 14 different cell types. Do you know why this might be happening?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Small change in dataset yields large change in results #627

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Small change in dataset yields large change in results #627

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions