Skip to content

Commit 045a3ed

Browse files
committed
update
1 parent 3e72c85 commit 045a3ed

File tree

2 files changed

+49
-25
lines changed

2 files changed

+49
-25
lines changed

README.md

Lines changed: 33 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -7,28 +7,39 @@ regulatory network (GRN) construction, driver regulator identification and regul
77

88
![Overview.png](https://github.com/WPZgithub/CEFCON/blob/main/Overview.png)
99

10+
CEFCON first uses the graph attention neural networks under a contrastive learning framework to construct reliable GRNs
11+
for individual developmental cell lineages. Then, CEFCON characterizes the gene regulatory dynamics from a perspective
12+
of network control theory and identifies the driver regulators that steer cell fate decisions.
13+
CEFCON also detects the gene regulatory modules (i.e., RGMs) involving the identified driver regulators and measure
14+
their activities based on the [AUCell](https://github.com/aertslab/AUCell) method.
15+
1016
## Installation
17+
This code was originally run on a Linux x86_64 machine with a GTX3090 NVIDIA GPU.
1118
### Requirements
19+
Please ensure that the following packages are installed in order to run the codes.
1220
- python>=3.8
13-
- numpy, scipy, pandas, scikit-learn, tqdm
1421
- [pytorch>=1.8.0](https://pytorch.org/get-started/locally/)
1522
- [torch-geometric>=2.1.0](https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html)
16-
- [scanpy>=1.9.0](https://scanpy.readthedocs.io/en/stable/installation.html)
23+
- [scanpy>=1.8.2](https://scanpy.readthedocs.io/en/stable/installation.html)
1724
- networkx>=2.8.0
1825
- cvxpy>=1.2.0
1926
- gurobipy>=9.5.0
2027
- [pyscenic>=0.12.0](https://pyscenic.readthedocs.io/en/latest/installation.html)
28+
- numpy, scipy, pandas, scikit-learn, tqdm
2129
- Recommended: An NVIDIA GPU with CUDA support for GPU acceleration
22-
### Optional (for evaluation and other analyses)
30+
### Optional (for performance evaluation and other analyses)
2331
- rpy2>=3.4.1
2432
- R>=3.6
2533
- PRROC (R package)
26-
- matplotlib-venn
27-
- palantir
34+
- matplotlib>=3.5.3
35+
- matplotlib-venn>=0.11.7
36+
- seaborn>=0.12.1
37+
- [palantir==1.0.1](https://github.com/dpeerlab/palantir)
2838
### Install using pip
2939
```
3040
pip install git+https://github.com/WPZgithub/CEFCON.git
3141
```
42+
It may take about 10-20 minutes to install these dependencies.
3243

3344
### Using GUROBI
3445

@@ -46,27 +57,33 @@ We recommend using GPU. If so, you will need to install the GPU version of PyTor
4657
## Input data
4758

4859
- `Prior gene interaction network`: an edgelist formatted network file.\
49-
   We provide prior gene interaction networks for human and mouse respectively, located in `/prior_data`.
60+
 We provide prior gene interaction networks for human and mouse respectively, located in `/prior_data`.
5061
- `scRNA-seq data`: a '.csv' file in which rows represent cells and columns represent genes, or a '.h5ad' formatted file with AnnData objects.
5162
- `Differential expression level`: a 'csv' file contains the log fold change of each gene.
5263

53-
Examples of input data are located in `/example_data`.\
54-
The pre-processed data in the paper can be downloaded from [here](https://zenodo.org/record/7564872).
64+
An example of input data (i.e., the hESC dataset with 1,000 highly variable genes) are located in `/example_data`
65+
All the input data in the paper can be downloaded from [here](https://zenodo.org/record/7564872).
5566

5667
## Usage example
5768
### Command line usage
5869
```
59-
cefcon [-h] --input_expData PATH --input_priorNet PATH [--input_genesDE PATH] [--TFs PATH] [--additional_edges_pct ADDITIONAL_EDGES_PCT] [--cuda CUDA] [--seed SEED] [--hidden_dim HIDDEN_DIM] [--output_dim OUTPUT_DIM] [--heads HEADS] [--attention {COS,AD,SD}] [--miu MIU] [--epochs EPOCHS] [--repeats REPEATS] [--edge_threshold_param EDGE_THRESHOLD_PARAM] [--remove_self_loops] [--topK_drivers TOPK_DRIVERS] --out_dir OUT_DIR
70+
cefcon [-h] --input_expData PATH --input_priorNet PATH [--input_genesDE PATH] [--TFs PATH] \
71+
[--additional_edges_pct ADDITIONAL_EDGES_PCT] [--cuda CUDA] [--seed SEED] \
72+
[--hidden_dim HIDDEN_DIM] [--output_dim OUTPUT_DIM] [--heads HEADS] [--attention {COS,AD,SD}] \
73+
[--miu MIU] [--epochs EPOCHS] [--repeats REPEATS] [--edge_threshold_param EDGE_THRESHOLD_PARAM] \
74+
[--remove_self_loops] [--topK_drivers TOPK_DRIVERS] --out_dir OUT_DIR
6075
```
61-
Please use `cefcon -h` to view parameters information. \
76+
Please use `cefcon.py -h` to view parameters information. \
6277
You can run the `run_CEFCON.sh` bash file for a usage example.
6378

64-
- Output (in the output folder `${OUT_DIR}/`):
65-
- The constructed cell-lineage-specific GRN with default name "cl_GRN.csv";
66-
- The obtained gene embeddings with default name "gene_embs.csv";
67-
- A list of identified driver regulators with default name "driver_regulators.csv";
68-
- A list of obtained RGMs with default name "RGMs.csv";
69-
- The AUCell activity matrix of the obtained RGMs with default name "AUCell_mtx.csv".
79+
- The output results can be found in the folder `${OUT_DIR}/`:
80+
- "cl_GRN.csv": the constructed cell-lineage-specific GRN;
81+
- "gene_embs.csv": the obtained gene embeddings;
82+
- "driver_regulators.csv": a list of identified driver regulators;
83+
- "RGMs.csv": a list of obtained RGMs;
84+
- "AUCell_mtx.csv": the AUCell activity matrix of the obtained RGMs.
85+
86+
It may take about 2-5 minutes to run on the example data.
7087

7188
## Citation
7289
Please cite the following paper, if you find the repository or the paper useful.

cefcon/CEFCON.py

Lines changed: 16 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
from .utils import *
88

99
def main():
10-
parser = argparse.ArgumentParser(prog='cefcon', formatter_class=argparse.ArgumentDefaultsHelpFormatter)
10+
parser = argparse.ArgumentParser(prog='CEFCON', formatter_class=argparse.ArgumentDefaultsHelpFormatter)
1111
parser = add_main_args(parser)
1212
args = parser.parse_args()
1313

@@ -51,11 +51,18 @@ def main():
5151
print('Identifying driver regulators...')
5252
critical_genes, out_critical_genes, in_critical_genes = highly_weighted_genes(gene_influence_scores,
5353
topK=args.topK_drivers)
54-
cellFate_drivers_set, MDS_driver_set, MFVS_driver_set, _ = driver_regulators(G_predicted,
54+
cellFate_drivers_set, MDS_driver_set, MFVS_driver_set, a = driver_regulators(G_predicted,
5555
gene_influence_scores,
5656
topK=args.topK_drivers,
5757
driver_union=True,
5858
plot_Venn=False)
59+
### Temp for Case analysis
60+
import pickle
61+
DriverSet = {'N_genes':data.n_vars, 'MDS':MDS_driver_set, 'MFVS':MFVS_driver_set, 'Critical':a}
62+
DriverSet_file = open(fspath(p/'DriverSet.pkl'), 'wb')
63+
pickle.dump(DriverSet, DriverSet_file)
64+
DriverSet_file.close()
65+
###
5966

6067
# Driver genes ranking save to file
6168
drivers_results = gene_influence_scores.loc[gene_influence_scores.index.isin(list(cellFate_drivers_set)), :].copy()
@@ -85,15 +92,15 @@ def add_main_args(parser: argparse.ArgumentParser):
8592
# Input data
8693
input_parser = parser.add_argument_group(title='Input data options')
8794
input_parser.add_argument('--input_expData', type=str, required=True, metavar='PATH',
88-
help='input expression data file')
95+
help='path to the input gene expression data')
8996
input_parser.add_argument('--input_priorNet', type=str, required=True, metavar='PATH',
90-
help='input prior network file')
97+
help='path to the input prior gene interaction network')
9198
input_parser.add_argument('--input_genesDE', type=str, default=None, metavar='PATH',
92-
help='input differential expression score file')
99+
help='path to the input gene differential expression score')
93100
input_parser.add_argument('--TFs', type=str, default=None, metavar='PATH',
94-
help='input transcriptional factors list')
101+
help='path to the input transcriptional factors list')
95102
input_parser.add_argument('--additional_edges_pct', type=float, default=0.01,
96-
help='percentage of additional interactions with highly co-expressions')
103+
help='proportion of high co-expression interactions to be added')
97104

98105
# GRN
99106
grn_parser = parser.add_argument_group(title='Cell-lineage-specific GRN construction options')
@@ -109,7 +116,7 @@ def add_main_args(parser: argparse.ArgumentParser):
109116
grn_parser.add_argument("--heads", type=int, default=4,
110117
help="number of heads")
111118
grn_parser.add_argument("--attention", type=str, default='COS', choices=['COS', 'AD', 'SD'],
112-
help="type of attention function")
119+
help="type of attention scoring function")
113120
grn_parser.add_argument('--miu', type=float, default=0.5,
114121
help='parameter for considering the importance of attention coefficients of the first GNN layer')
115122
grn_parser.add_argument('--epochs', type=int, default=350,
@@ -125,7 +132,7 @@ def add_main_args(parser: argparse.ArgumentParser):
125132
# Driver regulators
126133
driver_parser = parser.add_argument_group(title='Driver regulator identification options')
127134
driver_parser.add_argument('--topK_drivers', type=int, default=50,
128-
help="number of candidate drivers genes according to the influence score")
135+
help="number of top-ranked candidate driver genes according to their influence scores")
129136

130137
# Output dir
131138
parser.add_argument("--out_dir", type=str, required=True, default='./output',

0 commit comments

Comments
 (0)