Vieira et al. - Machine Learning-Based Survival Prediction in Colorectal Cancer Combining Clinical and Biological Features
This project consists of the data and script used to support the research finings. The project is divided in two folders: feature extraction and model construction. The analysis is performed using both Python (Jupyter notebooks) and R scripts.
The project is organized into two main directories:
01-feature-extraction/- Contains scripts for data preprocessing and feature extraction02-model-construction/- Contains scripts for feature selection and model building
If you want to run the complete analysis from scratch:
-
First, run all scripts in
01-feature-extraction/in numerical order:01-ceRNAs-rectosigmoid.r02-ceRNAs-rectum.r03-ceRNAs-colon.r04-hazard.r05-average-exression-charts.ipynb06-data-with-features-extraction.ipynb
-
Then, proceed to
02-model-construction/:01-Feat_selection_and_model_construction.ipynb
If you only want to see the final results, you can directly run:
02-model-construction/01-Feat_selection_and_model_construction.ipynb
-
Python 3.x
-
R
-
Jupyter Notebook
-
Required Python packages (to be installed via pip):
- pandas
- numpy
- scikit-learn
- matplotlib
- seaborn
- jupyter
-
Required R packages (to be installed via install.packages()):
- dplyr
- ggplot2
- survival
- other packages as specified in the R scripts
- Input data should be placed in the respective
data/directories within each main folder - The feature extraction phase generates intermediate data that is used by the model construction phase
- Make sure to run the scripts in the correct numerical order within each directory
- The feature extraction phase (01) must be completed before running the model construction phase (02) if you're doing a complete analysis
- Some scripts may take significant time to run depending on the size of the input data
- Remeber to verify package versions on
01-feature-extraction/rsession.txt - If neede update folder to read and save input/output.