Skip to content

lmacielvieira/crc-bio-cli-ml

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Vieira et al. - Machine Learning-Based Survival Prediction in Colorectal Cancer Combining Clinical and Biological Features

This project consists of the data and script used to support the research finings. The project is divided in two folders: feature extraction and model construction. The analysis is performed using both Python (Jupyter notebooks) and R scripts.

Project Structure

The project is organized into two main directories:

  1. 01-feature-extraction/ - Contains scripts for data preprocessing and feature extraction
  2. 02-model-construction/ - Contains scripts for feature selection and model building

Execution Order

Option 1: Complete Analysis

If you want to run the complete analysis from scratch:

  1. First, run all scripts in 01-feature-extraction/ in numerical order:

    • 01-ceRNAs-rectosigmoid.r
    • 02-ceRNAs-rectum.r
    • 03-ceRNAs-colon.r
    • 04-hazard.r
    • 05-average-exression-charts.ipynb
    • 06-data-with-features-extraction.ipynb
  2. Then, proceed to 02-model-construction/:

    • 01-Feat_selection_and_model_construction.ipynb

Option 2: Direct to Results

If you only want to see the final results, you can directly run:

  • 02-model-construction/01-Feat_selection_and_model_construction.ipynb

Requirements

  • Python 3.x

  • R

  • Jupyter Notebook

  • Required Python packages (to be installed via pip):

    • pandas
    • numpy
    • scikit-learn
    • matplotlib
    • seaborn
    • jupyter
  • Required R packages (to be installed via install.packages()):

    • dplyr
    • ggplot2
    • survival
    • other packages as specified in the R scripts

Data

  • Input data should be placed in the respective data/ directories within each main folder
  • The feature extraction phase generates intermediate data that is used by the model construction phase

Notes

  • Make sure to run the scripts in the correct numerical order within each directory
  • The feature extraction phase (01) must be completed before running the model construction phase (02) if you're doing a complete analysis
  • Some scripts may take significant time to run depending on the size of the input data
  • Remeber to verify package versions on 01-feature-extraction/rsession.txt
  • If neede update folder to read and save input/output.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors