COVID19 Latin America Prediction and Analysis

Authors

Project by

Daniela A. Gomez-Cravioto and

Ramon E. Diaz-Ramos

Introduction

In this study, we present a linear, polynomial and generalized logistic regression curve fitting model to evaluate the growth of the COVID-19 incidents in Mexico. Additionally, we use machine learning and time-series techniques to identify features importance and perform forecasting for daily cases and fatalities.The data analysis and modeling conducted in this research is based on the publicly available data sets from the Resource Center at John Hopkins University of Medicine. Additional features were added to this dataset as an effort to understand its effects in the number of daily cases: this includes mobility rates obtained from Google's Mobility Reports and climate variables obtained from weather online.

Problem Statement

As referenced by the World Health Organization, the first case of COVID-19 (also known as 2019 Novel Coronavirus) was confirmed in Wuhan, China on December 31, 2019. Even though the disease is now successfully contained in China, it has spread all over the world. On May 21th there had been over 5,102,424 confirmed cases which resulted in more than 332,924 fatalities around the world. The pandemic is severe and it continues to affect billions of people.

Motivation

The motivation of this study is to contribute to the knowledge necessary to fight the disease and characterize its course in Mexico, with the attempt to display more preparedness and promote more logical actions by the policy makers and the population in general.

Objectives

To identify which function adjusts the best to the infected population growth (COVID-19 cases).
To determine the feature importance of climate and mobility.
To compare the results of a traditional time series statistical model with a modern approach in machine learning.

In this project we will be using the number of confirmed cases and the number of deaths as target variables.

Evaluation Metrics

To measure the performance of each of the models, the following metrics were computed:

The Root Mean Squared Error (RMSE) is obtained in order to measure how close the fitted values are to the real values and
The Akaike information criterion (AIC) is used to obtain the estimated likelihood to predict a model, and to test how well the model fits the data without overfitting it.

Root Mean Squared Error (RMSE): \begin{equation} \sqrt{\frac{1}{n}\sum_{i=1}^n (Prediction_i - Truth_i)^2} \end{equation}

$$ AIC= N_log(MSE) + 2_(n), $$

where,

N = number of observations,
MSE = Mean Squared Error, and
n = Number of independently adjusted parameters within the model.

Cross-Validation

When dealing with time series data, traditional cross-validation (like k-fold) should not be used for two reasons:

Temporal Dependencies
Arbitrary Choice of Test Set

https://towardsdatascience.com/time-series-nested-cross-validation-76adba623eb9

In this study, we utilized the hold-out cross-validation by splitting the dataset into a train and validate set. We used 80% for testing and the last 20% of the time-series dates for validating.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
Cases		Cases
Mobility_Info		Mobility_Info
Novel_Coronavirus_2019_dataset		Novel_Coronavirus_2019_dataset
Spearman_Correlation_Results		Spearman_Correlation_Results
Updated_Info		Updated_Info
Weather_Info		Weather_Info
CountryFig.svg		CountryFig.svg
CountryFig2.svg		CountryFig2.svg
LRGraph_confirmed.svg		LRGraph_confirmed.svg
LRGraph_fatalities.svg		LRGraph_fatalities.svg
LogTransform.svg		LogTransform.svg
PRGraph.svg		PRGraph.svg
PRGraph_deaths.svg		PRGraph_deaths.svg
README.md		README.md
covid19_prediction.ipynb		covid19_prediction.ipynb
logisMexico.svg		logisMexico.svg
logisMexico_deaths.svg		logisMexico_deaths.svg
model_diagram.png		model_diagram.png
model_diagram.svg		model_diagram.svg
qqplot.svg		qqplot.svg
worldwide_cases.svg		worldwide_cases.svg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

COVID19 Latin America Prediction and Analysis

Authors

Introduction

Problem Statement

Motivation

Objectives

Evaluation Metrics

Cross-Validation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

COVID19 Latin America Prediction and Analysis

Authors

Introduction

Problem Statement

Motivation

Objectives

Evaluation Metrics

Cross-Validation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages