Skip to content

n0rdp0l/Statistical_Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Statistical Learning Assignments

This repository contains my solutions to the assignments given in the statistical learning course, which covers both supervised and unsupervised learning techniques.

Course Description

The course provides a strong theoretical foundation for understanding and evaluating statistical learning techniques. It covers classical and state-of-the-art supervised learning methods such as linear regression, regularized regression (Ridge, Lasso, and other L1-methods), naive Bayes, decision trees, logistic regression, splines, random forests, support vector machines, and deep learning. The course also discusses model selection techniques, including various forms of cross-validation and permutation tests.

Additionally, the course covers unsupervised learning techniques such as clustering methods (k-means and more advanced methods) and dimension reduction methods (PCA, ICA, etc.).

Assignments

  1. Assignment 1: The main essence of the assignment is to compare the performance of K-nearest neighbors analysis and logistic regression with a lasso penalty on predicting the target variable Y using different sets of predictors. The first part involves considering Y and 3 relevant and 3 irrelevant predictors, while the second part involves considering Y and all 203 predictors. The third and fourth parts involve applying K-nearest neighbor analysis and logistic regression with a lasso penalty on both sets of predictors and comparing their results. The second part of the assignment involves unsupervised learning, where the students will use the Coping dataset to perform clustering analysis using different techniques. Grade: 8.4
  2. Assignment 2:This assignment involves analyzing data from a subsample of 1152 adults in the Netherlands suffering from anxiety and/or depressive disorders, with the goal of predicting the severity of depressive symptoms after twelve months. There are 20 potential predictor variables, and the task involves selecting and applying three supervised learning methods to the dataset. The dataset must first be randomly split into a training and test set, and the models should be evaluated and compared based on their predictive accuracy using the test set. The important variables and effects of each model should be described and interpreted, and overall conclusions should be drawn regarding which predictors are related to the outcome. Finally, the severity of depressive symptoms in 12 months time for a specific patient, David Sthymia, must be estimated, and a recommendation made regarding referral to an intensive treatment program. Grade: 8.1
  3. Assignment 3: In assignment 3, the goal is to select a dataset of your own choice and analyze it using the techniques learned in the course (or related techniques). The first step is to explore the data, formulate research questions and/or problems, and choose at least three different techniques to provide insight into these questions/problems. The assignment does not require a report, but a group presentation to fellow students and instructors, where the dataset, research questions, methodology, results, and discussions of the results should be presented. Grade: 8.5

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors