This project aims to develop predictive models for detecting complications following a heart attack using mobile data. By leveraging accelerometer and transdermal alcohol content (TAC) data, we can identify heavy drinking episodes that may exacerbate health risks, particularly for college students during social events.
Researchers from Harvard University and the University of Southern California collaborated to address heavy drinking among college students. This project leverages mobile data to create predictive models for intoxication levels, enhancing student safety and promoting responsible drinking behavior in real-time settings.
The goal is to develop accurate predictive models that can identify and preempt heavy drinking episodes using mobile data. Traditional methods like self-reporting and breathalyzer tests are ineffective in dynamic social settings. This project utilizes accelerometer data from smartphones and TAC readings from SCRAM bracelets to provide a real-time solution for detecting heavy drinking.
- Database: "Bar Crawl: Detecting Heavy Drinking"
- Source: Harvard University and the University of Southern California (May 2017)
- Participants: 13 participants with accelerometer data from smartphones and TAC data from SCRAM ankle bracelets.
- Attributes: Three-axis accelerometer data, phone types (iPhone, Android), and TAC readings.
- Missing Values: None
clean_tac: Cleaned TAC dataraw_tac: Raw TAC dataall_accelerometer_data_pids_13.csv: Accelerometer dataphone_types.csv: Phone types (iPhone, Android)pids.txt: Participant IDs
- Data Cleaning: Handling missing values, normalization, and redundant columns removal.
- Visualization: Summary statistics, histograms, scatter plots, and correlation matrices.
-
Correlation Analysis: Examined relationships between accelerometer readings and TAC levels.
-
Time-Series Analysis: Visualized TAC readings over time and their correlation with accelerometer data.

- Significant correlations between accelerometer data patterns and elevated TAC levels.
- High TAC readings often followed periods of high accelerometer activity.
Three supervised machine learning models were selected:
- Linear Regression: Baseline model.
- Random Forest Regressor: Handles non-linear relationships and feature interactions.
- Support Vector Regressor (SVR): Effective in high-dimensional spaces.
- Split dataset into training and testing sets (80-20 split).
- Cross-validation for robustness.
- Hyperparameter optimization using grid search.
- Min-max normalization on the training split.
- Metrics: Mean Squared Error (MSE), Mean Absolute Error (MAE), and R-squared (R²) scores.
- Comparison:
Classification Thresholds Based on Legal limits: legal limit = 0.08 if tac_level < 0.002 then person is Less Than Legal Limit if 0.002 <= tac_level <= 0.08 then person is About Legal Limit if tac_level > 0.092 the person is Acrossed Legal Limit
Time(17-05-03 ) Actual_TAC LR Regression RF Regression SVR Regression LR Classified RF Classified SVR Classified
10:36:54 0.000 0.004841 0.018967 0.002438 About Legal Limit About Legal Limit About Legal Limit
11:20:57 0.000 0.017887 0.015987 0.001054 About Legal Limit About Legal Limit Less Than Legal Limit
11:26:26 0.000 0.021852 0.006375 0.005715 About Legal Limit About Legal Limit About Legal Limit
11:31:56 0.000 0.022301 0.014127 0.001997 About Legal Limit About Legal Limit Less Than Legal Limit
11:37:25 0.000 0.022071 0.015367 0.003187 About Legal Limit About Legal Limit About Legal Limit
11:48:23 0.008 0.032634 0.052058 0.016806 About Legal Limit About Legal Limit About Legal Limit
12:35:04 0.000 0.008847 0.048058 -0.008847 About Legal Limit About Legal Limit Less Than Legal Limit
13:05:36 0.016 0.053409 0.036056 0.025607 About Legal Limit About Legal Limit About Legal Limit
15:11:57 0.000 0.026994 0.050443 0.015282 About Legal Limit About Legal Limit About Legal Limit
19:58:50 0.011 0.001037 0.055479 0.002945 Less Than Legal Limit About Legal Limit About Legal Limit
20:29:25 0.063 0.036046 0.036422 0.018911 About Legal Limit About Legal Limit About Legal Limit
22:01:47 0.149 0.073458 0.085755 0.080125 About Legal Limit About Legal Limit About Legal Limit
22:32:32 0.156 0.088400 0.112560 0.114255 About Legal Limit Heavy Alcohol Heavy Alcohol
00:04:46 0.152 0.101784 0.130890 0.114621 Heavy Alcohol Heavy Alcohol Heavy Alcohol
04:41:28 0.015 0.069027 0.036213 0.070655 About Legal Limit About Legal Limit About Legal Limit
05:42:33 0.000 -0.021743 0.023145 -0.010728 Less Than Legal Limit About Legal Limit Less Than Legal Limit
06:43:37 0.000 0.030835 0.022438 -0.004068 About Legal Limit About Legal Limit Less Than Legal Limit
10:53:04 0.000 0.017760 0.046772 0.008141 About Legal Limit About Legal Limit About Legal Limit
The results demonstrate the performance of models in predicting alcohol levels across different time intervals. Generally, support vector regression tends to make predictions closer to the actual values, while linear regression and random forest models may produce predictions further from the truth in some instances. Additionally, the predicted alcohol levels are classified, mostly associated with the alcohol limit, but sometimes categorized as excessive alcohol consumption. These findings evaluate the effectiveness of models used in monitoring alcohol consumption and help identify associated risks. The SVR model demonstrated the best performance and was selected as the final model. The predictions closely matched actual TAC values, indicating high accuracy in identifying heavy drinking episodes.
A mobile application was proposed to use real-time accelerometer data for predicting TAC levels. The app alerts users and designated contacts when heavy drinking episodes are detected, promoting timely intervention.
Data-driven insights from the predictive model informed the design of alert thresholds and the notification system.
A monitoring plan tracks the application's effectiveness using key performance indicators (KPIs) like the number of alerts generated, user feedback, and intervention outcomes.
Future research will focus on refining the model with additional data and incorporating other sensors (e.g., gyroscopes) to improve real-time prediction accuracy.
The Python code for data analysis and model training is available in the repo or Colab https://colab.research.google.com/drive/1Hkuwf_Oldo3IeQl2saySK-PEA-K-J6bj?usp=sharing#scrollTo=p6I2LqosiIez








