This repository contains my notes, practices, and solutions for the Applied Data Science Course offered by Sharif University of Technology (EE-879). The course has also been presented — and is currently being presented — at Teias, Khatam University. The content covers various data science concepts, hands-on exercises, and solutions to course assignments.
For more details about the course, visit the official course website: Applied Data Science - Sharif University of Technology, Spring 2025 or Saloot GitHub
The course covers the following key topics:
- 📚 Introduction to Pandas
- 🧹 Data Cleaning and Preprocessing
- 📊 Data Visualization
- ⚙️ Feature Engineering and Dimensionality Reduction
- 🎯 Different Problem Types and Accuracy Measures
- 📈 Regression Methods
- 🔍 Classification Methods
- 🌂 Multiclass/Multilabel Classification and Boosting
- 🧠 Neural Networks
- 🚀 Deep Learning
- 🖼️ Deep Learning Application: Image Classification
- 🤖 Generative AI
The following datasets are utilized in this repository:
- 🚗 Car Features and MSRP
- 🧠 Stroke Prediction
- 🏠 Ames Iowa Housing Data
- 🏨 Hotel Booking Demand
- 🥑 Avocado Prices
- Complete Kaggle mini tutorial on Pandas: https://www.kaggle.com/learn/pandas
- Choose a dataset for future assignments.
- Perform EDA, cleaning, and preprocessing on the Stroke Prediction dataset: https://www.kaggle.com/datasets/fedesoriano/stroke-prediction-dataset
- Complete Kaggle mini tutorial on Data Cleaning: https://www.kaggle.com/learn/data-cleaning
- Data visualization on the Hotel Booking Demand dataset: https://www.kaggle.com/datasets/jessemostipak/hotel-booking-demand/data
- Kaggle mini tutorial on Data Visualization: https://www.kaggle.com/learn/data-visualization
- Web-scrape 50 Samand cars from bama.ir (price, mileage, color, year, transmission, description)
- Create new features (ratio, binning, functional transforms, combining columns)
- Create date/time features
- Perform aggregation and counts
- Feature selection using Mutual Information
- PCA for dimensionality reduction
- Write explanation: When is feature engineering optional vs necessary?
- Optional: Kaggle mini tutorial on Feature Engineering
- Compute regression metrics: MSE, MAE, MAPE, R2
- Compute binary classification metrics: Precision, Recall, F1
- Compute multi-class metrics: per-class Precision and Recall, macro/micro/weighted F1
- Answer question about multi-label metrics (football example)
- Use models or random predictions for metric evaluation
- Train and evaluate:
- Linear Regression
- Kernel Regression
- Logistic Regression
- Ridge Regression
- LASSO Regression
- Explain the kernel trick
- Train and evaluate:
- Logistic Regression
- SVM (linear + kernel)
- KNN (implement + tune K)
- Decision Trees (implement + tune depth)
- Random Forest
- Explain three regularization techniques for decision trees
- Bonus: Achieve F1 > 0.9 on diabetes dataset
- Implement and evaluate:
- Multiclass SVM
- Multiclass Logistic Regression (OVR, multinomial, log loss)
- Multiclass KNN (implement + tune K)
- Multiclass Decision Trees
- Boosting methods: XGBoost, LightGBM, Adaboost/Catboost
- Perform grid search on one boosting method
- Explain how KNN and Decision Trees extend to multi-label classification
- Bonus: Achieve F1 > 0.6 on 12-class player-position dataset
- Train and evaluate:
- Scikit-Learn MLP (classification + regression)
- Keras 4-layer sequential network (classification + regression)
- PyTorch 4-layer network (classification + regression)
- Keras 4-layer non-sequential network
- Explain why neural networks are powerful and difficult to design
- Bonus: RNN (3-layer) if dataset includes time-series
- Tune Keras network using at least 5 options:
- Optimizer
- Learning rate
- Learning rate decay
- Batch size
- Activation functions
- Weight initialization
- Network depth and width
- L1/L2 weight regularization
- L1/L2 activity regularization
- Dropout rate
- Explain why deeper networks are harder to train
- Use 4-fold cross validation for evaluation
- Build CNN with at least two convolutional layers
- Tune:
- Convolution kernel size
- Convolution stride
- Pooling size
- Pooling stride
- Perform data augmentation using ImageGenerator
- Perform transfer learning using two pretrained models (VGG19, ResNet, EfficientNet)
- Explain impact of receptive field size on performance
- Build and train:
- Dense autoencoder
- Convolutional autoencoder
- Denoising autoencoder
- Train a GAN on CIFAR-10 dataset
- Use OpenAI API to generate:
- An image
- An audio voice reading generated text
- Explain the adversarial learning process
- Bonus: Build a Variational Autoencoder on Fashion-MNIST
- Practice 1: Explore pandas features on Car Features and MSRP dataset.
Maybe I will practice more in the future :).
- Notes → Summarized lecture notes & key concepts
- Practices → Hands-on coding exercises & projects
- Assignments → Solutions to course assignments