Skip to content

Production-ready, automated machine learning pipeline for building an electrical transformer load forecasting model using Python, YAML, AzureML, MLflow, Prophet, Scikit-learn, Docker, Terraform, and GitHub Actions.

Notifications You must be signed in to change notification settings

MichaelMaio/ml_time_series_forecasting

Repository files navigation

Production-Ready, Reproducible, Secure, Cross-Cloud Machine Learning Pipeline

Engineer: Michael Maio
Last updated: 9/7/2025

image

Overview

This repo contains a working machine learning pipeline that addresses the following hypothetical scenario that a software company might need to deal with.

Problem: Data center power consumption is growing, causing a gradual, year-over-year increase in the hourly peak kilowatt load reported by a sensor on the local transformer.

Question: Assuming the power consumption trend remains unchanged, how long before the transformer becomes overloaded?

Solution: Build a machine learning pipeline that can process recent trend data and forecast when the transformer may eventually become overloaded, informing the necessary schedule for a preventative upgrade.

This machine learning pipeline uses:

  1. Python for the scripting.
  2. Docker containers to encapsulate training, promotion, and prediction jobs.
  3. MLflow for model management.
  4. YAML for job management.
  5. GitHub Actions to trigger a pipeline deployment.
  6. Terraform to create and update AzureML infrastructure from code.
  7. A managed identity to keep everything secure.

#1 through #3 allow for the entire pipeline to be run locally for quick feedback on changes before deploying to the cloud. No Azure required. #6 allows for other cloud providers, such as AWS or GCP, to be swapped in as needed.

AzureML In Action

This is the starting point: a high-level view of the experiment in Azure AI’s Machine Learning Studio.

image

Drilling into the experiment shows a list of its jobs. Each job represents a different deployment of the pipeline that trains the model, promotes the model (if it passed testing), and uses the model to make predictions.

image

Drilling into the latest job reveals a list of sub-jobs and how they are wired together. Below you can see that the sub-job which trains the model outputs a “trained_model” to the job that promotes the model, which then outputs a “promoted_model” to the job that uses the model to output “predictions”.

image

You can drill into each sub-job to view all kinds of details about it. Below you can see that the first sub-job, “Train Transformer Load Model”, did the following:

  1. It output the model; once when MLflow logged the model and once to pass the model along to the promotion job.
  2. It applied some informative tags.
  3. It reported the metric “rmse” (aka Root Mean Squared Error), indicating how well the model performed during testing.

image

You can drill into one of the model links to get more information on the model.

image

And drill into its artifacts.

image

Moving on to the “Promote Transformer Load Model” sub-job, you can see that it output the “promoted_model”, meaning the model passed testing during training and the Root Mean Squared Error of the model was sufficiently low for it to be useful in making predictions.

image

If you view the AzureML model registry for the workspace, you can see that the promotion sub-job registered the model since it passed testing.

image

Moving onto the “Predict Transformer Overload” sub-job, we can see that it created the following: 1. A tag reporting that the transformer is predicted to hit its first overload at 11pm on November 26th, 2027. 2. A metric predicting that the maximum load over the entire 5-year period will be about 98 kilowatts. 3. A metric predicting that the transformer will overload over 4,623 times in the next 5 years given current usage trends.

image

You can also drill into the “predictions” output and see the files that the prediction job uploaded, including: 1. The predicted transformer load in kilowatts for each hour during the next 5 years. 2. The number of times the transformer is predicted to overload during that period. 3. A chart of the predicted loads.

image

About

Production-ready, automated machine learning pipeline for building an electrical transformer load forecasting model using Python, YAML, AzureML, MLflow, Prophet, Scikit-learn, Docker, Terraform, and GitHub Actions.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published