Skip to content

Commit 6ec87cf

Browse files
authored
feat: add amazon chronos benchmark (#257)
1 parent 3992ba5 commit 6ec87cf

File tree

7 files changed

+667
-0
lines changed

7 files changed

+667
-0
lines changed
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
# A Statistical Ensemble of traditional methods is 10% more accurate and 5x faster than Amazon Chronos
2+
3+
We present a comprehensive evaluation showcasing that a Statistical Ensemble, consisting of AutoARIMA, AutoETS, AutoCES, and DynamicOptimizedTheta, outperforms Amazon Chronos—a foundational model for time series forecasting with over 710 million parameters. Specifically, the **Statistical Ensemble demonstrates 10%, 10%, and 11% superior performance in CRPS, MASE, and SMAPE metrics, respectively**, and it is **5x faster**. This analysis spans over 50,000 unique time series across M1, M3, M4, and Tourism datasets, robustly comparing these models.
4+
5+
# Introduction
6+
7+
The rise of foundational models in time series forecasting, such as Amazon Chronos, represents a significant leap forward, leveraging deep learning and massive datasets for model pre-training to enhance predictive accuracy. Amazon Chronos, in particular, is noteworthy for its extensive parameterization and ambitious scope. However, our study shows that a comparatively simpler approach, employing a Statistical Ensemble of traditional forecasting methods, yields better accuracy and computational efficiency.
8+
9+
## Empirical Evaluation
10+
11+
This study considers over 50,000 unique time series from the M1, M3, M4, and Tourism datasets, spanning various time series frequencies. Chronos did not use these datasets in the training phase. We have also included comparisons to the Seasonal Naive model to provide a benchmark for traditional forecasting methods.
12+
13+
## Results
14+
15+
Our findings are shown in the following table, showcasing the performance across different metrics: CRPS, MASE, SMAPE, and computational time (in seconds). The best results are highlighted in **bold** for ease of reference.
16+
17+
<img width="1099" alt="image" src="https://github.com/Nixtla/nixtla/assets/10517170/4d4fe9f3-4251-4b95-bd9b-248fc283e97b">
18+
19+
20+
## Reproducibility
21+
22+
To ensure the reproducibility of our findings, the Statistical Ensemble experiments were conducted on an AWS c5a.24xlarge instance, equipped with 96 vCPUs and 192 GiB of RAM. In contrast, the experiments for Amazon Chronos were carried out on an AWS g5.4xlarge GPU instance, which includes 16 vCPUs, 64 GiB of RAM, and an NVIDIA A10G Tensor Core GPU with 24 GiB. All necessary code and detailed instructions for reproducing the experiments are available in this directory.
23+
24+
### Instructions
25+
26+
1. Set up a Python environment:
27+
28+
```bash
29+
mamba env create -f environment.yml
30+
conda activate amazon-chronos
31+
```
32+
33+
2. Run the experiments as reported in the table:
34+
35+
```bash
36+
python -m src.main --mode fcst_statsforecast
37+
python -m src.main --mode fcst_chronos
38+
```
39+
40+
3. Evaluate the results using:
41+
42+
```bash
43+
python -m src.main --mode evaluation
44+
```
45+
46+
### References
47+
- **Statistical Ensemble Paper**: [A Simple Combination of Univariate Models](https://www.sciencedirect.com/science/article/abs/pii/S0169207019300585?via%3Dihub)
48+
- **Amazon Chronos Paper**: [Chronos: Learning the Language of Time Series](https://arxiv.org/abs/2403.07815)
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
name: amazon-chronos
2+
channels:
3+
- conda-forge
4+
- defaults
5+
- anaconda
6+
dependencies:
7+
- jupyterlab
8+
- pip
9+
- python=3.10
10+
- pip:
11+
- datasetsforecast
12+
- fire
13+
- gluonts
14+
- huggingface_hub[cli]
15+
- neuralforecast
16+
- orjson
17+
- statsforecast
18+
- utilsforecast
19+
- git+https://github.com/amazon-science/chronos-forecasting.git
20+
Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
import logging
2+
from typing import Iterable, List
3+
4+
import numpy as np
5+
import pandas as pd
6+
import torch
7+
from chronos import ChronosPipeline
8+
from utilsforecast.processing import make_future_dataframe
9+
10+
logging.basicConfig(level=logging.INFO)
11+
main_logger = logging.getLogger(__name__)
12+
13+
14+
class TimeSeriesDataset:
15+
def __init__(
16+
self,
17+
data: torch.Tensor,
18+
uids: Iterable,
19+
last_times: Iterable,
20+
batch_size: int,
21+
):
22+
self.data = data
23+
self.uids = uids
24+
self.last_times = last_times
25+
self.batch_size = batch_size
26+
self.n_batches = len(data) // self.batch_size + (
27+
0 if len(data) % self.batch_size == 0 else 1
28+
)
29+
self.current_batch = 0
30+
31+
@classmethod
32+
def from_df(cls, df: pd.DataFrame, batch_size: int):
33+
num_unique_ids = df["unique_id"].nunique()
34+
max_series_length = df["unique_id"].value_counts().max()
35+
padded_tensor = torch.full(
36+
size=(num_unique_ids, max_series_length),
37+
fill_value=torch.nan,
38+
dtype=torch.bfloat16,
39+
) # type: ignore
40+
df_sorted = df.sort_values(by=["unique_id", "ds"])
41+
for idx, (_, group) in enumerate(df_sorted.groupby("unique_id")):
42+
series_length = len(group)
43+
padded_tensor[idx, -series_length:] = torch.tensor(
44+
group["y"].values,
45+
dtype=torch.bfloat16,
46+
)
47+
uids = df_sorted["unique_id"].unique()
48+
last_times = df_sorted.groupby("unique_id")["ds"].tail(1)
49+
return cls(padded_tensor, uids, last_times, batch_size)
50+
51+
def __len__(self):
52+
return len(self.data)
53+
54+
def make_future_dataframe(self, h: int, freq: str) -> pd.DataFrame:
55+
return make_future_dataframe(
56+
uids=self.uids,
57+
last_times=pd.to_datetime(self.last_times),
58+
h=h,
59+
freq=freq,
60+
) # type: ignore
61+
62+
def __iter__(self):
63+
self.current_batch = 0 # Reset for new iteration
64+
return self
65+
66+
def __next__(self):
67+
if self.current_batch < self.n_batches:
68+
start_idx = self.current_batch * self.batch_size
69+
end_idx = start_idx + self.batch_size
70+
self.current_batch += 1
71+
return self.data[start_idx:end_idx]
72+
else:
73+
raise StopIteration
74+
75+
76+
class AmazonChronos:
77+
def __init__(self, model_name: str):
78+
self.model_name = model_name
79+
self.model = ChronosPipeline.from_pretrained(
80+
model_name,
81+
device_map="auto",
82+
torch_dtype=torch.bfloat16,
83+
)
84+
85+
def forecast(
86+
self,
87+
df: pd.DataFrame,
88+
h: int,
89+
freq: str,
90+
batch_size: int = 32,
91+
quantiles: List[float] | None = None,
92+
**predict_kwargs,
93+
) -> pd.DataFrame:
94+
main_logger.info("transforming dataframe to tensor")
95+
dataset = TimeSeriesDataset.from_df(df, batch_size=batch_size)
96+
main_logger.info("forecasting")
97+
fcsts = [self.model.predict(batch, prediction_length=h, **predict_kwargs) for batch in dataset]
98+
fcst = torch.cat(fcsts)
99+
main_logger.info("transforming forecast to dataframe")
100+
fcst = fcst.numpy()
101+
fcst_df = dataset.make_future_dataframe(h=h, freq=freq)
102+
fcst_df[self.model_name] = np.median(fcst, axis=1).reshape(-1, 1)
103+
if quantiles is not None:
104+
for q in quantiles:
105+
q_col = f"{self.model_name}-q-{q}"
106+
fcst_df[q_col] = np.quantile(fcst, q, axis=1).reshape(-1, 1)
107+
return fcst_df
108+
109+
110+
if __name__ == "__main__":
111+
import pandas as pd
112+
113+
df = pd.read_csv(
114+
"https://raw.githubusercontent.com/AileenNielsen/TimeSeriesAnalysisWithPython/master/data/AirPassengers.csv"
115+
)
116+
df = df.rename(columns={"#Passengers": "y", "Month": "ds"})
117+
df["ds"] = pd.to_datetime(df["ds"])
118+
df.insert(0, "unique_id", "AirPassengers")
119+
df = pd.concat([df, df.assign(unique_id="AirPassengers2")])
120+
model = AmazonChronos("amazon/chronos-t5-small")
121+
fcst_df = model.forecast(df, h=12, freq="MS")
122+
print(fcst_df)
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
import os
2+
from time import time
3+
from typing import List, Tuple
4+
5+
import fire
6+
import pandas as pd
7+
8+
9+
from ..utils import ExperimentHandler
10+
from .forecaster import AmazonChronos
11+
12+
13+
def run_amazon_chronos(
14+
train_df: pd.DataFrame,
15+
model_name: str,
16+
horizon: int,
17+
freq: str,
18+
quantiles: List[float],
19+
) -> Tuple[pd.DataFrame, float, str]:
20+
ac = AmazonChronos(model_name)
21+
init_time = time()
22+
fcsts_df = ac.forecast(
23+
df=train_df,
24+
h=horizon,
25+
freq=freq,
26+
batch_size=8,
27+
quantiles=quantiles,
28+
# parameters as in https://github.com/amazon-science/chronos-forecasting/blob/73be25042f5f587823d46106d372ba133152fb00/README.md?plain=1#L62-L65
29+
num_samples=20,
30+
temperature=1.0,
31+
top_k=50,
32+
top_p=1.0,
33+
)
34+
total_time = time() - init_time
35+
return fcsts_df, total_time, model_name
36+
37+
38+
def main(dataset: str, model_name: str):
39+
exp = ExperimentHandler(dataset)
40+
fcst_df, total_time, model_name = run_amazon_chronos(
41+
train_df=exp.train_df,
42+
model_name=model_name,
43+
horizon=exp.horizon,
44+
freq=exp.freq,
45+
quantiles=exp.quantiles,
46+
)
47+
exp.save_results(fcst_df, total_time, model_name)
48+
49+
50+
if __name__ == "__main__":
51+
fire.Fire(main)
Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
import logging
2+
import subprocess
3+
4+
import fire
5+
import pandas as pd
6+
7+
from src.utils import ExperimentHandler
8+
9+
logger = logging.getLogger(__name__)
10+
logger.setLevel(logging.INFO)
11+
12+
datasets = [
13+
"m1_yearly",
14+
"m1_quarterly",
15+
"m1_monthly",
16+
"m3_yearly",
17+
"m3_quarterly",
18+
"m3_monthly",
19+
"m3_other",
20+
"tourism_yearly",
21+
"tourism_quarterly",
22+
"tourism_monthly",
23+
"m4_yearly",
24+
"m4_quarterly",
25+
]
26+
27+
amazon_chronos_models = [
28+
"amazon/chronos-t5-large",
29+
"amazon/chronos-t5-tiny",
30+
"amazon/chronos-t5-mini",
31+
"amazon/chronos-t5-small",
32+
"amazon/chronos-t5-base",
33+
]
34+
35+
36+
def main(mode: str):
37+
prefix_process = ["python", "-m"]
38+
39+
eval_df = None
40+
for dataset in datasets:
41+
logger.info(f"Evaluating {dataset}...")
42+
if mode in ["fcst_statsforecast", "fcst_chronos"]:
43+
suffix_process = ["--dataset", dataset]
44+
45+
def process(middle_process):
46+
return prefix_process + middle_process + suffix_process
47+
48+
if mode == "fcst_statsforecast":
49+
logger.info("Running StatisticalEnsemble")
50+
subprocess.run(process(["src.statsforecast_pipeline"]))
51+
elif mode == "fcst_chronos":
52+
for model in amazon_chronos_models:
53+
logger.info(f"Running Amazon Chronos {model}")
54+
chronos_process = process(["src.amazon_chronos.pipeline"])
55+
chronos_process.extend(["--model_name", model])
56+
subprocess.run(chronos_process)
57+
elif mode == "evaluation":
58+
if eval_df is None:
59+
eval_df = []
60+
logger.info("Running dataset evaluation")
61+
exp = ExperimentHandler(dataset)
62+
try:
63+
eval_dataset_df = exp.evaluate_models(
64+
amazon_chronos_models + ["StatisticalEnsemble", "SeasonalNaive"]
65+
)
66+
print(eval_dataset_df)
67+
eval_df.append(eval_dataset_df)
68+
except Exception as e:
69+
logger.error(e)
70+
if eval_df is not None:
71+
eval_df = pd.concat(eval_df).reset_index(drop=True)
72+
exp.save_dataframe(eval_df, "complete-results.csv")
73+
74+
75+
if __name__ == "__main__":
76+
fire.Fire(main)

0 commit comments

Comments
 (0)