Submitted by fedegarzar t3_zk6h8q in MachineLearning

>TL;DR: We paid USD $800 USD and spend 4 hours in the AWS Forecast console so you don't have to.

In this reproducible experiment, we compare Amazon Forecast and StatsForecast a python open-source library for statistical methods.

Since AWS Forecast specializes in demand forecasting, we selected the M5 competition dataset as a benchmark; the dataset contains 30,490 series of daily Walmart sales.

We found that Amazon Forecast is 60% less accurate and 669 times more expensive than running an open-source alternative in a simple cloud server.

We also provide a step-by-step guide to reproduce the results.

Results

Amazon Forecast:

  • achieved 1.617 in error (measured in wRMSSE, the official evaluation metric used in the competition),
  • took 4.1 hours to run,
  • and cost 803.53 USD.

An ensemble of statistical methods trained on a c5d.24xlarge EC2 instance:

  • achieved 0.669 in error (wRMSSE),
  • took 14.5 minutes to run,
  • and cost only 1.2 USD.

For this data set, we show, therefore, that:

  • Amazon Forecast is 60% less accurate and 669 times more expensive than running an open-source alternative in a simple cloud server.
  • Classical methods outperform Machine Learning methods in terms of speed, accuracy, and cost.

Although using StatsForecast requires some basic knowledge of Python and cloud computing, the results are better for this dataset.

Table

https://preview.redd.it/vt9ru0149i5a1.png?width=1274&format=png&auto=webp&s=64e6d4519f5934d56d25d76d17a58e6d03d70512

368

Comments

You must log in or register to comment.

Zealousideal-Card637 t1_izy2kh1 wrote

Interested comparison. I looked at the full experiments, and Amazon performs slightly better on the bottom level, the actual time series you are forecasting.

44

SherbertTiny2366 t1_izy50ew wrote

For Hierarchical and sparse data it is quite common to see models achieving good accuracy in the bottom levels but being very bad at higher aggregation levels. This is the case because the models are systematically under or over predicting.

29

Mark8472 t1_izyamjv wrote

How long was development time and required human resources (e.g. number of FTE days)?

How well do both scale?

How easily are they maintained / cost on the long run?

9

fedegarzar OP t1_izycx10 wrote

  1. We did not run those experiments. But in our opinion, it's easier to maintain a python pipeline than using the UI or CLI of AWS.

  2. In terms of scalability, I think StatsForecast wins by far, given that it takes a lot less time to compute and supports integration with spark and ray.

  3. The point of the whole experiment is to show that the AutoML solution is far more expensive in the long run.

27

jedi-son t1_izykk0l wrote

Machine learning isn't as useful as basic statistical method in 99% of real world problems?! I'm shocked 😲

−16

marr75 t1_izywb2h wrote

If they were using a custom python pipeline for the statistical models, yeah, I could see this argument. But, like many of the Nixtla tools:

!conda install -c conda-forge statsforecast
import sf
sf.fit(Xzero, yzero)
yone = sf.predict(Xone)

This is a pretty common "marketing" post format from Nixtla. I think they make good tools and good points, so I'm not at all mad about it. They're providing a ready to use tool (StatsForecast) and making a great point about it's performance and cost vs the AWS alternative. Asking for the total cost of developing and maintaining statsforecast means you'd have to also account for the total cost and complexity of developing and maintaining AmazonForecast...

12

CyberPun-K t1_izywnhq wrote

There is long way to go for AutoML solutions. Thanks for confirming I was not the only one.

23

dat_cosmo_cat t1_izyz5hj wrote

Several of our internal teams have arrived at similar conclusions when comparing AWS models to pre-trained open source models. Specifically; zero shot CLIP, and a fine-tuned ResNet (ImageNet) out performed Rekognition on various classification tasks (both on internal data sourced from 9 e-commerce catalogs, as well as on Google Open Image v6). Zero shot DETIC out performs it on image tagging. We even collaborated with a technical team at AWS to ensure these comparisons were as favorable as possible (truncating some classes from our data, combining others, etc...).

38

mangotheblackcat89 t1_izzighp wrote

IMO, this is an important consideration. Sure, the target level is SKU-store, but at what level are the purchase orders being made? The M5 Competition didn't say anything about this, but probably the SKU level is as important as the SKU-store, if not more.

For retail data in general, I think we need to see how well a method perfoms at different levels of the hierarchy. I've seen commercial and finance teams prefer a forecast that is more accurate at the top than another that is slightly more accurate at the bottom.

4

new_name_who_dis_ t1_izzo58u wrote

I totally buy this. However you said

> Classical methods outperform Machine Learning methods in terms of speed, accuracy, and cost

those classical methods are also machine learning methods. Classic AI methods usually refers to non-statistical methods

72

maxafrass t1_j00fw4z wrote

Thank you for the post and the discussion. Gives me much to consider as I prepare to look at AutoML and Azure and GCP based systems next year.

4

TaXxER t1_j00kcrd wrote

When I hear “classical methods” I associate that with traditional statistical methods that often aren’t even considered ML.

Note that frequentist stats also go by the name of classical methods (as opposed to Bayesian methods).

13

cajmorgans t1_j00x7cr wrote

The cloud has always been a scam in one way or another

0

Uptown-Dog t1_j01092x wrote

Yeah Amazon's ML offerings performed very poorly the last time I tried them out. Kendra returned miserable results, and AWS Comprehend had a crappy (very limited) API, multiple serious bugs (like whole-sale truncating input text segments in the response, not handling quotes consistently, etc.) that they took months to fix when we reported them, and never inspired huge amounts of confidence.

In all honesty, I'm not too surprised; my understanding is that AWS has a habit of grabbing open-source projects that kinda/sorta do what they need and build off of that internally, so you're not typically going to be exposed to unparalleled brilliance with their offerings. Mostly it will "kind of" work. But not much more than that.

(I wouldn't say I hate AWS because they do a reasonable job on several points, but they're no silver bullet across the board.)

6

-Rizhiy- t1_j018jx5 wrote

Do you by any chance have a resource that explains that a bit more?

I can't get my head around how a collection of accurate forecasts, can produce an inaccurate aggregate.

Is it related to class imbalances or perhaps something like Simpson's paradox?

2

Delta-tau t1_j01aysc wrote

In statistics jargon, classical methods are all frequentist inference methods which rely on asymptotic theory and p-values. Some of them, like linear regression, logistic regression, or ARMA models are nowadays viewed as ML. I guess the "ML" label is a bit vague and changes over time.

13

Delta-tau t1_j01b1f2 wrote

Great post! Planning to publish this?

3

SherbertTiny2366 t1_j01t4du wrote

Imagine this toy example. You have 5 series, which are very sparse, as is often the case in retail. For example, series 1 has sales on Mondays and 0's the rest of the days, series 2 on Tuesdays, series 3 on Wednesdays, and so on. For those individual series, a value close to 0 would be more or less accurate, however, when you add all the predictions up, the value will be way below the true value.

5

TaXxER t1_j01xpkt wrote

Yeah I’m aware that linear and logistic regression are classical methods and are in the weird spot where they sometimes are and sometimes are not regarded as ML.

My comment was mostly aimed to argue against this claim in the comment that I replied to:

> Classic AI methods usually refers to non-statistical methods

3

nickkon1 t1_j01yzso wrote

While I believe your results, isnt the whole point of AutoML that non-ML people can easily create models (e.g. via Drag & Drop)? While you didnt do much here, you selected models and specified their seasonality, both of which the target audience of AutoML would not do. The alternative of AutoML is not neccessarily "make a model yourself" but often "you will not have a model at all".

3

Living_Discipline244 t1_j020owa wrote

So if you want to be a statistical powerhouse, you'd better hop on board the AutoML train. Just don't forget your space suit and your grim reaper scythe, because with great power comes great responsibility. And if you're not careful, you might just end up dooming humanity to a future ruled by sentient algorithms. But hey, at least you'll have impressive machine learning models, right?

1

new_name_who_dis_ t1_j020yfl wrote

Search + hard-coded (expert provided) rules, for example. Deep Blue that beat Kasparov didn't have any statistics in it iirc.

Deductive reasoning (as opposed to inductive which is what statistical/ML methods are), so like reasoning from first principles that are hard coded into the system.

1

Living_Discipline244 t1_j021ja0 wrote

When it comes to comparing Amazon's AutoML and open source statistical methods, open source methods come out on top. While AutoML may be easy to use and can quickly train models, it lacks the flexibility and control of open source tools. With open source methods, you can fine-tune your models to your specific needs and goals, and you have access to a wide range of algorithms and techniques to choose from. Additionally, the open source community is constantly developing new methods and techniques, so you can always stay on the cutting edge of statistical analysis.

Furthermore, open source methods are often more cost-effective than commercial solutions like AutoML. While AutoML may seem like a quick and easy way to build machine learning models, the costs can quickly add up, especially for large or complex projects. In contrast, open source tools are typically free to use and can be easily integrated into your existing workflow.

So if you want to take control of your statistical analysis and have access to the latest and greatest methods, open source tools are the way to go. Just remember, with great power comes great responsibility, so be sure to use your newfound statistical prowess wisely.

2

new_name_who_dis_ t1_j021jgc wrote

I have the same association as you if I hear classic (ML) methods. But not classic (AI) methods, those I associate with good old fashioned AI, which aren't statistical.

Maybe it's just me, idk. I studied AI in philosophy long before I took an ML class. And I took my first intro to ML class before they were teaching deep learning in intro to ML classes (though i missed this cut-off only by a year or two haha).

1

chief167 t1_j027csj wrote

Honestly, if you want decent automl results, you should only consider datarobot. Everything else is noticeably worse

We are a customer of them and it's a game changer. Yes it's expensive and not aimed at hobbyists, and it's like super expensive. But it's good

If I find the time, I shall upload this dataset into our system and check the results. Remind me later if I forget

−1

chief167 t1_j027p7v wrote

Don't waste your time. Check datarobot (and H2O is the closest competition).

Everybody else plainly sucks at automl, sorry to put it so bluntly but it's true

I am a happy customer of them, and it took a mountain of effort to convince our it teams to move away from Microsoft and databricks etc..., But the results were just in another ballpark, so we had a strong business case

−2

xgboostftw t1_j02butn wrote

would be nice to disclose that the study was sponsored (and conducted?) by StatsForecast...

0

xgboostftw t1_j02ifkc wrote

I think the terminology is more common in the forecasting niche where (especially since the M4, M5 competitions) they started to separate out tree and NN architectures into "ML" and all other methods used for last 50 years are deemed "classical".

2

xgboostftw t1_j03y82b wrote

Seems like a poorly planned attempt at promoting your own tool.
Looking briefly at the notebook, it seems like a lot of the M5 features were excluded and only item_id was kept: https://nixtla.github.io/statsforecast/examples/aws/statsforecast.html#read-data
M5 has additional features like department, category, store, state and of course the events table. These features are very helpful and would obviously be present in a real life scenario of a retail forecast (among with many others).
The code with parameters to train AWS Forecasts models seems to also be missing from the "reproducible experiment" notebook 😂.
Not sure the study is worth taking seriously. Seems like a quick attempt at marketing rather than a study with any meaningful level of rigor. "My Corolla is faster and cheaper than a Porsche 911 when I use vegetable oil to fuel them and don't show you the Porsche".
Where does your result land on the Kaggle leaderboard?

−1

fedegarzar OP t1_j048qe0 wrote

Here is the step-by-step guide to reproducing Amazon Forecast: https://nixtla.github.io/statsforecast/examples/aws/amazonforecast.html

As you can see, all the exogenous variables of M5 are included in Amazon Forecast.

Concretely, if you read the same link you posted, we even provide links to the Static and temporal exogenous variables you mention.

From the ReadMe:

The data are ready for download at the following URLs:

2

fedegarzar OP t1_j04jp9e wrote

2

geneman101 t1_j1ectfo wrote

Agreed with OP through trial and error!

1