Databricks Serverless Forecasting as a powerful tool for time-series model training 

16 January 2025

 

Time series forecasting is essential in many domains, such as finance, sales, and operations. It involves predicting future values of a variable based on its past values and other relevant features.

However, traditional forecasting methods can be time-consuming and require expertise in selecting and tuning algorithms. In this article, I will show you how to use serverless forecasting with AutoML, simplifying the process by automatically selecting the best algorithm and hyperparameters. 

 

Serverless Forecasting quick look  

Forecasting with classic computing was available in Auto ML for some time. However, it has some limitations. The biggest was a list of possible algorithms used in the AutoML process: Prophet and Auto-ARIMA.

Serverless forecasting allows the train models to be trained with statistical models and DeepAR, a neural net algorithm. Additionally, new features provided by Serverless Forecasting are: one-click model serving deployment, custom train/validate/test splits, and custom weights for individual time series.

From the computing infrastructure, a user-configured computer is unnecessary for serverless AutoML. Databricks manages compute configuration and automatically optimizes for cost and performance. The process is fully integrated with Unity Catalog, meaning all input data should be stored as tables in UC, and all models and results will be registered and saved there.   

The only weakness of this new approach to forecasting is that it is not integrated with the feature store. However, integration with Unity Catalog should be enough for most use cases.

 

Use case – WIG20 index 

As a first attempt, I built a forecasting model for the WIG20 Index. The WIG20 is a capitalization-weighted stock market index of the twenty largest companies on the Warsaw Stock Exchange. I use 3 years of data aggregated as the WIG20 index average per day. I decided to manually split data for train, validate, and test subsets:  

 

After uploading data to Unity Catalog, I was ready to build a model with AutoML. 

 

Step-by-step guide for Serverless Forecasting 

  1. Start with selecting Forecasting in the Experiments section.

2. Next, find a table with data in Unity Catalog. Select the column representing time. Provide information about frequency and decide how many intervals the forecast should be generated  

 

3. In the prediction section, select the column from the source table that should be used for model training and provide the location and table name for the forecast results. 

 

4. The model registration section automatically registers the best model in Unity Catalog based on the provided location and model name. This feature simplifies model deployment because, in the end, AutoML can help create a serving endpoint based on the registered model.

 

5. In the advanced option, additional parameters, such as metrics used for evaluation, ML algorithms, and a Split column, can be provided. The chosen country’s holiday region can also be selected to include bank holidays in training. In my scenario, this feature was very useful because the stock market does not work on weekends and holidays. 

 

 

Results 

  1. In the AutoML experiment, one can see a list of runs done for different algorithms and sets of hyperparameters. 

       

2. As in all mlflow experiments, we can easily compare runs to see differences and trends among them. Here is an example of validation MAE (mean absolute error) in relation to the maximum number of epochs and batch size in runs completed for neural network DeepAR. 

3. For the best model, AutoMl automatically generates a shortcut for creating a serving endpoint, a batch inference notebook, and a prediction saved in a Unity Catalog table. 

  • If only model registry had been configured at the beginning of the AutoML experiment, the model is ready to deploy for online serving 
  • Notebook ready to use in pipelines and scheduled process 
  • Results of prediction for the next 7 days saved in Unity Catalog

 

Summary  

I imagine that if I had to build the forecasting model from scratch, I would spend many hours researching and building pipelines. Without intuition and experience in the business area, it would be hard to decide which algorithms and hyperparameters I should use. Additionally, I am aware that predicting stock indexes is not an easy task. Otherwise, day training would be the most profitable profession in the world.

However, with serverless forecasting, AutoML generated multiple models, evaluated them, and selected the best candidate in less than 2 hours. It was immediately ready for deployment and inference. Sounds like an effective approach for model delivery process.