# Model Types

Exabel's KPI Analyzer offers a selection of proprietary and open-source machine-learning models for KPI modelling. When you run custom models in the KPI Analyzer, you may choose from any of these model types.

# Exabel proprietary models

## Ratio Prediction

This is a proprietary prediction model developed by Exabel. This type of model is suitable when the input signals are proportional to the target. The prototypical example would be a card spend signal used to predict the revenue of a consumer company, where you expect that a 10% change in the credit card spend corresponds to a 10% change in the revenue of the company.

If the input time series is proportional to the target time series, you can use a linear regression, which would express that `target = k * input`

. The problem is that the proportionality constant `k`

will typically vary over time, and often cannot be modelled with a simple linear regression. When this config option is enabled, the model will rather treat `k`

as a time-varying ratio, and build a model for `k(t)`

, which is then used to predict the target. By default, the model we use for `k(t)`

is an Unobserved Components model that takes into account the seasonality and level of this ratio (as the ratio will tend to drift over time, and the input time series may have a different seasonality pattern than the target time series).

The steps in this calculation are as follows:

- Calculate the ratio between the input time series and the target time series
- Model the resulting time series (the ratio) with an Unobserved Components model
- Predict what the ratio will be for the next quarter to be reported
- Multiply the predicted ratio with the input value for the next quarter to arrive at a prediction for the target

If there are multiple input in the model configuration, the above process is run for each input separately. That produces multiple predictions for the target value. Then the model calculates a weighted average of all the predictions as its final output. The weights for this ensemble are determined by calculating the covariance between the historical prediction errors, and then minimizing the expected error. The weights are restricted to being non-negative and must add up to 1.

Furthermore, this model allows for including a baseline, which is a univariate forecast of the target variable (using the Theta model). This baseline prediction is included in the ensemble along with the predictions stemming from each input, and weighted accordingly.

## Ratio Prediction ML

Experimental

This model is similar to Ratio Prediction, except that machine-learning models are used to predict the ratio and baseline. (Recall that Ratio Prediction uses statistical models - Unobserved Components and Theta - to make these predictions.)

Two separate machine-learning (LightGBM) models are used. The first model is used to predict the ratio (between the input and target KPI), using only historical values of the ratio, and is trained on the entire universe of curated vendor KPI mappings. The second model is used to predict the subsequent values of the target KPI using only historical reported values of the KPI, and is trained on the entire universe of reported KPI data.

## SARIMAX

SARIMAX (Seasonal AutoRegressive Integrated Moving Average with eXogenous regressors) is a classical time series model. It is a regression model with autoregressive terms, meaning that the predictions at one time step depend on the values at previous time steps, and optionally seasonal components.

Exabel’s implementation is based on StatsModels. Here is a notebook with explanation of the model and an example.

Exabel’s implementation extends the StatsModels model by adding Elastic Net regularization. Therefore there are two additional hyperparameters, the alpha and L1 ratio, that control the amount of regularization.

Furthermore, Exabel’s implementation allows specifying that all the coefficients for the exogenous variables (the model inputs) must be positive, which is useful as additional regularization when the inputs are known to correlate positively with the target variable.

# Open-source models

All the below model types are different variants of linear regression. Beyond the ordinary linear regression model, the other model types add some form of regularization to avoid overfitting the model (which is a common issue with regression models, since there normally are many inputs to the model and few data points to estimate the coefficients).

A key feature of the Linear Regression, Elastic Net and Elastic Net CV models, is that they support monotone constraints. When building a custom model with any of these model types in the KPI Analyzer, the system will ensure that only positive coefficients are used for the KPI mappings used as inputs to the model. The KPI mappings are supposed to represent something that correlates positively with the KPI, and therefore negative coefficients are not allowed. For instance, it would not make sense if higher credit card spend data caused the model to predict lower revenue. This constraint is an effective way to regularize the model further and avoid overfitting to the historical data.

The ARD Regression and Huber Regression models do not support monotone constraints, and thus it is possible to end up with negative coefficients for some inputs with these models. However, both of these model types apply other techniques for regularizing the model, which have their own advantages.

## ARD Regression

ARD (Automatic Relevance Determination) regression is a Bayesian linear regression method where the weights of the regression model are assumed to be in Gaussian distributions.

Exabel uses the scikit-learn implementation.

## Elastic Net

Elastic net is a regularized regression method that linearly combines the L1 and L2 penalties of the lasso and ridge methods.

Exabel uses the scikit-learn implementation.

## Elastic Net CV

Elastic net is a regularized regression method that linearly combines the L1 and L2 penalties of the lasso and ridge methods. The CV version uses a grid search with cross-validation to choose the optimal alpha and L1 ratio hyperparameters.

Exabel uses the scikit-learn implementation.

## Huber Regression

Huber Regression is a linear regression model that is robust to outliers.

Exabel uses the scikit-learn implementation.

## Linear Regression

Ordinary least squares linear regression model.

Exabel uses the scikit-learn implementation.

Updated 4 months ago