View Jupyter notebook on the GitHub.

Forecasting interpretation#

Binder

This notebook contains examples of forecast decomposition into additive components for various models.

Table of contents

  • Loading dataset

  • Forecast decomposition

    • CatBoost

    • SARIMAX

    • BATS

    • In-sample and out-of-sample decomposition

  • Accessing target components

  • Regressors relevance

    • Feature relevance

    • Components relevance

[1]:
import warnings

warnings.filterwarnings("ignore")
[2]:
from copy import deepcopy

import pandas as pd

from etna.analysis import plot_backtest
from etna.analysis import plot_forecast_decomposition
from etna.datasets import TSDataset
from etna.metrics import MAE
from etna.pipeline import Pipeline
from etna.transforms import LagTransform
[3]:
HORIZON = 21

1. Loading dataset#

Consider the dataset data/example_dataset.csv.

We will use this data to demonstrate how model forecasts can be decomposed down to additive effects in the ETNA library.

Let’s load and plot this data to get a brief idea of how time series look.

[4]:
df = pd.read_csv("data/example_dataset.csv")
df["timestamp"] = pd.to_datetime(df["timestamp"])
df = TSDataset.to_dataset(df)
ts = TSDataset(df, freq="D")

ts.head(5)
[4]:
segment segment_a segment_b segment_c segment_d
feature target target target target
timestamp
2019-01-01 170 102 92 238
2019-01-02 243 123 107 358
2019-01-03 267 130 103 366
2019-01-04 287 138 103 385
2019-01-05 279 137 104 384
[5]:
ts.plot()
../_images/tutorials_304-forecasting_interpretation_8_0.png

Here we have four segments in the dataset. All segments have seasonalities, and some of them show signs of trend.

Some of the models need features in order to estimate forecasts. Here we will use lags of the target variable as features when it is necessary.

[6]:
transforms = [
    LagTransform(in_column="target", lags=list(range(21, 57, 7)), out_column="lag"),
]

2. Forecast decomposition#

This section shows how a model forecast can be decomposed down to individual additive components. There are two types of decomposition:

  • model-specific decomposition

  • model-agnostic decomposition

Model-specific decomposition uses model representation to compute components. Consider linear regression as an example. \(\hat y = w^T X + b\). We can slightly rewrite this equation to see additive components \(\hat y = \sum_{i=0}^m w_i x_i + b\). So in this case, the components are: \(w_i x_i\) and \(b\).

Model-agnostic decomposition uses separate estimators to compute the contribution of each component (e.g., SHAP).

The main feature of forecast decomposition in the ETNA library is guaranteed additive components.

There are several models with available decomposition:

  • SARIMAXModel

  • AutoARIMAModel

  • CatBoostPerSegmentModel, CatBoostMultiSegmentModel

  • SimpleExpSmoothingModel, HoltModel, HoltWintersModel

  • LinearPerSegmentModel, ElasticPerSegmentModel, LinearMultiSegmentModel, ElasticMultiSegmentModel

  • ProphetModel

  • MovingAverageModel, NaiveModel, DeadlineMovingAverageModel, SeasonalMovingAverageModel

  • BATS, TBATS

In this notebook, we will take a closer look at the forecast decomposition for CatBoost, SARIMAX and BATS.

Here is a function that will help us extract the target and its components from the forecast dataframe and transform this data to TSDataset.

[7]:
from etna.datasets.utils import match_target_components


def target_components_to_ts(df: pd.DataFrame) -> TSDataset:
    """Get components with target as `TSDataset`."""
    # find target components column names
    column_names = set(df.columns.get_level_values("feature"))
    components_columns = match_target_components(column_names)

    # select dataframe with components
    target_components = df.loc[:, pd.IndexSlice[:, tuple(components_columns)]]

    # select dataframe with target
    target = df.loc[:, pd.IndexSlice[:, "target"]]

    # creat dataset
    forecast_ts = TSDataset(df=target, freq="D")
    forecast_ts.add_target_components(target_components_df=target_components)

    return forecast_ts

2.1 CatBoost#

CatBoost uses model-agnostic forecast decomposition. Components contributions estimated with SHAP for each timestamp. Decomposition at timestamp \(t\) could be represented as:

\begin{equation} f(x_t) = \phi_0 + \sum_{i = 1}^m \phi_i \quad , \end{equation}

where \(\phi_i\) - SHAP values.

[8]:
from etna.models import CatBoostMultiSegmentModel
from etna.models import CatBoostPerSegmentModel

Here we create Pipeline with CatBoostPerSegmentModel and estimate forecast components.

[9]:
pipeline = Pipeline(model=CatBoostPerSegmentModel(), transforms=transforms, horizon=HORIZON)

_, forecast_df, _ = pipeline.backtest(ts=ts, metrics=[MAE()], n_folds=3, forecast_params={"return_components": True})
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    1.8s
[Parallel(n_jobs=1)]: Done   2 tasks      | elapsed:    3.6s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:    5.5s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:    5.5s
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    0.3s
[Parallel(n_jobs=1)]: Done   2 tasks      | elapsed:    0.6s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:    1.0s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:    1.0s
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    0.0s
[Parallel(n_jobs=1)]: Done   2 tasks      | elapsed:    0.0s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:    0.1s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:    0.1s
[10]:
plot_backtest(forecast_df=forecast_df, ts=ts, history_len=50)
../_images/tutorials_304-forecasting_interpretation_19_0.png

Components could be plotted with the function plot_forecast_decomposition.

[11]:
forecast_ts = target_components_to_ts(df=forecast_df)
plot_forecast_decomposition(forecast_ts=forecast_ts, mode="joint", columns_num=2, show_grid=True)
../_images/tutorials_304-forecasting_interpretation_21_0.png

On these charts we can see how each feature contributed to the forecast for each timestamp. CatBoost uses features to estimate forecasts, so each forecast component corresponds to a particular feature.

Let’s run a similar procedure for CatBoostMultiSegmentModel.

[12]:
pipeline = Pipeline(model=CatBoostMultiSegmentModel(), transforms=transforms, horizon=HORIZON)

_, forecast_df, _ = pipeline.backtest(ts=ts, metrics=[MAE()], n_folds=3, forecast_params={"return_components": True})
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    1.2s
[Parallel(n_jobs=1)]: Done   2 tasks      | elapsed:    2.3s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:    3.6s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:    3.6s
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    0.3s
[Parallel(n_jobs=1)]: Done   2 tasks      | elapsed:    0.7s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:    1.0s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:    1.0s
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    0.0s
[Parallel(n_jobs=1)]: Done   2 tasks      | elapsed:    0.1s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:    0.1s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:    0.1s
[13]:
plot_backtest(forecast_df=forecast_df, ts=ts, history_len=50)
../_images/tutorials_304-forecasting_interpretation_25_0.png
[14]:
forecast_ts = target_components_to_ts(df=forecast_df)
plot_forecast_decomposition(forecast_ts=forecast_ts, mode="joint", columns_num=2, show_grid=True)
../_images/tutorials_304-forecasting_interpretation_26_0.png

Here we obtained almost similar results. They are identical in structure but slightly different when comparing values. Such behaviour is explained by the use of a single model for all dataset segments.

2.2 SARIMAX#

SARIMAX uses model-specific forecast decomposition. So the main components of this model are: SARIMA and effects from exogenous variables.

Such constraints on decomposition form are due to implementation details of SARIMAX models, which are estimated as state-space models.

[15]:
from etna.models import SARIMAXModel

Let’s build Pipeline with SARIMAXModel and estimate forecast components on backtest.

[16]:
pipeline = Pipeline(model=SARIMAXModel(), transforms=transforms, horizon=HORIZON)

_, forecast_df, _ = pipeline.backtest(ts=ts, metrics=[MAE()], n_folds=3, forecast_params={"return_components": True})
RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =            8     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  4.90038D+00    |proj g|=  1.65569D-01

At iterate    5    f=  4.85680D+00    |proj g|=  2.10363D-02

At iterate   10    f=  4.85558D+00    |proj g|=  4.19482D-03

At iterate   15    f=  4.85420D+00    |proj g|=  2.79606D-02

At iterate   20    f=  4.85376D+00    |proj g|=  7.24822D-04

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
    8     22     27      1     0     0   1.529D-05   4.854D+00
  F =   4.8537570835596506

CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH
RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =            8     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  4.00764D+00    |proj g|=  2.03374D-01

At iterate    5    f=  3.97187D+00    |proj g|=  9.69939D-03

At iterate   10    f=  3.97079D+00    |proj g|=  7.25716D-03

At iterate   15    f=  3.97049D+00    |proj g|=  7.69284D-04

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
    8     18     24      1     0     0   1.597D-05   3.970D+00
  F =   3.9704937260313664

CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH
 This problem is unconstrained.
 This problem is unconstrained.
RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =            8     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  5.34706D+00    |proj g|=  1.90556D-02

At iterate    5    f=  5.34701D+00    |proj g|=  8.87463D-05

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
    8      5      8      1     0     0   8.875D-05   5.347D+00
  F =   5.3470110236236659

CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH
RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =            8     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  5.41327D+00    |proj g|=  2.29720D-01

At iterate    5    f=  5.37417D+00    |proj g|=  1.87453D-02

At iterate   10    f=  5.37332D+00    |proj g|=  3.59795D-03

At iterate   15    f=  5.37261D+00    |proj g|=  2.76053D-02

At iterate   20    f=  5.37180D+00    |proj g|=  4.27622D-04

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
    8     22     26      1     0     0   1.226D-05   5.372D+00
  F =   5.3718041615424870

CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH
RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =            8     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  4.87110D+00    |proj g|=  1.54798D-01

 This problem is unconstrained.
 This problem is unconstrained.
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    0.4s
 This problem is unconstrained.
At iterate    5    f=  4.82551D+00    |proj g|=  4.04983D-02

At iterate   10    f=  4.82384D+00    |proj g|=  3.86702D-03

At iterate   15    f=  4.82355D+00    |proj g|=  1.76847D-02

At iterate   20    f=  4.82168D+00    |proj g|=  5.25136D-03

At iterate   25    f=  4.82166D+00    |proj g|=  1.07272D-04

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
    8     25     31      1     0     0   1.073D-04   4.822D+00
  F =   4.8216622881090307

CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH
RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =            8     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  3.97752D+00    |proj g|=  2.10289D-01

At iterate    5    f=  3.94348D+00    |proj g|=  8.76676D-03

At iterate   10    f=  3.94254D+00    |proj g|=  1.20056D-02

At iterate   15    f=  3.94231D+00    |proj g|=  2.84316D-04

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
    8     17     23      1     0     0   1.044D-05   3.942D+00
  F =   3.9423131829987743

CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH
 This problem is unconstrained.
 This problem is unconstrained.
 This problem is unconstrained.
RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =            8     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  5.31323D+00    |proj g|=  2.61433D-02

At iterate    5    f=  5.31314D+00    |proj g|=  8.69485D-05

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
    8      5      8      1     0     0   8.695D-05   5.313D+00
  F =   5.3131410159230246

CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH
RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =            8     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  5.41440D+00    |proj g|=  3.15861D-01

At iterate    5    f=  5.35194D+00    |proj g|=  1.73456D-02

At iterate   10    f=  5.35100D+00    |proj g|=  3.07897D-03

At iterate   15    f=  5.34849D+00    |proj g|=  4.01631D-02

At iterate   20    f=  5.34688D+00    |proj g|=  9.88039D-04

At iterate   25    f=  5.34687D+00    |proj g|=  4.49029D-05

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
    8     26     31      1     0     0   3.804D-05   5.347D+00
  F =   5.3468739885960970

CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH
[Parallel(n_jobs=1)]: Done   2 tasks      | elapsed:    0.8s
 This problem is unconstrained.
 This problem is unconstrained.
RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =            8     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  4.90986D+00    |proj g|=  2.31991D-01

At iterate    5    f=  4.86531D+00    |proj g|=  1.31344D-02

At iterate   10    f=  4.86491D+00    |proj g|=  1.14106D-02

At iterate   15    f=  4.86340D+00    |proj g|=  3.90127D-02

At iterate   20    f=  4.86307D+00    |proj g|=  2.96271D-03

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
    8     23     28      1     0     0   5.929D-05   4.863D+00
  F =   4.8630726840600804

CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH
RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =            8     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  4.00550D+00    |proj g|=  2.45513D-01

At iterate    5    f=  3.97878D+00    |proj g|=  5.34959D-03

At iterate   10    f=  3.97831D+00    |proj g|=  1.04488D-02

At iterate   15    f=  3.97818D+00    |proj g|=  2.67016D-04

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
    8     17     23      1     0     0   8.812D-06   3.978D+00
  F =   3.9781845691380524

CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL
RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =            8     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  5.28379D+00    |proj g|=  3.50254D-02

At iterate    5    f=  5.28362D+00    |proj g|=  8.71784D-05

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
    8      5      8      1     0     0   8.718D-05   5.284D+00
  F =   5.2836224381312107

CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH
RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =            8     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  5.44469D+00    |proj g|=  3.22756D-01

At iterate    5    f=  5.39435D+00    |proj g|=  1.21107D-02

At iterate   10    f=  5.39418D+00    |proj g|=  5.90180D-03

At iterate   15    f=  5.39219D+00    |proj g|=  5.58459D-02

At iterate   20    f=  5.39176D+00    |proj g|=  7.63151D-04

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
    8     23     28      1     0     0   5.592D-05   5.392D+00
  F =   5.3917592312532499

CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH
 This problem is unconstrained.
 This problem is unconstrained.
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:    1.3s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:    1.3s
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    0.4s
[Parallel(n_jobs=1)]: Done   2 tasks      | elapsed:    0.8s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:    1.2s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:    1.2s
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    0.0s
[Parallel(n_jobs=1)]: Done   2 tasks      | elapsed:    0.0s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:    0.1s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:    0.1s
[17]:
plot_backtest(forecast_df=forecast_df, ts=ts, history_len=50)
../_images/tutorials_304-forecasting_interpretation_32_0.png
[18]:
forecast_ts = target_components_to_ts(df=forecast_df)
plot_forecast_decomposition(forecast_ts=forecast_ts, mode="joint", columns_num=2, show_grid=True)
../_images/tutorials_304-forecasting_interpretation_33_0.png

Here we can notice components for each exogenous variable and a separate component for the SARIMA effect. Note that this decomposition is completely different from the forecast decomposition of CatBoost models, even on the same data. This is due to different approaches for component contribution estimation.

2.3 BATS#

BATS uses model-specific forecast decomposition. The main components are:

  • trend

  • local level

  • seasonality for each specified period

  • errors ARMA(p, q)

The last two components are controlled by seasonal_periods and use_arma_erros parameters, respectively. The trend component is controlled by use_trend. These parameters add flexibility to the model representation and, hence, to the forecast decomposition.

[19]:
from etna.models import BATSModel

Let’s build Pipeline and estimate forecast decomposition with BATSModel, but now without any features.

[20]:
pipeline = Pipeline(model=BATSModel(seasonal_periods=[14, 21]), horizon=HORIZON)

_, forecast_df, _ = pipeline.backtest(ts=ts, metrics=[MAE()], n_folds=3, forecast_params={"return_components": True})
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:  3.0min
[Parallel(n_jobs=1)]: Done   2 tasks      | elapsed:  6.5min
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:  9.8min
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:  9.8min
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    0.1s
[Parallel(n_jobs=1)]: Done   2 tasks      | elapsed:    0.2s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:    0.4s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:    0.4s
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    0.0s
[Parallel(n_jobs=1)]: Done   2 tasks      | elapsed:    0.1s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:    0.1s
[Parallel(n_jobs=1)]: Done   3 tasks      | elapsed:    0.1s
[21]:
plot_backtest(forecast_df=forecast_df, ts=ts, history_len=50)
../_images/tutorials_304-forecasting_interpretation_39_0.png
[22]:
forecast_ts = target_components_to_ts(df=forecast_df)
plot_forecast_decomposition(forecast_ts=forecast_ts, mode="joint", columns_num=2, show_grid=True)
../_images/tutorials_304-forecasting_interpretation_40_0.png

Here we obtained components for trend, local level, seasonality, and ARMA errors. Note that each seasonality period is represented by a separate component. Such a form of decomposition helps to find causality with seasonal effects in the original data. Results show that almost all seasonal fluctuations come from seasonality in period 14.

2.4 In-sample and out-of-sample decomposition#

In the ETNA library we can perform decomposition of fitted and forecasted values separately.

Consider the BATS model with seasonal periods of 2 and 3 weeks. Let’s prepare data for training and forecasting and fit the model.

[23]:
train, _ = ts.train_test_split(test_size=HORIZON)
future = train.make_future(future_steps=HORIZON)

model = BATSModel(seasonal_periods=[14, 21])
model.fit(ts=train);

In-sample forecast decomposition could be estimated with predict method. To do this set return_components=True.

[24]:
in_sample_forecast = model.predict(ts=train, return_components=True)
plot_forecast_decomposition(forecast_ts=in_sample_forecast, mode="joint", columns_num=2, show_grid=True)
../_images/tutorials_304-forecasting_interpretation_46_0.png

For estimation of out-of-sample forecast decomposition, use forecast with return_components=True.

[25]:
out_of_sample_forecast = model.forecast(ts=future, return_components=True)
plot_forecast_decomposition(forecast_ts=out_of_sample_forecast, mode="joint", columns_num=2, show_grid=True)
../_images/tutorials_304-forecasting_interpretation_48_0.png

As expected, the decomposition form is the same for in-sample and out-of-sample cases.

3. Accessing target components#

This section shows how one can manipulate forecast components using the TSDataset interface.

Let’s populate our data with lag features and fit the CatBoostPerSegmentModel. We will use this setup to estimate forecast with additive components.

[26]:
ts_with_lags = deepcopy(ts)
ts_with_lags.fit_transform(transforms=transforms)

model = CatBoostPerSegmentModel()
model.fit(ts=ts_with_lags);

Now we can estimate forecasts with forecast decomposition.

[27]:
forecast = model.forecast(ts=ts_with_lags, return_components=True)
forecast
[27]:
segment segment_a ... segment_d
feature lag_21 lag_28 lag_35 lag_42 lag_49 lag_56 target target_component_lag_21 target_component_lag_28 target_component_lag_35 ... lag_42 lag_49 lag_56 target target_component_lag_21 target_component_lag_28 target_component_lag_35 target_component_lag_42 target_component_lag_49 target_component_lag_56
timestamp
2019-01-01 NaN NaN NaN NaN NaN NaN 368.960244 52.677657 45.848608 59.139714 ... NaN NaN NaN 440.294693 58.143024 43.773274 92.546575 76.282320 79.442884 90.106617
2019-01-02 NaN NaN NaN NaN NaN NaN 368.960244 52.677657 45.848608 59.139714 ... NaN NaN NaN 440.294693 58.143024 43.773274 92.546575 76.282320 79.442884 90.106617
2019-01-03 NaN NaN NaN NaN NaN NaN 368.960244 52.677657 45.848608 59.139714 ... NaN NaN NaN 440.294693 58.143024 43.773274 92.546575 76.282320 79.442884 90.106617
2019-01-04 NaN NaN NaN NaN NaN NaN 368.960244 52.677657 45.848608 59.139714 ... NaN NaN NaN 440.294693 58.143024 43.773274 92.546575 76.282320 79.442884 90.106617
2019-01-05 NaN NaN NaN NaN NaN NaN 368.960244 52.677657 45.848608 59.139714 ... NaN NaN NaN 440.294693 58.143024 43.773274 92.546575 76.282320 79.442884 90.106617
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2019-11-26 510.0 502.0 532.0 582.0 528.0 558.0 587.562771 92.622123 93.063689 102.406833 ... 931.0 879.0 1037.0 937.554113 186.185095 152.507876 144.561146 142.979337 137.269227 174.051432
2019-11-27 501.0 497.0 515.0 559.0 550.0 516.0 602.751237 93.055945 93.196738 98.617847 ... 897.0 886.0 969.0 941.766157 189.933535 153.970751 143.035390 142.433922 137.140238 175.252321
2019-11-28 525.0 501.0 520.0 543.0 547.0 489.0 556.927263 89.108979 94.226170 90.372332 ... 882.0 934.0 929.0 894.837318 181.561275 154.268641 139.273730 136.903530 130.007170 152.822973
2019-11-29 534.0 480.0 511.0 523.0 544.0 471.0 568.569172 96.026571 92.935954 88.949103 ... 843.0 885.0 874.0 899.518552 178.867363 163.091693 144.959843 132.615508 135.207526 144.776619
2019-11-30 430.0 384.0 502.0 422.0 423.0 371.0 499.505858 81.304877 65.772926 113.489016 ... 652.0 672.0 664.0 720.744048 108.165250 129.946841 111.854389 124.573300 121.123109 125.081158

334 rows × 52 columns

Note that we have additional columns prefixed with target_component_. These are individual effects that contributed to the overall forecast.

One can obtain forecast components separately from other forecast data using get_target_components.

[28]:
forecast.get_target_components()
[28]:
segment segment_a segment_b ... segment_c segment_d
feature target_component_lag_21 target_component_lag_28 target_component_lag_35 target_component_lag_42 target_component_lag_49 target_component_lag_56 target_component_lag_21 target_component_lag_28 target_component_lag_35 target_component_lag_42 ... target_component_lag_35 target_component_lag_42 target_component_lag_49 target_component_lag_56 target_component_lag_21 target_component_lag_28 target_component_lag_35 target_component_lag_42 target_component_lag_49 target_component_lag_56
timestamp
2019-01-01 52.677657 45.848608 59.139714 69.745969 69.187315 72.360981 29.190551 23.342434 27.285093 28.338942 ... 18.093097 13.173772 20.646325 19.443199 58.143024 43.773274 92.546575 76.282320 79.442884 90.106617
2019-01-02 52.677657 45.848608 59.139714 69.745969 69.187315 72.360981 29.190551 23.342434 27.285093 28.338942 ... 18.093097 13.173772 20.646325 19.443199 58.143024 43.773274 92.546575 76.282320 79.442884 90.106617
2019-01-03 52.677657 45.848608 59.139714 69.745969 69.187315 72.360981 29.190551 23.342434 27.285093 28.338942 ... 18.093097 13.173772 20.646325 19.443199 58.143024 43.773274 92.546575 76.282320 79.442884 90.106617
2019-01-04 52.677657 45.848608 59.139714 69.745969 69.187315 72.360981 29.190551 23.342434 27.285093 28.338942 ... 18.093097 13.173772 20.646325 19.443199 58.143024 43.773274 92.546575 76.282320 79.442884 90.106617
2019-01-05 52.677657 45.848608 59.139714 69.745969 69.187315 72.360981 29.190551 23.342434 27.285093 28.338942 ... 18.093097 13.173772 20.646325 19.443199 58.143024 43.773274 92.546575 76.282320 79.442884 90.106617
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2019-11-26 92.622123 93.063689 102.406833 115.592269 74.912398 108.965459 39.646473 41.938191 48.057205 42.905137 ... 28.693191 44.746001 29.506321 24.909083 186.185095 152.507876 144.561146 142.979337 137.269227 174.051432
2019-11-27 93.055945 93.196738 98.617847 104.375155 114.013395 99.492157 44.803978 42.526073 44.869424 44.210031 ... 24.689802 48.773621 28.103322 29.071382 189.933535 153.970751 143.035390 142.433922 137.140238 175.252321
2019-11-28 89.108979 94.226170 90.372332 88.756972 112.596508 81.866302 43.636738 39.279155 41.455011 41.251943 ... 25.204180 49.273745 30.080450 29.670928 181.561275 154.268641 139.273730 136.903530 130.007170 152.822973
2019-11-29 96.026571 92.935954 88.949103 89.018761 117.011601 84.627183 41.949065 38.703364 42.617386 40.808739 ... 22.174713 48.547700 28.981421 26.047110 178.867363 163.091693 144.959843 132.615508 135.207526 144.776619
2019-11-30 81.304877 65.772926 113.489016 86.668521 78.966691 73.303827 38.118666 33.999578 31.914988 33.683397 ... 17.088639 45.224635 22.134809 24.102108 108.165250 129.946841 111.854389 124.573300 121.123109 125.081158

334 rows × 24 columns

There is a possibility to remove components. It can be achieved using drop_target_components. This function will remove components inplace. We can ensure that our forecast dataset no longer has any components by using the target_components_names property.

[29]:
components = forecast.get_target_components()
forecast.drop_target_components()
forecast.target_components_names
[29]:
()

There is a possibility to attach components to the forecast dataset as well. The main constraint on components is that, being attached, they should sum up to the target at each timestamp.

[30]:
forecast.add_target_components(components)
forecast.target_components_names
[30]:
('target_component_lag_21',
 'target_component_lag_28',
 'target_component_lag_35',
 'target_component_lag_42',
 'target_component_lag_49',
 'target_component_lag_56')

4. Regressors relevance#

This section shows how regressors relevance could be estimated.

There are two types of relevance estimation in the ETNA library

  • statistical, based on tsfresh (StatisticsRelevanceTable)

  • model based (ModelRelevanceTable)

[31]:
from sklearn.ensemble import RandomForestRegressor

from etna.analysis import ModelRelevanceTable
from etna.analysis import StatisticsRelevanceTable
from etna.analysis import plot_feature_relevance

4.1 Feature relevance#

The first step is to prepare the data. Here, we only need features and estimated forecasts.

[32]:
target_with_features = deepcopy(forecast)
target_with_features.drop_target_components()

Let’s estimate feature relevance using a statistical approach. Results can be shown with the help of the plot_feature_relevance function.

[33]:
plot_feature_relevance(ts=target_with_features, relevance_table=StatisticsRelevanceTable())
../_images/tutorials_304-forecasting_interpretation_67_0.png

A statistical approach shows that all features have the same relevance. The blue vertical line on the chart represents the significance level. Values that lie to the right of this line have p-value < alpha and values that lie to the left have p-value > alpha. Here we see that all features are significant at the default level of alpha = 0.05. Significance level available only for a statistical approach.

We can use a separate tree-based estimator that is able to provide feature relevance to estimate such values. In this case, we can use RandomForestRegressor.

[34]:
plot_feature_relevance(
    ts=target_with_features,
    relevance_table=ModelRelevanceTable(),
    relevance_params={"model": RandomForestRegressor(random_state=1)},
)
../_images/tutorials_304-forecasting_interpretation_70_0.png

We can see that results from this approach are completely different from statistical results. The main reason is the relevance estimation process. In ModelRelevanceTable, a separate estimator provides feature_importances_, while in StatisticsRelevanceTable statistics from tsfresh are used.

4.2 Components relevance#

We can use the same instruments to estimate relevance for each target component.

First, let’s select only the forecasts with corresponding components from the dataset.

[35]:
target_with_components = TSDataset(df=forecast[:, :, "target"], freq=forecast.freq)
target_components = forecast.get_target_components()
target_with_components.add_target_components(target_components)

Here we can use the same setup with tree-based relevance estimator.

[36]:
plot_feature_relevance(
    ts=target_with_components,
    relevance_table=ModelRelevanceTable(),
    relevance_params={"model": RandomForestRegressor(random_state=1)},
)
../_images/tutorials_304-forecasting_interpretation_75_0.png

Results show a strong correlation between the relevance of features and the relevance of corresponding components. Note that values are actually different when comparing components to features relevance.

In this notebook we showed how one can decompose in-sample and out-of-sample forecasts in the ETNA library. This decomposition has additive properties and hence may be used for forecast interpretation.

Also, we showed how to estimate regressors relevance with different methods and display the results.