VotingEnsemble#
- class VotingEnsemble(pipelines: List[BasePipeline], weights: List[float] | Literal['auto'] | None = None, regressor: DecisionTreeRegressor | ExtraTreeRegressor | RandomForestRegressor | ExtraTreesRegressor | GradientBoostingRegressor | CatBoostRegressor | None = None, n_folds: int = 3, n_jobs: int = 1, joblib_params: Dict[str, Any] | None = None)[source]#
Bases:
EnsembleMixin
,SaveEnsembleMixin
,BasePipeline
VotingEnsemble is a pipeline that forecast future values with weighted averaging of it’s pipelines forecasts.
Examples
>>> from etna.datasets import generate_ar_df >>> from etna.datasets import TSDataset >>> from etna.ensembles import VotingEnsemble >>> from etna.models import NaiveModel >>> from etna.models import ProphetModel >>> from etna.pipeline import Pipeline >>> df = generate_ar_df(periods=30, start_time="2021-06-01", ar_coef=[1.2], n_segments=3) >>> df_ts_format = TSDataset.to_dataset(df) >>> ts = TSDataset(df_ts_format, "D") >>> prophet_pipeline = Pipeline(model=ProphetModel(), transforms=[], horizon=7) >>> naive_pipeline = Pipeline(model=NaiveModel(lag=10), transforms=[], horizon=7) >>> ensemble = VotingEnsemble( ... pipelines=[prophet_pipeline, naive_pipeline], ... weights=[0.7, 0.3] ... ) >>> _ = ensemble.fit(ts=ts) >>> forecast = ensemble.forecast() >>> forecast segment segment_0 segment_1 segment_2 feature target target target timestamp 2021-07-01 -8.84 -186.67 130.99 2021-07-02 -8.96 -198.16 138.81 2021-07-03 -9.57 -212.48 148.48 2021-07-04 -10.48 -229.16 160.13 2021-07-05 -11.20 -248.93 174.39 2021-07-06 -12.47 -281.90 197.82 2021-07-07 -13.51 -307.02 215.73
Init VotingEnsemble.
- Parameters:
pipelines (List[BasePipeline]) – List of pipelines that should be used in ensemble
weights (List[float] | Literal['auto'] | None) –
List of pipelines’ weights.
If None, use uniform weights
If List[float], use this weights for the base estimators, weights will be normalized automatically
If “auto”, use importances of the base estimators forecasts as weights of base estimators
regressor (DecisionTreeRegressor | ExtraTreeRegressor | RandomForestRegressor | ExtraTreesRegressor | GradientBoostingRegressor | CatBoostRegressor | None) – Regression model with fit/predict interface which will be used to evaluate weights of the base estimators. It should have
feature_importances_
property (e.g. all tree-based regressors in sklearn)n_folds (int) – Number of folds to use in the backtest. Backtest is used to obtain the forecasts from the base estimators; forecasts will be used to evaluate the estimator’s weights.
n_jobs (int) – Number of jobs to run in parallel
joblib_params (Dict[str, Any] | None) – Additional parameters for
joblib.Parallel
- Raises:
ValueError: – If the number of the pipelines is less than 2 or pipelines have different horizons.
Methods
backtest
(ts, metrics[, n_folds, mode, ...])Run backtest with the pipeline.
fit
(ts)Fit pipelines in ensemble.
forecast
([ts, prediction_interval, ...])Make a forecast of the next points of a dataset.
load
(path[, ts])Load an object.
Get hyperparameter grid to tune.
predict
(ts[, start_timestamp, ...])Make in-sample predictions on dataset in a given range.
save
(path)Save the object.
set_params
(**params)Return new object instance with modified parameters.
to_dict
()Collect all information about etna object in dict.
Attributes
This class stores its
__init__
parameters as attributes.- backtest(ts: TSDataset, metrics: List[Metric], n_folds: int | List[FoldMask] = 5, mode: str | None = None, aggregate_metrics: bool = False, n_jobs: int = 1, refit: bool | int = True, stride: int | None = None, joblib_params: Dict[str, Any] | None = None, forecast_params: Dict[str, Any] | None = None) Tuple[DataFrame, DataFrame, DataFrame] [source]#
Run backtest with the pipeline.
If
refit != True
and some component of the pipeline doesn’t support forecasting with gap, this component will raise an exception.- Parameters:
ts (TSDataset) – Dataset to fit models in backtest
metrics (List[Metric]) – List of metrics to compute for each fold
n_folds (int | List[FoldMask]) – Number of folds or the list of fold masks
mode (str | None) – Train generation policy: ‘expand’ or ‘constant’. Works only if
n_folds
is integer. By default, is set to ‘expand’.aggregate_metrics (bool) – If True aggregate metrics above folds, return raw metrics otherwise
n_jobs (int) – Number of jobs to run in parallel
Determines how often pipeline should be retrained during iteration over folds.
If
True
: pipeline is retrained on each fold.If
False
: pipeline is trained only on the first fold.If
value: int
: pipeline is trained everyvalue
folds starting from the first.
stride (int | None) – Number of points between folds. Works only if
n_folds
is integer. By default, is set tohorizon
.joblib_params (Dict[str, Any] | None) – Additional parameters for
joblib.Parallel
forecast_params (Dict[str, Any] | None) – Additional parameters for
forecast()
- Returns:
metrics_df, forecast_df, fold_info_df – Metrics dataframe, forecast dataframe and dataframe with information about folds
- Return type:
Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]
- Raises:
ValueError: – If
mode
is set whenn_folds
areList[FoldMask]
.ValueError: – If
stride
is set whenn_folds
areList[FoldMask]
.
- fit(ts: TSDataset) VotingEnsemble [source]#
Fit pipelines in ensemble.
- Parameters:
ts (TSDataset) – TSDataset to fit ensemble
- Returns:
Fitted ensemble
- Return type:
self
- forecast(ts: TSDataset | None = None, prediction_interval: bool = False, quantiles: Sequence[float] = (0.025, 0.975), n_folds: int = 3, return_components: bool = False) TSDataset [source]#
Make a forecast of the next points of a dataset.
The result of forecasting starts from the last point of
ts
, not including it.- Parameters:
ts (TSDataset | None) – Dataset to forecast. If not given, dataset given during :py:meth:
fit
is used.prediction_interval (bool) – If True returns prediction interval for forecast
quantiles (Sequence[float]) – Levels of prediction distribution. By default 2.5% and 97.5% taken to form a 95% prediction interval
n_folds (int) – Number of folds to use in the backtest for prediction interval estimation
return_components (bool) – If True additionally returns forecast components
- Returns:
Dataset with predictions
- Raises:
NotImplementedError: – Adding target components is not currently implemented
- Return type:
- params_to_tune() Dict[str, BaseDistribution] [source]#
Get hyperparameter grid to tune.
Not implemented for this class.
- Returns:
Grid with hyperparameters.
- Return type:
- predict(ts: TSDataset, start_timestamp: Timestamp | None = None, end_timestamp: Timestamp | None = None, prediction_interval: bool = False, quantiles: Sequence[float] = (0.025, 0.975), return_components: bool = False) TSDataset [source]#
Make in-sample predictions on dataset in a given range.
Currently, in situation when segments start with different timestamps we only guarantee to work with
start_timestamp
>= beginning of all segments.- Parameters:
ts (TSDataset) – Dataset to make predictions on.
start_timestamp (Timestamp | None) – First timestamp of prediction range to return, should be >= than first timestamp in
ts
; expected that beginning of each segment <=start_timestamp
; if isn’t set the first timestamp where each segment began is taken.end_timestamp (Timestamp | None) – Last timestamp of prediction range to return; if isn’t set the last timestamp of
ts
is taken. Expected that value is less or equal to the last timestamp ints
.prediction_interval (bool) – If True returns prediction interval for forecast.
quantiles (Sequence[float]) – Levels of prediction distribution. By default 2.5% and 97.5% taken to form a 95% prediction interval.
return_components (bool) – If True additionally returns forecast components
- Returns:
Dataset with predictions in
[start_timestamp, end_timestamp]
range.- Raises:
ValueError: – Value of
end_timestamp
is less thanstart_timestamp
.ValueError: – Value of
start_timestamp
goes before point where each segment started.ValueError: – Value of
end_timestamp
goes after the last timestamp.NotImplementedError: – Adding target components is not currently implemented
- Return type:
- set_params(**params: dict) Self [source]#
Return new object instance with modified parameters.
Method also allows to change parameters of nested objects within the current object. For example, it is possible to change parameters of a
model
in aPipeline
.Nested parameters are expected to be in a
<component_1>.<...>.<parameter>
form, where components are separated by a dot.- Parameters:
**params (dict) – Estimator parameters
- Returns:
New instance with changed parameters
- Return type:
Self
Examples
>>> from etna.pipeline import Pipeline >>> from etna.models import NaiveModel >>> from etna.transforms import AddConstTransform >>> model = model=NaiveModel(lag=1) >>> transforms = [AddConstTransform(in_column="target", value=1)] >>> pipeline = Pipeline(model, transforms=transforms, horizon=3) >>> pipeline.set_params(**{"model.lag": 3, "transforms.0.value": 2}) Pipeline(model = NaiveModel(lag = 3, ), transforms = [AddConstTransform(in_column = 'target', value = 2, inplace = True, out_column = None, )], horizon = 3, )