Skip to main content
Macro by Mark
  • Home
  • News
  • Calendar
  • Indicators
  • Macro
  • About
Sign inSign up
Macro by Mark

Global Economic Data, Empirical Models, and Macro Theory
All in One Workspace

Public data from government agencies and multilateral statistical releases, anchored in official sources

© 2026 Mark Jayson Nation

Product

  • Home
  • Indicators
  • News
  • Calendar

Macro

  • Overview
  • Models
  • Labs
  • Glossary

Learn

  • Concepts
  • Models
  • Schools
  • History
  • Docs

Account

  • Create account
  • Sign in
  • Pricing
  • Contact
AboutPrivacy PolicyTerms of ServiceTrust and securityEthics and Compliance

Data-Driven Models

Loading Data-Driven Models

Macro by Mark

Unlock Full Macro Model Library with Starter.

This feature is exclusively available to Starter, Research, and Pro. Upgrade when you need this workflow, review pricing, or send a question before changing plans.

Upgrade to StarterView pricingQuestions?Already subscribed? Sign in

What you keep on Free

  • Create and edit one custom board
  • Use up to 3 widgets on each Free board
  • Browse indicators and calendar
← ModelsOverviewHistoryConceptsModelsSchools

Forecast ensemble
Model

Combination of several base forecasters -- simple averages, weighted blends, or stacking -- to reduce single-model risk.

Why does averaging several mediocre forecasts almost always beat the single best individual forecast?

Background

Bates and Granger (1969) noticed something striking: a simple average of two airline-passenger forecasts beat each individual forecast. This was not a fluke. Four decades of subsequent evidence---surveyed by Timmermann (2006) and Genre, Kenny, Meyler, and Timmermann (2013)---confirms that forecast combination is one of the most reliable empirical regularities in economics. The "forecast combination puzzle" is that simple averages often beat sophisticated optimal weighting schemes, likely because the estimation error in optimal weights offsets their theoretical advantage in small samples.

The mechanism is variance reduction through diversification. If two forecasters make independent errors, their average has half the error variance of either one. In practice, forecast errors are correlated, but never perfectly so. As long as individual models capture partially distinct information, combining them cancels some errors. The mathematics is identical to portfolio diversification: the combined forecast is a portfolio of predictions, and the combination weights are portfolio weights that minimize forecast-error variance subject to the constraint that weights sum to one.

Modern practice goes beyond simple averaging. The ECB combines density forecasts from its Survey of Professional Forecasters using optimal linear pools. The Bank of England's Monetary Policy Committee informally combines outputs from multiple internal models (COMPASS, DSGE, SVARs) when producing its Inflation Report projections. The Federal Reserve Bank of Atlanta's GDPNow is itself a weighted combination of bridge equations, each using different indicator subsets. Academic researchers use Bayesian model averaging (BMA), where the weights are posterior model probabilities proportional to marginal likelihoods.

The field has moved from static to adaptive combination. Stock and Watson (2004) showed that the best individual forecasting model changes over time---a model that dominated in the 1990s may fail in the 2000s. Fixed weights estimated from a historical sample can assign high weight to a model whose advantage has evaporated. Time-varying weights that adapt to recent forecast performance---exponentially weighted moving averages of squared errors, Bayesian learning, or regime-switching weights---address this instability. Adaptive combinations sacrifice some statistical efficiency for robustness to structural change.

How the Parts Fit Together

The inputs are MMM individual forecasts y^1,…,y^M\hat{y}_1, \ldots, \hat{y}_My^​1​,…,y^​M​ for a common target variable yyy at a common horizon hhh. Each forecast comes from a distinct model (VAR, DSGE, ARIMA, factor model, judgmental survey, etc.) or from the same model class with different specifications. The forecasts may be point forecasts (scalars) or density forecasts (full predictive distributions). Historical forecast errors are needed for weight estimation: a holdout sample of T0T_0T0​ periods where both the forecasts and the realized values are observed.

The combined forecast is y^c=∑m=1Mwmy^m\hat{y}_c = \sum_{m=1}^M w_m \hat{y}_my^​c​=∑m=1M​wm​y^​m​, where wm≥0w_m \geq 0wm​≥0 and ∑mwm=1\sum_m w_m = 1∑m​wm​=1. The weights wmw_mwm​ are chosen to minimize the forecast-error variance of the combination. Under the assumption that forecast errors have constant covariance Σe\Sigma_eΣe​ (M×MM \times MM×M matrix), the optimal weights are w∗=Σe−1ι/(ι′Σe−1ι)w^* = \Sigma_e^{-1} \iota / (\iota' \Sigma_e^{-1} \iota)w∗=Σe−1​ι/(ι′Σe−1​ι), where ι\iotaι is a vector of ones. This is the minimum-variance portfolio from finance. In practice, Σe\Sigma_eΣe​ must be estimated from the holdout sample, introducing estimation error that can make the estimated optimal weights worse than equal weights.

Weight estimation methods range from simple to complex. Equal weights (wm=1/Mw_m = 1/Mwm​=1/M) ignore the covariance structure entirely but avoid estimation error. Inverse-MSE weights set wm∝1/MSEmw_m \propto 1/\text{MSE}_mwm​∝1/MSEm​, using only individual model performance without cross-model covariance. Regression-based weights run the regression yt=β0+∑mβmy^m,t+εty_t = \beta_0 + \sum_m \beta_m \hat{y}_{m,t} + \varepsilon_tyt​=β0​+∑m​βm​y^​m,t​+εt​ and use β^m\hat{\beta}_mβ^​m​ as weights (Granger-Ramanathan, 1984), but this requires enough holdout data to estimate M+1M+1M+1 parameters reliably. Bayesian model averaging sets wmw_mwm​ proportional to the posterior model probability p(Mm∣data)p(\mathcal{M}_m | \text{data})p(Mm​∣data), computed from the marginal likelihood of each model. Time-varying weights use exponential decay: wm,t∝exp⁡(−α∑s=1t(ys−y^m,s)2)w_{m,t} \propto \exp(-\alpha \sum_{s=1}^{t} (y_s - \hat{y}_{m,s})^2)wm,t​∝exp(−α∑s=1t​(ys​−y^​m,s​)2) with a forgetting factor α\alphaα.

Applications

The ECB's Survey of Professional Forecasters (SPF) collects density forecasts from approximately 60 institutions for euro-area GDP growth, inflation, and unemployment. The ECB staff combines these into a weighted density forecast, where the weights reflect each forecaster's past predictive accuracy. The combined density is tighter (lower entropy) than most individual densities, demonstrating the variance-reduction benefit. This combined forecast enters the Governing Council's deliberations alongside the ECB's own staff projections.

The Federal Reserve Bank of Philadelphia's Survey of Professional Forecasters publishes a consensus (median and mean) forecast that is itself a forecast combination. Research by Reifschneider and Tulip (2019) shows that the SPF consensus systematically outperforms the Fed's Greenbook (now Tealbook) forecast at horizons beyond one quarter, even though Greenbook forecasters have access to confidential data. The combination of many diverse perspectives apparently compensates for the information advantage of a single well-resourced institution.

Central banks that maintain multiple internal models combine them informally. The Bank of England's Monetary Policy Committee sees output from at least four models (COMPASS central forecast, DSGE variant, reduced-form VARs, judgmental overlays). The committee members weight these outputs subjectively, but the process is structured to ensure that no single model dominates. Research by Kapetanios, Mitchell, Price, and Fawcett (2015) formalized this by showing that density combination with time-varying weights improves upon any fixed-weight scheme for the Bank's inflation forecast.

Forecast combination fails when the model pool lacks diversity. Combining five VARs with slightly different lag lengths provides marginal benefit because their errors are nearly identical. The gains come from combining structurally different approaches: a statistical model (VAR) with a structural model (DSGE), a machine-learning method with a judgmental forecast, or a domestic model with one that emphasizes international linkages. Combination also fails when the number of models MMM is large relative to the holdout T0T_0T0​, because weight estimation becomes unreliable. In this regime, equal weights dominate, and the ensemble reduces to a simple average.

Literature and Extensions

Key Papers

  • Bates, Granger (1969) --- foundational paper on forecast combination, established the basic variance-reduction principle
  • Granger, Ramanathan (1984) --- regression-based combination weights with and without constraints
  • Stock, Watson (2004) --- documented time variation in relative forecast performance, motivating adaptive combination
  • Timmermann (2006) --- comprehensive survey of forecast combination methods and the combination puzzle
  • Hall, Mitchell (2007) --- optimal combination of density forecasts using the logarithmic scoring rule

Named Variants

  • Bayesian model averaging --- weights proportional to posterior model probabilities
  • Exponentially weighted combination --- recent forecast errors weighted more heavily, adapting to structural change
  • Stacked generalization --- uses a meta-learner (regression, LASSO, neural network) to combine base model forecasts
  • Trimmed mean / median combination --- robust to outlier forecasts by discarding extreme predictions
  • Density forecast combination --- combines full predictive distributions, not just point forecasts

Open Questions

  • Why simple averages outperform estimated optimal weights in most empirical settings (the forecast combination puzzle)
  • How to optimally combine density forecasts when individual predictive distributions are misspecified
  • Whether machine-learning-based meta-learners can reliably improve upon equal weighting for macro forecast combination

Components

y^m\hat{y}_my^​m​Individual forecast

Point forecast from model mmm for the target variable yyy at forecast horizon hhh.

wmw_mwm​Combination weight

Weight assigned to model mmm in the combined forecast. Non-negative, summing to one.

Σe\Sigma_eΣe​Forecast-error covariance

The M×MM \times MM×M covariance matrix of individual forecast errors. Off-diagonal entries capture error correlation across models.

y^c\hat{y}_cy^​c​Combined forecast

The weighted combination y^c=∑mwmy^m\hat{y}_c = \sum_m w_m \hat{y}_my^​c​=∑m​wm​y^​m​. The main output of the ensemble.

MSEm\text{MSE}_mMSEm​Individual model MSE

Mean squared forecast error of model mmm over the holdout period. Used for inverse-MSE weighting.

T0T_0T0​Holdout sample size

Number of past periods with observed forecasts and realizations, used to estimate combination weights.

Assumptions

Stationary forecast-error distributionTestable

The joint distribution of forecast errors (e1,…,eM)(e_1, \ldots, e_M)(e1​,…,eM​) is stationary over the holdout and forecast periods.

If violated: Weights estimated from historical errors are no longer optimal. Structural breaks, policy shifts, or model degradation change the error covariance over time.

Unbiased individual forecastsTestable

Each individual forecast is unbiased: E[y−y^m]=0E[y - \hat{y}_m] = 0E[y−y^​m​]=0 for all mmm.

If violated: Biased forecasts transmit their bias to the combination unless the weights are constrained or a constant is included (Granger-Ramanathan regression). Mincer-Zarnowitz tests detect individual forecast bias.

Diversified information setsTestable

Individual models use partially non-overlapping information, so their errors are imperfectly correlated.

If violated: If all models make the same errors (perfect correlation), combination provides zero variance reduction. The benefit of combining scales with the degree of error diversification.

Sufficient holdout for weight estimationMaintained

The holdout sample T0T_0T0​ is large enough to estimate the covariance Σe\Sigma_eΣe​ reliably. For optimal weights, T0≫MT_0 \gg MT0​≫M.

If violated: Estimated optimal weights overfit the holdout sample. Sample covariance inversion amplifies estimation error. Equal weights become preferable when T0/MT_0 / MT0​/M is small.

Correctly specified models (for BMA)Maintained

The true data-generating process is one of the MMM candidate models (or well-approximated by one).

If violated: BMA posterior concentrates on the single best-approximating model, missing the combination benefits that arise from averaging across structurally different models.

Symmetric loss functionMaintained

The loss function is quadratic (MSE). Over-prediction and under-prediction are equally costly.

If violated: Under asymmetric loss (e.g., underestimating inflation is worse than overestimating), MSE-optimal weights are suboptimal. Combination should target the asymmetric loss function directly.