Macro by Mark

Unlock Full Macro Model Library with Starter.

This feature is exclusively available to Starter, Research, and Pro. Upgrade when you need this workflow, review pricing, or send a question before changing plans.

Upgrade to Starter View pricing Questions?Already subscribed? Sign in

What you keep on Free

Create and edit one custom board
Use up to 3 widgets on each Free board
Browse indicators and calendar

← Models Overview History Concepts Models Schools

Bayesian shrinkage on a vector autoregression -- Minnesota-style priors stabilize forecasts for medium-sized macro systems.

How do you estimate a high-dimensional VAR without drowning in parameter proliferation?

Background

The VAR's strength -- treating every variable as endogenous -- is also its weakness. A VAR(p) with n variables requires n^2*p slope coefficients. With n = 7 and p = 4 the system has 196 slopes plus 7 intercepts. Quarterly macro datasets rarely exceed 200 observations, so the parameter-to-observation ratio approaches the danger zone where OLS overfits noise. By the mid-1980s practitioners at the Federal Reserve Bank of Minneapolis had hit this wall repeatedly. Robert Litterman's 1986 Journal of Forecasting paper proposed the solution that became the standard: treat the VAR coefficients as random variables and impose a prior distribution that shrinks most of them toward zero. The resulting Bayesian VAR (BVAR) traded a small amount of bias for a large reduction in forecast variance.

The Minnesota prior -- named for the Minneapolis Fed where Litterman, Thomas Doan, and Christopher Sims developed it -- encodes a specific belief: each variable behaves roughly like a random walk with drift, and cross-variable lags are less important than own lags. Formally, the prior mean for a variable's own first lag is 1 (unit root belief); all other lag coefficients have prior mean 0. The prior variance on coefficient (j,k) at lag l shrinks at rate 1/l^2, penalizing distant lags, and cross-variable coefficients receive an additional shrinkage factor lambda_cross < 1. Two hyperparameters control the overall tightness (lambda) and the relative weight on cross-variable versus own-variable lags. When lambda approaches infinity the posterior collapses to the OLS VAR. When lambda approaches zero every variable becomes an independent random walk.

Giannone, Lenza, and Primiceri (2015, Review of Economics and Statistics) showed how to choose the Minnesota hyperparameters by marginal likelihood maximization, removing the last subjective element. Their approach treats lambda as a continuous hyperparameter and optimizes the marginal data density over a grid. The result: BVAR forecasts that are competitive with or better than large factor models, mixed-frequency models, and professional consensus forecasts for GDP, inflation, and unemployment at horizons of 1-8 quarters. The ECB's suite of BVARs follows this approach almost exactly.

Extensions since the original Minnesota prior include stochastic volatility (Cogley and Sargent, 2005), time-varying parameters (Primiceri, 2005), large BVARs with global-local shrinkage (Banbura, Giannone, and Reichlin, 2010), and hierarchical priors that learn the prior tightness from the data jointly with the coefficients. The BVAR has become the default forecasting tool at central banks worldwide -- not because it always wins point-forecast accuracy contests, but because it consistently avoids the catastrophic forecast failures that plague unrestricted VARs in moderate-to-large systems.

How the Parts Fit Together

Inputs are identical to a standard VAR: an n x T matrix of endogenous variables y_t observed at uniform frequency, a chosen lag order p, and a deterministic component (intercept, trend, or neither). The additional ingredient is the prior specification: a prior mean vector b_0 for the stacked coefficient vector beta, and a prior precision matrix (or its parameterization through hyperparameters). Under the Minnesota prior, the user sets overall tightness lambda, cross-variable shrinkage lambda_cross, lag decay d (usually 2), and optionally a sum-of-coefficients or dummy-initial-observation prior to encode beliefs about unit roots and cointegration.

Estimation under the natural conjugate prior (normal-inverse-Wishart) is analytic. The posterior for the coefficient matrix B conditional on the innovation covariance Sigma is matrix-normal; the marginal posterior for Sigma is inverse-Wishart. No MCMC is needed. The posterior mean for B is a matrix-weighted average of the OLS estimate and the prior mean, where the weight on the prior increases with the prior precision relative to the data precision. With a non-conjugate prior (asymmetric shrinkage, stochastic volatility, or heavy-tailed innovations), Gibbs sampling or Hamiltonian Monte Carlo replaces the closed-form solution.

Forecasting uses the posterior predictive distribution. For conjugate BVARs the h-step-ahead forecast distribution is a matrix-t with known location, scale, and degrees of freedom -- no simulation needed for point forecasts and intervals. For non-conjugate BVARs, forecasts are generated by iterating the VAR forward for each posterior draw and collecting the empirical distribution across draws. Structural identification (Cholesky, sign restrictions, proxy instruments) applies to BVARs exactly as it does to frequentist VARs, but posterior uncertainty about the reduced-form parameters propagates naturally into the identified impulse responses.

Applications

The European Central Bank maintains a suite of BVARs as part of its forecasting infrastructure. The Giannone-Lenza-Primiceri (2015) specification with marginal-likelihood-optimized hyperparameters is the baseline. Variables include real GDP growth, HICP inflation, the short-term interest rate, unemployment, credit growth, and the exchange rate. Staff produce density forecasts at horizons of 1-8 quarters and compare BVAR output against the DSGE model (NAWM II) and the broad macroeconomic projection exercise. BVARs consistently rank first or second in real-time forecast accuracy evaluations for euro area GDP and inflation.

Large BVARs (n = 20-130 variables) have become feasible through global-local shrinkage priors. Banbura, Giannone, and Reichlin (2010) showed that a BVAR with 130 variables and appropriate Minnesota shrinkage matches or beats factor models for U.S. macroeconomic forecasting. The Federal Reserve Bank of New York's nowcasting model incorporates BVAR components. The Bank of Canada's LENS model (Large Empirical and Semi-structural) uses a BVAR core with 40+ variables. These large systems offer a direct alternative to dynamic factor models when the user wants structural interpretability alongside forecast accuracy.

Structural identification in BVARs follows the same logic as in frequentist VARs -- Cholesky, sign restrictions, narrative identification, proxy instruments -- but with a key advantage: parameter uncertainty propagates into the identified impulse responses through the posterior. In a frequentist SVAR, confidence bands around IRFs are computed by delta method or bootstrap, both of which have known coverage problems in small samples. In a BVAR, credible bands from the posterior are typically better calibrated. Rubio-Ramirez, Waggoner, and Zha (2010) provide the algorithmic machinery for sign-restricted identification in BVARs.

BVARs break down when the assumption of linearity and time-invariant parameters is seriously violated. Recessions involve nonlinear dynamics -- credit crunches, zero lower bound constraints, financial accelerator effects -- that a linear BVAR cannot represent. TVP-BVARs partially address this but at enormous computational cost and with new identification challenges. For systems with more than ~150 variables, even efficient shrinkage priors struggle with the O(n^2) scaling of the covariance matrix. Factor-augmented VARs or penalized reduced-rank regression handle ultra-high-dimensional panels more gracefully.

Literature and Extensions

Key Papers

Litterman (1986) 'Forecasting with Bayesian Vector Autoregressions -- Five Years of Experience': established the Minnesota prior and demonstrated consistent out-of-sample forecast improvements over unrestricted VARs for U.S. macro variables.
Doan, Litterman, and Sims (1984): introduced the sum-of-coefficients prior to handle unit roots in BVARs without differencing, preserving cointegrating relationships.
Giannone, Lenza, and Primiceri (2015) 'Prior Selection for Vector Autoregressions': closed the loop on hyperparameter choice by maximizing marginal likelihood, making the BVAR fully automatic.
Banbura, Giannone, and Reichlin (2010) 'Large Bayesian Vector Auto Regressions': showed that BVARs scale to 130+ variables with appropriate shrinkage, matching factor model forecast accuracy.
Primiceri (2005) 'Time Varying Structural Vector Autoregressions and Monetary Policy': introduced the TVP-BVAR with stochastic volatility, the standard tool for studying evolving macro dynamics.

Named Variants

Minnesota prior BVAR: the original Litterman specification with diagonal prior covariance, random-walk prior mean, and quadratic lag decay.
TVP-BVAR: time-varying parameter BVAR (Primiceri 2005) where coefficients follow random walks and the covariance matrix has stochastic volatility. Estimated by MCMC.
Large BVAR: systems with 20-130+ variables using tighter shrinkage (Banbura et al. 2010) or global-local priors (horseshoe, Dirichlet-Laplace).
BVAR with stochastic volatility: constant coefficients but time-varying Sigma_t. Lighter than full TVP-BVAR; captures the Great Moderation without changing the propagation mechanism.
Hierarchical BVAR: hyperparameters (lambda, lambda_cross) are given their own priors and sampled jointly with the model parameters, fully integrating over prior uncertainty.

Open Questions

How should BVARs handle the zero lower bound? When the policy rate is constrained at zero, the linear VAR dynamics misrepresent the monetary transmission mechanism. Shadow rate BVARs and censored-normal approaches exist but lack consensus.
Optimal shrinkage in ultra-large systems (n > 200): global-local priors, factor structure priors, and reduced-rank approaches compete without a clear winner. Computational scaling remains the binding constraint.
Prior specification for emerging economies: the Minnesota random-walk prior reflects U.S. macro dynamics. For countries with structural breaks, high inflation regimes, or short samples, the appropriate prior family is an open research question.

Components

\boldsymbol{\beta}

Stacked coefficient vector

The vec(B') of all n^2*p + n parameters in the reduced-form VAR, treated as a random vector with a prior distribution.

\lambda

Overall shrinkage hyperparameter

Controls how tightly the posterior is pulled toward the prior mean. Small lambda means strong shrinkage; large lambda lets the data dominate.

\lambda_{\text{cross}}

Cross-variable shrinkage

Relative scaling of prior variance on cross-variable coefficients versus own-variable coefficients. Values below 1 penalize cross-variable lags more heavily.

\boldsymbol{\Sigma}

Innovation covariance matrix

Same n x n positive-definite matrix as in the frequentist VAR. Under the conjugate prior, its marginal posterior is inverse-Wishart.

\mathbf{b}_0

Prior mean vector

The prior expectation of beta. Under the Minnesota prior: 1 for own first lag, 0 elsewhere. Encodes the unit-root random-walk belief.

\mathbf{V}_0

Prior covariance matrix

The prior covariance of beta. Diagonal under Minnesota, with entries shrinking at rate 1/l^d for lag l and scaled by lambda, lambda_cross, and residual variances from univariate AR(p) pre-regressions.

d

Lag decay exponent

Controls how fast the prior variance shrinks with lag distance. d = 2 (quadratic decay) is the Minnesota default. Larger d penalizes remote lags more aggressively.

p(\mathbf{Y} | \lambda)

Marginal likelihood

The marginal data density integrated over the coefficient and covariance parameters. Used to select hyperparameters by maximizing p(Y|lambda) over a grid.

Assumptions

Covariance stationarity (or known nonstationarity treatment)Testable

Eigenvalues of the companion matrix lie inside the unit circle, or unit-root behavior is absorbed by the prior (sum-of-coefficients dummy, dummy-initial-observation prior). Stationarity is not strictly required for Bayesian estimation, but the prior must be coherent with the data-generating process.

If violated: If variables are I(1) and the prior assumes stationarity, the posterior can concentrate on explosive parameter regions. The sum-of-coefficients prior of Doan, Litterman, and Sims (1984) handles this by shrinking the sum of lag coefficients toward 1.

Correct prior familyMaintained

The prior distribution is a reasonable representation of beliefs about the parameter space. For the Minnesota prior: each variable is close to a random walk, cross-variable effects are smaller than own effects, and distant lags matter less than recent ones.

If violated: A badly misspecified prior can dominate the posterior in small samples. Symptoms: marginal likelihood that is flat or multimodal in the hyperparameter space, or posterior predictive checks that fail systematically for a subset of variables.

LinearityTestable

The conditional mean of y_t given past information is linear in the lagged values. No regime switches, threshold effects, or asymmetric responses.

If violated: Linear BVARs miss recession-era dynamics where responses are asymmetric. Threshold BVAR or Markov-switching BVAR extends the framework at the cost of much heavier computation.

Gaussian innovations (for conjugate inference)Testable

u_t ~ N(0, Sigma). Required for the normal-inverse-Wishart conjugate posterior to be exact. Not required for MCMC-based BVARs with flexible error distributions.

If violated: Heavy tails or skewness in the innovations invalidate conjugate credible intervals. Point forecasts remain reasonable but interval coverage deteriorates. Stochastic volatility or t-distributed errors are the standard fix.

No structural breaksTestable

The coefficient matrix B and the covariance matrix Sigma are constant over the sample period.

If violated: Time-varying parameter BVARs (Primiceri 2005) allow B_t and Sigma_t to drift as random walks. Constant-parameter BVARs estimated over samples spanning the Great Moderation and post-2008 period systematically overstate volatility in the later subsample.

Correct lag orderTestable

The chosen lag order p is sufficient to capture the relevant dynamics. Underfitting omits important information; overfitting increases posterior uncertainty without improving forecasts.

If violated: Marginal likelihood comparison across p values is the Bayesian approach to lag selection. Selecting p by AIC/BIC and then conditioning on it ignores model uncertainty -- Bayesian model averaging over p is the principled solution.

Data-Driven Models

Loading Data-Driven Models