Data-Driven Models
Data-Driven Models
Empirical forecasting models · Model guide
Time-varying-parameter VAR that lets transmission coefficients and shock effects drift across samples.
How do you estimate a VAR when the transmission mechanism itself changes over time - when the coefficients and the volatility of shocks are both drifting?
Constant-parameter VARs assume the economy's dynamic structure stayed the same across the entire sample. That assumption is defensible over short, stable windows. Over a 60-year US sample spanning the Great Inflation, the Volcker disinflation, the Great Moderation, and the Global Financial Crisis, it isn't. Primiceri (2005) proposed a VAR in which every coefficient and every element of the error covariance structure follow independent random walks, allowing both the transmission mechanism and the shock volatility to evolve smoothly over time. This model extends the VAR and Bayesian VAR, already documented in the VAR and BVAR pages of this platform.
The state-space structure is the backbone. The VAR coefficients, stacked into a vector \(\), are treated as latent states evolving via \(\). The error covariance matrix \(\) is factored as \( ('\), where \(\) is a lower-triangular matrix with ones on the diagonal (time-varying contemporaneous relationships) and \(H_t = \text{diag}(\) collects the stochastic volatilities. The log-volatilities \(\) follow independent random walks. Everything is Gaussian conditional on the states, so the Carter-Kohn (1994) forward-filtering, backward-sampling algorithm draws the entire state path in one block.
Central banks adopted TVP-VARs almost immediately after Primiceri's paper. Cogley and Sargent (2005) used a simpler variant (drifting coefficients, no time-varying contemporaneous relationships) to study changes in the persistence of US inflation. The Bank of England uses TVP-VARs to assess whether the monetary transmission mechanism changed after the 2008 financial crisis. The ECB estimates time-varying Phillips curves and output-gap uncertainty using the same framework. The Federal Reserve Bank of Minneapolis and the Federal Reserve Board both maintain TVP-VAR codebases for monitoring structural change in real time.
The computational cost is the main barrier. Each MCMC iteration involves multiple simulation-smoother passes - one for \(\), one for the free elements of \(\), and \(n\) separate passes for each log-volatility \(\). A 3-variable TVP-VAR with 4 lags on 200 quarterly observations has about 40 coefficients drifting at every time period, plus 3 log-volatilities and 3 off-diagonal elements of \(\). Running 50,000 MCMC draws takes minutes to hours depending on implementation. Koop and Korobilis (2013) proposed computationally lighter approximations using forgetting factors, but the full Primiceri MCMC remains the gold standard for inference.
The measurement equation is a VAR with time-varying coefficients: \(' \), where \(\) is the \(n \) vector of observables, \( (1, ', ')\) is the Kronecker-structured regressor block, \(\) stacks all VAR coefficients at time \(t\), and \(\). The factored error structure \(\) separates the contemporaneous interactions (in \(\)) from the idiosyncratic shock variances (in \(\)).
The transition equations are random walks. Coefficients: \(\), \((0, Q)\). Free elements of \(\): \(\), \((0, S)\). Log-volatilities: \(\), \((0, 1)\). The random-walk specification is parsimonious - it avoids estimating mean-reversion parameters - but implies that the coefficients can wander arbitrarily far from their initial values, which occasionally produces explosive draws that must be discarded.
Estimation proceeds via Gibbs sampling. The algorithm cycles through: (i) draw \(\) from \(p(\) using the Carter-Kohn simulation smoother; (ii) draw \(\) from \(p(\) using the same algorithm on the partially orthogonalized residuals; (iii) draw each \(\) from \(p(\) using the Kim, Shephard, and Chib (1998) mixture-of-normals approximation; (iv) draw the hyperparameters \(Q\), \(S\), \(\) from their conditional posteriors (inverse-Wishart or inverse-gamma). Convergence is monitored via trace plots and the Geweke (1992) diagnostic.
Primiceri (2005) estimated a 3-variable TVP-VAR (inflation, unemployment, interest rate) on US data from 1953-2001 and found that the monetary policy transmission mechanism changed substantially: the response of inflation to a monetary shock weakened after the Volcker disinflation, and the volatility of all three shocks declined during the Great Moderation. The Bank of England replicates this exercise with UK data to assess whether the inflation-targeting regime altered the Phillips curve slope. The ECB uses TVP-VARs to monitor whether the euro-area transmission mechanism fragmented during the sovereign debt crisis.
Time-varying impulse responses are the primary output. At each point in the sample, the researcher extracts a full set of IRFs, producing a three-dimensional object: variable × horizon × time. Comparing IRFs across decades reveals structural change directly. Cogley and Sargent (2005) used this to show that the persistence of US inflation declined after 1980 - the largest autoregressive root of the inflation equation fell from 0.95 in the 1970s to 0.8 in the 1990s.
TVP-VARs also serve as a flexible forecasting device. Because the parameters adapt to the most recent data, the model automatically downweights distant observations that may reflect an obsolete regime. D'Agostino, Gambetti, and Giannone (2013) showed that TVP-VARs produce superior out-of-sample forecasts of US inflation compared to constant-parameter VARs and random-walk benchmarks, especially around turning points.
The model breaks down when the sample is too short for the time variation to be identified, when the number of variables exceeds 4-5 (the parameter space explodes), and when the researcher needs structural identification beyond the Cholesky ordering that Primiceri used. Combining TVP-VARs with sign restrictions or proxy instruments is possible but doubles the computational cost. For large systems, the Koop and Korobilis (2013) forgetting-factor approximation or the Chan and Eisenstat (2018) equation-by-equation approach is necessary.
Stacked vector of all intercepts and lag coefficients at time \(t\). Dimension \(n(np+1) \) for a VAR(p) with \(n\) variables and an intercept.
Lower-triangular matrix with ones on the diagonal. The free elements \(\) below the diagonal capture time-varying contemporaneous relationships among the variables.
\(H_t = \text{diag}(\) where each \((\). The variance of each structural shock evolves independently over time.
Covariance matrix of the innovation \(\) to the coefficient random walk \(\). Controls how fast the VAR coefficients can drift.
\( ('\). The full time-varying covariance matrix of the reduced-form VAR residuals, combining contemporaneous interactions and volatility.
Variance of the random-walk innovation to \(\). Larger values allow the stochastic volatility to change more rapidly.
Coefficients \(\), contemporaneous parameters \(\), and log-volatilities \(\) all follow driftless random walks.
If violated: If the true process has mean-reverting parameters, the random-walk prior over-smooths fast changes and under-smooths slow ones. Forecasts of the parameters themselves are martingales, which may not match the actual dynamics of structural change.
The innovations to \(\), \(\), and \(\) are mutually uncorrelated. The hyperparameters \(Q\), \(S\), and \(\) are treated as constant.
If violated: Correlated state innovations would mean coefficient changes and volatility changes are simultaneous and linked. The block-diagonal assumption misses this co-movement but is needed for tractable Gibbs sampling.
Conditional on the states \(\), \(\), \(\), the reduced-form residuals are Gaussian.
If violated: Fat-tailed or skewed innovations make the Gaussian likelihood misspecified. The stochastic volatility component absorbs some of the excess kurtosis, but not all.
For each time period \(t\), the VAR coefficient matrix \(\) implies a stationary VAR - the companion matrix has all eigenvalues inside the unit circle.
If violated: Explosive draws occasionally occur when \(\) drifts into an unstable region. Standard practice discards these draws. If explosive draws are frequent, the prior on \(Q\) is too loose.
The initial state \(\) is calibrated from a training sample (typically the first 10 years of data), and the prior on \(Q\) is scaled to the OLS coefficient variance.
If violated: Poor initialization forces the sampler to spend many iterations burning in, and the early-sample estimates of time variation are unreliable.
The sample must be long enough to identify both the time variation and the level of the parameters. Primiceri (2005) used 1953Q1-2001Q3 (195 observations).
If violated: Short samples cannot distinguish genuine parameter drift from estimation noise. The posterior on \(Q\) collapses toward zero, and the model reverts to a constant-parameter VAR.
Continue reading
Open the concept, data series, policy setting, or neighboring model that anchors this page.