Data-Driven Models
Data-Driven Models
Empirical forecasting models · Model guide
State-space decomposition that separates a macro series into latent trend, cycle, seasonal, and irregular components.
How do you decompose a macroeconomic time series into trend, cycle, seasonal, and irregular components when none of them are directly observed?
Andrew Harvey introduced the structural time-series model in a 1985 Journal of Forecasting paper and gave it a full textbook treatment in 'Forecasting, Structural Time Series Models and the Kalman Filter' (1989). The central idea: every component of a time series - trend, cycle, seasonal, irregular - gets its own explicit stochastic law of motion, rather than being recovered as a residual from an ad-hoc filter. Harvey's framework unified earlier work by Nerlove, Grether, and Carvalho (1979) on signal extraction with the state-space machinery of Kalman (1960). The result was a model-based alternative to mechanical filters like X-11, the Hodrick-Prescott filter, and Census methods that had dominated decomposition practice since the 1960s.
The UCM works by writing the observed series as a sum of latent components: mu_t + psi_t + gamma_t + epsilon_t, where mu_t is trend, psi_t is cycle, gamma_t is seasonal, and epsilon_t is the irregular (white noise measurement error). Each component follows its own transition equation. The local linear trend has a stochastic level and slope. The cycle is a damped stochastic sinusoid with frequency lambda_c. The seasonal uses trigonometric terms at the fundamental and harmonic frequencies. Because every component has a disturbance, the model nests deterministic components as special cases (set the relevant disturbance variance to zero). Estimation stacks all components into a single state vector and runs the Kalman filter to evaluate the innovation (prediction-error) likelihood.
Central banks and statistical agencies adopted UCMs for two reasons. First, the decomposition is interpretable: each component maps to a concept economists actually discuss (underlying trend growth, business-cycle position, seasonal adjustment). Second, the framework handles missing data, mixed frequencies, and multiple series naturally through the state-space representation. The Bank of England's output-gap estimates, the ECB's trend-inflation measures, and Statistics Netherlands' seasonal adjustment system (SEATS/TRAMO via Gomez and Maravall) all rest on UCM principles. The U.S. Congressional Budget Office uses a variant to estimate potential GDP.
The model has a close relationship to ARIMA models through a result Harvey called the 'reduced form.' Every UCM implies a specific ARIMA representation for , but the mapping is many-to-one: multiple UCM specifications can yield the same ARIMA process. The UCM adds structure by assigning economic meaning to each component. This structural interpretation comes at a cost: if the component specification is wrong (e.g., the cycle is not well-described by a single damped sinusoid), the decomposition inherits the misspecification. Model selection typically involves comparing nested UCMs via likelihood-ratio tests or information criteria, plus diagnostic checks on the standardized innovations.
The UCM requires a single observed series (or a vector of series in the multivariate case) and a specification of which components to include: local level or local linear trend, stochastic cycle, trigonometric seasonal, and irregular. Each component's disturbance variance is a free parameter. The cycle adds two parameters (damping factor rho and frequency lambda_c). The seasonal requires a choice of period s (e.g., s = 4 for quarterly, s = 12 for monthly). All parameters are collected into a vector theta and estimated by maximizing the innovation likelihood produced by the Kalman filter.
The state vector alpha_t stacks the internal states of all components. For a local linear trend, alpha_t includes the level mu_t and slope beta_t. The stochastic cycle contributes a two-dimensional state (psi_t, psi_t^*) that generates damped sinusoidal dynamics. Trigonometric seasonals add s - 1 state elements (or 2 * floor(s/2) if s is even and the Nyquist term is included). The irregular epsilon_t enters only through the measurement equation. The full state dimension can range from 2 (local level + irregular) to 15+ (local linear trend + cycle + monthly trigonometric seasonal).
The Kalman filter runs the prediction and update recursions on this stacked state-space system. At each step, the filter produces the one-step-ahead prediction error and its variance . The log-likelihood is L(theta) = -(T/2) log(2 pi) - (1/2) sum_t [log |' . Numerical optimization (typically quasi-Newton or EM) finds the theta that maximizes L. After estimation, the Kalman smoother (Rauch-Tung-Striebel backward pass) produces smoothed estimates of each component using the full sample, yielding the final decomposition.
Identification requires that the disturbance variances be non-negative and that the model not be over-parameterized relative to the data. A common pitfall: including both a stochastic slope and a stochastic cycle can create near-collinearity in the state-space, leading to flat likelihood surfaces. Practitioners often fix one component to be deterministic (e.g., fixed slope) and let the data determine whether the remaining variances are significantly different from zero.
The Bank of England estimates the UK output gap using a bivariate UCM where GDP and inflation are jointly decomposed into trend and cycle components. The Phillips-curve link between the inflation cycle and the output gap provides cross-equation identifying information. The European Central Bank's trend-inflation estimates follow a similar logic: a UCM decomposes HICP inflation into trend, energy-price cycle, and irregular, with the smoothed trend feeding into medium-term inflation projections.
Statistical agencies use UCM-based seasonal adjustment as an alternative to X-13ARIMA-SEATS. The SEATS method (Gomez and Maravall 1996) is essentially a signal-extraction procedure derived from the UCM representation of an ARIMA model. Statistics Netherlands and Eurostat have adopted TRAMO-SEATS for official seasonal adjustment of employment, industrial production, and GDP series. The advantage over X-11-type methods: the seasonal pattern is allowed to evolve stochastically, and the decomposition comes with model-based standard errors.
The UCM breaks down when the data do not support the assumed component structure. If the series is too short (T < 40 for quarterly data), the likelihood surface is flat and component variances are poorly identified. If the true data-generating process involves regime switches (recessions versus expansions), the UCM's fixed-parameter cycle cannot capture the asymmetric dynamics. If multiple cycles operate at different frequencies (e.g., business cycle plus financial cycle), a single-cycle UCM conflates them. In each case, extensions exist (Markov-switching UCM, multi-cycle UCM) but at the cost of additional parameters and potential over-fitting.
Clark (1987) used the UCM to decompose U.S. real GDP into permanent (random walk with drift) and transitory (AR(2) cycle) components, providing an early model-based estimate of the output gap. This specification became a workhorse in central banking. Morley, Nelson, and Zivot (2003) showed that allowing correlated trend and cycle disturbances dramatically changes the decomposition, attributing most GDP variation to the permanent component - a finding that challenged the conventional view of large transitory business-cycle fluctuations.
Stochastic level plus slope: mu_t = mu_{t-1} + beta_{t-1} + eta_t, beta_t = beta_{t-1} + zeta_t. Captures the slowly-moving underlying trajectory of the series.
Damped sinusoid: the bivariate state (psi_t, psi_t*) rotates at frequency lambda_c with damping rho in (0,1). Generates quasi-periodic fluctuations with stochastic amplitude.
Sum of floor(s/2) stochastic harmonic pairs at frequencies 2*pi*j/s, j = 1, ..., floor(s/2). Each pair evolves as a rotation with its own disturbance variance.
White noise: epsilon_t ~ . Captures high-frequency variation not attributable to trend, cycle, or seasonal.
Variance of the shock to the trend level. When sigma_eta^2 = 0, the level evolves deterministically given the slope.
Variance of the shock to the trend slope. When sigma_zeta^2 = 0, the slope is constant and the trend is a random walk with drift.
rho in (0,1) is the damping factor (persistence of cycle amplitude). lambda_c in (0, pi) is the cycle frequency in radians, implying period 2*pi/lambda_c.
The observed series is the sum of latent components: mu_t + psi_t + gamma_t + epsilon_t. No interaction terms.
If violated: Multiplicative or nonlinear component interactions (common in series with level-dependent volatility) require log-transformation or a multiplicative UCM variant.
All component disturbances (eta_t, zeta_t, kappa_t, kappa_t*, omega_jt, epsilon_t) are mutually independent Gaussian white noise.
If violated: Non-Gaussianity does not bias the point estimates (the Kalman filter remains best linear), but confidence intervals, likelihood-based inference, and model selection are unreliable. Use robust standard errors or bootstrap.
... = 0 for all component pairs.
If violated: Correlated disturbances make components unidentifiable without additional restrictions. The decomposition becomes sensitive to which correlation structure is imposed.
The number and type of components (trend order, cycle count, seasonal period) correctly describe the data-generating process.
If violated: Omitting a genuine component (e.g., leaving out the cycle) forces its dynamics into the trend or irregular, distorting the decomposition. Innovation diagnostics and information criteria help detect this.
Disturbance variances, damping factor rho, and cycle frequency lambda_c are constant over the sample.
If violated: Parameter instability (e.g., a shift in trend volatility after the Great Moderation) produces a decomposition that averages across regimes. Use a TVP extension or split-sample estimation.
The damping factor satisfies 0 < rho < 1, ensuring the cycle component is stationary.
If violated: If rho = 1, the cycle becomes a unit-root process indistinguishable from the trend. The optimizer may push rho toward the boundary, signaling that trend and cycle are not separately identified.
Continue reading
Open the concept, data series, policy setting, or neighboring model that anchors this page.