Skip to main content
Macro by Mark
  • Home
  • News
  • Calendar
  • Indicators
  • Macro
  • About
Sign inSign up
Macro by Mark

Global Economic Data, Empirical Models, and Macro Theory
All in One Workspace

Public data from government agencies and multilateral statistical releases, anchored in official sources

© 2026 Mark Jayson Nation

Product

  • Home
  • Indicators
  • News
  • Calendar

Macro

  • Overview
  • Models
  • Labs
  • Glossary

Learn

  • Concepts
  • Models
  • Schools
  • History
  • Docs

Account

  • Create account
  • Sign in
  • Pricing
  • Contact
AboutPrivacy PolicyTerms of ServiceTrust and securityEthics and Compliance

Data-Driven Models

Loading Data-Driven Models

Macro by Mark

Unlock Full Macro Model Library with Starter.

This feature is exclusively available to Starter, Research, and Pro. Upgrade when you need this workflow, review pricing, or send a question before changing plans.

Upgrade to StarterView pricingQuestions?Already subscribed? Sign in

What you keep on Free

  • Create and edit one custom board
  • Use up to 3 widgets on each Free board
  • Browse indicators and calendar
← ModelsOverviewHistoryConceptsModelsSchools

Dynamic factor model
Model

Extracts a small number of latent factors from many macro indicators and lets those factors drive a current-quarter read.

How do you extract the common business-cycle signal from a large panel of economic indicators?

Background

Macroeconomic datasets contain hundreds of time series -- output measures, employment, prices, financial indicators, surveys, trade flows -- that all respond to a small number of underlying shocks: aggregate demand, aggregate supply, monetary policy, global risk appetite. The dynamic factor model (DFM) formalizes this observation. A small number of latent factors (typically 1-5) drive the co-movement across the entire panel. Each observed variable is a linear combination of these common factors plus an idiosyncratic (variable-specific) component. The factors are unobserved; they are inferred from the cross-sectional covariance structure of the data.

Geweke (1977) and Sargent and Sims (1977) introduced the static factor model to macroeconomics. Stock and Watson (1989, 2002a, 2002b) developed the modern dynamic factor framework used in applied work today. Their key innovation: when the cross-section n is large (n > 20), the factors can be consistently estimated by principal components of the data matrix, even without specifying the factor dynamics or the idiosyncratic error structure. This 'large n, large T' asymptotic framework made factor models practical for panels of 100-200 variables without requiring full maximum likelihood estimation.

The state-space representation (Forni, Hallin, Lippi, Reichlin 2000; Doz, Giannone, Reichlin 2011) casts the DFM as a Kalman filter problem: factors follow a VAR in the state equation, and observed variables are noisy linear projections of the factors in the observation equation. The Kalman filter handles missing data, mixed frequencies, and ragged edges naturally. The New York Fed's nowcasting model and the ECB's real-time factor model both use this state-space approach.

DFMs dominate macroeconomic forecasting competitions. Stock and Watson (2002b) showed that factor-augmented regressions beat univariate benchmarks for 215 U.S. macro series at horizons of 1-24 months. The factor-augmented VAR (FAVAR, Bernanke, Boivin, Eliasz 2005) combines extracted factors with policy variables in a VAR to identify monetary policy effects in a data-rich environment. Central banks worldwide -- the Fed, ECB, Bank of Canada, Reserve Bank of Australia -- maintain DFM-based forecasting and monitoring systems.

How the Parts Fit Together

The input is an n x T panel of stationary time series, typically standardized to zero mean and unit variance. The panel contains variables from multiple economic categories: output/production, employment/labor, prices/wages, interest rates, exchange rates, money/credit, surveys, trade. The cross-section dimension n ranges from 20 (small DFM) to 200+ (large DFM). Each variable is classified by category and transformation (level, first difference, log difference) to ensure stationarity.

The model separates each variable into two components: a common component driven by r latent factors, and an idiosyncratic component specific to that variable. The common factors follow a VAR(p) in the state equation: f_t = A_1 f_{t-1} + ... + A_p f_{t-p} + eta_t. The observation equation links each variable to the factors: x_{i,t} = lambda_i' f_t + e_{i,t}, where lambda_i is the factor loading for variable i. The idiosyncratic errors e_{i,t} can be serially correlated (dynamic idiosyncratic) or white noise (exact factor model).

Estimation follows one of two paths. (A) Principal components: compute the eigendecomposition of the n x n sample covariance matrix of the data; the first r eigenvectors (scaled) are the estimated factor loadings lambda_hat, and the estimated factors are f_hat = X' lambda_hat (or equivalently, the first r principal components of the T x n data matrix). (B) State-space/Kalman filter: specify the full state-space model (state equation + observation equation), estimate parameters by EM algorithm or quasi-maximum likelihood, and extract factors by the Kalman smoother. Path (A) is fast and requires no distributional assumptions; path (B) handles missing data and mixed frequencies but requires iterative estimation.

Applications

The New York Fed's Weekly Economic Index (WEI) and its predecessor, the dynamic factor model used for nowcasting (Bok et al. 2018), extract common factors from a panel of 37 monthly and quarterly indicators covering output, labor, housing, consumption, surveys, and financial conditions. The model operates in state-space form with the Kalman filter, handling mixed frequencies (monthly and quarterly data) and ragged edges (different publication dates) within a unified framework. The extracted factor serves as a coincident economic index and a direct input to the GDP nowcast.

The FAVAR (Bernanke, Boivin, Eliasz 2005) is the dominant tool for studying monetary policy transmission in a data-rich environment. The idea: extract factors from a large panel (100+ macro and financial variables), then include the extracted factors alongside the federal funds rate in a VAR. Structural identification (Cholesky, sign restrictions) applied to the VAR yields impulse responses of every variable in the panel to a monetary policy shock. The FAVAR avoids the dimensionality problem of including all 100+ variables directly in the VAR.

The ECB's area-wide model and the Eurosystem's real-time forecasting infrastructure use DFMs to handle the heterogeneous data availability across euro area member states. Some countries publish GDP estimates earlier than others; some indicators are monthly in one country and quarterly in another. The state-space DFM imputes missing values, aggregates across countries, and produces a euro area factor that tracks aggregate economic activity. The Banbura and Modugno (2014) implementation is the methodological backbone.

DFMs struggle when the data-generating process changes structurally: the COVID-19 pandemic produced factor estimates that were orders of magnitude larger than any historical observation, rendering the Gaussian state-space model's forecast intervals meaningless. The linear factor structure also misses sector-specific dynamics: a shock concentrated in one sector (e.g., oil prices affecting energy production) may not be captured by factors estimated from the full panel. Factor-augmented models with sector-specific factors (Kose, Otrok, Whiteman 2003) partially address this at the cost of additional complexity.

Literature and Extensions

Key Papers

  • Stock and Watson (2002a) 'Forecasting Using Principal Components from a Large Number of Predictors': demonstrated that principal component factors from a large panel improve forecasts of individual macro variables across the board.
  • Stock and Watson (2002b) 'Macroeconomic Forecasting Using Diffusion Indexes': the companion paper providing the theoretical framework for large-n factor-augmented forecasting.
  • Bernanke, Boivin, and Eliasz (2005) 'Measuring the Effects of Monetary Policy: A Factor-Augmented Vector Autoregressive (FAVAR) Approach': combined factor extraction with structural VAR identification.
  • Bai and Ng (2002) 'Determining the Number of Factors in Approximate Factor Models': proposed the IC_p1, IC_p2, IC_p3 criteria for selecting the number of factors.
  • Doz, Giannone, and Reichlin (2011) 'A Two-Step Estimator for Large Approximate Dynamic Factor Models Based on Kalman Filtering': provided the state-space estimation framework used by the ECB and NY Fed.

Named Variants

  • Static factor model: factors are extracted from the contemporaneous cross-sectional covariance, with no dynamics specified. Estimated by principal components. The simplest version.
  • Dynamic factor model (state-space): factors follow a VAR in the state equation. Estimated by Kalman filter + EM algorithm or quasi-MLE. Handles missing data and mixed frequencies.
  • FAVAR (factor-augmented VAR): extracted factors plus policy variables in a VAR. Used for structural identification of policy shocks in a data-rich environment.
  • Generalized dynamic factor model (Forni, Hallin, Lippi, Reichlin 2000): uses the spectral density matrix to separate common and idiosyncratic components. More general than the state-space version but harder to implement.
  • Time-varying factor model: factor loadings lambda_i(t) or factor dynamics A_j(t) change over time. Captures structural breaks and evolving economic relationships at heavy computational cost.

Open Questions

  • How many factors drive the macroeconomy? The Bai-Ng criteria typically select 2-6 for large U.S. panels. But the answer depends on the panel composition and variable transformations. There is no consensus on whether 'the' correct number is 2, 4, or 6.
  • Structural identification of factor shocks: the latent factors are identified only up to rotation (PCA identifies them up to an orthogonal rotation). Giving factors economic names (demand shock, supply shock, financial shock) requires additional restrictions. Different rotation schemes produce different structural interpretations.
  • Weak factors: some economically important factors (e.g., a housing-sector factor) load on only a small subset of variables. Standard PCA-based methods estimate them poorly because they do not produce diverging eigenvalues. Targeted factor estimation (Bai and Ng 2021) addresses this but is not yet standard.

Components

ft\mathbf{f}_tft​Latent factor vector

The r x 1 vector of unobserved common factors at time t. These capture the co-movement across the n observed variables.

Λ\boldsymbol{\Lambda}ΛFactor loading matrix

The n x r matrix of factor loadings. Row i (lambda_i') measures how variable i loads on each of the r factors.

et\mathbf{e}_tet​Idiosyncratic error vector

The n x 1 vector of variable-specific errors. E[e_t f_t'] = 0: idiosyncratic and common components are orthogonal.

rrrNumber of factors

The dimension of the latent factor space. Selected by information criteria (Bai and Ng 2002) or scree plot of eigenvalues.

Aj\mathbf{A}_jAj​Factor VAR coefficient matrices

The r x r matrices governing the dynamics of the factors in the state equation. Capture the persistence and cross-dynamics of the common factors.

ηt\boldsymbol{\eta}_tηt​Factor innovation vector

The r x 1 vector of shocks to the factors. E[eta_t eta_t'] = Q. These are the primitive common shocks before rotation/identification.

R\mathbf{R}RIdiosyncratic covariance matrix

The n x n covariance matrix of e_t. Diagonal in the exact factor model; block-diagonal or sparse in the approximate factor model.

Assumptions

Approximate factor structureTestable

As n grows, the r largest eigenvalues of the n x n covariance matrix diverge (they grow with n), while all remaining eigenvalues stay bounded. The common component explains a non-vanishing fraction of the total variance in the cross-section.

If violated: If the factor structure is weak (eigenvalues grow slowly), principal component estimates of the factors are inconsistent. A weak factor loads on only a subset of variables; it can be missed by methods that assume pervasive factors.

StationarityTestable

All variables in the panel are covariance stationary after appropriate transformations (differencing, log transformation). The factor dynamics are stable (companion matrix eigenvalues inside the unit circle).

If violated: Non-stationary variables create spurious factor estimates: unit-root-driven trends dominate the first principal component regardless of economic content.

Limited cross-sectional dependence in idiosyncratic errorsTestable

The idiosyncratic errors e_{i,t} are weakly cross-sectionally correlated. Strong clustering (e.g., all energy variables have correlated idiosyncratic shocks) violates the approximate factor model assumptions.

If violated: Strong idiosyncratic cross-correlation inflates the eigenvalues of the covariance matrix, leading the number-of-factors criteria (Bai-Ng) to overestimate r.

Correct number of factorsTestable

The number of common factors r is correctly specified. Bai-Ng IC criteria, the Onatski (2010) test, or the Ahn-Horenstein (2013) eigenvalue ratio test provide data-driven guidance.

If violated: Too few factors: the omitted factor's variance is absorbed into the idiosyncratic errors, creating cross-correlated residuals and biased factor-augmented regressions. Too many factors: noise is elevated to factor status, adding estimation variance without improving the common component.

Linearity of the factor structureMaintained

Each variable is a linear function of the common factors. No threshold effects, no nonlinear factor loadings.

If violated: Nonlinear dependence on the factors (e.g., financial variables respond asymmetrically to positive vs. negative factor realizations) is not captured. Nonlinear factor models exist but are computationally demanding and rarely used in macro practice.

Balanced panel or correctly handled missing dataTestable

The panel is either balanced (no missing observations) or missing data is handled by the EM algorithm / Kalman filter. Dropping variables with missing data instead of imputing can bias factor estimates if missingness is non-random.

If violated: Listwise deletion of variables with missing data reduces the effective cross-section and can eliminate informative series. The EM algorithm iteratively imputes missing values using the current factor estimates, then re-estimates factors.