Macro by Mark

Unlock Full Macro Model Library with Starter.

This feature is exclusively available to Starter, Research, and Pro. Upgrade when you need this workflow, review pricing, or send a question before changing plans.

Upgrade to Starter View pricing Questions?Already subscribed? Sign in

What you keep on Free

Create and edit one custom board
Use up to 3 widgets on each Free board
Browse indicators and calendar

← Models Overview History Concepts Models Schools

Mixed-data sampling regressions that handle indicators arriving at different frequencies inside a single forecasting equation.

How do you use high-frequency data to predict a low-frequency target without first aggregating to the lower frequency?

Background

Bridge equations aggregate monthly indicators to the quarterly level before regression, discarding within-quarter timing information. If month-1 industrial production is a stronger predictor of quarterly GDP than month-3, the bridge equation cannot detect this because it averages all three months. Ghysels, Santa-Clara, and Valkanov (2004, 2006) introduced Mixed-Data Sampling (MIDAS) regression to solve exactly this problem. MIDAS regresses the low-frequency target (quarterly GDP) directly on high-frequency lags (monthly indicators), keeping the original monthly observations intact. A parsimonious weighting function constrains the monthly lag coefficients so the model does not explode in parameters.

The core idea: instead of k free coefficients on k monthly lags, MIDAS parameterizes the lag profile with a smooth weighting function controlled by 2-3 hyperparameters. The exponential Almon lag polynomial is the most popular: w(k; theta) = exp(theta_1 * k + theta_2 * k^2) / sum(exp(...)). This function can represent declining weights (recent months matter more), hump-shaped weights (middle months most informative), or flat weights (equal weighting, which collapses to bridge equation aggregation). The weighting shape is estimated from the data, not imposed a priori.

Foroni, Marcellino, and Schumacher (2015) introduced Unrestricted MIDAS (U-MIDAS), which drops the weighting function and estimates each monthly lag coefficient freely by OLS. U-MIDAS is valid when the number of high-frequency lags is small relative to the sample size (e.g., 3 monthly lags for quarterly prediction). When the frequency mismatch is large (daily to monthly, for instance), the restricted MIDAS with a parameterized weighting function remains necessary.

MIDAS has been extended in multiple directions. MIDAS-AR adds autoregressive lags of the low-frequency target. Markov-switching MIDAS (Guerin and Marcellino 2013) allows regime-dependent weights. Factor MIDAS combines factor extraction from a large monthly panel with MIDAS weighting for the quarterly target. Bayesian MIDAS (Rodriguez and Puggioni 2010) places priors on the weighting function parameters. The Federal Reserve Bank of Atlanta's GDPNow model uses MIDAS-type specifications for several GDP subcomponents.

How the Parts Fit Together

Inputs come at two frequencies. The low-frequency target y_t^L (quarterly GDP growth, indexed at the quarterly level) and the high-frequency predictor x_tau^H (monthly industrial production, indexed at the monthly level). The frequency ratio m links the two: for monthly-to-quarterly, m = 3. Each quarterly observation y_t^L corresponds to m = 3 monthly observations x_{tm}^H, x_{tm-1}^H, x_{tm-2}^H within the quarter, plus additional monthly lags from prior quarters.

The model regresses the quarterly target on a weighted sum of monthly lags: y_t^L = alpha + beta * sum_{k=0}^{K} w(k; theta) * x_{tm-k}^H + epsilon_t. The weighting function w(k; theta) assigns different importance to different monthly lags. K is the total number of high-frequency lags included (e.g., K = 11 means 4 quarters of monthly lags). The key parameter is theta = (theta_1, theta_2), which controls the shape of the weight function. The coefficient beta scales the overall effect.

Estimation is nonlinear least squares (NLS) when the weighting function is nonlinear in theta (exponential Almon, beta polynomial). The objective function is the sum of squared residuals with respect to (alpha, beta, theta). NLS requires starting values; a grid search over theta followed by numerical optimization is standard. For U-MIDAS (no weighting function), estimation is OLS. Model selection involves choosing the number of high-frequency lags K, the weighting function family, and whether to include autoregressive terms.

Applications

The Federal Reserve Bank of Atlanta's GDPNow tracker uses MIDAS-type specifications for several GDP subcomponents. For example, monthly personal income and expenditure data feeds a MIDAS regression targeting quarterly personal consumption expenditure growth. The exponential Almon weighting function determines whether early-month or late-month spending data is more informative. The model updates its nowcast each time a monthly data release occurs, leveraging the within-quarter timing that bridge equations discard.

Financial applications use daily-to-monthly MIDAS to forecast monthly volatility from daily returns. Ghysels, Santa-Clara, and Valkanov (2006) showed that MIDAS volatility forecasts using daily realized variance outperform GARCH models at monthly horizons. The frequency ratio is m = 22 (trading days per month), and the weighting function determines how quickly past daily information decays. This is the original application that motivated the MIDAS framework.

The OECD's composite leading indicators use a MIDAS-inspired approach to combine monthly financial and survey indicators with quarterly real activity. Schumacher and Breitung (2008) applied factor MIDAS to German GDP nowcasting, extracting monthly factors from a panel of 100+ indicators and passing them through a MIDAS regression targeting quarterly GDP. The factor-MIDAS combination handles both the large-panel dimension reduction and the mixed-frequency aggregation.

MIDAS regressions fail when the weighting function is misspecified and the sample is too short to detect the misspecification. With K = 12 monthly lags and a 2-parameter weighting function, the NLS optimization may converge to local minima. Starting-value sensitivity is a known issue. The restricted MIDAS also struggles when the true lag profile is discontinuous or multi-modal -- situations where U-MIDAS with free coefficients is preferable if the data permits.

Literature and Extensions

Key Papers

Ghysels, Santa-Clara, and Valkanov (2004) 'The MIDAS Touch: Mixed Data Sampling Regression Models': the original working paper introducing the MIDAS framework.
Ghysels, Santa-Clara, and Valkanov (2006) 'Predicting Volatility: Getting the Most out of Return Data Sampled at Different Frequencies': applied MIDAS to volatility forecasting, demonstrating gains from daily-to-monthly frequency mismatch.
Foroni, Marcellino, and Schumacher (2015) 'Unrestricted Mixed Data Sampling (MIDAS): MIDAS Regressions with Unrestricted Lag Polynomials': introduced U-MIDAS for low frequency ratios where free coefficients are feasible.
Andreou, Ghysels, and Kourtellos (2013) 'Should Macroeconomic Forecasters Use Daily Financial Data and How?': systematic evaluation of daily-to-quarterly MIDAS for macro forecasting.
Clements and Galvao (2008) 'Macroeconomic Forecasting with Mixed-Frequency Data: Forecasting Output Growth in the United States': compared MIDAS against bridge equations and direct multi-step forecasting.

Named Variants

Exponential Almon MIDAS: the standard specification. Weights parameterized as w(k) = exp(theta_1*k + theta_2*k^2) / sum. Two parameters control the lag profile shape.
Beta polynomial MIDAS: weights based on the Beta distribution density. Can produce U-shaped or J-shaped lag profiles. Two parameters (a, b) with flexible shapes.
U-MIDAS (unrestricted): no weighting function. Each monthly lag gets its own OLS coefficient. Works when K is small relative to T.
MIDAS-AR: augments the MIDAS regression with autoregressive lags of the low-frequency target. Handles serial correlation in the forecast error.
Factor MIDAS: extract factors from a large monthly panel, then use factors as high-frequency predictors in a MIDAS regression targeting the quarterly variable.
Markov-switching MIDAS (Guerin and Marcellino 2013): the weighting function parameters switch between regimes, allowing recession vs. expansion dynamics.

Open Questions

Optimal choice of weighting function: exponential Almon and beta polynomial are the most common, but there is no theoretical basis for preferring one over the other. Data-driven model selection (AIC/BIC on the NLS objective) is the practical approach.
Real-time versus pseudo-real-time evaluation: most MIDAS evaluations use final-release data with timing mimicking real-time availability. True real-time vintage data may tell a different story, especially for heavily revised indicators.
MIDAS with many high-frequency predictors: extending MIDAS to k > 3-4 monthly predictors creates a high-dimensional NLS problem. Penalized MIDAS (LASSO-MIDAS) and factor MIDAS address this differently. No consensus on which dominates.

Components

y_t^L

Low-frequency target

The variable to forecast, observed at the lower frequency (quarterly). Indexed by t at the quarterly level.

x_\tau^H

High-frequency predictor

The predictor variable observed at the higher frequency (monthly, weekly, or daily). Indexed by tau at the high-frequency level.

m

Frequency ratio

Number of high-frequency observations per low-frequency period. m = 3 for monthly-to-quarterly; m = 60-65 for daily-to-quarterly.

w(k; \boldsymbol{\theta})

MIDAS weighting function

Parameterized function assigning weights to the k-th high-frequency lag. Normalized to sum to 1. Controls the shape of the lag profile.

\boldsymbol{\theta}

Weighting function parameters

The 2-3 parameters governing the shape of the lag weights. For exponential Almon: theta = (theta_1, theta_2). For beta polynomial: theta = (a, b).

K

Number of high-frequency lags

Total count of monthly (or daily) lags included. K = m*q includes q quarters of monthly lags.

\beta

MIDAS slope coefficient

Scales the overall effect of the weighted high-frequency lags on the low-frequency target.

Assumptions

Correct weighting function familyTestable

The exponential Almon, beta polynomial, or chosen functional form can approximate the true lag profile between the high-frequency predictor and the low-frequency target.

If violated: If the true lag profile is non-smooth (e.g., only month-2 matters, month-1 and month-3 do not), a smooth weighting function forces weight onto irrelevant lags. U-MIDAS avoids this but requires more data.

Stationarity of both seriesTestable

Both the low-frequency target and the high-frequency predictor are covariance stationary after appropriate transformations.

If violated: Nonstationary variables produce spurious MIDAS regressions. Cointegrating MIDAS exists (Miller 2014) but is not standard.

Temporal alignmentTestable

The mapping between high-frequency periods and low-frequency periods is correct. Each quarterly observation y_t^L corresponds to the correct set of monthly observations x_{tm}^H through x_{tm-K}^H.

If violated: Misalignment (off-by-one-month errors) shifts the estimated lag profile and produces biased weight estimates. Common with end-of-quarter versus within-quarter timing conventions.

Exogeneity of the high-frequency predictorTestable

The monthly predictor is predetermined relative to the quarterly target. No feedback from y_t^L to contemporaneous x_tau^H.

If violated: Reverse causality (GDP growth affects monthly indicators within the same quarter) biases the MIDAS coefficient. Instrumental variable MIDAS (Ghysels and Wright 2009) addresses this.

Stable relationship over timeTestable

The weighting function parameters theta and the slope beta are constant over the sample period.

If violated: Structural breaks in the relationship between the high-frequency predictor and the quarterly target cause the estimated weights to be a compromise between two regimes. Regime-switching MIDAS handles this.

Limited serial correlation in errorsTestable

The low-frequency regression errors epsilon_t are serially uncorrelated or at most weakly dependent.

If violated: Serial correlation in epsilon_t indicates omitted low-frequency dynamics. Adding autoregressive terms (MIDAS-AR) is the standard fix.

Data-Driven Models

Loading Data-Driven Models