Macroeconomic model reference

Random forest Model

Bagged decision-tree ensemble for nonlinear macro prediction and feature-importance diagnostics.

Empirical forecasting models · Model guide

Random forest: question, structure, and use cases

Bagged decision-tree ensemble for nonlinear macro prediction and feature-importance diagnostics.

How do you capture nonlinear interactions among hundreds of macro predictors without specifying any functional form in advance?

Background

Leo Breiman introduced the random forest algorithm in a 2001 paper that unified two earlier ideas: bagging (bootstrap aggregating, Breiman 1996) and random subspace selection (Ho, 1998). The problem Breiman addressed was the high variance of individual decision trees. A single CART tree fit to macroeconomic data can produce wildly different splits depending on small perturbations of the training sample. Bagging reduces this variance by averaging predictions across many trees, each grown on a bootstrap resample of the data. Breiman's key addition was to decorrelate the trees further by restricting each split to a random subset of $m$ features drawn from the full set of $p$ predictors. This feature subsampling prevents a single dominant predictor from appearing at the root of every tree, forcing the ensemble to spread its explanatory weight across many variables. The result is a predictor whose variance decreases roughly as $1/B$ (where $B$ is the number of trees) when trees are uncorrelated, compared to a much slower rate for plain bagging where tree correlation limits the gains.

The core mechanism operates at two levels. At the tree level, each base learner partitions the predictor space into axis-aligned rectangles and fits a constant (regression) or plurality vote (classification) within each rectangle. Trees are grown deep, typically to terminal nodes containing 5 or fewer observations, so each tree is a high-variance, low-bias estimator. At the ensemble level, the forest averages these overfit trees, exploiting the fact that their errors are approximately uncorrelated thanks to both bootstrap resampling and random feature selection. The prediction for a new observation is the average (regression) or majority vote (classification) across all $B$ trees. Because each tree sees a different bootstrap sample and a different feature subset at each split, the ensemble captures a rich mosaic of interaction patterns that no single tree or linear model could represent.

Random forests have become standard tools in applied macroeconomics and central bank research. Goulet Coulombe et al. (2022) showed that random forests match or beat penalized linear models for US inflation and GDP growth forecasting when the predictor set exceeds 100 variables. The Bank of Canada uses random forests in its nowcasting toolkit alongside factor models and MIDAS regressions. The European Central Bank applies them for recession probability estimation, where nonlinear interactions between financial conditions and real activity matter. In finance, Gu, Kelly, and Xiu (2020) found random forests competitive with neural networks for cross-sectional stock return prediction. The Federal Reserve Bank of Philadelphia uses tree-based methods in its Survey of Professional Forecasters analysis to detect structural breaks in forecaster behavior.

Extensions of the base algorithm include extremely randomized trees (Geurts, Ernst, Wehenkel, 2006), which randomize split points as well as feature subsets; quantile regression forests (Meinshausen, 2006), which estimate conditional quantiles for density forecasting; and causal forests (Athey, Tibshirani, Wager, 2019), which estimate heterogeneous treatment effects in policy evaluation. Macroeconomic causal forests have been applied to estimate heterogeneous fiscal multipliers across country-time cells and to personalize monetary policy transmission estimates.

How the Parts Fit Together

Inputs to a random forest are the standard supervised learning setup: $n$ observations of a response variable $y$ (GDP growth, inflation rate, recession indicator) and $p$ predictor variables $X$ (financial conditions indices, labor market indicators, survey expectations, commodity prices, and their lags). Unlike penalized linear models, random forests do not require standardization or stationarity transformations, though differencing nonstationary series remains good practice. The algorithm handles mixed types (continuous, categorical, binary) natively because tree splits operate on order statistics rather than magnitudes. Missing values can be accommodated through surrogate splits, where the tree identifies a backup splitting variable whose partition most closely matches the primary split.

The model consists of $B$ independently grown decision trees $\{T_b\}_{b=1}^B$ . Each tree $T_b$ is constructed on a bootstrap sample $\mathcal{D}_b^*$ of size $n$ drawn with replacement from the training data. At each internal node of tree $T_b$ , the algorithm selects a random subset of $m$ features from the full $p$ predictors ( $m = \lfloor p/3 \rfloor$ for regression, $m = \lfloor \sqrt{p} \rfloor$ for classification by default), finds the best split among those $m$ features by minimizing the sum of squared residuals (regression) or Gini impurity (classification), and partitions the data accordingly. Trees are grown until each terminal node contains fewer than a minimum number of observations (typically $n_{\min} = 5$ ). No pruning is applied. The forest prediction is $\hat{f}(x) = B^{-1} \sum_{b=1}^B T_b(x)$ for regression.

Three structural outputs define the forest's diagnostic surface beyond point predictions. First, the out-of-bag (OOB) error: each observation $i$ is excluded from roughly $1/e \approx 37\%$ of the bootstrap samples, so its OOB prediction is the average across only those trees that did not train on it. The OOB error is an unbiased estimate of generalization error without requiring a separate validation set. Second, permutation importance: for each predictor $j$ , the OOB error is recomputed after randomly shuffling the values of $X_j$ across observations. The increase in OOB error measures how much the forest relies on predictor $j$ . Third, partial dependence plots: the marginal effect of one or two predictors on the prediction, averaged over the empirical distribution of the remaining predictors. These three outputs make random forests interpretable despite being nonparametric.

Applications

Goulet Coulombe et al. (2022) at the Bank of Canada conducted a large-scale forecasting comparison using 127 US macroeconomic predictors for GDP growth and inflation at horizons from 1 to 12 quarters. Random forests consistently matched or outperformed LASSO, ridge, and elastic net, with the largest gains at longer horizons where nonlinear interactions between financial conditions and real activity accumulate. The key advantage was the forest's ability to capture threshold effects (e.g., credit spreads above 200 basis points triggering recession dynamics) without requiring the analyst to specify the threshold location or the interaction terms in advance. The OOB error served as a reliable real-time model selection criterion, avoiding the need for separate validation sets that shorten already-limited macro time series.

The European Central Bank and the Federal Reserve use random forests for recession probability estimation and financial stress early warning systems. Recession prediction is a classification problem with severe class imbalance (recessions are rare), and random forests handle this through stratified bootstrap sampling that overrepresents recession quarters. Partial dependence plots from these models reveal that the yield curve slope, credit growth, and industrial production growth interact nonlinearly: the recession probability jumps sharply when the yield curve inverts while credit growth decelerates, but either signal alone produces only a modest increase. This interaction structure is invisible to logistic regression unless the analyst manually specifies the correct interaction terms.

Causal forests (Athey, Tibshirani, Wager, 2019) extend the random forest framework to estimate heterogeneous treatment effects. In macroeconomics, this has been applied to fiscal multiplier estimation: Cloyne, Jordà, and Taylor (2023) use forest-based methods to estimate how fiscal multipliers vary with the state of the business cycle, the level of government debt, and the monetary policy regime. The causal forest produces observation-level treatment effect estimates, allowing policymakers to assess whether a proposed fiscal stimulus would be more effective in the current macroeconomic configuration than the historical average multiplier suggests.

Random forests should not be used when extrapolation is required. If the forecast horizon involves predictor values outside the training range (e.g., inflation at 15% when the training data only covers 0-8%), the forest will revert to the nearest historical analog and miss the tail dynamics. They should also be avoided when the primary goal is structural interpretation rather than prediction. While permutation importance ranks predictors, it does not produce coefficients, elasticities, or marginal effects with standard errors. For structural macro questions (what is the slope of the Phillips curve?), penalized linear models or Bayesian VARs are more appropriate. See the elastic net reference at /models/empirical/elastic-net and the LASSO reference at /models/empirical/lasso for linear alternatives.

Components

T_b(x)

Individual decision tree

The $b$ -th base learner, a fully grown regression or classification tree trained on bootstrap sample $\mathcal{D}_b^*$ with random feature selection at each split.

\hat{f}(x)

Forest ensemble prediction

$\hat{f}(x) = B^{-1} \sum_{b=1}^B T_b(x)$ for regression. The average of all tree predictions at input $x$ .

m

Feature subsample size (mtry)

Number of randomly selected candidate features at each split. Controls the correlation between trees. Default: $\lfloor p/3 \rfloor$ for regression, $\lfloor \sqrt{p} \rfloor$ for classification.

\text{OOB}_i

Out-of-bag prediction

Prediction for observation $i$ using only trees whose bootstrap sample excluded $i$ : $\text{OOB}_i = |\mathcal{B}_i^c|^{-1} \sum_{b \in \mathcal{B}_i^c} T_b(x_i)$ , where $\mathcal{B}_i^c$ is the set of trees that did not sample $i$ .

\text{VIM}_j

Permutation variable importance

Increase in OOB error when the values of predictor $j$ are randomly permuted: $\text{VIM}_j = \text{OOB}_{\text{perm},j} - \text{OOB}_{\text{orig}}$ . Measures the forest's reliance on predictor $j$ .

n_{\min}

Minimum node size

Minimum number of observations in a terminal node. Controls tree depth. Smaller values produce deeper, more overfit trees whose errors are reduced by ensemble averaging.

Assumptions

Sufficient sample size for tree diversityTestable

The training set must be large enough that bootstrap resamples produce meaningfully different trees. With $n < 50$ , bootstrap samples overlap heavily and tree correlation increases, reducing the variance-reduction benefit.

If violated: OOB error estimates become optimistic. The forest degenerates toward a single averaged tree, losing ensemble diversity.

No extrapolation beyond training supportTestable

Random forests predict within the convex hull of the training data. Predictions for inputs outside the observed range of any predictor are constant extensions of the nearest terminal node.

If violated: Forecasts in novel regimes (unprecedented inflation, zero lower bound, pandemic shock) revert to the nearest historical analog rather than extrapolating a trend. The forest cannot generate out-of-sample predictions that exceed the training range.

Approximate independence or weak dependence of observationsTestable

Bootstrap resampling assumes observations are exchangeable or at most weakly dependent. For time-series data, standard i.i.d. bootstrap destroys temporal ordering.

If violated: OOB error underestimates true forecast error because bootstrap samples leak future information into training. Block bootstrap or temporal train/test splits are required for valid error estimation.

Adequate feature subsampling rateMaintained

The feature subsample size $m$ must be small enough to decorrelate trees but large enough that relevant predictors appear frequently. The default $m = \lfloor p/3 \rfloor$ balances these requirements under moderate correlation.

If violated: If $m$ is too large, trees are highly correlated and variance reduction stalls. If $m$ is too small, individual trees are so weak that even the ensemble has high bias.

Axis-aligned decision boundaries are adequateMaintained

CART trees split on one variable at a time, producing rectangular partitions. The random forest inherits this geometry. Smooth, oblique decision boundaries require many more trees and observations to approximate well.

If violated: When the true relationship is a smooth function of a linear combination of predictors (e.g., a Phillips curve in the output gap), random forests approximate the smooth surface with a staircase. Penalized linear models or gradient boosting with deeper trees may be more efficient.

Stationarity of the data-generating processTestable

The conditional distribution $P(y | X)$ is stable over the training and prediction windows. Random forests have no built-in mechanism for structural breaks or time-varying parameters.

If violated: Post-break predictions reflect the pre-break conditional distribution. The forest cannot adapt to regime changes unless the regime is encoded as a predictor or the training window is restricted to the current regime.

Concepts, data, and nearby models

Open the concept, data series, policy setting, or neighboring model that anchors this page.

Random forest: question, structure, and use cases

Background

How the Parts Fit Together

Applications

Components

Assumptions

Concepts, data, and nearby models

Concepts

Indicators

Policy

Nearby models