Empirical Forecasting

Method notes for the Empirical Forecasting Lab.

The Empirical Forecasting Lab is built for short-run macro forecasting, structural analysis, and evaluation against held-out observations. The public method notes separate the statistical procedure from the data plumbing so readers can see which part of a result comes from a model and which part comes from the selected information set.

Current Model Families

  • Classical univariate forecasts use lagged values, residual diagnostics, and out-of-sample accuracy checks.
  • Multivariate forecasts use VAR-style systems when several series are selected.
  • Bayesian VAR runs use a conjugate normal-inverse-Wishart posterior with Minnesota-style shrinkage.
  • VECM workflows use Johansen rank testing and rank-restricted forecasts when cointegration is detected.
  • Structural VAR workflows report impulse responses and forecast error variance shares under the chosen identification scheme.
  • Factor short-horizon workflows summarize a panel into latent factors before producing near-term estimates.

Data Treatment

Each run carries a vintage mode, target series, transform choice, horizon, and evaluation window. A cached run should not be combined or ranked against another run unless those fields are compatible.

Transforms must be chosen before model fitting. Common choices include levels, differences, log levels, and log differences. The lab does not silently reverse a transform and call that a structural result; any forecast shown in transformed units should be read as such unless the output names the back-transform.

Evaluation Split

The release branch uses separate validation and terminal-test windows when combining forecasts. Validation is used to estimate weights or tune a rule. The terminal-test window is kept for final reporting.

This distinction matters. A method that looks best on the same observations used to tune it is not being tested out of sample.

Notebook Export

The empirical sandbox can export the current session as a Python notebook bundle. The bundle includes a Jupyter notebook, a Python script, the session snapshot, the runtime policy, requested packages, and a manifest digest.

Notebook export is an audit aid. It helps a researcher inspect the data, transform, model, and evaluation choices outside the guided UI. It is not the same as a completed hosted execution. Execution availability is environment- and plan-dependent, while notebook export remains useful even when hosted execution is unavailable.

Density Output

Density forecasts are described as model-native predictive distributions only when the engine supplies draws from a model-supported simulation path. Gaussian residual bands are labeled approximate. That label is intentional: a normal residual approximation can be useful, but it does not carry the same meaning as posterior predictive simulation or state-space simulation.

Limits

  • The lab does not prove causal effects from forecast accuracy alone.
  • Structural VAR results depend on the identification scheme and ordering or restrictions selected by the user.
  • Very short samples can make lag selection, rank tests, and density calibration unstable.
  • Provider-scale ingestion and live external forecast uploads are separate implementation work.

References

  • Said and Dickey, 1984, unit-root testing in ARMA models.
  • Kwiatkowski, Phillips, Schmidt, and Shin, 1992, stationarity testing.
  • Johansen, 1988 and 1991, cointegration rank tests.
  • Sims, 1980, vector autoregressions in macroeconomics.
  • Litterman, 1986, Bayesian VAR shrinkage.
  • Gneiting and Raftery, 2007, proper scoring rules for probabilistic forecasts.