The Federal Reserve Bank of New York's nowcasting framework uses regularized regressions (including ridge) when estimating bridge equations from large sets of monthly indicators. With 100+ candidate predictors released on staggered schedules, OLS on the full set is infeasible for most vintages. Ridge provides stable coefficient estimates that update smoothly as new data arrive, which is critical for a real-time forecasting system where erratic coefficient jumps would undermine credibility.
Phillips-curve estimation illustrates the collinearity problem ridge was built for. If you include six measures of economic slack (output gap, unemployment gap, capacity utilization gap, employment-to-population ratio, vacancy-unemployment ratio, underemployment rate) plus their first and second lags, you have 18 highly correlated predictors. OLS assigns wild, offsetting coefficients to near-identical series. Ridge pulls these coefficients toward each other (and toward zero), producing a composite slack measure implicitly. De Mol, Giannone, and Reichlin (2008) showed that ridge regression on a large predictor set can perform comparably to principal-components-based factor models for macro forecasting.
In academic applied work, ridge serves as a benchmark in forecast comparison exercises. Researchers evaluating new machine-learning methods for macro forecasting (random forests, neural networks, gradient boosting) almost always include ridge as a baseline because it is the simplest regularized method with a closed-form solution. When ridge beats a fancy model, it usually means the fancy model is overfitting. When a fancy model barely beats ridge, the nonlinearity gains are negligible.
Ridge fails when variable selection matters. It keeps every predictor in the model, shrinking weak ones toward zero but never eliminating them. For practitioners who need to identify which predictors drive the outcome---policy evaluation, structural analysis, model interpretability---ridge is the wrong tool. It also fails when the true model is very sparse: if only 5 of 100 predictors matter, LASSO or elastic net will outperform ridge by zeroing out the irrelevant 95. Finally, ridge treats all predictors as exchangeable through the isotropic penalty λI. If prior knowledge suggests that some groups of predictors should be penalized differently (e.g., financial variables less than survey variables), group-penalty methods like group LASSO or Bayesian hierarchical regression are more appropriate.