The browser you are using is not supported by this website. All versions of Internet Explorer are no longer supported, either by us or Microsoft (read more here: https://www.microsoft.com/en-us/microsoft-365/windows/end-of-ie-support).

Please use a modern browser to fully experience our website, such as the newest versions of Edge, Chrome, Firefox or Safari etc.

Discovering Structural Patterns in Statistical Models via Regularization : Asymptotics, Exact Recovery and Estimation

Author

Summary, in English

In the linear regression model, when the dimension p is fixed and n→∞, the asymptotic distribution of the estimation error for Lasso-type M-estimators is well established. In contrast, the convergence of the discrete structures (patterns) induced by these regularizers, such as sparsity or clustering, has remained largely unaddressed, even for the standard Lasso penalty. Because these lower-dimensional structures are sensitive to infinitesimal perturbations, the weak convergence of the continuous estimation error does not guarantee the convergence of the induced patterns.

This thesis develops a unified theoretical framework to resolve this discrepancy. We establish that for a range of non-differentiable regularizers with a polyhedral component (including the Lasso, Generalized Lasso, SLOPE, and Elastic Net), the estimated patterns converge toward a limiting pattern determined by an explicit asymptotic formula. This limit is characterized by the Fisher Information matrix of the loss, the score covariance matrix, and the directional derivative of the regularizer. In Paper I, this is achieved in the linear model by utilizing the Hausdorff distance as a suitable mode of set convergence involving subdifferentials. In Paper II, we extend the theory from Paper I to general statistical models via a Stochastic Lipschitz Differentiability (SLD) condition, which controls the fluctuations of the Taylor remainder and, beyond differentiable losses, encompasses robust, non-smooth functions such as the Huber and quantile losses.

The asymptotic formula characterizes the specific asymptotic irrepresentability conditions under which the true signal pattern can be recovered with high probability. To bypass instances where these stringent conditions fail (typically in highly correlated predictor structures), we develop adaptive two-step procedures based on proximal operators. Furthermore, the theory identifies a critical degeneracy in the Fused Lasso regarding its inability to recover its own clusters under orthogonal design, motivating the proposal of a Concavified Fused Lasso penalty that resolves this limitation.

Papers III and IV apply this asymptotic theory to graphical models and precision matrix estimation. Paper III establishes exact asymptotic limits for the Graphical SLOPE estimator under elliptically distributed data, revealing how it can outperform the Graphical Lasso when clustering structures are present. Paper IV focuses on the scale-invariant PCGLASSO method; we derive an irrepresentability condition under which the true sparsity structure can be recovered, theoretically justifying the method's empirical performance in discovering network hubs.

Paper V focuses on the high-dimensional regime (p >> n). By introducing a Surrogate regularizer, a decomposable penalty that locally matches the original polyhedral regularizer around the true signal, we extend established high-dimensional estimation bounds to non-decomposable penalties. Using this surrogate construction, we derive explicit deterministic error bounds and exact pattern recovery conditions. While our primary application focuses on SLOPE, the surrogate approach encompasses penalties such as the Generalized Lasso. Additionally, by linking the linear SLOPE sequence to the Integrated Brownian Bridge, we show that the volume ratio between the dual SLOPE and dual Lasso balls decays at an exact rate of p^{-1/4} for the linear sequence, and we conjecture a p^{-1/6} Gaussian volume ratio decay rate for the Benjamini-Hochberg sequence.

Publishing year

2026-05-19

Language

English

Document type

Dissertation

Publisher

Lund University (Media-Tryck)

Topic

  • Probability Theory and Statistics

Keywords

  • regularization
  • pattern convergence
  • exact recovery
  • irrepresentability condition
  • high-dimensional statistics
  • graphical models
  • non-decomposable penalization
  • Lasso
  • Fused Lasso
  • SLOPE

Status

Published

Supervisor

ISBN/ISSN/Other

  • ISBN: 978-91-90202-00-5
  • ISBN: 978-91-90202-01-2

Defence date

9 June 2026

Defence time

13:15

Defence place

EC3:207

Opponent

  • Piotr Zwiernik (Professor)