Discovering Structural Patterns in Statistical Models via Regularization : Asymptotics, Exact Recovery and Estimation

Author

Ivan Hejny

Summary, in English

In the linear regression model, when the dimension p is fixed and n→∞, the asymptotic distribution of the estimation error for Lasso-type M-estimators is well established. In contrast, the convergence of the discrete structures (patterns) induced by these regularizers, such as sparsity or clustering, has remained largely unaddressed, even for the standard Lasso penalty. Because these lower-dimensional structures are sensitive to infinitesimal perturbations, the weak convergence of the continuous estimation error does not guarantee the convergence of the induced patterns.

This thesis develops a unified theoretical framework to resolve this discrepancy. We establish that for a range of non-differentiable regularizers with a polyhedral component (including the Lasso, Generalized Lasso, SLOPE, and Elastic Net), the estimated patterns converge toward a limiting pattern determined by an explicit asymptotic formula. This limit is characterized by the Fisher Information matrix of the loss, the score covariance matrix, and the directional derivative of the regularizer. In Paper I, this is achieved in the linear model by utilizing the Hausdorff distance as a suitable mode of set convergence involving subdifferentials. In Paper II, we extend the theory from Paper I to general statistical models via a Stochastic Lipschitz Differentiability (SLD) condition, which controls the fluctuations of the Taylor remainder and, beyond differentiable losses, encompasses robust, non-smooth functions such as the Huber and quantile losses.

The asymptotic formula characterizes the specific asymptotic irrepresentability conditions under which the true signal pattern can be recovered with high probability. To bypass instances where these stringent conditions fail (typically in highly correlated predictor structures), we develop adaptive two-step procedures based on proximal operators. Furthermore, the theory identifies a critical degeneracy in the Fused Lasso regarding its inability to recover its own clusters under orthogonal design, motivating the proposal of a Concavified Fused Lasso penalty that resolves this limitation.

Papers III and IV apply this asymptotic theory to graphical models and precision matrix estimation. Paper III establishes exact asymptotic limits for the Graphical SLOPE estimator under elliptically distributed data, revealing how it can outperform the Graphical Lasso when clustering structures are present. Paper IV focuses on the scale-invariant PCGLASSO method; we derive an irrepresentability condition under which the true sparsity structure can be recovered, theoretically justifying the method's empirical performance in discovering network hubs.

Paper V focuses on the high-dimensional regime (p >> n). By introducing a Surrogate regularizer, a decomposable penalty that locally matches the original polyhedral regularizer around the true signal, we extend established high-dimensional estimation bounds to non-decomposable penalties. Using this surrogate construction, we derive explicit deterministic error bounds and exact pattern recovery conditions. While our primary application focuses on SLOPE, the surrogate approach encompasses penalties such as the Generalized Lasso. Additionally, by linking the linear SLOPE sequence to the Integrated Brownian Bridge, we show that the volume ratio between the dual SLOPE and dual Lasso balls decays at an exact rate of p^{-1/4} for the linear sequence, and we conjecture a p^{-1/6} Gaussian volume ratio decay rate for the Benjamini-Hochberg sequence.

Department/s

Department of Statistics

Publishing year

2026-05-19

Language

English

Full text

Available as PDF - 2 MB
Download statistics

Links

Publication in Lund University research portal

Document type

Dissertation

Publisher

Lund University (Media-Tryck)

Topic

Probability Theory and Statistics

Keywords

regularization
pattern convergence
exact recovery
irrepresentability condition
high-dimensional statistics
graphical models
non-decomposable penalization
Lasso
Fused Lasso
SLOPE

Status

Published

Supervisor

Jonas Wallin
Malgorzata Bogdan

ISBN/ISSN/Other

ISBN: 978-91-90202-00-5
ISBN: 978-91-90202-01-2

Defence date

9 June 2026

Defence time

13:15

Defence place

EC3:207

Opponent

Piotr Zwiernik (Professor)

Discovering Structural Patterns in Statistical Models via Regularization : Asymptotics, Exact Recovery and Estimation

Summary, in English

Contact information

Shortcuts

Find us on social media

Collaboration and networks