Date:14 May 2025, Wednesday
Location:S16-06-118, Seminar Room
Time:3pm, Singapore
General first-order methods (GFOMs), including various gradient descent variants and approximate message passing algorithms, constitute a broad class of iterative algorithms widely used in modern statistical learning problems. Some GFOMs also serve as constructive proof devices, iteratively characterizing the empirical distributions of statistical estimators in the asymptotic regime of large system limits for a fixed number of iterations.
This talk presents a recent non-asymptotic, mean-field theory for the dynamics of a general class of GFOMs. Our characterizations capture the precise stochastic behavior of each coordinate of the GFOM iterates and, more importantly, hold universally across a broad class of heterogeneous random matrix models.
We demonstrate the utility of these general results through two applications. In the first application, we characterize the mean-field behavior of non-convex gradient descent algorithms in a broad class of empirical risk minimization problems. As a result, we show that the vanilla gradient descent algorithm can be augmented with a few auxiliary computations to produce data-driven estimates of the unknown generalization error at each iteration. This applies to models ranging from the simple linear model to highly complex nonlinear multi-layer neural networks, and can be used to inform practical decisions such as early stopping and hyperparameter tuning during gradient descent training.
In the second application, we develop an algorithmic method for proving the universality of regularized regression estimators. Specifically, we systematically improve universality results for regularized regression estimators in the linear model and resolve the long-standing universality conjecture for (regularized) maximum likelihood estimators in the logistic regression model.