- Department of Statistics & Data Science

Event

Non-asymptotic Theory for Two-Layer Neural Networks: Beyond the Bias-Variance Trade-Off

Assoc Professor Wei Lin

Peking University

Date: 6 February 2024, Tuesday

Location:S17-04-06, Seminar Room

Time: 3pm, Singapore

In statistical and probabilistic models, “mean-field” behavior arises when the aggregate effect of many variables is well-approximated by an average or effective field. Ideas from the analysis of mean-field models in statistical physics have been influential in our understanding of various problems of high-dimensional statistics in recent years. In this talk, I will argue that statistical applications can conversely drive the basic study of mean-field phenomena.

Large neural networks have proved remarkably effective in modern deep learning practice, even in the overparametrized regime where the number of active parameters is large relative to the sample size. This contradicts the classical perspective that a machine learning model must trade off bias and variance for optimal generalization. In this talk, we will first review some recent efforts in resolving this conflict, and then present a unified generalization theory for two-layer neural networks with ReLU activation function by incorporating scaled variation regularization. Interestingly, the regularizer is equivalent to ridge regression from the angle of gradient-based optimization, but plays a similar role to the group lasso in controlling the model complexity. By exploiting this “ridge-lasso duality,” we obtain non-asymptotic prediction bounds for all network widths, which reproduce the double descent phenomenon. Moreover, the overparametrized minimum risk is lower than its under parametrized counterpart when the signal is strong, and is nearly minimax optimal over a suitable class of functions. By contrast, we show that overparametrized random feature models suffer from the curse of dimensionality and thus are suboptimal.