Event
Non-asymptotic Theory for Two-Layer Neural Networks: Beyond the Bias-Variance Trade-Off
Assoc Professor Wei Lin
Peking University
Date: 6 February 2024, Tuesday
Location:S17-04-06, Seminar Room
Time: 3pm, Singapore
Large neural networks have proved remarkably effective in modern deep learning practice, even in the overparametrized regime where the number of active parameters is large relative to the sample size. This contradicts the classical perspective that a machine learning model must trade off bias and variance for optimal generalization. In this talk, we will first review some recent efforts in resolving this conflict, and then present a unified generalization theory for two-layer neural networks with ReLU activation function by incorporating scaled variation regularization. Interestingly, the regularizer is equivalent to ridge regression from the angle of gradient-based optimization, but plays a similar role to the group lasso in controlling the model complexity. By exploiting this “ridge-lasso duality,” we obtain non-asymptotic prediction bounds for all network widths, which reproduce the double descent phenomenon. Moreover, the overparametrized minimum risk is lower than its under parametrized counterpart when the signal is strong, and is nearly minimax optimal over a suitable class of functions. By contrast, we show that overparametrized random feature models suffer from the curse of dimensionality and thus are suboptimal.