Date:27 December 2021, Monday
Location:ZOOM: https://nus-sg.zoom.us/j/87153758757?pwd=aHBpWC82M1B2RjJJcURuaHBiSTd6UT09
Time:9am-10am, Singapore
A mystery of modern neural networks is their surprising generalization power in overparametrized regime: they comprise so many parameters that they can interpolate the training set, even if actual labels are replaced by purely random ones; despite this, they achieve good prediction error on unseen data.
To demystify the above phenomena, we focus on two-layer neural networks in the neural tangent (NT) regime. Under a simple data model where n inputs are d-dimensional isotropic vectors and there are N hidden neurons, we show that as soon as Nd >> n, the minimum eigenvalue of the empirical NT kernel is bounded away from zero, and therefore the network can exactly interpolate arbitrary labels.
Next, we study the generalization error of NT ridge regression (including min-norm interpolation). We show that in the same overparametrization regime Nd >> n, in terms of generalization errors, NT ridge regression is well approximated by kernel ridge regression (infinite-width kernel), which is further approximated by polynomial ridge regression. A surprising phenomenon is the “self-induced” regularization due to the high-degree components of the activation function.
Link to the ArXiv paper: https://arxiv.org/abs/2007.12826