Despite the development of many new optimization techniques, constant step-size stochastic gradient descent remains the tool of choice in modern machine learning. It performs unreasonably well in practice, both in convex settings where there are poor guarantees, and also in non-convex settings, where there are often no guarantees. In this talk, we will sketch a proof of why it works for phase retrieval. This is the problem of solving systems of rank-1 quadratic equations, which can be formulated as the optimization of a non-convex, non-smooth objective.
Our analysis makes use of several new probabilistic arguments, including the "anti-concentration on wedges" property and a "summary state space" analysis.