Academic

Seminars Events Calendar


Filter by
Jump to
Search

Regret Minimization and Best Arm Identification Algorithms for Cascading Bandits

Associate Professor Vincent Y. F. TanDept. of Electrical and Computer Engineering/Dept. of Mathematics, NUS

Date:7 October 2020, Wednesday

Location:ZOOM: https://nus-sg.zoom.us/j/83837608158?pwd=Y0NDYmlVNWY3dlh0MG5oZG9YY0RRdz09

Time:04.00pm - 05.00pm, Singapore time

To join using H.323: https://wiki.nus.edu.sg/display/cit/Making+H.323+or+SIP+Calls

 

We design and analyze TS-Cascade and Cascade-BAI, respectively a Thompson sampling algorithm and a best-arm algorithm within the framework of cascading bandits, a popular model for recommendation systems. In TS-Cascade, Bayesian estimates of the click probability are constructed using a univariate Gaussian; this leads to a more efficient exploration procedure vis-ã-vis existing UCB-based approaches. We also incorporate the empirical variance of each item’s click probability into the Bayesian updates. These two novel features allow us to prove an expected regret bound of the form \tilde{O}(\sqrt{LKT}) where L and K are the number of ground items and the number of items in the chosen list respectively and T ≥ L is the number of Thompson sampling update steps. CascadeBAI is an algorithm for finding the best set of K items, also called an arm. An upper bound on the time complexity of CascadeBAI is derived by overcoming a crucial analytical challenge, namely, that of probabilistically estimating the amount of available feedback at each step. To do so, we define a new class of random variables (r.v.’s) which we term as left-sided sub-Gaussian r.v.’s; these are r.v.’s whose cumulant generating functions (CGFs) can be bounded by a quadratic only for non-positive arguments of the CGFs. This enables the application of a sufficiently tight Bernstein-type concentration inequality. We show, through the derivation of a lower bound on the time complexity, that the performance of CascadeBAI is optimal in some practically relevant regimes.  

This is joint work with Zixin Zhong (Dept of Maths NUS) and Wang Chi Cheung (Department of Industrial Systems and Management, NUS) and was published in AISTATS 2019 and ICML 2020.