Date:12 November 2025, Wednesday
Location:S16-05-22
Time:4pm, Singapore
Online learning is formulated as a sequential decision problem with partial/censored feedback: an agent acts in an unknown, oblivious environment, trading off information and reward to maximize cumulative return. Performance is measured by regret, with the aim of designing algorithms that balance exploration and exploitation and attain optimal finite-sample rates. This formulation serves as a unifying framework for repeated decision problems in digital markets. The presentation focuses on bilateral trade, cast as learning brokerage between traders with unknown valuations, considering multiple reward objectives (gain from trade, fairness, volume) and realistic feedback signals (full information, two-bit, asynchronous, one-bit). Techniques are briefly outlined to show how reward, feedback, and structural assumptions govern learnability and the resulting rates.