理论中心前沿系列讲座｜直播：匹配市场上的多臂老虎机算法

2022-11-02 | 作者：微软亚洲研究院

微软亚洲研究院理论中心前沿系列讲座第六期，将于 11 月 4 日（本周五）上午 10:00-11:00 与你相见。这一期，我们请到了上海交通大学约翰·霍普克罗夫计算机科学中心助理教授李帅，带来以“匹配市场上的多臂老虎机算法”为主题的讲座分享，届时请锁定 B 站“微软中国视频中心”直播间！
理论中心前沿系列讲座是微软亚洲研究院的常设系列直播讲座，将邀请全球站在理论研究前沿的研究者介绍他们的研究发现，主题涵盖大数据、人工智能以及其他相关领域的理论进展。通过这一系列讲座，我们期待与各位一起探索当前理论研究的前沿发现，并建立一个活跃的理论研究社区。
欢迎对理论研究感兴趣的老师同学们参与讲座并加入社区（加入方式见后文），共同推动理论研究进步，加强跨学科研究合作，助力打破 AI 发展瓶颈，实现计算机技术实质性发展！

直播信息

直播地址：B 站“微软中国视频中心”直播间
https://live.bilibili.com/730

直播时间：每两周直播一次，时间为周四上午 10:00-11:00（本周变更至周五）
扫码直达直播间

讲座信息

直播时间：11 月 4 日 10:00 - 11:00

Shuai Li is currently an Assistant Professor in the John Hopcroft Center of Shanghai Jiao Tong University. She received PhD degree in Computer Science from the Chinese University of Hong Kong, master degree in Mathematics from University of the Chinese Academy of Sciences and bachelor degree in Mathematics from Zhejiang University. Her research interests include machine learning theory, bandit algorithms and reinforcement learning algorithms. She has published 40+ papers in top machine learning conferences like ICML/NeurIPS/AAAI/IJCAI/KDD and serves as reviewers in these conferences. She is a recipient of Shanghai Sailing Program 2020 and Google PhD fellowship 2018.

报告题目:
Player-optimal Stable Regret for Bandit Learning in Matching Markets
报告摘要:
The problem of matching markets has been studied for a long history in the literature due to its wide range of applications. Finding a stable matching is a common equilibrium objective in this problem. Since market participants are usually uncertain of their preferences, a rich line of recent works study the online setting where one-side participants (players) learn their unknown preferences from iterative interactions with the other side (arms). Most previous works in this line are only able to derive theoretical guarantees for player-pessimal stable regret, which is defined compared with the players’ least-preferred stable matching.
However, under the pessimal stable matching, players only obtain the least reward among all stable matchings. To maximize players’ profits, the player-optimal stable matching would be the most desirable Though Basu et al. [2021] successfully bring an upper bound for player-optimal stable regret, their result can be exponentially large if players’ preference gap is small. Whether a polynomial guarantee for this regret exists is a significant but still open problem. In this work, we provide a new algorithm and show that the optimal stable regret of each player can be upper bounded by O(K log T / ∆^2) where K is the number of arms, T is the horizon and ∆ is the players’ minimum preference gap. This result significantly improves previous works which either has a weaker player-pessimal stable matching objective or applies only for markets with special assumptions. When the preferences of participants satisfy some special conditions, our regret upper bound also matches the previously derived lower bound This work is accepted at SODA 2023.

上期讲座回顾

在上次讲座中，来自宾夕法尼亚大学的苏炜杰教授分享了神经网络训练中的一些有趣的几何现象。特别地，苏教授探讨了著名的神经坍缩现象。他提出这一现象可以用一种只建模最后一层输出和权重的层剥离模型的优化过程来解释，并据此模型预测了非均衡学习中的一种少数坍缩现象。然后苏教授提出了他和合作者们最新观测到的等分律现象，即神经网络对数据的分离程度逐层随指数状态平稳提升。大家对这些现象背后的深层次原因及进一步的现实意义提出了问题，并得到了苏教授的解答。

回放地址：https://www.bilibili.com/video/BV1vg411a7ra

欢迎扫码加入理论研究社区，与关注理论研究的研究者交流碰撞，群内也将分享微软亚洲研究院理论中心前沿系列讲座的最新信息

也可以向MSRA.TheoryCenter@outlook.com 发送以"Subscribe the Lecture Series"为主题的邮件订阅讲座信息

关于微软亚洲研究院理论中心

2021 年 12 月，微软亚洲研究院理论中心正式成立，期待通过搭建国际学术交流与合作枢纽，促进理论研究与大数据和人工智能技术的深度融合，在推动理论研究进步的同时，加强跨学科研究合作，助力打破 AI 发展瓶颈，实现计算机技术实质性发展。目前，理论中心已经汇集了微软亚洲研究院内部不同团队和研究背景的成员，聚焦于解决包括深度学习、强化学习、动力系统学习和数据驱动优化等领域的基础性问题。

理论中心前沿系列讲座｜直播：匹配市场上的多臂老虎机算法

关注微软亚洲研究院