
2023-04-26 | 作者:微软亚洲研究院

微软亚洲研究院理论中心前沿系列讲座第十期,将于 4 月 27 日(本周四)上午 9:30-10:30 与你相见。这一期,我们请到了波士顿大学电子与计算机工程系副教授 Francesco Orabona,带来以 “Understanding Adam and AdamW through proximal updates, scale-freeness, and relaxed smoothness” 为主题的讲座分享,届时请锁定 B 站 “微软科技” 直播间!


欢迎对理论研究感兴趣的老师同学们参与讲座并加入社区(加入方式见后文),共同推动理论研究进步,加强跨学科研究合作,助力打破 AI 发展瓶颈,实现计算机技术实质性发展!



直播地址:B 站 “微软科技” 直播间
如果您希望与讲者互动,欢迎通过 Teams 参会
会议 ID:235 209 711 091

直播时间:4 月 27 日(本周四)9:30 - 10:30

扫码或点击 “阅读原文” 直达 B 站直播间



Francesco Orabona is an Associate Professor of Electrical & Computer Engineering at Boston University. His research interests lie in online learning, optimization, and statistical learning theory. He obtained his Ph.D. from the University of Genova in 2007. He previously was an Assistant Professor of Computer Science at Stony Brook University, a Senior Research Scientist at Yahoo Labs, and a Research Assistant Professor at the Toyota Technological Institute at Chicago. He received a Faculty Early Career Development (CAREER) from NSF in 2021 and a Google Research Award in 2017.


Understanding Adam and AdamW through proximal updates, scale-freeness, and relaxed smoothness


Adam and AdamW are the most commonly used algorithms for training deep neural networks due to their remarkable performance. However, despite a massive amount of research, it is fair to say that we are still far from understanding the true reasons why they work so well. In this talk, I'll show you some recent results on unique characteristics of Adam and AdamW.

First, I'll show how AdamW can be easily understood as an approximation of a proximal update on the squared L2 regularizer. Next, I'll show that, contrary to Adam, AdamW's update is "scale-free", i.e., its update is invariant to component-wise rescaling of the gradients. I'll show how scale-freeness provides an automatic preconditioning and how it correlates with the better performance of AdamW over Adam on deep learning experiments. Finally, I'll show the first analysis of a (minor) variant of Adam, that has a provably advantage over SGD for functions that satisfy a relaxed smoothness assumption, like the objective functions of Transformers.









也可以向 MSRA.TheoryCenter@outlook.com 发送以"Subscribe the Lecture Series"为主题的邮件订阅讲座信息



2021 年 12 月,微软亚洲研究院理论中心正式成立,期待通过搭建国际学术交流与合作枢纽,促进理论研究与大数据和人工智能技术的深度融合,在推动理论研究进步的同时,加强跨学科研究合作,助力打破 AI 发展瓶颈,实现计算机技术实质性发展。目前,理论中心已经汇集了微软亚洲研究院内部不同团队和研究背景的成员,聚焦于解决包括深度学习、强化学习、动力系统学习和数据驱动优化等领域的基础性问题。想了解关于理论中心的更多信息,请访问 https://www.microsoft.com/en-us/research/group/msr-asia-theory-center/
