理论中心前沿系列讲座|直播:由近端更新、缩放不变性和更弱的光滑性假设来理解Adam(W)

2023-04-26 | 作者:微软亚洲研究院

微软亚洲研究院理论中心前沿系列讲座第十期,将于 4 月 27 日(本周四)上午 9:30-10:30 与你相见。这一期,我们请到了波士顿大学电子与计算机工程系副教授 Francesco Orabona,带来以 “Understanding Adam and AdamW through proximal updates, scale-freeness, and relaxed smoothness” 为主题的讲座分享,届时请锁定 B 站 “微软科技” 直播间!

理论中心前沿系列讲座是微软亚洲研究院的常设系列直播讲座,将邀请全球站在理论研究前沿的研究者介绍他们的研究发现,主题涵盖大数据、人工智能以及其他相关领域的理论进展。通过这一系列讲座,我们期待与各位一起探索当前理论研究的前沿发现,并建立一个活跃的理论研究社区。

欢迎对理论研究感兴趣的老师同学们参与讲座并加入社区(加入方式见后文),共同推动理论研究进步,加强跨学科研究合作,助力打破 AI 发展瓶颈,实现计算机技术实质性发展!

 

直播信息

直播地址:B 站 “微软科技” 直播间
https://live.bilibili.com/730
如果您希望与讲者互动,欢迎通过 Teams 参会
会议链接:https://aka.ms/AAkln3g
会议 ID:235 209 711 091
会议密码:ZzSCyB

直播时间:4 月 27 日(本周四)9:30 - 10:30

扫码或点击 “阅读原文” 直达 B 站直播间

 

讲座信息

Francesco Orabona is an Associate Professor of Electrical & Computer Engineering at Boston University. His research interests lie in online learning, optimization, and statistical learning theory. He obtained his Ph.D. from the University of Genova in 2007. He previously was an Assistant Professor of Computer Science at Stony Brook University, a Senior Research Scientist at Yahoo Labs, and a Research Assistant Professor at the Toyota Technological Institute at Chicago. He received a Faculty Early Career Development (CAREER) from NSF in 2021 and a Google Research Award in 2017.

报告题目:

Understanding Adam and AdamW through proximal updates, scale-freeness, and relaxed smoothness

报告摘要:

Adam and AdamW are the most commonly used algorithms for training deep neural networks due to their remarkable performance. However, despite a massive amount of research, it is fair to say that we are still far from understanding the true reasons why they work so well. In this talk, I'll show you some recent results on unique characteristics of Adam and AdamW.

First, I'll show how AdamW can be easily understood as an approximation of a proximal update on the squared L2 regularizer. Next, I'll show that, contrary to Adam, AdamW's update is "scale-free", i.e., its update is invariant to component-wise rescaling of the gradients. I'll show how scale-freeness provides an automatic preconditioning and how it correlates with the better performance of AdamW over Adam on deep learning experiments. Finally, I'll show the first analysis of a (minor) variant of Adam, that has a provably advantage over SGD for functions that satisfy a relaxed smoothness assumption, like the objective functions of Transformers.

 

上期讲座回顾

在上次讲座中,来自北京大学的邓小铁教授讲述了他在多主体博弈动力学领域的最新工作。特别地,他探讨了博弈动力学中平衡计算和激励分析建模的各种研究方法,并讨论了多智能体系统中的计算复杂性、顺序和交互优化以及平衡分析。听众们就激励分析建模等提出了问题,并得到了邓教授的回答。

回放地址:https://aka.ms/AAkl7of

 

加入理论研究社区

欢迎扫码加入理论研究社区,与关注理论研究的研究者交流碰撞,群内也将分享微软亚洲研究院理论中心前沿系列讲座的最新信息

【微信群二维码】

也可以向 MSRA.TheoryCenter@outlook.com 发送以"Subscribe the Lecture Series"为主题的邮件订阅讲座信息

 

关于微软亚洲研究院理论中心

2021 年 12 月,微软亚洲研究院理论中心正式成立,期待通过搭建国际学术交流与合作枢纽,促进理论研究与大数据和人工智能技术的深度融合,在推动理论研究进步的同时,加强跨学科研究合作,助力打破 AI 发展瓶颈,实现计算机技术实质性发展。目前,理论中心已经汇集了微软亚洲研究院内部不同团队和研究背景的成员,聚焦于解决包括深度学习、强化学习、动力系统学习和数据驱动优化等领域的基础性问题。想了解关于理论中心的更多信息,请访问 https://www.microsoft.com/en-us/research/group/msr-asia-theory-center/

标签