讲座报名 | ACE Talk 特邀芝加哥大学副教授李博，探讨对大语言模型进行安全性和可信度评估的方法

2023-11-20 | 作者：微软亚洲研究院

微软亚洲研究院 ACE Talk 系列讲座旨在邀请杰出的学术新星分享科研成果，为学生与研究员提供相互交流学习与洞悉前沿动态的平台。

第十四期 ACE Talk，我们特别邀请到来自芝加哥大学计算机科学系的副教授李博为我们带来以“Assessing Trustworthiness and Risks of Generative Models”为主题的报告，介绍她为评估大语言模型安全性和可信度而开发的平台 DecodingTrust。欢迎大家报名！

讲座信息

时间：11 月 24 日（周五）10：00 - 11：30

地点：线上

日程：

• 嘉宾报告（10:00-11:00）

• Q&A（11:00-11:30）

报名方式

欢迎扫描下方二维码填写报名问卷，报名成功后将收到邮件通知，邮件中将提供讲座 Teams 线上会议链接。

报名截止时间：11 月 22 日（周三）12：00

报名链接：https://jinshuju.net/f/Rqt8fQ

嘉宾介绍

Dr. Bo Li is an associate professor in the Department of Computer Science at the University of Chicago and the University of Illinois at Urbana-Champaign. She is the recipient of the IJCAI Computers and Thought Award, Alfred P. Sloan Research Fellowship, AI’s 10 to Watch, NSF CAREER Award, MIT Technology Review TR-35 Award, Dean's Award for Excellence in Research, C.W. Gear Outstanding Junior Faculty Award, Intel Rising Star award, Symantec Research Labs Fellowship, Rising Star Award, Research Awards from Tech companies such as Amazon, Meta, Google, Intel, IBM, and eBay, and best paper awards at several top machine learning and security conferences. Her research focuses on both theoretical and practical aspects of trustworthy machine learning, which is at the intersection of machine learning, security, privacy, and game theory. She has designed several scalable frameworks for certifiably robust learning and privacy-preserving data publishing. Her work has been featured by several major publications and media outlets, including Nature, Wired, Fortune, and New York Times.

报告简介

Assessing Trustworthiness and Risks of Generative Models

Large language models have captivated both practitioners and the public with their remarkable capabilities. However, the extent of our understanding regarding the trustworthiness of these models remains limited. The allure of deploying adept generative pre-trained transformer (GPT) models in sensitive domains like healthcare and finance, where prediction errors can carry significant consequences, has prompted the need for a thorough investigation. In response, our recent research endeavors to design a unified trustworthiness evaluation platform for large language models from different safety and trustworthiness perspectives. In this talk, I will briefly introduce our platform DecodingTrust, and discuss our evaluation principles, red-teaming approaches, and findings, with a specific focus on GPT-4 and GPT-3.5.

The DecodingTrust evaluation platform encompasses a wide array of perspectives, including toxicity, stereotype bias, adversarial robustness, out-of-distribution robustness, robustness against adversarial demonstrations, privacy, machine ethics, and fairness. Through these diverse lenses, we examine the multifaceted dimensions of trustworthiness that GPT models must adhere to in real-world applications. Our evaluations unearth previously undisclosed vulnerabilities that underscore potential trustworthiness threats. Among our findings, we unveil that GPT models can be easily steered to produce content that is toxic and biased, raising red flags for their deployment in sensitive contexts. Furthermore, our evaluation identifies instances of private information leakage from both training data and ongoing conversations, unveiling a dimension of concern that necessitates immediate attention. We also find that despite the superior performance of GPT-4 over GPT-3.5 on standard benchmarks, GPT-4 exhibits heightened vulnerability when faced with challenging adversarial, possibly due to its meticulous adherence to instructions, even when misleading. This talk aims to shed light on some critical gaps in the trustworthiness of language models and provide an illustrative demonstration of the safety evaluation platform DecodingTrust, hoping to pave the way for more responsible and secure machine learning systems in the future.

主持人简介

Dr. Huishuai Zhang is a principal researcher at Microsoft Research Asia. His main research interests include machine learning, privacy and optimization. His research work aims at understanding the fundamental limits of machine learning, designing efficient and robust learning algorithms.

关于ACE Talk

ACE (Accelerate, Create, Empower) Talk Series epitomizes our commitment across three dimensions. To accelerate the swift adoption of cutting-edge research where researchers and students can share the latest breakthroughs and advancements. To create an environment that nurtures novel ideas and fosters the genesis of innovative solutions to complex problems. At its core, we are dedicated to empowering individuals, including our speakers and audiences to drive positive changes on a broader scale. Through this endeavor, we aspire to enhance global communication and cultivate a diverse academic atmosphere, connecting talented individuals worldwide and ultimately contributing to meaningful change within the academic research community