讲座报名 | ACE Talk 特邀斯坦福大学计算机科学系博士生梁伟欣、曹瀚成，介绍大语言模型在论文评审方面的作用
微软亚洲研究院 ACE Talk 系列讲座旨在邀请杰出的学术新星分享科研成果，为学生与研究员提供相互交流学习与洞悉前沿动态的平台。
第十三期 ACE Talk，我们特别邀请到来自斯坦福大学计算机科学系的博士生梁伟欣、曹瀚成为我们带来以“Can large language models provide useful feedback on research papers? A large-scale empirical analysis.”为主题的报告，介绍大语言模型在论文评审方面对于研究者的帮助。欢迎大家报名！
时间：11 月 17 日（周五）10：00 - 11：30
欢迎扫描下方二维码填写报名问卷，报名成功后将收到邮件通知，邮件中将提供讲座 Teams 线上会议链接。
报名截止时间：11 月 15 日（周三）12：00
Weixin Liang is in the third year of his Doctorate studies in Computer Science at Stanford University, working under the supervision of Professor James Zou. Previously, he obtained a Master's degree in Electrical Engineering from Stanford University and a Bachelor's degree in Computer Science from Zhejiang University. His research is primarily focused on the areas of trustworthy AI, data-centric AI, and natural language processing.
Hancheng Cao is a final year PhD candidate in computer science (minor in management science & engineering) at Stanford University, working with Prof. Daniel A. McFarland and Prof. Michael Bernstein. He works on computational social science, social computing and human AI interaction. He is a Stanford Interdisciplinary Graduate Fellow, and his work has been awarded three best paper/honorable mention awards at top HCI venues (CHI, CSCW). He has interned at Microsoft Research, Allen Institute for Artificial Intelligence, and holds a bachelor degree in electronic engineering from Tsinghua University.
Can large language models provide useful feedback on research papers? A large-scale empirical analysis.
Expert feedback lays the foundation of rigorous research. However, the rapid growth of scholarly production and intricate knowledge specialization challenge the conventional scientific feedback mechanisms. High-quality peer reviews are increasingly difficult to obtain. Researchers who are more junior or from under-resourced settings have especially hard times getting timely feedback. With the breakthrough of large language models (LLM) such as GPT-4, there is growing interest in using LLMs to generate scientific feedback on research manuscripts. However, the utility of LLM-generated feedback has not been systematically studied. To address this gap, we created an automated pipeline using GPT-4 to provide comments on the full PDFs of scientific papers. We evaluated the quality of GPT-4's feedback through two large-scale studies. We first quantitatively compared GPT-4's generated feedback with human peer reviewer feedback in 15 Nature family journals (3,096 papers in total) and the ICLR machine learning conference (1,709 papers). The overlap in the points raised by GPT-4 and by human reviewers (average overlap 30.85% for Nature journals, 39.23% for ICLR) is comparable to the overlap between two human reviewers (average overlap 28.58% for Nature journals, 35.25% for ICLR). The overlap between GPT-4 and human reviewers is larger for the weaker papers (i.e., rejected ICLR papers; average overlap 43.80%). We then conducted a prospective user study with 308 researchers from 110 US institutions in the field of AI and computational biology to understand how researchers perceive feedback generated by our GPT-4 system on their own papers. Overall, more than half (57.4%) of the users found GPT-4 generated feedback helpful/very helpful and 82.4% found it more beneficial than feedback from at least some human reviewers. While our findings show that LLM-generated feedback can help researchers, we also identify several limitations. For example, GPT-4 tends to focus on certain aspects of scientific feedback (e.g., `add experiments on more datasets'), and often struggles to provide in-depth critique of method design. Together our results suggest that LLM and human feedback can complement each other. While human expert review is and should continue to be the foundation of rigorous scientific process, LLM feedback could benefit researchers, especially when timely expert feedback is not available and in earlier stages of manuscript preparation before peer-review. The paper is available at https://arxiv.org/pdf/2310.01783.pdf
Dr. Xing Xie is currently a senior principal research manager of Microsoft Research Asia, and a guest Ph.D. advisor at the University of Science and Technology of China. He received his B.S. and Ph.D. degrees in Computer Science from the University of Science and Technology of China in 1996 and 2001, respectively. He joined Microsoft Research Asia in July 2001, working on data mining, social computing and responsible AI. During the past years, he has published over 300 papers, won the IEEE MDM 2023 test-of-time award, the ACM SIGKDD 2022 test-of-time award, the ACM SIGKDD China 2021 test of time award, etc.
ACE (Accelerate, Create, Empower) Talk Series epitomizes our commitment across three dimensions. To accelerate the swift adoption of cutting-edge research where researchers and students can share the latest breakthroughs and advancements. To create an environment that nurtures novel ideas and fosters the genesis of innovative solutions to complex problems. At its core, we are dedicated to empowering individuals, including our speakers and audiences to drive positive changes on a broader scale. Through this endeavor, we aspire to enhance global communication and cultivate a diverse academic atmosphere, connecting talented individuals worldwide and ultimately contributing to meaningful change within the academic research community
ACE Talk 历届嘉宾
Ep.1：哥伦比亚大学 博士生 钟宇宏
Ep.2：杜克大学 助理教授 Neil Gong
Ep.3：香港大学 助理教授 余涛
Ep.4：南加利福尼亚大学 助理教授 赵洁玉
Ep.5：麻省理工学院 副教授 韩松
Ep.6：斯坦福大学 助理教授 杨笛一
Ep.7：佐治亚理工学院 助理教授 Srijan Kumar
Ep.8：卡耐基梅隆大学 副教授 朱海一
Ep.9：北卡罗来纳州立大学 助理教授 胥栋宽
Ep.10：卡耐基梅隆大学 教授 Raj Reddy
Ep.11：香港大学 教授 马毅
Ep.12：俄亥俄州立大学 教授 丁源森