学术论文

A large-scale empirical study of commit message generation: models, datasets and evaluation
Commit messages are natural language descriptions of code changes, which are important for program understanding and maintenance. However, writing commit messages manually is time-consuming and laborious, especially when the code is updated frequently. Various approaches utilizing generation or retrieval techniques have been proposed to automatically generate commit messages. To achieve a better understanding of how the existing approaches perform in solving this problem, this paper conducts a systematic and in-depth analysis of the state-of-the-art models and datasets. We find that: (1) Different variants of the BLEU metric used in previous works affect the evaluation. (2) Most datasets are crawled only from Java repositories while repositories in other programming languages are not sufficiently explored. (3) Dataset splitting strategies can influence the performance of existing models by a large margin. (4) For pre-trained models, fune-tuning with different multi-programming-language combinations can influence their performance. Based on these findings, we collect a large-scale, information-rich, Multi-language Commit Message Dataset (MCMD). Using MCMD, we conduct extensive experiments under different experiment settings including splitting strategies and multi-programming-language combinations. Furthermore, we provide suggestions for comprehensively evaluating commit message generation models and discuss possible future research directions. We believe our work can help practitioners and researchers better evaluate and select models for automatic commit message generation.
DIGMN: Dynamic Intent Guided Meta Network for Differentiated User Engagement Forecasting in Online Professional Social Platforms
User engagement prediction plays a critical role for designing interaction strategies to grow user engagement and increase revenue in online social platforms. Through the in-depth analysis of the real-world data from the world’s largest professional social platforms, i.e., LinkedIn, we find that users expose diverse engagement patterns, and a major reason for the differences in user engagement patterns is that users have different intents. That is, people have different intents when using LinkedIn, e.g., applying for jobs, building connections, or checking notifications, which shows quite different engagement patterns. Meanwhile, user intents and the corresponding engagement patterns may change over time. Although such pattern differences and dynamics are essential for user engagement prediction, differentiating user engagement patterns based on user dynamic intents for better user engagement forecasting has not received enough attention in previous works. In this paper, we proposed a Dynamic Intent Guided Meta Network (DIGMN), which can explicitly model user intent varying with time and perform differentiated user engagement forecasting. Specifically, we derive some interpretable basic user intents as prior knowledge from data mining and introduce prior intents in explicitly modeling dynamic user intent. Furthermore, based on the dynamic user intent representations, we propose a meta predictor to perform differentiated user engagement forecasting. Through a comprehensive evaluation on LinkedIn anonymous user data, our method outperforms state-of-the-art baselines significantly, i.e., 2.96% and 3.48% absolute error reduction, on coarse-grained and fine-grained user engagement prediction tasks, respectively, demonstrating the effectiveness of our method.
VRL3: A Data-Driven Framework for Visual Deep Reinforcement Learning
We propose VRL3, a powerful data-driven framework with a simple design for solving challenging visual deep reinforcement learning (DRL) tasks. We analyze a number of major obstacles in taking a data-driven approach, and present a suite of design principles, novel findings, and critical insights about data-driven visual DRL. Our framework has three stages: in stage 1, we leverage non-RL datasets (e.g. ImageNet) to learn task-agnostic visual representations; in stage 2, we use offline RL data (e.g. a limited number of expert demonstrations) to convert the task-agnostic representations into more powerful task-specific representations; in stage 3, we fine-tune the agent with online RL. On a set of challenging hand manipulation tasks with sparse reward and realistic visual inputs, compared to the previous SOTA, VRL3 achieves an average of 780% better sample efficiency. And on the hardest task, VRL3 is 1220% more sample efficient (2440% when using a wider encoder) and solves the task with only 10% of the computation. These significant results clearly demonstrate the great potential of data-driven deep reinforcement learning.
VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts
We present a unified Vision-Language pretrained Model (VLMo) that jointly learns a dual encoder and a fusion encoder with a modular Transformer network. Specifically, we introduce Mixture-of-Modality-Experts (MoME) Transformer, where each block contains a pool of modality-specific experts and a shared self-attention layer. Because of the modeling flexibility of MoME, pretrained VLMo can be fine-tuned as a fusion encoder for vision-language classification tasks, or used as a dual encoder for efficient image-text retrieval. Moreover, we propose a stagewise pre-training strategy, which effectively leverages large-scale image-only and text-only data besides image-text pairs. Experimental results show that VLMo achieves state-of-the-art results on various vision-language tasks, including VQA and NLVR2. The code and pretrained models are available at https://aka.ms/vlmo.
Visual Concepts Tokenization
Obtaining the human-like perception ability of abstracting visual concepts from concrete pixels has always been a fundamental and important target in machine learning research fields such as disentangled representation learning and scene decomposition. Towards this goal, we propose an unsupervised transformer-based Visual Concepts Tokenization framework, dubbed VCT, to perceive an image into a set of disentangled visual concept tokens, with each concept token responding to one type of independent visual concept. Particularly, to obtain these concept tokens, we only use cross-attention to extract visual information from the image tokens layer by layer without self-attention between concept tokens, preventing information leakage across concept tokens. We further propose a Concept Disentangling Loss to facilitate that different concept tokens represent independent visual concepts. The cross-attention and disentangling loss play the role of induction and mutual exclusion for the concept tokens, respectively. Extensive experiments on several popular datasets verify the effectiveness of VCT on the tasks of disentangled representation learning and scene decomposition. VCT achieves the state-of-the-art results by a large margin.
Revisiting Code Search in a Two-Stage Paradigm
With a good code search engine, developers can reuse existing code snippets and accelerate software development process. Current code search methods can be divided into two categories: traditional information retrieval (IR) based and deep learning (DL) based approaches. DL-based approaches include the cross-encoder paradigm and the bi-encoder paradigm. However, both approaches have certain limitations. The inference of IR-based and bi-encoder models are fast, however, they are not accurate enough; while cross-encoder models can achieve higher search accuracy but consume more time. In this work, we propose TOSS, a two-stage fusion code search framework that can combine the advantages of different code search methods. TOSS first uses IR-based and bi-encoder models to efficiently recall a small number of top-k code candidates, and then uses fine-grained cross-encoders for finer ranking. Furthermore, we conduct extensive experiments on different code candidate volumes and multiple programming languages to verify the effectiveness of TOSS. We also compare TOSS with six data fusion methods. Experimental results show that TOSS is not only efficient, but also achieves state-of-the-art accuracy with an overall mean reciprocal ranking (MRR) score of 0.763, compared to the best baseline result on the CodeSearchNet benchmark of 0.713.
研究主题
隐私计算
从数据分析、密码学角度看区块链未来
本文从数据分析及密码学的角度,结合最近微软亚洲研究院可信系统研究组发表的两篇论文,向大家介绍区块链技术的相关现状以及技术趋势
微软亚洲研究院与南大、科大等最新合作研究,助力模型高效推理和隐私保护
了解有关功耗优化、高效推理、以及创新隐私保护技术的最新研究
机器学习隐私研究新进展:数据增强风险被低估,新算法“降服”维数依赖
本文介绍微软亚洲研究院在机器学习隐私研究的最新进展,以及讨论在深度学习中的隐私攻击与保护
科学智能
科学智能(AI4Science)赋能科学发现的第五范式
微软研究院成立全新科学智能团队,专注于将第五范式变为现实
你真的了解计算生物学和AI for Science吗?
微软亚洲研究院副院长刘铁岩、首席研究员邵斌和主管研究员王童介绍了微软亚洲研究院计算生物学领域的最新研究,并对未来 AI for Science 的发展和融合进行了分享
AI挺进生命科学领域,分子动力学模拟加速新冠病毒致病机理研究进程
微软亚洲研究院与清华大学合作,利用分子动力学模拟技术,取得了新冠病毒机理研究的重要成果
可持续发展
气候变化、流行病、发展鸿沟…… 应对这些挑战我们还要做些什么?
来自世界各地的微软科学家们就“打造具有复原力和可持续发展的全球社会”进行了探讨
微软发起“气候研究倡议”,与全球学术界共促气候科学变革性创新
微软与领域专家协力探索,予力全球可持续发展
如何利用深度学习优化大气污染物排放量估算?
使用 AI 来帮助环境学家更精确地估算 NOx、SO2、VOC、一次 PM2.5 等污染物的排放量,并且把相对的估计误差降低了20%,极大提高了估算精度
行业赋能
AI与教育的深度融合,究竟什么是核心问题?
华东师范大学上海智能教育研究院郑蝉金副教授与微软亚洲研究院首席开发经理夏炎就 AI 与教育的结合和落地等问题进行了一场深入探讨的对话
谭旭:AI音乐,技术与艺术的碰撞
了解微软亚洲研究院在 AI 音乐创作领域的一系列研究成果,以及当前AI音乐生成所面临的研究挑战
AI、机器学习在材料科学研究中能发挥哪些作用?
中国科学院半导体研究所首席科学家汪林望教授与微软亚洲研究院副院长刘铁岩博士展开深入对话,深度解析当前材料领域研究现状、面临的挑战与问题,以及AI在材料科学中的应用方向和待解决的问题
科研活动