A large-scale empirical study of commit message generation: models, datasets and evaluation
Commit messages are natural language descriptions of code changes, which are important for program understanding and maintenance. However, writing commit messages manually is time-consuming and laborious, especially when the code is updated frequently. Various approaches utilizing generation or retrieval techniques have been proposed to automatically generate commit messages. To achieve a better understanding of how the existing approaches perform in solving this problem, this paper conducts a systematic and in-depth analysis of the state-of-the-art models and datasets. We find that: (1) Different variants of the BLEU metric used in previous works affect the evaluation. (2) Most datasets are crawled only from Java repositories while repositories in other programming languages are not sufficiently explored. (3) Dataset splitting strategies can influence the performance of existing models by a large margin. (4) For pre-trained models, fune-tuning with different multi-programming-language combinations can influence their performance. Based on these findings, we collect a large-scale, information-rich, Multi-language Commit Message Dataset (MCMD). Using MCMD, we conduct extensive experiments under different experiment settings including splitting strategies and multi-programming-language combinations. Furthermore, we provide suggestions for comprehensively evaluating commit message generation models and discuss possible future research directions. We believe our work can help practitioners and researchers better evaluate and select models for automatic commit message generation.
DIGMN: Dynamic Intent Guided Meta Network for Differentiated User Engagement Forecasting in Online Professional Social Platforms
User engagement prediction plays a critical role for designing interaction strategies to grow user engagement and increase revenue in online social platforms. Through the in-depth analysis of the real-world data from the world’s largest professional social platforms, i.e., LinkedIn, we find that users expose diverse engagement patterns, and a major reason for the differences in user engagement patterns is that users have different intents. That is, people have different intents when using LinkedIn, e.g., applying for jobs, building connections, or checking notifications, which shows quite different engagement patterns. Meanwhile, user intents and the corresponding engagement patterns may change over time. Although such pattern differences and dynamics are essential for user engagement prediction, differentiating user engagement patterns based on user dynamic intents for better user engagement forecasting has not received enough attention in previous works. In this paper, we proposed a Dynamic Intent Guided Meta Network (DIGMN), which can explicitly model user intent varying with time and perform differentiated user engagement forecasting. Specifically, we derive some interpretable basic user intents as prior knowledge from data mining and introduce prior intents in explicitly modeling dynamic user intent. Furthermore, based on the dynamic user intent representations, we propose a meta predictor to perform differentiated user engagement forecasting. Through a comprehensive evaluation on LinkedIn anonymous user data, our method outperforms state-of-the-art baselines significantly, i.e., 2.96% and 3.48% absolute error reduction, on coarse-grained and fine-grained user engagement prediction tasks, respectively, demonstrating the effectiveness of our method.
VRL3: A Data-Driven Framework for Visual Deep Reinforcement Learning
We propose VRL3, a powerful data-driven framework with a simple design for solving challenging visual deep reinforcement learning (DRL) tasks. We analyze a number of major obstacles in taking a data-driven approach, and present a suite of design principles, novel findings, and critical insights about data-driven visual DRL. Our framework has three stages: in stage 1, we leverage non-RL datasets (e.g. ImageNet) to learn task-agnostic visual representations; in stage 2, we use offline RL data (e.g. a limited number of expert demonstrations) to convert the task-agnostic representations into more powerful task-specific representations; in stage 3, we fine-tune the agent with online RL. On a set of challenging hand manipulation tasks with sparse reward and realistic visual inputs, compared to the previous SOTA, VRL3 achieves an average of 780% better sample efficiency. And on the hardest task, VRL3 is 1220% more sample efficient (2440% when using a wider encoder) and solves the task with only 10% of the computation. These significant results clearly demonstrate the great potential of data-driven deep reinforcement learning.
VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts
We present a unified Vision-Language pretrained Model (VLMo) that jointly learns a dual encoder and a fusion encoder with a modular Transformer network. Specifically, we introduce Mixture-of-Modality-Experts (MoME) Transformer, where each block contains a pool of modality-specific experts and a shared self-attention layer. Because of the modeling flexibility of MoME, pretrained VLMo can be fine-tuned as a fusion encoder for vision-language classification tasks, or used as a dual encoder for efficient image-text retrieval. Moreover, we propose a stagewise pre-training strategy, which effectively leverages large-scale image-only and text-only data besides image-text pairs. Experimental results show that VLMo achieves state-of-the-art results on various vision-language tasks, including VQA and NLVR2. The code and pretrained models are available at https://aka.ms/vlmo.
Visual Concepts Tokenization
Obtaining the human-like perception ability of abstracting visual concepts from concrete pixels has always been a fundamental and important target in machine learning research fields such as disentangled representation learning and scene decomposition. Towards this goal, we propose an unsupervised transformer-based Visual Concepts Tokenization framework, dubbed VCT, to perceive an image into a set of disentangled visual concept tokens, with each concept token responding to one type of independent visual concept. Particularly, to obtain these concept tokens, we only use cross-attention to extract visual information from the image tokens layer by layer without self-attention between concept tokens, preventing information leakage across concept tokens. We further propose a Concept Disentangling Loss to facilitate that different concept tokens represent independent visual concepts. The cross-attention and disentangling loss play the role of induction and mutual exclusion for the concept tokens, respectively. Extensive experiments on several popular datasets verify the effectiveness of VCT on the tasks of disentangled representation learning and scene decomposition. VCT achieves the state-of-the-art results by a large margin.
Revisiting Code Search in a Two-Stage Paradigm
With a good code search engine, developers can reuse existing code snippets and accelerate software development process. Current code search methods can be divided into two categories: traditional information retrieval (IR) based and deep learning (DL) based approaches. DL-based approaches include the cross-encoder paradigm and the bi-encoder paradigm. However, both approaches have certain limitations. The inference of IR-based and bi-encoder models are fast, however, they are not accurate enough; while cross-encoder models can achieve higher search accuracy but consume more time. In this work, we propose TOSS, a two-stage fusion code search framework that can combine the advantages of different code search methods. TOSS first uses IR-based and bi-encoder models to efficiently recall a small number of top-k code candidates, and then uses fine-grained cross-encoders for finer ranking. Furthermore, we conduct extensive experiments on different code candidate volumes and multiple programming languages to verify the effectiveness of TOSS. We also compare TOSS with six data fusion methods. Experimental results show that TOSS is not only efficient, but also achieves state-of-the-art accuracy with an overall mean reciprocal ranking (MRR) score of 0.763, compared to the best baseline result on the CodeSearchNet benchmark of 0.713.
你真的了解计算生物学和AI for Science吗?
微软亚洲研究院副院长刘铁岩、首席研究员邵斌和主管研究员王童介绍了微软亚洲研究院计算生物学领域的最新研究,并对未来 AI for Science 的发展和融合进行了分享
气候变化、流行病、发展鸿沟…… 应对这些挑战我们还要做些什么?
使用 AI 来帮助环境学家更精确地估算 NOx、SO2、VOC、一次 PM2.5 等污染物的排放量,并且把相对的估计误差降低了20%,极大提高了估算精度
华东师范大学上海智能教育研究院郑蝉金副教授与微软亚洲研究院首席开发经理夏炎就 AI 与教育的结合和落地等问题进行了一场深入探讨的对话
了解微软亚洲研究院在 AI 音乐创作领域的一系列研究成果,以及当前AI音乐生成所面临的研究挑战