学术论文

LEMON: Language-Based Environment Manipulation via Execution-Guided Pre-training
Language-based environment manipulation requires agents to manipulate the environment following natural language instructions, which is challenging due to the huge space of the environments. To address this challenge, various approaches have been proposed in recent work. Although these approaches work well for their intended environments, they are difficult to generalize across environments. In this work, we propose LEMON, a general framework for language-based environment manipulation tasks. Specifically, we first specify a task-agnostic approach for language-based environment manipulation tasks, which can deal with various environments using the same generative language model. Then we propose an execution-guided pre-training strategy to inject prior knowledge of environments to the language model with a pure synthetic pre-training corpus. Experimental results on tasks including Alchemy, Scene, Tangrams, ProPara and Recipes demonstrate the effectiveness of LEMON: it achieves new state-of-the-art results on four of the tasks, and the execution-guided pre-training strategy brings remarkable improvements on all experimental tasks.
TIARA: Multi-grained Retrieval for Robust Question Answering over Large Knowledge Bases
Pre-trained language models (PLMs) have shown their effectiveness in multiple scenarios. However, KBQA remains challenging, especially regarding coverage and generalization settings. This is due to two main factors: i) understanding the semantics of both questions and relevant knowledge from the KB; ii) generating executable logical forms with both semantic and syntactic correctness. In this paper, we present a new KBQA model, TIARA, which addresses those issues by applying multi-grained retrieval to help the PLM focus on the most relevant KB contexts, viz., entities, exemplary logical forms, and schema items. Moreover, constrained decoding is used to control the output space and reduce generation errors. Experiments over important benchmarks demonstrate the effectiveness of our approach. TIARA outperforms previous SOTA, including those using PLMs or oracle entity annotations, by at least 4.1 and 1.1 F1 points on GrailQA and WebQuestionsSP, respectively. Code is available at https://github.com/microsoft/KC/tree/main/papers/TIARA.
SPINE: A Scalable Log Parser with Feedback Guidance
Log parsing, which extracts log templates and parameters, is a critical prerequisite step for automated log analysis techniques. Though existing log parsers have achieved promising accuracy on public log datasets, they still face many challenges when applied in the industry. Through studying the characteristics of real-world log data and analyzing the limitations of existing log parsers, we identify two problems. Firstly, it is non-trivial to scale a log parser to a vast number of logs, especially in real-world scenarios where the log data is extremely imbalanced. Secondly, existing log parsers overlook the importance of user feedback, which is imperative for parser fine-tuning under the continuous evolution of log data. To overcome the challenges, we propose SPINE, which is a highly scalable log parser with user feedback guidance. Based on our log parser equipped with initial grouping and progressive clustering,we propose a novel log data scheduling algorithm to improve the efficiency of parallelization under the large-scale imbalanced log data. Besides, we introduce user feedback to make the parser fast adapt to the evolving logs. We evaluated SPINE on 16 public log datasets. SPINE achieves more than 0.90 parsing accuracy on average with the highest parsing efficiency, which outperforms the state-of-the-art log parsers. We also evaluated SPINE in the production environment of Microsoft, in which SPINE can parse 30million logs in less than 8 minutes under 16 executors, achieving near real-time performance. In addition, our evaluations show that SPINE can consistently achieve good accuracy under log evolution with a moderate number of user feedback.
Towards Robust Numerical Question Answering: Diagnosing Numerical Capabilities of NLP Systems
Numerical Question Answering is the task of answering questions that require numerical capabilities. Previous works introduce general adversarial attacks to Numerical Question Answering, while not systematically exploring numerical capabilities specific to the topic. In this paper, we propose to conduct numerical capability diagnosis on a series of Numerical Question Answering systems and datasets. A series of numerical capabilities are highlighted, and corresponding dataset perturbations are designed. Empirical results indicate that existing systems are severely challenged by these perturbations. E.g., Graph2Tree experienced a 53.83% absolute accuracy drop against the “Extra” perturbation on ASDiv-a, and BART experienced 13.80% accuracy drop against the “Language” perturbation on the numerical subset of DROP. As a counteracting approach, we also investigate the effectiveness of applying perturbations as data augmentation to relieve systems’ lack of robust numerical capabilities. With experiment analysis and empirical studies, it is demonstrated that Numerical Question Answering with robust numerical capabilities is still to a large extent an open question. We discuss future directions of Numerical Question Answering and summarize guidelines on future dataset collection and system design.
FormLM: Recommending Creation Ideas for Online Forms by Modelling Semantic and Structural Information
Online forms are widely used to collect data from human and have a multi-billion market. Many software products provide online services for creating semi-structured forms where questions and descriptions are organized by predefined structures. However, the design and creation process of forms is still tedious and requires expert knowledge. To assist form designers, in this work we present FormLM to model online forms (by enhancing pre-trained language model with form structural information) and recommend form creation ideas (including question / options recommendations and block type suggestion). For model training and evaluation, we collect the first public online form dataset with 62K online forms. Experiment results show that FormLM significantly outperforms general-purpose language models on all tasks, with an improvement by 4.71 on Question Recommendation and 10.6 on Block Type Suggestion in terms of ROUGE-1 and Macro-F1, respectively.
A Disaggregate Data Collecting Approach for Loss-Tolerant Applications
Datacenter generates operation data at an extremely high rate, and data center operators collect and analyze them for problem diagnosis, resource utilization improvement, and performance optimization. However, existing data collection methods fail to efficiently aggregate and store data at extremely high speed and scale. In this paper, we explore a new approach that leverages programmable switches to aggregate data and directly write data to the destination storage. Our proposed data collection system, ALT, uses programmable switches to control NVMe SSDs on remote hosts without the involvement of a remote CPU. To tolerate loss, ALT uses an elegant data structure to enable efficient data recovery when retrieving the collected data. We implement our system on a Tofino-based programmable switch for a prototype. Our evaluation shows that ALT can saturate SSD’s peak performance without any CPU involvement.
研究主题
隐私计算
从数据分析、密码学角度看区块链未来
本文从数据分析及密码学的角度,结合最近微软亚洲研究院可信系统研究组发表的两篇论文,向大家介绍区块链技术的相关现状以及技术趋势
微软亚洲研究院与南大、科大等最新合作研究,助力模型高效推理和隐私保护
了解有关功耗优化、高效推理、以及创新隐私保护技术的最新研究
机器学习隐私研究新进展:数据增强风险被低估,新算法“降服”维数依赖
本文介绍微软亚洲研究院在机器学习隐私研究的最新进展,以及讨论在深度学习中的隐私攻击与保护
科学智能
科学智能(AI4Science)赋能科学发现的第五范式
微软研究院成立全新科学智能团队,专注于将第五范式变为现实
你真的了解计算生物学和AI for Science吗?
微软亚洲研究院副院长刘铁岩、首席研究员邵斌和主管研究员王童介绍了微软亚洲研究院计算生物学领域的最新研究,并对未来 AI for Science 的发展和融合进行了分享
AI挺进生命科学领域,分子动力学模拟加速新冠病毒致病机理研究进程
微软亚洲研究院与清华大学合作,利用分子动力学模拟技术,取得了新冠病毒机理研究的重要成果
可持续发展
气候变化、流行病、发展鸿沟…… 应对这些挑战我们还要做些什么?
来自世界各地的微软科学家们就“打造具有复原力和可持续发展的全球社会”进行了探讨
微软发起“气候研究倡议”,与全球学术界共促气候科学变革性创新
微软与领域专家协力探索,予力全球可持续发展
如何利用深度学习优化大气污染物排放量估算?
使用 AI 来帮助环境学家更精确地估算 NOx、SO2、VOC、一次 PM2.5 等污染物的排放量,并且把相对的估计误差降低了20%,极大提高了估算精度
行业赋能
AI与教育的深度融合,究竟什么是核心问题?
华东师范大学上海智能教育研究院郑蝉金副教授与微软亚洲研究院首席开发经理夏炎就 AI 与教育的结合和落地等问题进行了一场深入探讨的对话
谭旭:AI音乐,技术与艺术的碰撞
了解微软亚洲研究院在 AI 音乐创作领域的一系列研究成果,以及当前AI音乐生成所面临的研究挑战
AI、机器学习在材料科学研究中能发挥哪些作用?
中国科学院半导体研究所首席科学家汪林望教授与微软亚洲研究院副院长刘铁岩博士展开深入对话,深度解析当前材料领域研究现状、面临的挑战与问题,以及AI在材料科学中的应用方向和待解决的问题
科研活动