学术论文

XInsight: eXplainable Data Analysis Through The Lens of Causality
In light of the growing popularity of Exploratory Data Analysis (EDA), understanding the underlying causes of the knowledge acquired by EDA is crucial. However, it remains under-researched. This study promotes a transparent and explicable perspective on data analysis, called eXplainable Data Analysis (XDA). For this reason, we present XInsight, a general framework for XDA. XInsight provides data analysis with qualitative and quantitative explanations of causal and non-causal semantics. This way, it will significantly improve human understanding and confidence in the outcomes of data analysis, facilitating accurate data interpretation and decision making in the real world. XInsight is a three-module, end-to-end pipeline designed to extract causal graphs, translate causal primitives into XDA semantics, and quantify the quantitative contribution of each explanation to a data fact. XInsight uses a set of design concepts and optimizations to address the inherent difficulties associated with integrating causality into XDA. Experiments on synthetic and real-world datasets as well as a user study demonstrate the highly promising capabilities of XInsight.
Leveraging Pretrained Representations with Task-related Keywords for Alzheimer’s Disease Detection
With the global population aging rapidly, Alzheimer’s disease (AD) is particularly prominent in older adults, which has an insidious onset and leads to a gradual, irreversible deterioration in cognitive domains (memory, communication, etc.). Speech-based AD detection opens up the possibility of widespread screening and timely disease intervention. Recent advances in pre-trained models motivate AD detection modeling to shift from low-level features to high-level representations. This paper presents several efficient methods to extract better AD-related cues from high-level acoustic and linguistic features. Based on these features, the paper also proposes a novel task-oriented approach by modeling the relationship between the participants’ description and the cognitive task. Experiments are carried out on the ADReSS dataset in a binary classification setup, and models are evaluated on the unseen test set. Results and comparison with recent literature demonstrate the efficiency and superior performance of proposed acoustic, linguistic and task-oriented methods. The findings also show the importance of semantic and syntactic information, and feasibility of automation and generalization with the promising audio-only and task-oriented methods for the AD detection task.
A Hierarchical Regression Chain Framework for Affective Vocal Burst Recognition
As a common way of emotion signaling via non-linguistic vocalizations, vocal burst (VB) plays an important role in daily social interaction. Understanding and modeling human vocal bursts are indispensable for developing robust and general artificial intelligence. Exploring computational approaches for understanding vocal bursts is attracting increasing research attention. In this work, we propose a hierarchical framework, based on chain regression models, for affective recognition from VBs, that explicitly considers multiple relationships: (i) between emotional states and diverse cultures; (ii) between low-dimensional (arousal & valence) and high-dimensional (10 emotion classes) emotion spaces; and (iii) between various emotion classes within the high-dimensional space. To address the challenge of data sparsity, we also use self-supervised learning (SSL) representations with layer-wise and temporal aggregation modules. The proposed systems participated in the ACII Affective Vocal Burst (A-VB) Challenge 2022 and ranked first in the “TWO” and “CULTURE” tasks. Experimental results based on the ACII Challenge 2022 dataset demonstrate the superior performance of the proposed system and the effectiveness of considering multiple relationships using hierarchical regression chain models.
Collaborative Pure Exploration in Kernel Bandit
In this paper, we propose a novel Collaborative Pure Exploration in Kernel Bandit model (CoPE-KB), where multiple agents collaborate to complete different but related tasks with limited communication. Our model generalizes prior CoPE formulation with the single-task and classic MAB setting to allow multiple tasks and general reward structures. We propose a novel communication scheme with an efficient kernelized estimator, and design algorithms CoKernelFC and CoKernelFB for CoPE-KB with fixed-confidence and fixed-budget objectives, respectively. Sample and communication complexities are provided to demonstrate the efficiency of our algorithms. Our theoretical results explicitly quantify how task similarities influence learning speedup, and only depend on the effective dimension of feature space. Our novel techniques, such as an efficient kernelized estimator and decomposition of task similarities and arm features, which overcome the communication difficulty in high-dimensional feature space and reveal the impacts of task similarities on sample complexity, can be of independent interests.
Combinatorial Pure Exploration of Causal Bandits
The combinatorial pure exploration of causal bandits is the following online learning task: given a causal graph with unknown causal inference distributions, in each round we choose a subset of variables to intervene or do no intervention, and observe the random outcomes of all random variables, with the goal that using as few rounds as possible, we can output an intervention that gives the best (or almost best) expected outcome on the reward variable Y with probability at least 1 − δ, where δ is a given confidence level. We provide the first gap-dependent and fully adaptive pure exploration algorithms on two types of causal models —the binary generalized linear model (BGLM) and general graphs. For BGLM, our algorithm is the first to be designed specifically for this setting and achieves polynomial sample complexity, while all existing algorithms for general graphs have either sample complexity exponential to the graph size or some unreasonable assumptions. For general graphs, our algorithm provides a significant improvement on sample complexity, and it nearly matches the lower bound we prove. Our algorithms achieve such improvement by a novel integration of prior causal bandit algorithms and prior adaptive pure exploration algorithms, the former of which utilize the rich observational feedback in causal bandits but are not adaptive to reward gaps, while the latter of which have the issue in reverse.
Exploring the Limits of Differentially Private Deep Learning with Group-wise Clipping
Differentially private deep learning has recently witnessed advances in computational efficiency and privacy-utility trade-off. We explore whether further improvements along the two axes are possible and provide affirmative answers leveraging two instantiations of \emph{group-wise clipping}. To reduce the compute time overhead of private learning, we show that \emph{per-layer clipping}, where the gradient of each neural network layer is clipped separately, allows clipping to be performed in conjunction with backpropagation in differentially private optimization. This results in private learning that is as memory-efficient and almost as fast per training update as non-private learning for many workflows of interest. While per-layer clipping with constant thresholds tends to underperform standard flat clipping, per-layer clipping with adaptive thresholds matches or outperforms flat clipping under given training epoch constraints, hence attaining similar or better task performance within less wall time. To explore the limits of scaling (pretrained) models in differentially private deep learning, we privately fine-tune the 175 billion-parameter GPT-3. We bypass scaling challenges associated with clipping gradients that are distributed across multiple devices with \emph{per-device clipping} that clips the gradient of each model piece separately on its host device. Privately fine-tuning GPT-3 with per-device clipping achieves a task performance at epsilon=1 better than what is attainable by non-privately fine-tuning the largest GPT-2 on a summarization task.
研究主题
多模态
NUWA系列再添新成员——超长视频生成模型NUWA-XL
近期,微软亚洲研究院 NUWA 多模态生成模型家族迎来了新成员——NUWA-XL,其以创新的 Diffusion over Diffusion 架构,首次实现了高质量超长视频的并行生成,为多模态大模型提供了新的解题思路。
文档基础模型引领文档智能走向多模态大一统
微软亚洲研究院在文档智能领域开发的一系列多模态任务的文档基础模型,在诸如表单、收据、发票、报告等视觉富文本文档数据集上都取得了优异的表现,获得了学术界和产业界的广泛认可,并已应用在多个微软产品中,赋能企业和机构的数字化转型。
通用多模态基础模型BEiT-3:引领文本、图像、多模态预训练迈向“大一统”
微软亚洲研究院联合微软图灵团队推出的 BEiT-3 预训练模型,在广泛的视觉及视觉-语言任务上实现了 SOTA 的迁移性能。
科学智能
科学智能(AI4Science)赋能科学发现的第五范式
微软研究院成立全新科学智能团队,专注于将第五范式变为现实
你真的了解计算生物学和AI for Science吗?
微软亚洲研究院副院长刘铁岩、首席研究员邵斌和主管研究员王童介绍了微软亚洲研究院计算生物学领域的最新研究,并对未来 AI for Science 的发展和融合进行了分享
AI挺进生命科学领域,分子动力学模拟加速新冠病毒致病机理研究进程
微软亚洲研究院与清华大学合作,利用分子动力学模拟技术,取得了新冠病毒机理研究的重要成果
可持续发展
气候变化、流行病、发展鸿沟…… 应对这些挑战我们还要做些什么?
来自世界各地的微软科学家们就“打造具有复原力和可持续发展的全球社会”进行了探讨
微软发起“气候研究倡议”,与全球学术界共促气候科学变革性创新
微软与领域专家协力探索,予力全球可持续发展
如何利用深度学习优化大气污染物排放量估算?
使用 AI 来帮助环境学家更精确地估算 NOx、SO2、VOC、一次 PM2.5 等污染物的排放量,并且把相对的估计误差降低了20%,极大提高了估算精度
行业赋能
AI+生物医药,如何双向赋能?
近年来,人工智能的深入发展助力生物医学研究取得了重大突破。“AI+生物医药”成为了学术界和产业界都非常关注的热门赛道。在后疫情时代, “AI+生物医药”能否保持强劲的发展态势,又将面临哪些机遇与挑战?
AI与教育的深度融合,究竟什么是核心问题?
华东师范大学上海智能教育研究院郑蝉金副教授与微软亚洲研究院首席开发经理夏炎就 AI 与教育的结合和落地等问题进行了一场深入探讨的对话
AI、机器学习在材料科学研究中能发挥哪些作用?
中国科学院半导体研究所首席科学家汪林望教授与微软亚洲研究院副院长刘铁岩博士展开深入对话,深度解析当前材料领域研究现状、面临的挑战与问题,以及AI在材料科学中的应用方向和待解决的问题
科研活动