学术论文

Cascading Reinforcement Learning
Cascading bandits have gained popularity in recent years due to their applicability to recommendation systems and online advertising. In the cascading bandit model, at each timestep, an agent recommends an ordered subset of items (called an item list) from a pool of items, each associated with an unknown attraction probability. Then, the user examines the list, and clicks the first attractive item (if any), and after that, the agent receives a reward. The goal of the agent is to maximize the expected cumulative reward. However, the prior literature on cascading bandits ignores the influences of user states (e.g., historical behaviors) on recommendations and the change of states as the session proceeds. Motivated by this fact, we propose a generalized cascading RL framework, which considers the impact of user states and state transition into decisions. In cascading RL, we need to select items not only with large attraction probabilities but also leading to good successor states. This imposes a huge computational challenge due to the combinatorial action space. To tackle this challenge, we delve into the properties of value functions, and design an oracle BestPerm to efficiently find the optimal item list. Equipped with BestPerm, we develop two algorithms CascadingVI and CascadingBPI, which are both computation-efficient and sample-efficient, and provide near-optimal regret and sample complexity guarantees. Furthermore, we present experiments to show the improved computational and sample efficiencies of our algorithms compared to straightforward adaptations of existing RL algorithms in practice.
Real-Time Neural Video Recovery and Enhancement on Mobile Devices
As mobile devices become increasingly popular for video streaming, it’s crucial to optimize the streaming experience for these devices. Although deep learning-based video enhancement techniques are gaining attention, most of them cannot support real-time enhancement on mobile devices. Additionally, many of these techniques are focused solely on super-resolution and cannot handle partial or complete loss or corruption of video frames, which is common on the Internet and wireless networks. To overcome these challenges, we present a novel approach in this paper. Our approach consists of (i) a novel video frame recovery scheme, (ii) a new super-resolution algorithm, and (iii) a receiver enhancement-aware video bit rate adaptation algorithm. We have implemented our approach on an iPhone 12, and it can support 30 frames per second (FPS). We have evaluated our approach in various networks such as WiFi, 3G, 4G, and 5G networks. Our evaluation shows that our approach enables real-time enhancement and results in a significant increase in video QoE (Quality of Experience) of 24\% – 82\% in our video streaming system.
Fewer is More: Boosting LLM Reasoning with Reinforced Context Pruning
Large Language Models (LLMs) have shown impressive capabilities, yet they still struggle with math reasoning. In this work, we propose CoT-Influx, a novel approach that pushes the boundary of few-shot Chain-of-Thoughts (CoT) learning to improve LLM mathematical reasoning. Motivated by the observation that adding more concise CoT examples in the prompt can improve LLM reasoning performance, CoT-Influx employs a coarse-to-fine pruner to maximize the input of effective and concise CoT examples. The pruner first selects as many crucial CoT examples as possible and then prunes unimportant tokens to fit the context window. A math reasoning dataset with diverse difficulty levels and reasoning steps is used to train the pruner, along with a math-specialized reinforcement learning approach. As a result, by enabling more CoT examples with double the context window size in tokens, CoT-Influx significantly outperforms various prompting baselines across various LLMs (LLaMA2-7B, 13B, 70B) and 5 math datasets, achieving up to 4.55% absolute improvements. Remarkably, without any fine-tuning, LLaMA2-70B with CoT-Influx surpasses GPT-3.5 and a wide range of larger LLMs (PaLM, Minerva 540B, etc.) on the GSM8K. CoT-Influx serves as a plug-and-play module for LLMs and is compatible with most existing reasoning prompting techniques, such as self-consistency and self-verification.
ORES: Open-vocabulary Responsible Visual Synthesis
Avoiding synthesizing specific visual concepts is an essential challenge in responsible visual synthesis. However, the visual concept that needs to be avoided for responsible visual synthesis tends to be diverse, depending on the region, context, and usage scenarios. In this work, we formalize a new task, Open-vocabulary Responsible Visual Synthesis (ORES), where the synthesis model is able to avoid forbidden visual concepts while allowing users to input any desired content. To address this problem, we present a Two-stage Intervention (TIN) framework. By introducing 1) rewriting with learnable instruction through a large-scale language model (LLM) and 2) synthesizing with prompt intervention on a diffusion synthesis model, it can effectively synthesize images avoiding any concepts but following the user’s query as much as possible. To evaluate on ORES, we provide a publicly available dataset, baseline models, and benchmark. Experimental results demonstrate the effectiveness of our method in reducing risks of image generation. Our work highlights the potential of LLMs in responsible visual synthesis. Our code and dataset is public available.
HORIZON: High-Resolution Semantically Controlled Panorama Synthesis
Panorama synthesis endeavors to craft captivating 360-degree visual landscapes, immersing users in the heart of virtual worlds. Nevertheless, contemporary panoramic synthesis techniques grapple with the challenge of semantically guiding the content generation process. Although recent breakthroughs in visual synthesis have unlocked the potential for semantic control in 2D flat images, a direct application of these methods to panorama synthesis yields distorted content. In this study, we unveil an innovative framework for generating high-resolution panoramas, adeptly addressing the issues of spherical distortion and edge discontinuity through sophisticated spherical modeling. Our pioneering approach empowers users with semantic control, harnessing both image and text inputs, while concurrently streamlining the generation of high-resolution panoramas using parallel decoding. We rigorously evaluate our methodology on a diverse array of indoor and outdoor datasets, establishing its superiority over recent related work, in terms of both quantitative and qualitative performance metrics. Our research elevates the controllability, efficiency, and fidelity of panorama synthesis to new levels.
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Large context window is a desirable feature in large language models (LLMs). However, due to high fine-tuning costs, scarcity of long texts, and catastrophic values introduced by new token positions, current extended context windows are limited to around 128k tokens. This paper introduces LongRoPE that, for the first time, extends the context window of pre-trained LLMs to an impressive 2048k tokens, with up to only 1k fine-tuning steps at within 256k training lengths, while maintaining performance at the original short context window. This is achieved by three key innovations: (i) we identify and exploit two forms of non-uniformities in positional interpolation through an efficient search, providing a better initialization for fine-tuning and enabling an 8x extension in non-fine-tuning scenarios; (ii) we introduce a progressive extension strategy that first fine-tunes a 256k length LLM and then conducts a second positional interpolation on the fine-tuned extended LLM to achieve a 2048k context window; (iii) we readjust LongRoPE on 8k length to recover the short context window performance. Extensive experiments on LLaMA2 and Mistral across various tasks demonstrate the effectiveness of our method. Models extended via LongRoPE retain the original architecture with minor modifications to the positional embedding, and can reuse most pre-existing optimizations.
研究主题
社会责任人工智能
价值观罗盘:如何让大模型与人类价值观对齐?
人工智能应该与哪些价值观进行对齐?又该如何对齐?这些问题至今还没有明确的答案。为了解决这些挑战,微软亚洲研究院提出了价值观罗盘(Value Compass)项目,从交叉学科的角度切入,充分借鉴伦理学和社会学中的理论,以解决对价值观的定义、评测和对齐问题。
大模型时代,如何评估人工智能与人类智能?
在北京师范大学心理学部骆方教授的大力支持与协助下,微软亚洲研究院举办了“社会责任人工智能(Societal AI)”系列研讨会的心理与教育专题讨论。研讨会上,来自心理测量领域、教育领域以及计算机领域的顶尖专家们共同探讨了心理测量技术应用与人工智能测评的可行性、大模型如何赋能心理测评,并展望了人工智能辅助下的未来教育。
知识产权、隐私和技术滥用:如何面对大模型时代的法律与伦理挑战?
在中国人民大学法学院副教授郭锐的大力支持与协助下,微软亚洲研究院举办了“社会责任人工智能”系列研讨会的法律与伦理专题讨论。研讨会上,来自法律和计算机领域的顶尖专家们聚焦探讨了大模型与知识产权、大模型与隐私、大模型的技术滥用问题等人工智能发展所带来的与法律规范和社会伦理相关的问题,以期在这个最为紧迫且关键的话题上引发更多深入思考与探索。
多模态
KOSMOS-2.5:阅读文本密集图像的多模态大型语言模型
KOSMOS-2.5是一个基于文本密集图像的多模态大型语言模型,它在KOSMOS-2的基础上发展而来,突出了对于文本密集图像的多模态阅读和理解能力。KOSMOS-2.5的目标是在文本丰富的图像中实现无缝的视觉和文本数据处理,以便理解图像内容并生成结构化的文本描述。
强可控视频生成模型 DragNUWA
DragNUWA允许用户直接在图像中拖拽物体或背景,然后模型会自动将拖拽操作转化为合理的运镜或物体的运动,并生成相应的视频。通过融合文本、图像和轨迹三个关键控制因素,DragNUWA在语义、空间和时间三个层面均实现了卓越的可控视频生成能力。
文档基础模型引领文档智能走向多模态大一统
微软亚洲研究院在文档智能领域开发的一系列多模态任务的文档基础模型,在诸如表单、收据、发票、报告等视觉富文本文档数据集上都取得了优异的表现,获得了学术界和产业界的广泛认可,并已应用在多个微软产品中,赋能企业和机构的数字化转型。
科学智能
MatterSim:人工智能解锁材料设计的无限可能
新材料的物理和化学特性复杂多变,准确预测其属性,特别是实际合成和使用条件下的属性,是材料工业数字化转型的核心挑战之一。为了破解这一难题,微软研究院科学智能中心开发了深度学习模型 MatterSim,能够在广泛的元素、温度和压力范围内实现准确高效的材料模拟与性质预测,为材料设计的数字化转型提供了强有力的支持。
AI助力M-OFDFT实现兼具精度与效率的电子结构方法
为了使电子结构方法突破当前广泛应用的密度泛函理论(KSDFT)所能求解的分子体系规模,微软研究院科学智能中心的研究员们基于人工智能技术和无轨道密度泛函理论(OFDFT)开发了一种新的电子结构计算框架 M-OFDFT。这一框架不仅保持了与 KSDFT 相当的计算精度,而且在计算效率上实现了显著提升,并展现了优异的外推性能,为分子科学研究中诸多计算方法的基础——电子结构方法开辟了新的思路。
ViSNet:用于分子性质预测和动力学模拟的通用分子结构建模网络
尽管几何深度学习已经彻底颠覆了分子建模领域,但最先进的算法在实际应用中仍然面临着几何信息利用不足和高昂计算成本的阻碍。为此,微软研究院科学智能中心(Microsoft Research AI4Science)的研究员们提出了通用分子结构建模网络 ViSNet。在多个分子动力学基准测试中,ViSNet 均表现优异。
可持续发展
以科技之力,守护地球家园:微软亚洲研究院助力实现可持续发展
在2023年第50个世界环境日之际,让我们一起来了解一下微软亚洲研究院在可持续发展方面的科研和应用成果
微软发起“气候研究倡议”,与全球学术界共促气候科学变革性创新
微软与领域专家协力探索,予力全球可持续发展
气候变化、流行病、发展鸿沟…… 应对这些挑战我们还要做些什么?
来自世界各地的微软科学家们就“打造具有复原力和可持续发展的全球社会”进行了探讨
行业赋能
守护人类健康:人工智能赋能医疗领域创新应用
无论是辅助疾病的早期检测发现、病情发展预测,还是在个性化的精准医疗,以及推进医学研究和新药研发,人工智能都展现出了其独特的价值和潜力。在过去的几年中,微软亚洲研究院持续与医疗机构和高校的专家密切合作,并且引进医疗健康领域的专业人才,希望推动人工智能技术在医疗健康领域的深入应用,促进构建健康的全球社会。
机器学习开源工具BatteryML,一站式分析与预测电池性能
为了更好地分析电池性能,预测电池使用寿命,微软亚洲研究院开发并开源了一站式机器学习工具 BatteryML,希望可以集结更多的专业力量,共同推动电池领域的研究。
Qlib全新升级:强化学习能否重塑金融决策模式?
经过两年多的深入探索,Qlib 迎来了重大更新,在原有的 AI 量化金融框架基础上,又引入了基于强化学习和元学习的新范式以及订单执行优化和市场动态性建模的新场景,帮助相关从业者使用更先进和多样的人工智能技术来应对更复杂的金融挑战。
科研活动