Learning Sequential Decisions from Multiple Sources via Group-Robust Markov Decision Processes

📅 2026-02-02

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

This work addresses the challenge of policy learning from multi-source heterogeneous offline data, where distributional shifts across sites hinder effective generalization. The authors propose an offline reinforcement learning framework grounded in a group-robust Markov decision process. By employing a shared feature mapping to capture both commonality and heterogeneity across sites, the method constructs a feature-level uncertainty set that preserves cross-site structure and enables tractable robust Bellman recursion. Site-wise ridge regression is used to estimate Bellman targets, combined with a worst-case aggregation over features and a data-dependent pessimistic penalty. A clustering-based extension further enhances sample efficiency while circumventing the restrictive state-action rectangularity assumption. Under a robust partial coverage condition, the approach enjoys provable suboptimality bounds for the learned policy, yielding efficient and robust sequential decision-making.

Technology Category

Application Category

📝 Abstract

We often collect data from multiple sites (e.g., hospitals) that share common structure but also exhibit heterogeneity. This paper aims to learn robust sequential decision-making policies from such offline, multi-site datasets. To model cross-site uncertainty, we study distributionally robust MDPs with a group-linear structure: all sites share a common feature map, and both the transition kernels and expected reward functions are linear in these shared features. We introduce feature-wise (d-rectangular) uncertainty sets, which preserve tractable robust Bellman recursions while maintaining key cross-site structure. Building on this, we then develop an offline algorithm based on pessimistic value iteration that includes: (i) per-site ridge regression for Bellman targets, (ii) feature-wise worst-case (row-wise minimization) aggregation, and (iii) a data-dependent pessimism penalty computed from the diagonals of the inverse design matrices. We further propose a cluster-level extension that pools similar sites to improve sample efficiency, guided by prior knowledge of site similarity. Under a robust partial coverage assumption, we prove a suboptimality bound for the resulting policy. Overall, our framework addresses multi-site learning with heterogeneous data sources and provides a principled approach to robust planning without relying on strong state-action rectangularity assumptions.

Problem

Research questions and friction points this paper is trying to address.

multi-site learning

distributional robustness

Markov Decision Processes

heterogeneous data

offline reinforcement learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

distributionally robust MDP

multi-site offline RL

feature-wise uncertainty