Research
Sequential Decision Making with Expert Demonstrations under Unobserved Heterogeneity [arXiv] , [Code]
NeurIPS 2024
We present the Experts-as-Priors (ExPerior) algorithm for online decision-making from offline expert demonstrations under unobserved contextual information. ExPerior frames this setting as a zero-shot meta-reinforcement learning task. It employs a non-parametric empirical Bayes approach to enhance decision-making with a maximum entropy-based informative prior. Our method surpasses existing algorithms in leveraging expert demonstrations across various setups.
Personalized Adaptation via In-Context Preference Learning [PDF]
NeurIPS 2024 Workshop on Adaptive Foundation Models
We introduce the Preference Pretrained Transformer (PPT), a new approach to personalize Language Models by using online user feedback to adapt to individual preferences, addressing a common limitation in Reinforcement Learning from Human Feedback. PPT employs the in-context learning capabilities of transformers for adaptive personalization through two phases: an offline phase, where a single policy model is trained with a history-dependent loss function, and an online phase, where the model dynamically adjusts to user preferences.
Partial Identification of Treatment Effects with Implicit Generative Models [arXiv] , [Code]
NeurIPS 2022
We address the challenge of partial identification, focusing on estimating bounds for treatment effects from observational data using deep generative modeling. We introduce a new method for identifying average treatment effects (ATEs) in general causal graphs, using implicit generative models with continuous and discrete variables. Our algorithm converges to tight ATE bounds in linear structural causal models (SCMs), and for nonlinear SCMs, the provided bounds are tighter and more stable than existing methods.
Learning to Switch Among Agents in a Team via 2-Layer Markov Decision Processes [arXiv], [Code]
Transactions on Machine Learning Research 2022
In this work, we propose an algorithm that enables reinforcement learning agents to operate under different levels of automation by dynamically switching control among agents. We introduce a framework using a 2-layer Markov decision process to formalize the problem of agent control switching. Our online learning algorithm employs upper confidence bounds on agents' policies and the environment's transition probabilities to identify optimal switching strategies. The algorithm achieves sublinear regret relative to the optimal switching policy.
Order-based Structure Learning with Normalizing Flows [arXiv] , [Code]
Under Review at AAAI 2025
We introduce a framework to estimate causal structures from observational data without relying heavily on the typical assumptions of additive noise models (ANMs). Traditional methods face challenges due to the super-exponential scaling of the problem with graph size and often use continuous relaxations that restrict the data-generating process. Our method incorporates autoregressive normalizing flows to relax these constraints and employs a differentiable permutation learning method to efficiently search over topological orderings, ensuring acyclicity in structure discovery.