Research

My research develops mathematical and conceptual foundations for fallible AI systems. The result so far has been an elegant unifying picture that explains many standard but seemingly ad-hoc choices made in practice. A key technical ingredient is a class of models I invented called Probabilistic Dependency Graphs (PDGs) which subsume traditional graphical models, yet can model inconsistent beliefs and most scenarios in machine learning. Indeed, many important algorithms in AI turn out to be instances of an intuitive heuristic approach to resolving probabilistic inconsistency.

For an overview, see my research statement [last update March 2024] ;
for (a great deal) more, see my dissertation.

Papers and Publications

Legend

thesis

workshop

preprint

conference

journal

spotlight

Local Inconsistency Resolution: The Interplay between Attention and Control in Probabilistic Models

Oliver Richardson, Mehran Shakerinava, Mandana Samiei, Abdessamad El Kabid, Joseph Viviano, Ali Parviz, and Yoshua Bengio
AISTATS, May 2026

Abstract. We present a generic algorithm for learning and approximate inference with an intuitive epistemic interpretation: iteratively focus on a subset of the model and resolve inconsistencies using the parameters under control. This framework, which we call Local Inconsistency Resolution (LIR) is built upon Probabilistic Dependency Graphs (PDGs), which provide a flexible representational foundation capable of capturing inconsistent beliefs. We show how LIR unifies and generalizes a wide variety of important algorithms in the literature, including the Expectation-Maximization (EM) algorithm, belief propagation, adversarial training, GANs, and GFlowNets. In the last case, LIR actually suggests a more natural loss, which we demonstrate improves GFlowNet convergence. Each method can be recovered as a specific instance of LIR by choosing a procedure to direct focus (attention and control). We implement this algorithm for discrete PDGs and study its properties on synthetically generated PDGs, comparing its behavior to the global optimization semantics of the full PDG.

Notes. Earlier versions of the core idea in this work in “Local Resolution Algorithm, presented at ICML workshops in 2023, and in Chapter 7 of my PhD dissertation. My co-authors helped develop a general implementation and synthetic experiments, test the hypotheses posed by the paper, and generally bring it into the form the paper is in now.

arXiv poster code
Language models recognize dropout and Gaussian noise applied to their activations

Damiano Fornasiere^*, Mirko Bronzi^*, Spencer Kitts^*, Alessandro Palmas, Yoshua Bengio^†, and Oliver Richardson^†
preprint (submitted to COLM), April 2026

arXiv webpage code
Latent Veracity Inference for Identifying Errors in Stepwise Reasoning

Minsu Kim, Jean-Pierre R. Falet, Oliver Ethan Richardson, Xiaoyin Chen, Moksh Jain, Sungjin Ahn, Sungsoo Ahn, and Yoshua Bengio
ICLR, 2026

Abstract. Chain-of-Thought (CoT) reasoning has advanced the capabilities and transparency of language models (LMs); however, reasoning chains can contain inaccurate statements that reduce performance and trustworthiness. To address this, we propose to augment each reasoning step in a CoT with a latent veracity (or correctness) variable. To efficiently explore this expanded space, we introduce Veracity Search (VS), a discrete search algorithm over veracity assignments. It performs otherwise intractable inference in the posterior distribution over latent veracity values by leveraging the LM's joint likelihood over veracity and the final answer as a proxy reward. This efficient inference-time verification method facilitates supervised fine-tuning of an Amortized Veracity Inference (AVI) machine by providing pseudo-labels for veracity. AVI generalizes VS, enabling accurate zero-shot veracity inference in novel contexts. Empirical results demonstrate that VS reliably identifies errors in logical (ProntoQA), mathematical (GSM8K), and commonsense (CommonsenseQA) reasoning benchmarks, with AVI achieving comparable zero-shot accuracy. Finally, we demonstrate the utility of latent veracity inference for providing feedback during self-correction and self-improvement.

arXiv
oral

Learning with Confidence

Oliver Richardson
UAI, 2025

Abstract. We characterize a notion of confidence that arises when learning or updating beliefs. This notion of trust, or learner’s confidence, can be used alongside (and is easily be mistaken for) probability or likelihood, but it is fundamentally a different concept. Although perhaps not as useful as probability itself, our notion of confidence captures and unifies many concepts in the literature, from Shafer’s weight of evidence, to Kalman gain, as well as number of training epochs and learning rate. We provide a mathematical definition of what it means to learn with confidence, and give two canonical ways of measuring confidence on a continuum. Under additional assumptions, we derive more compact representations of confidence-based learning in terms of vector fields and loss functions. These representations induce an extended language of compound “parallel” observations. We illustrate our framework by analyzing standard ways of updating beliefs.

arXiv poster slides.pdf
Qualitative Mechanism Independence

Oliver Richardson, Spencer Peters, and Joseph Halpern
NeurIPS, December 2024

Abstract. We define what it means for a joint probability distribution to be compatible with a set of independent causal mechanisms, at a qualitative level—or, more precisely, with a directed hypergraph A, which is the qualitative structure of a probabilistic dependency graph (PDG). When A represents a qualitative Bayesian network, QIM-compatibility with A reduces to satisfying the appropriate conditional independencies. But giving semantics to hypergraphs using QIM-compatibility lets us do much more. For one thing, we can capture functional dependencies. For another, we can capture important aspects of causality using compatibility: we can use compatibility to understand cyclic causal graphs, and to demonstrate structural compatibility, we must essentially produce a causal model. Finally, QIM-compatibility has deep connections to information theory. Applying compatibility to cyclic structures helps to clarify a longstanding conceptual issue in information theory.

arXiv poster slides.pptx
A Unified Theory of Probabilistic Modeling, Dependence, and Inconsistency

Oliver Richardson
PhD Thesis, Cornell University, August 2024
Abstract. What should you do with conflicting information? To be rational in the traditional sense, you must immediately resolve the inconsistency, so as to maintain a consistent (probabilistic) picture of the world. But how? And is it really critical to do so immediately? Inconsistency is clearly undesirable, but, as we will soon show, we stand to gain a lot by representing it.

This thesis develops a broad theory of how to approach probabilistic modeling with possibly-inconsistent information, unifying and reframing much of the literature in graphical models and machine learning in the process. The key ingredient is a novel kind of graphical model, called a Probabilistic Dependency Graph (PDG), which allows for arbitrary (even conflicting) pieces of probabilistic information.
- In Part I, we establish PDGs as a generalization of other models of mental state, including traditional graphical models such as Bayesian Networks and Factor Graphs, as well as causal models, and even generalizations of probability distributions, such as Dempster-Shafer Belief functions.
- In Part II, we show that PDGs also capture modern neural representations. Surprisingly, standard loss functions can be viewed as the inconsistency of a PDG that models the situation appropriately. Furthermore, many important algorithms in AI are instances of a simple approach to resolving inconsistencies.
- In Part III, we provide algorithms for PDG inference, and uncover a deep algorithmic equivalence between the problems of inference and calculating a PDG’s numerical degree of inconsistency. We also develop powerful yet inutuitive principles for reasoning with (and about) PDGs.
.pdf
Mixture Languages

Oliver Richardson and Jialu Bao
POPL, 2024 Languages for Inference (LAFI) Workshop

poster 2-page extended abstract slides.pptx
The Local Inconsistency Resolution Algorithm

Oliver Richardson
ICML, 2023 Workshop on Structured Probabilistic Inference & Generative Modeling (SPIGM); Workshop on Localized Learning (LLW)

poster 4-page extended abstract SPIGM.OpenReview LLW.OpenReview
spotlight

Inference for Probabilistic Dependency Graphs

Oliver Richardson, Joseph Halpern, and Christopher De Sa
UAI, 2023

arXiv poster code
oral

Loss as the Inconsistency of a Probabilistic Dependency Graph: Choose Your Model, not Your Loss Function

Oliver Richardson
AISTATS, 2022

Abstract. In a world blessed with a great diversity of loss functions, we argue that that choice between them is not a matter of taste or pragmatics, but of model. Probabilistic depencency graphs (PDGs) are probabilistic models that come equipped with a measure of "inconsistency". We prove that many standard loss functions arise as the inconsistency of a natural PDG describing the appropriate scenario, and use the same approach to justify a well-known connection between regularizers and priors. We also show that the PDG inconsistency captures a large class of statistical divergences, and detail benefits of thinking of them in this way, including an intuitive visual language for deriving inequalities between them. In variational inference, we find that the ELBO, a somewhat opaque objective for latent variable models, and variants of it arise for free out of uncontroversial modeling assumptions---as do simple graphical proofs of their corresponding bounds. Finally, we observe that inconsistency becomes the log partition function (free energy) in the setting where PDGs are factor graphs.

arXiv poster slides.pptx
Probabilistic Dependency Graphs

Oliver Richardson and Joseph Halpern
AAAI, 2021

arXiv poster code
Complexity and Scale: Understanding the Creative

Oliver Richardson
International Association of Computing and Philosophy (IACAP), 2014

conference paper
Capitalization in the St Petersburg Game: Why Statistical Distributions Matter

Mariam Thalos and Oliver Richardson
Politics, Philosophy & Economics, 2014

paper code

Academic Talks

Handling internal conflict in probabilistic models.
Invited Talk @ Center for Human-Compatible AI (CHAI) visiting speaker series, 30 Apr 2026.
The Inconsistency Quantification Problem.
@ Brown University Theory Seminar, 2 Apr 2026.
slides.pptx recording (YouTube)
Structural deficiency with respect to a directed hypergraph: information-theoretic constraints for generalized causal models.
@ MIT CSAIL Theory Lunch, 27 Mar 2026.
The Pursuit of Epistemic Consistency as a "Universal" Objective.
@ ILIAD Conference, 29 Jul 2024.
slides.pptx recording (YouTube)
A Probabilistic Model of Belief, Dependence, and Inconsistency.
PhD Defense / B-Exam @ Cornell CS Department, 16 Jul 2024.
slides.pptx recording
How to Compute with PDGs: Inference, Inconsistency Measurement, and the Close Relationship Between the Two.
Invited Talk @ Cornell CS Theory Seminar, 19 Feb 2024.
talk page slides.pptx
Learning, Inference, and the Pursuit of Consistency.
Invited Talk @ Cornell CS Student Colloquium, 7 Dec 2023.
slides.pptx
Probabilistic (In)consistency as a Basis for Learning and Inference.
Invited Talk @ University of Tenessee CS Dept, 29 Nov 2023.
slides.pptx recording
A Tutorial on Bialgebra.
@ Cornell Seminar on Programming Languages, 17 Sep 2022.
slides.pdf
Loss as the Inconsistency of a Probabilistic Dependency Graph: Choose Your Model, Not Your Loss Function.
Invited Talk @ Cornell Seminar on Artificial Intelligence, 2 Sep 2022.
talk page slides.pptx
Probabilistic Dependency Graphs and Inconsistency: How to Model, Measure, and Mitigate Internal Conflict.
PhD Proposal / A-Exam @ Cornell CS Department, 17 Sep 2021.
slides.pdf recording
Probabilistic Dependency Graphs.
@ Cornell CS Theory Tea, 17 Mar 2021.
slides.pdf
Cat Thoughts: An Intro to Categorical Thinking.
@ Cornell Graduate Student Seminar, 20 Feb 2019.
slides.pptx

oli.io;

Research

Papers and Publications

Academic Talks