oli.io;

Research

My research develops mathematical and conceptual foundations for fallible AI systems. The result so far has been an elegant unifying picture that explains many standard but seemingly ad-hoc choices made in practice. A key technical ingredient is a class of models I invented called Probabilistic Dependency Graphs (PDGs) which subsume traditional graphical models, yet can model inconsistent beliefs and most scenarios in machine learning. Indeed, many important algorithms in AI turn out to be instances of an intuitive heuristic approach to resolving probabilistic inconsistency.

For an overview, see my research statement [last update March 2024] ;
  for (a great deal) more, see my dissertation.


Papers and Publications

Legend
thesis
workshop
preprint

conference
journal

  • spotlight
    Local Inconsistency Resolution: The Interplay between Attention and Control in Probabilistic Models
    Oliver Richardson, Mehran Shakerinava, Mandana Samiei, Abdessamad El Kabid, Joseph Viviano, Ali Parviz, and Yoshua Bengio
    AISTATS, May 2026

    Abstract. We present a generic algorithm for learning and approximate inference with an intuitive epistemic interpretation: iteratively focus on a subset of the model and resolve inconsistencies using the parameters under control. This framework, which we call Local Inconsistency Resolution (LIR) is built upon Probabilistic Dependency Graphs (PDGs), which provide a flexible representational foundation capable of capturing inconsistent beliefs. We show how LIR unifies and generalizes a wide variety of important algorithms in the literature, including the Expectation-Maximization (EM) algorithm, belief propagation, adversarial training, GANs, and GFlowNets. In the last case, LIR actually suggests a more natural loss, which we demonstrate improves GFlowNet convergence. Each method can be recovered as a specific instance of LIR by choosing a procedure to direct focus (attention and control). We implement this algorithm for discrete PDGs and study its properties on synthetically generated PDGs, comparing its behavior to the global optimization semantics of the full PDG.

    Notes. Earlier versions of the core idea in this work in “Local Resolution Algorithm, presented at ICML workshops in 2023, and in Chapter 7 of my PhD dissertation. My co-authors helped develop a general implementation and synthetic experiments, test the hypotheses posed by the paper, and generally bring it into the form the paper is in now.

    arXiv poster code
  • Language models recognize dropout and Gaussian noise applied to their activations
    Damiano Fornasiere*, Mirko Bronzi*, Spencer Kitts*, Alessandro Palmas, Yoshua Bengio, and Oliver Richardson
    preprint (submitted to COLM), April 2026
  • Latent Veracity Inference for Identifying Errors in Stepwise Reasoning
    Minsu Kim, Jean-Pierre R. Falet, Oliver Ethan Richardson, Xiaoyin Chen, Moksh Jain, Sungjin Ahn, Sungsoo Ahn, and Yoshua Bengio
    ICLR, 2026
    Abstract. Chain-of-Thought (CoT) reasoning has advanced the capabilities and transparency of language models (LMs); however, reasoning chains can contain inaccurate statements that reduce performance and trustworthiness. To address this, we propose to augment each reasoning step in a CoT with a latent veracity (or correctness) variable. To efficiently explore this expanded space, we introduce Veracity Search (VS), a discrete search algorithm over veracity assignments. It performs otherwise intractable inference in the posterior distribution over latent veracity values by leveraging the LM's joint likelihood over veracity and the final answer as a proxy reward. This efficient inference-time verification method facilitates supervised fine-tuning of an Amortized Veracity Inference (AVI) machine by providing pseudo-labels for veracity. AVI generalizes VS, enabling accurate zero-shot veracity inference in novel contexts. Empirical results demonstrate that VS reliably identifies errors in logical (ProntoQA), mathematical (GSM8K), and commonsense (CommonsenseQA) reasoning benchmarks, with AVI achieving comparable zero-shot accuracy. Finally, we demonstrate the utility of latent veracity inference for providing feedback during self-correction and self-improvement.
    arXiv

  • oral
    Learning with Confidence
    Oliver Richardson
    UAI, 2025

    Abstract. We characterize a notion of confidence that arises when learning or updating beliefs. This notion of trust, or learner’s confidence, can be used alongside (and is easily be mistaken for) probability or likelihood, but it is fundamentally a different concept. Although perhaps not as useful as probability itself, our notion of confidence captures and unifies many concepts in the literature, from Shafer’s weight of evidence, to Kalman gain, as well as number of training epochs and learning rate. We provide a mathematical definition of what it means to learn with confidence, and give two canonical ways of measuring confidence on a continuum. Under additional assumptions, we derive more compact representations of confidence-based learning in terms of vector fields and loss functions. These representations induce an extended language of compound “parallel” observations. We illustrate our framework by analyzing standard ways of updating beliefs.

  • Qualitative Mechanism Independence
    Oliver Richardson, Spencer Peters, and Joseph Halpern
    NeurIPS, December 2024

    Abstract. We define what it means for a joint probability distribution to be compatible with a set of independent causal mechanisms, at a qualitative level—or, more precisely, with a directed hypergraph A, which is the qualitative structure of a probabilistic dependency graph (PDG). When A represents a qualitative Bayesian network, QIM-compatibility with A reduces to satisfying the appropriate conditional independencies. But giving semantics to hypergraphs using QIM-compatibility lets us do much more. For one thing, we can capture functional dependencies. For another, we can capture important aspects of causality using compatibility: we can use compatibility to understand cyclic causal graphs, and to demonstrate structural compatibility, we must essentially produce a causal model. Finally, QIM-compatibility has deep connections to information theory. Applying compatibility to cyclic structures helps to clarify a longstanding conceptual issue in information theory.

  • A Unified Theory of Probabilistic Modeling, Dependence, and Inconsistency
    Oliver Richardson
    PhD Thesis, Cornell University, August 2024

    Abstract. What should you do with conflicting information? To be rational in the traditional sense, you must immediately resolve the inconsistency, so as to maintain a consistent (probabilistic) picture of the world. But how? And is it really critical to do so immediately? Inconsistency is clearly undesirable, but, as we will soon show, we stand to gain a lot by representing it.

    This thesis develops a broad theory of how to approach probabilistic modeling with possibly-inconsistent information, unifying and reframing much of the literature in graphical models and machine learning in the process. The key ingredient is a novel kind of graphical model, called a Probabilistic Dependency Graph (PDG), which allows for arbitrary (even conflicting) pieces of probabilistic information.

    • In Part I, we establish PDGs as a generalization of other models of mental state, including traditional graphical models such as Bayesian Networks and Factor Graphs, as well as causal models, and even generalizations of probability distributions, such as Dempster-Shafer Belief functions.
    • In Part II, we show that PDGs also capture modern neural representations. Surprisingly, standard loss functions can be viewed as the inconsistency of a PDG that models the situation appropriately. Furthermore, many important algorithms in AI are instances of a simple approach to resolving inconsistencies.
    • In Part III, we provide algorithms for PDG inference, and uncover a deep algorithmic equivalence between the problems of inference and calculating a PDG’s numerical degree of inconsistency. We also develop powerful yet inutuitive principles for reasoning with (and about) PDGs.
    .pdf
  • Mixture Languages
    Oliver Richardson and Jialu Bao
    POPL, 2024 Languages for Inference (LAFI) Workshop
  • The Local Inconsistency Resolution Algorithm
    Oliver Richardson
    ICML, 2023 Workshop on Structured Probabilistic Inference & Generative Modeling (SPIGM); Workshop on Localized Learning (LLW)

  • spotlight
    Inference for Probabilistic Dependency Graphs
    Oliver Richardson, Joseph Halpern, and Christopher De Sa
    UAI, 2023

  • oral
    Loss as the Inconsistency of a Probabilistic Dependency Graph: Choose Your Model, not Your Loss Function
    Oliver Richardson
    AISTATS, 2022
    Abstract. In a world blessed with a great diversity of loss functions, we argue that that choice between them is not a matter of taste or pragmatics, but of model. Probabilistic depencency graphs (PDGs) are probabilistic models that come equipped with a measure of "inconsistency". We prove that many standard loss functions arise as the inconsistency of a natural PDG describing the appropriate scenario, and use the same approach to justify a well-known connection between regularizers and priors. We also show that the PDG inconsistency captures a large class of statistical divergences, and detail benefits of thinking of them in this way, including an intuitive visual language for deriving inequalities between them. In variational inference, we find that the ELBO, a somewhat opaque objective for latent variable models, and variants of it arise for free out of uncontroversial modeling assumptions---as do simple graphical proofs of their corresponding bounds. Finally, we observe that inconsistency becomes the log partition function (free energy) in the setting where PDGs are factor graphs.
  • Probabilistic Dependency Graphs
    Oliver Richardson and Joseph Halpern
    AAAI, 2021
  • Complexity and Scale: Understanding the Creative
    Oliver Richardson
    International Association of Computing and Philosophy (IACAP), 2014
  • Capitalization in the St Petersburg Game: Why Statistical Distributions Matter
    Mariam Thalos and Oliver Richardson
    Politics, Philosophy & Economics, 2014


Academic Talks

  • Handling internal conflict in probabilistic models.
    Invited Talk @ Center for Human-Compatible AI (CHAI) visiting speaker series,     30 Apr 2026.
  • The Inconsistency Quantification Problem.
    @ Brown University Theory Seminar,     2 Apr 2026.
  • Structural deficiency with respect to a directed hypergraph: information-theoretic constraints for generalized causal models.
    @ MIT CSAIL Theory Lunch,     27 Mar 2026.
  • The Pursuit of Epistemic Consistency as a "Universal" Objective.
    @ ILIAD Conference,     29 Jul 2024.
  • A Probabilistic Model of Belief, Dependence, and Inconsistency.
    PhD Defense / B-Exam @ Cornell CS Department,     16 Jul 2024.
  • How to Compute with PDGs: Inference, Inconsistency Measurement, and the Close Relationship Between the Two.
    Invited Talk @ Cornell CS Theory Seminar,     19 Feb 2024.
  • Learning, Inference, and the Pursuit of Consistency.
    Invited Talk @ Cornell CS Student Colloquium,     7 Dec 2023.
  • Probabilistic (In)consistency as a Basis for Learning and Inference.
    Invited Talk @ University of Tenessee CS Dept,     29 Nov 2023.
  • A Tutorial on Bialgebra.
    @ Cornell Seminar on Programming Languages,     17 Sep 2022.
  • Loss as the Inconsistency of a Probabilistic Dependency Graph: Choose Your Model, Not Your Loss Function.
    Invited Talk @ Cornell Seminar on Artificial Intelligence,     2 Sep 2022.
  • Probabilistic Dependency Graphs and Inconsistency: How to Model, Measure, and Mitigate Internal Conflict.
    PhD Proposal / A-Exam @ Cornell CS Department,     17 Sep 2021.
  • Probabilistic Dependency Graphs.
    @ Cornell CS Theory Tea,     17 Mar 2021.
  • Cat Thoughts: An Intro to Categorical Thinking.
    @ Cornell Graduate Student Seminar,     20 Feb 2019.