[This is a note, which is the seed of an idea and something I’ve written quickly, as opposed to articles that I’ve optimized for readability and transmission of ideas.]

This is a snapshot of my curriculum for exploring the following questions:

Is probability theory all you need to develop AI?
- If not, what is missing?
Should a theory of AI be expressed in the framework of probability theory at all?
Do Brains use probability?

This reflects my current estimate of the landscape, and summarizes where my interests and aspirations have taken me so far. It is not set in stone. I may follow through on it, or I may diverge as I learn more. I primarily follow the current of my curiosity.

Visualization of topic tree. Nodes are organized hierarchically be level of abstraction, with dotted-lines representing non-hierarchical associations. Colors designate hierarchy-level. Made with https://www.yworks.com/yed-live/

Description of topics

Here are the topics from the graph above, with descriptions to the extent that I understand them, and links to reference material.

Objective probability

Is probability an objective property of physical systems in general (not just i.i.d.)? Objective, meaning independently arrived at by multiple parties, like a scientific experiment (just as mass and energy measurements can be independently verified) - i.e. not dependent on a particular brain with particular beliefs. If p(x) = θ, then this is true even if no humans are around at all to believe it. The main problem in making probability objective is figuring out how to uniquely determine the probability of something given observations. What needs to be measured in order to ascertain the objective probability of a system?
- Solomonoff induction
  
  A Bayesian inference setup general enough to encompass general intelligence. Posterior converges to the true data posterior at the infinite limit (for any prior with support everywhere), possibly providing an objective notion of probability, at least for infinite sequences.
  ‣ Universal Artifical Intelligence
  ‣ On the Existence and Convergence Computable Universal Priors
  - Approximations
    
    How can SI be implemented in practice? How would brains implement it?
    ‣ http://www.hutter1.net/ai/uaibook.htm#approx
  - Posterior convergence
    
    The sense in which Solomonoff induction is objective. The predicted posterior converges to the true data posterior with infinite observations, for any prior with support over all hypotheses.
    ‣ Universal Artifical Intelligence, Theorem 3.19
  - Posterior consistency
    
    Solomonoff induction may not be consistent, meaning it cannot distinguish between any two hypotheses with infinite data. Implications for objective probability.
  - Prior with universally optimal convergence
    
    Solomonoff’s universally optimal prior.
    ‣ Universal Artifical Intelligence, Theorem 3.70
  - Convergence on individual sequences
    
    Convergence of Solomonoff induction is not guaranteed on a measure-0 set of sequences. Construction of such a sequence.
    ‣ Universal Convergence of Semimeasures on Individual Random Sequences, Theorem 6 and Proposition 12
  - (Non-)Equivalence of Universal Priors
    
    A surprising equivalence between mixtures of deterministic programs and computable distributions.
    ‣ (Non-)Equivalence of Universal Priors, Theorem 14
- Martin-Lof randomness
  
  What it means for an infinite sequence to be drawn from a probability distribution. Algorithmic definition of randomness (see AIT).
  ‣ An Introduction to Kolmogorov Complexity and Its Applications
  - Definition in terms of universal probability
    ‣ Universal Artifical Intelligence
    ‣ An Introduction to Kolmogorov Complexity and Its Applications
  - Can sequences can be Martin-Lof random w.r.t. multiple probability measures?
Bayesian epistemology

Are priors and posteriors all that is needed for a complete theory of knowledge, and are a sufficient framework for building an intelligent system? Bayesian epistemology repurposes probability as a property of the intelligent agent doing the observing, rather than the system being observed (or perhaps it characterizes their interaction), i.e. probability as belief.
- Bayesian brain hypothesis
  
  Hypothesis in neuroscience that the Brain is largely an approximate Bayesian inference engine.
  ‣ The Bayesian Brain: The Role of Uncertainty in Neural Coding and Computation
  ‣ Bayesian Brain: Probabilistic Approaches to Neural Coding
  ‣ Object Perception as Bayesian Inference
  - Friston’s free energy principle
    
    A unified theory of biological intelligence from which Bayesian epistemology can be derived.
    ‣ What does the free energy principle tell us about the brain?
    ‣ The free-energy principle: a unified brain theory?
  - How brains approximate Bayesian inference
    
    To make the Bayesian brain hypothesis falsifiable, a characterization of what counts as an approximation to Bayesian inference needs to be given. What approximate Bayesian computations in the brain have been found so far by neuroscientists? Reference same sources listen under “Bayesian brain hypothesis”
- Causal inference
  
  If Bayesian epistemology is not sufficient, then what is missing? Judea pearl proposes causal inference.
  ‣ Causality, chapters 3 and 7
  ‣ Introduction to Judea Pearl’s Do-Calculus
- Bounded Rationality
  
  What would Bayesian epistemology theoretically look like with bounded resources? Is Bayesian epistemology no longer optimal given bounded resources?
  ‣ Bayes, Bounds, and Rational Analysis
- Logical justifications
  
  Arguments from first principles that Bayesian epistemology is a necessary condition for rationality, and that a rational agent is necessarily a Bayesian agent (such an agent is likely performing Solomonoff induction, in order for it to be sufficiently general in its prediction ability).
  - Dutch book argument
  - Complete classes
  - Cox’s theorem
  - Von Neumann-Morgenstern utility theorem
- Motivation from decision theory
  
  Some say a theory is good because it is useful. Perhaps the question “what theory of uncertainty should I use?” is best answered by looking at what we want to do with it, namely decision making under uncertainty. Bayesian epistemology can be motivated by of decision theory.
  ‣ The Foundations of Statistics, chapter 3
- Unique priors
  
  How to choose a prior is one point of contention in Bayesian epistemology. There are some proposed methods for selecting a unique prior given what you already know, for example, the max-entropy principle.
  ‣ Objective Priors: An Introduction for Frequentists
  ‣ LECTURES ON PROBABILITY, ENTROPY, AND STATISTICAL PHYSICS
Algorithmic information theory (AIT)

An alternative to probability theory devised by Kolmogorov himself (and others) to address its shortcomings. Does AIT allow us to formalize the general learning problem of transferring knowledge out-of-distribution?
‣ An Introduction to Kolmogorov Complexity and Its Applications
‣ Kolmogorov Complexity and Algorithmic Randomness
- Types of Kolmogorov complexity
  
  There is a constellation of algorithmic complexity functions that make up the foundation of AIT. Reference same sources listen under “Algorithmic information theory”
  - Resource bounded complexities
    
    Kolmogorov complexity with bounded computation. Possible direction for computable-AIT.
    ‣ An Introduction to Kolmogorov Complexity and Its Applications, chapter 7
- Algorithmic transfer learning
  
  How can the information shared by two datasets be defined? What is the objective of transfer learning?
  ‣ On Universal Transfer Learning
  ‣ Transfer Learning using Kolmogorov Complexity: Basic Theory and Empirical Evaluations
  ‣ The Information Complexity of Learning Tasks, their Structure and their Distance
  - No free lunch theorem
    
    Theorem stating there is no universally best algorithm for all training-test dataset pairs.
    ‣ Understanding Machine Learning: From Theory to Algorithms, Theorem 5.1
AIXI

A theory of optimal intelligence put forth by Marcus Hutter based on Solomonoff induction.
‣ Universal Artifical Intelligence
Data compression

Lossless compression from the perspectives of Shannon’s information theory and AIT. Can they be unified? Can compression make probability objective? What is the relationship between compression and intelligence?
‣ Elements of Information Theory
‣ Data Compression Explained
Decision theory under ignorance

Decision theory without probability. Pros and cons.
‣ An Introduction to Decision Theory, chapter 3
The Fundamental Theorem of Statistical Learning (PAC)

An introduction to PAC-learning theory. PAC is a probability-theory-based account of machine learning which AIT could replace.
‣ Understanding Machine Learning: From Theory to Algorithms, Theorem 6.7
- PAC account of transfer learning
  
  PAC analysis of transfer learning. However, assumptions about relatedness of tasks need to be made.
  ‣ A Model of Inductive Bias Learning

Probability & AI Curriculum

Description of topics