Notes: Probability & AI Curriculum
June 17, 2020
[Disclaimer: I am not an expert on this topic, and I have likely just learned about this material for the first time. I don't guarantee the accuracy or correctness of anything. Consider this to be a page in my personal notebook. It is subject to change and revision over time. Equations in images are screen shots taken from study sources.]
This is a snapshot of my curriculum for exploring the following questions:
 Is probability theory all you need to develop AI?
 If not, what is missing?
 Should a theory of AI be expressed in the framework of probability theory at all?
 Do Brains use probability?
This reflects my current estimate of the landscape, and summarizes where my interests and aspirations have taken me so far. It is not set in stone. I may follow through on it, or I may diverge as I learn more. I primarily follow the current of my curiosity.
Description of topics
Here are the topics from the graph above, with descriptions to the extent that I understand them, and links to reference material.

 Objective probability
 Is probability an objective property of physical systems in general (not just i.i.d.)? Objective, meaning independently arrived at by multiple parties, like a scientific experiment (just as mass and energy measurements can be independently verified)  i.e. not dependent on a particular brain with particular beliefs. If p(x) = θ, then this is true even if no humans are around at all to believe it. The main problem in making probability objective is figuring out how to uniquely determine the probability of something given observations. What needs to be measured in order to ascertain the objective probability of a system?

 Solomonoff induction
 A Bayesian inference setup general enough to encompass general intelligence. Posterior converges to the true data posterior at the infinite limit (for any prior with support everywhere), possibly providing an objective notion of probability, at least for infinite sequences.
‣ Universal Artifical Intelligence
‣ On the Existence and Convergence Computable Universal Priors

 Approximations
 How can SI be implemented in practice? How would brains implement it?
‣ http://www.hutter1.net/ai/uaibook.htm#approx

 Posterior convergence
 The sense in which Solomonoff induction is objective. The predicted posterior converges to the true data posterior with infinite observations, for any prior with support over all hypotheses.
‣ Universal Artifical Intelligence, Theorem 3.19

 Posterior consistency
 Solomonoff induction may not be consistent, meaning it cannot distinguish between any two hypotheses with infinite data. Implications for objective probability.

 Prior with universally optimal convergence
 Solomonoff’s universally optimal prior.
‣ Universal Artifical Intelligence, Theorem 3.70

 Convergence on individual sequences
 Convergence of Solomonoff induction is not guaranteed on a measure0 set of sequences. Construction of such a sequence.
‣ Universal Convergence of Semimeasures on Individual Random Sequences, Theorem 6 and Proposition 12

 (Non)Equivalence of Universal Priors
 A surprising equivalence between mixtures of deterministic programs and computable distributions.
‣ (Non)Equivalence of Universal Priors, Theorem 14

 MartinLof randomness
 What it means for an infinite sequence to be drawn from a probability distribution. Algorithmic definition of randomness (see AIT).
‣ An Introduction to Kolmogorov Complexity and Its Applications

Definition in terms of universal probability
‣ Universal Artifical Intelligence
‣ An Introduction to Kolmogorov Complexity and Its Applications  Can sequences can be MartinLof random w.r.t. multiple probability measures?

 Bayesian epistemology
 Are priors and posteriors all that is needed for a complete theory of knowledge, and are a sufficient framework for building an intelligent system? Bayesian epistemology repurposes probability as a property of the intelligent agent doing the observing, rather than the system being observed (or perhaps it characterizes their interaction), i.e. probability as belief.

 Bayesian brain hypothesis
 Hypothesis in neuroscience that the Brain is largely an approximate Bayesian inference engine.
‣ The Bayesian Brain: The Role of Uncertainty in Neural Coding and Computation
‣ Bayesian Brain: Probabilistic Approaches to Neural Coding
‣ Object Perception as Bayesian Inference

 Friston’s free energy principle
 A unified theory of biological intelligence from which Bayesian epistemology can be derived.
‣ What does the free energy principle tell us about the brain?
‣ The freeenergy principle: a unified brain theory?

 How brains approximate Bayesian inference
 To make the Bayesian brain hypothesis falsifiable, a characterization of what counts as an approximation to Bayesian inference needs to be given. What approximate Bayesian computations in the brain have been found so far by neuroscientists? Reference same sources listen under “Bayesian brain hypothesis”

 Causal inference
 If Bayesian epistemology is not sufficient, then what is missing? Judea pearl proposes causal inference.
‣ Causality, chapters 3 and 7
‣ Introduction to Judea Pearl’s DoCalculus

 Bounded Rationality
 What would Bayesian epistemology theoretically look like with bounded resources? Is Bayesian epistemology no longer optimal given bounded resources?
‣ Bayes, Bounds, and Rational Analysis

 Logical justifications
 Arguments from first principles that Bayesian epistemology is a necessary condition for rationality, and that a rational agent is necessarily a Bayesian agent (such an agent is likely performing Solomonoff induction, in order for it to be sufficiently general in its prediction ability).
 Dutch book argument
 Complete classes
 Cox’s theorem
 Von NeumannMorgenstern utility theorem

 Motivation from decision theory
 Some say a theory is good because it is useful. Perhaps the question “what theory of uncertainty should I use?” is best answered by looking at what we want to do with it, namely decision making under uncertainty. Bayesian epistemology can be motivated by of decision theory.
‣ The Foundations of Statistics, chapter 3

 Unique priors
 How to choose a prior is one point of contention in Bayesian epistemology. There are some proposed methods for selecting a unique prior given what you already know, for example, the maxentropy principle.
‣ Objective Priors: An Introduction for Frequentists
‣ LECTURES ON PROBABILITY, ENTROPY, AND STATISTICAL PHYSICS

 Algorithmic information theory (AIT)
 An alternative to probability theory devised by Kolmogorov himself (and others) to address its shortcomings. Does AIT allow us to formalize the general learning problem of transferring knowledge outofdistribution?
‣ An Introduction to Kolmogorov Complexity and Its Applications
‣ Kolmogorov Complexity and Algorithmic Randomness

 Types of Kolmogorov complexity
 There is a constellation of algorithmic complexity functions that make up the foundation of AIT. Reference same sources listen under “Algorithmic information theory”

 Resource bounded complexities
 Kolmogorov complexity with bounded computation. Possible direction for computableAIT.
‣ An Introduction to Kolmogorov Complexity and Its Applications, chapter 7

 Algorithmic transfer learning
 How can the information shared by two datasets be defined? What is the objective of transfer learning?
‣ On Universal Transfer Learning
‣ Transfer Learning using Kolmogorov Complexity: Basic Theory and Empirical Evaluations
‣ The Information Complexity of Learning Tasks, their Structure and their Distance

 No free lunch theorem
 Theorem stating there is no universally best algorithm for all trainingtest dataset pairs.
‣ Understanding Machine Learning: From Theory to Algorithms, Theorem 5.1

 AIXI
 A theory of optimal intelligence put forth by Marcus Hutter based on Solomonoff induction.
‣ Universal Artifical Intelligence

 Data compression
 Lossless compression from the perspectives of Shannon’s information theory and AIT. Can they be unified? Can compression make probability objective? What is the relationship between compression and intelligence?
‣ Elements of Information Theory
‣ Data Compression Explained

 Decision theory under ignorance
 Decision theory without probability. Pros and cons.
‣ An Introduction to Decision Theory, chapter 3

 The Fundamental Theorem of Statistical Learning (PAC)
 An introduction to PAClearning theory. PAC is a probabilitytheorybased account of machine learning which AIT could replace.
‣ Understanding Machine Learning: From Theory to Algorithms, Theorem 6.7

 PAC account of transfer learning
 PAC analysis of transfer learning. However, assumptions about relatedness of tasks need to be made.
‣ A Model of Inductive Bias Learning