[This is a note, which is the seed of an idea and something I’ve written quickly, as opposed to articles that I’ve optimized for readability and transmission of ideas.]
This is a snapshot of my curriculum for exploring the following questions:
- Is probability theory all you need to develop AI?
- If not, what is missing?
- Should a theory of AI be expressed in the framework of probability theory at all?
- Do Brains use probability?
This reflects my current estimate of the landscape, and summarizes where my interests and aspirations have taken me so far. It is not set in stone. I may follow through on it, or I may diverge as I learn more. I primarily follow the current of my curiosity.
Description of topics
Here are the topics from the graph above, with descriptions to the extent that I understand them, and links to reference material.
-
- Objective probability
- Is probability an objective property of physical systems in general (not just i.i.d.)? Objective, meaning independently arrived at by multiple parties, like a scientific experiment (just as mass and energy measurements can be independently verified) - i.e. not dependent on a particular brain with particular beliefs. If p(x) = θ, then this is true even if no humans are around at all to believe it. The main problem in making probability objective is figuring out how to uniquely determine the probability of something given observations. What needs to be measured in order to ascertain the objective probability of a system?
-
- Solomonoff induction
- A Bayesian inference setup general enough to encompass general intelligence. Posterior converges to the true data posterior at the infinite limit (for any prior with support everywhere), possibly providing an objective notion of probability, at least for infinite sequences.
‣ Universal Artifical Intelligence
‣ On the Existence and Convergence Computable Universal Priors
-
- Approximations
- How can SI be implemented in practice? How would brains implement it?
‣ http://www.hutter1.net/ai/uaibook.htm#approx
-
- Posterior convergence
- The sense in which Solomonoff induction is objective. The predicted posterior converges to the true data posterior with infinite observations, for any prior with support over all hypotheses.
‣ Universal Artifical Intelligence, Theorem 3.19
-
- Posterior consistency
- Solomonoff induction may not be consistent, meaning it cannot distinguish between any two hypotheses with infinite data. Implications for objective probability.
-
- Prior with universally optimal convergence
- Solomonoff’s universally optimal prior.
‣ Universal Artifical Intelligence, Theorem 3.70
-
- Convergence on individual sequences
- Convergence of Solomonoff induction is not guaranteed on a measure-0 set of sequences. Construction of such a sequence.
‣ Universal Convergence of Semimeasures on Individual Random Sequences, Theorem 6 and Proposition 12
-
- (Non-)Equivalence of Universal Priors
- A surprising equivalence between mixtures of deterministic programs and computable distributions.
‣ (Non-)Equivalence of Universal Priors, Theorem 14
-
- Martin-Lof randomness
- What it means for an infinite sequence to be drawn from a probability distribution. Algorithmic definition of randomness (see AIT).
‣ An Introduction to Kolmogorov Complexity and Its Applications
- Definition in terms of universal probability
‣ Universal Artifical Intelligence
‣ An Introduction to Kolmogorov Complexity and Its Applications - Can sequences can be Martin-Lof random w.r.t. multiple probability measures?
-
- Bayesian epistemology
- Are priors and posteriors all that is needed for a complete theory of knowledge, and are a sufficient framework for building an intelligent system? Bayesian epistemology repurposes probability as a property of the intelligent agent doing the observing, rather than the system being observed (or perhaps it characterizes their interaction), i.e. probability as belief.
-
- Bayesian brain hypothesis
- Hypothesis in neuroscience that the Brain is largely an approximate Bayesian inference engine.
‣ The Bayesian Brain: The Role of Uncertainty in Neural Coding and Computation
‣ Bayesian Brain: Probabilistic Approaches to Neural Coding
‣ Object Perception as Bayesian Inference
-
- Friston’s free energy principle
- A unified theory of biological intelligence from which Bayesian epistemology can be derived.
‣ What does the free energy principle tell us about the brain?
‣ The free-energy principle: a unified brain theory?
-
- How brains approximate Bayesian inference
- To make the Bayesian brain hypothesis falsifiable, a characterization of what counts as an approximation to Bayesian inference needs to be given. What approximate Bayesian computations in the brain have been found so far by neuroscientists? Reference same sources listen under “Bayesian brain hypothesis”
-
- Causal inference
- If Bayesian epistemology is not sufficient, then what is missing? Judea pearl proposes causal inference.
‣ Causality, chapters 3 and 7
‣ Introduction to Judea Pearl’s Do-Calculus
-
- Bounded Rationality
- What would Bayesian epistemology theoretically look like with bounded resources? Is Bayesian epistemology no longer optimal given bounded resources?
‣ Bayes, Bounds, and Rational Analysis
-
- Logical justifications
- Arguments from first principles that Bayesian epistemology is a necessary condition for rationality, and that a rational agent is necessarily a Bayesian agent (such an agent is likely performing Solomonoff induction, in order for it to be sufficiently general in its prediction ability).
- Dutch book argument
- Complete classes
- Cox’s theorem
- Von Neumann-Morgenstern utility theorem
-
- Motivation from decision theory
- Some say a theory is good because it is useful. Perhaps the question “what theory of uncertainty should I use?” is best answered by looking at what we want to do with it, namely decision making under uncertainty. Bayesian epistemology can be motivated by of decision theory.
‣ The Foundations of Statistics, chapter 3
-
- Unique priors
- How to choose a prior is one point of contention in Bayesian epistemology. There are some proposed methods for selecting a unique prior given what you already know, for example, the max-entropy principle.
‣ Objective Priors: An Introduction for Frequentists
‣ LECTURES ON PROBABILITY, ENTROPY, AND STATISTICAL PHYSICS
-
- Algorithmic information theory (AIT)
- An alternative to probability theory devised by Kolmogorov himself (and others) to address its shortcomings. Does AIT allow us to formalize the general learning problem of transferring knowledge out-of-distribution?
‣ An Introduction to Kolmogorov Complexity and Its Applications
‣ Kolmogorov Complexity and Algorithmic Randomness
-
- Types of Kolmogorov complexity
- There is a constellation of algorithmic complexity functions that make up the foundation of AIT. Reference same sources listen under “Algorithmic information theory”
-
- Resource bounded complexities
- Kolmogorov complexity with bounded computation. Possible direction for computable-AIT.
‣ An Introduction to Kolmogorov Complexity and Its Applications, chapter 7
-
- Algorithmic transfer learning
- How can the information shared by two datasets be defined? What is the objective of transfer learning?
‣ On Universal Transfer Learning
‣ Transfer Learning using Kolmogorov Complexity: Basic Theory and Empirical Evaluations
‣ The Information Complexity of Learning Tasks, their Structure and their Distance
-
- No free lunch theorem
- Theorem stating there is no universally best algorithm for all training-test dataset pairs.
‣ Understanding Machine Learning: From Theory to Algorithms, Theorem 5.1
-
- AIXI
- A theory of optimal intelligence put forth by Marcus Hutter based on Solomonoff induction.
‣ Universal Artifical Intelligence
-
- Data compression
- Lossless compression from the perspectives of Shannon’s information theory and AIT. Can they be unified? Can compression make probability objective? What is the relationship between compression and intelligence?
‣ Elements of Information Theory
‣ Data Compression Explained
-
- Decision theory under ignorance
- Decision theory without probability. Pros and cons.
‣ An Introduction to Decision Theory, chapter 3
-
- The Fundamental Theorem of Statistical Learning (PAC)
- An introduction to PAC-learning theory. PAC is a probability-theory-based account of machine learning which AIT could replace.
‣ Understanding Machine Learning: From Theory to Algorithms, Theorem 6.7
-
- PAC account of transfer learning
- PAC analysis of transfer learning. However, assumptions about relatedness of tasks need to be made.
‣ A Model of Inductive Bias Learning