Articles

Bayesian Inference On 1st Order Logic
ðŸ”—
Category:Post
February 21, 2021
David Chapmanâ€™s blog post titled Probability theory does not extend logic has stirred up some controversy. In it, Chapman argues that socalled Bayesian logic, as it currently understood, is limited to propositional logic (0th order logic), but cannot generalize to higher order logics (e.g. predicate logic a.k.a. 1st order logic), and thus cannot be a general foundation for inference from data under uncertainty.
Chapman provides a few counterexamples that supposedly demonstrate that doing Bayesian inference on statements in 1st order logic is incoherent. I think there is a lot of confusion surrounding this point because Chapman does not use proper probability notation. In the following article I show how Chapmanâ€™s examples can be properly written and made sense of using random variables. Hopefully this clarifies some things.

Primer to Probability Theory and Its Philosophy
ðŸ”—
Category:Post
June 19, 2020
Probability is a measure defined on events, which are sets of primitive outcomes. Probability theory mostly comes down to constructing events and measuring them. A measure is a generalization of size which corresponds to length, area, and volume (rather than the bijective mapping definition of cardinality).

Notes: Probability & AI Curriculum
ðŸ”—
Category:Notes
June 17, 2020
This is a snapshot of my curriculum for exploring the following questions:
 Is probability theory all you need to develop AI?
 If not, what is missing?
 Should a theory of AI be expressed in the framework of probability theory at all?
 Do Brains use probability?
 Is probability theory all you need to develop AI?

Primer to Shannon's Information Theory
ðŸ”—
Category:Post
June 9, 2020
Shannonâ€™s theory of information is usually just called information theory, but is it deserving of that title? Does Shannonâ€™s theory completely capture every possible meaning of the word information? In the grand quests to creating AI and understanding the rules of the universe (i.e. grand unified theory) information may be key. Intelligent agents search for information and manipulate it. Particle interactions in physics may be viewed as information transfer. The physics of information may be key to interpreting quantum mechanics and resolving the measurement problem.
If you endeavor to answer these hard questions, it is prudent to understand existing socalled theories of information so you can evaluate whether they are powerful enough and to take inspiration from them.
Shannonâ€™s information theory is a hard nut to crack. Hopefully this primer gets you far enough along to be able to read a textbook like Elements of Information Theory. At the end I start to explore the question of whether Shannonâ€™s theory is a complete theory of information, and where it might be lacking.
This post is long. That is because Shannonâ€™s information theory is a framework of thought. That framework has a vocabulary which is needed to appreciate the whole. I attempt to gradually build up this vocabulary, stopping along the way to build intuition. With this vocabulary in hand, you will be ready to explore the big questions at the end of this post.

Quantum State
ðŸ”—
Category:Post
December 22, 2019
The two views of quantum state:
 Quantum states are $L^2$normalized complexvalued functions over classical configuration space.
 Quantum states are unit vectors residing in a complex Hilbert space, $\mathcal{H}$.
$$ \newcommand{\bm}{\boldsymbol} \newcommand{\diff}[1]{\mathop{\mathrm{d}#1}} \newcommand{\bra}[1]{\langle#1\rvert} \newcommand{\ket}[1]{\lvert#1\rangle} \newcommand{\braket}[2]{\langle#1\vert#2\rangle} $$ 
BiasVariance Decomposition For Machine Learning
ðŸ”—
Category:Post
July 14, 2019
$$ \newcommand{\Real}{ {\mathbb{R}} } \newcommand{\E}{ {\mathbb{E}} } \newcommand{\V}{ {\mathbb{V}} } \newcommand{\D}{\mathcal{D}} \newcommand{\Var}{\mathrm{Var}} \newcommand{\Bias}{\mathrm{Bias}} \newcommand\Yh{ {\hat{Y}} } \newcommand{\ep}{ {\boldsymbol{\varepsilon}} } \newcommand{\s}{\mathbb{S}} \DeclareMathOperator*{\argmax}{argmax} \DeclareMathOperator*{\argmin}{argmin} $$All about the biasvariance decomposition as it pertains to machine learning. All you need to know:
$$ \begin{align*} & \E_D[(f(x; D)  y(x))^2] \qquad\quad\ \textrm{Avg. error}\\ & = (\E_D[f(x; D)]  y(x))^2 \qquad \textrm{Bias}_y(f)^2\\ &\phantom{=}\, + \V_D[f(x; D)] \qquad\qquad\quad\ \, \textrm{Variance}(f)\\ \end{align*} $$