Measure Theoretic Probability

2023-10-12

Normally in statistics, we typically deal with discrete or continuous distributions. However, what if we have a combination of both? What if we want to build a theory on random matrices or lattices?

Measure theoretic foundations of probability allow us to do this. Not only does it generalize the discrete and continuous cases of a probability distribution, it also allows us to construct more arcane distributions.

I first stumbled across this when considering some specific random variables: For example, what if we have a random variable $X$ such that half the time, $X \sim \text{Binom}(n, p)$, and half the time $X \sim N(0, 1)$? What is $E(X)$, or $E(X^2)$?

What about $X = A + B$, where $A \sim \text{Binom}(n, p)$, and $B \sim N(0, 1)$?

To answer these questions, we can dive into measure theoretic probability.

Measure Theory

Definition ($\sigma$-algebra). Given a set $S$ and a collection of subsets $\mathcal{F}$, $\mathcal{F}$ is a $\sigma$-algebra if

If a subset $A \in \mathcal{F}$, then $S - A = A^C \in \mathcal{F}$.
For any countable collection of subsets $A_1, A_2, \cdots \in \mathcal{F}$, the union $A_1 \cup A_2 \cup A_3 \cdots \in \mathcal{F}$.
For any countable collection of subsets $A_1, A_2, \cdots \in \mathcal{F}$, the intersection $A_1 \cap A_2 \cap A_3 \cdots \in \mathcal{F}$.
$\varnothing \in \mathcal{F}$, and $S \in \mathcal{F}$.

Intuitively, this gives us a collection of "events", or subsets of the set $S$. It is important to note that the conditions for 2 and 3 need a countable collection.

Consider the uniform distribution on $[0, 1]$. Notice that $P(X = \{x\}) = 0$, but $P(a \le X \le b) = b - a$. Now if we try to take an infinite sum, we see that this does not work: $\sum_{i \in [0, 1]} P(X = \{x\}) = 0 \neq 1$.

Therefore, we can intuitively see that this finite collection of events.

Definition (Measure). Given a set $S$ and a $\sigma$-algebra $\mathcal{F}$ on $S$, then a function $\bm{P}: \mathcal{F} \to \mathbb{R}$ is a measure if

$\bm{P}(\varnothing) = 0$
$\bm{P}(A) \ge 0$ if $A \in \mathcal{F}$
$\bm{P}(A \cup B) = \bm{P}(A) + \bm{P}(B)$ for all disjoint $A, B \in \mathcal{F}$.

Intuitively, a measure adds a notion of "size" to each event in a $\sigma$-algebra. For each event that is fully disjoint, the "size" of each event is equal to the "size" of the union of those events.

Definition (Probability Space). A probability space is the triple $(\Omega, \mathcal{F}, \bm{P})$ where

$\Omega$ is the sample space, which is a nonempty set,
$\mathcal{F}$ is a $\sigma$-algebra on $\Omega$,
and $\bm{P}$ is a probability measure mapping $\mathcal{F} \to [0, 1]$.

Most probability theory done over discrete and continuous distributions can be interpreted in terms of these probability spaces. Note that the space is just a set, and so we can actually do probability theory with other objects, giving rise to stochastic models, Markov chains, random matrix theory, and random lattice theory.

Probability

Definition (Random Variable). Given a probability space $(\Omega, \mathcal{F}, \bm{P})$ and a measurable space $(\Sigma, \mathcal{S})$, a random variable is a function $$X: \Omega \to \Sigma$$ such that the following holds:

For all subsets $A \in \mathcal{S}$, $X^{-1}(A) \in \mathcal{F}$.

We can think of a random variable as a map from the probability space to the new outcome space. The second requirement allows us to consider an induced measure on the outcome space.

Consider the induced measure $\bm{I}: A \mapsto \bm{P}(X^{-1}(A))$. We can see that by definition, $X^{-1}(A)$ is contained within the $\sigma$-algebra $\mathcal{F}$. Since $A$ is in the $\sigma$-algebra $\mathcal{S}$, we see that our induced measure satisfies all properties of a measure. Therefore, $(\Sigma, \mathbb{S}, \bm{I})$ is a probability space.

However, typically we only consider random variables which map to the reals. Therefore, the standard definition of a random variable is where $\Sigma = \mathbb{R}$.

Expected Value

How do we compute the expected value of such a random variable? Well in standard probability, we either take a sum over all possible values $X$ can take on, or do an integral if $X$ is continuous.

Definition (Simple Random Variable). A simple random variable is a random variable where $\left\{X(w) | w \in \Omega\right\}$ is finite.

There correspond to random variables which can only take on a finite number of different values, or discrete random variables. The expected value for a simple random variable is just $\sum_{i} x_i\bm{P}(A_i)$, where $A_i$ is the set $X^{-1}(x_i)$.

Definition (Expectation of a Positive Random Variable). Let $X$ be a positive random variable. Then $$E(X) = \sup \left\{E(Y) | 0 \le Y \le X\text{ and }Y\text{ is simple}\right\}$$

Say we have a continuous distribution, such as the normal distribution (which is always positive). Consider the set of discrete sums which are less than the normal distribution.

An arbitrary continuous distribution.

An example $Y$ random variable.

We see here that taking this supremum over all the expected values, just computes the integral by taking finer and finer "rectangles", or discrete random variables.

Definition (Expected Value). The expected value of a random variable $X$ is $$E(X) = E(\max(X(w), 0)) - E(\max(-X(w), 0))$$

Here, we now adjust for random variables that can be negative, by subtracting out the expected value of the negative regions of our random variable.

Equivalently, the expected value is sometimes called the Lebesgue integral, notated as
$$ \int_{\Omega} X d\bm{P} $$

Note that this definition of an integral is much more generic than the standard Reimann integral.

Conclusion

Recap

Now that we have some foundations in measure theory, we were able to define a broader version of probability and expectation than before. Though I left out quite a few fine details in defining these objects (mainly involves a lot of analysis). For example, even showing that the Uniform distribution exists in this model involves quite a bit of analysis.

Interesting Things

One very interesting theorem (which I do not currently understand) is that any measure $\bm{P}$ can be decomposed into a single discrete measure, an absolutely continuous measure, and a singularly continuous measure. This would answer some of my initial questions about different sums of random variables.

This was mostly from me attempting to read A First Look at Rigorous Probability Theory by Rosenthal, after measure theoretic probability was mentioned in my statistics class. This also crops up quite a bit in random lattice theory, and estimating bounds and short vector lengths within these lattices.