Hari Kailad

Measure Theoretic Probability


Normally in statistics, we typically deal with discrete or continuous distributions. However, what if we have a combination of both? What if we want to build a theory on random matrices or lattices?

Measure theoretic foundations of probability allow us to do this. Not only does it generalize the discrete and continuous cases of a probability distribution, it also allows us to construct more arcane distributions.

I first stumbled across this when considering some specific random variables: For example, what if we have a random variable XX such that half the time, XBinom(n,p)X \sim \text{Binom}(n, p), and half the time XN(0,1)X \sim N(0, 1)? What is E(X)E(X), or E(X2)E(X^2)?

What about X=A+BX = A + B, where ABinom(n,p)A \sim \text{Binom}(n, p), and BN(0,1)B \sim N(0, 1)?

To answer these questions, we can dive into measure theoretic probability.

Measure Theory

Definition (σ\sigma-algebra). Given a set SS and a collection of subsets F\mathcal{F}, F\mathcal{F} is a σ\sigma-algebra if

  • If a subset AFA \in \mathcal{F}, then SA=ACFS - A = A^C \in \mathcal{F}.
  • For any countable collection of subsets A1,A2,FA_1, A_2, \cdots \in \mathcal{F}, the union A1A2A3FA_1 \cup A_2 \cup A_3 \cdots \in \mathcal{F}.
  • For any countable collection of subsets A1,A2,FA_1, A_2, \cdots \in \mathcal{F}, the intersection A1A2A3FA_1 \cap A_2 \cap A_3 \cdots \in \mathcal{F}.
  • F\varnothing \in \mathcal{F}, and SFS \in \mathcal{F}.

Intuitively, this gives us a collection of "events", or subsets of the set SS. It is important to note that the conditions for 2 and 3 need a countable collection.

Consider the uniform distribution on [0,1][0, 1]. Notice that P(X={x})=0P(X = \{x\}) = 0, but P(aXb)=baP(a \le X \le b) = b - a. Now if we try to take an infinite sum, we see that this does not work: i[0,1]P(X={x})=01\sum_{i \in [0, 1]} P(X = \{x\}) = 0 \neq 1.

Therefore, we can intuitively see that this finite collection of events.

Definition (Measure). Given a set SS and a σ\sigma-algebra F\mathcal{F} on SS, then a function P:FR\bm{P}: \mathcal{F} \to \mathbb{R} is a measure if

  • P()=0\bm{P}(\varnothing) = 0
  • P(A)0\bm{P}(A) \ge 0 if AFA \in \mathcal{F}
  • P(AB)=P(A)+P(B)\bm{P}(A \cup B) = \bm{P}(A) + \bm{P}(B) for all disjoint A,BFA, B \in \mathcal{F}.

Intuitively, a measure adds a notion of "size" to each event in a σ\sigma-algebra. For each event that is fully disjoint, the "size" of each event is equal to the "size" of the union of those events.

Definition (Probability Space). A probability space is the triple (Ω,F,P)(\Omega, \mathcal{F}, \bm{P}) where

  • Ω\Omega is the sample space, which is a nonempty set,
  • F\mathcal{F} is a σ\sigma-algebra on Ω\Omega,
  • and P\bm{P} is a probability measure mapping F[0,1]\mathcal{F} \to [0, 1].

Most probability theory done over discrete and continuous distributions can be interpreted in terms of these probability spaces. Note that the space is just a set, and so we can actually do probability theory with other objects, giving rise to stochastic models, Markov chains, random matrix theory, and random lattice theory.

Probability

Definition (Random Variable). Given a probability space (Ω,F,P)(\Omega, \mathcal{F}, \bm{P}) and a measurable space (Σ,S)(\Sigma, \mathcal{S}), a random variable is a function X:ΩΣX: \Omega \to \Sigma such that the following holds:

For all subsets ASA \in \mathcal{S}, X1(A)FX^{-1}(A) \in \mathcal{F}.

We can think of a random variable as a map from the probability space to the new outcome space. The second requirement allows us to consider an induced measure on the outcome space.

Consider the induced measure I:AP(X1(A))\bm{I}: A \mapsto \bm{P}(X^{-1}(A)). We can see that by definition, X1(A)X^{-1}(A) is contained within the σ\sigma-algebra F\mathcal{F}. Since AA is in the σ\sigma-algebra S\mathcal{S}, we see that our induced measure satisfies all properties of a measure. Therefore, (Σ,S,I)(\Sigma, \mathbb{S}, \bm{I}) is a probability space.

However, typically we only consider random variables which map to the reals. Therefore, the standard definition of a random variable is where Σ=R\Sigma = \mathbb{R}.

Expected Value

How do we compute the expected value of such a random variable? Well in standard probability, we either take a sum over all possible values XX can take on, or do an integral if XX is continuous.

Definition (Simple Random Variable). A simple random variable is a random variable where {X(w)wΩ}\left\{X(w) | w \in \Omega\right\} is finite.

There correspond to random variables which can only take on a finite number of different values, or discrete random variables. The expected value for a simple random variable is just ixiP(Ai)\sum_{i} x_i\bm{P}(A_i), where AiA_i is the set X1(xi)X^{-1}(x_i).

Definition (Expectation of a Positive Random Variable). Let XX be a positive random variable. Then E(X)=sup{E(Y)0YX and Y is simple}E(X) = \sup \left\{E(Y) | 0 \le Y \le X\text{ and }Y\text{ is simple}\right\}

Say we have a continuous distribution, such as the normal distribution (which is always positive). Consider the set of discrete sums which are less than the normal distribution.

An arbitrary continuous distribution.

An example YY random variable.

We see here that taking this supremum over all the expected values, just computes the integral by taking finer and finer "rectangles", or discrete random variables.

Definition (Expected Value). The expected value of a random variable XX is E(X)=E(max(X(w),0))E(max(X(w),0))E(X) = E(\max(X(w), 0)) - E(\max(-X(w), 0))

Here, we now adjust for random variables that can be negative, by subtracting out the expected value of the negative regions of our random variable.

Equivalently, the expected value is sometimes called the Lebesgue integral, notated as
ΩXdP \int_{\Omega} X d\bm{P}

Note that this definition of an integral is much more generic than the standard Reimann integral.

Conclusion

Recap

Now that we have some foundations in measure theory, we were able to define a broader version of probability and expectation than before. Though I left out quite a few fine details in defining these objects (mainly involves a lot of analysis). For example, even showing that the Uniform distribution exists in this model involves quite a bit of analysis.

Interesting Things

One very interesting theorem (which I do not currently understand) is that any measure P\bm{P} can be decomposed into a single discrete measure, an absolutely continuous measure, and a singularly continuous measure. This would answer some of my initial questions about different sums of random variables.

This was mostly from me attempting to read A First Look at Rigorous Probability Theory by Rosenthal, after measure theoretic probability was mentioned in my statistics class. This also crops up quite a bit in random lattice theory, and estimating bounds and short vector lengths within these lattices.