We have come to a major juncture in the course where our point of view shifts from a single random variable to larger collections of random variables. We have seen some instances of this, for example in our discussions of sequences of trials. The remainder of this course (and really probability theory in general) is concerned with studying collections of random variables in various situations.
If X1, X2, …, Xn are random variables defined on a sample space _, we can regard them as coordinates of the random vector (X1, X2, …, Xn). This vector is again a “random variable” in the sense of being a function on _, but instead of being real valued it is
valued. The concepts related to a single random variable can be adapted to a random vector. The probability distribution of (X1, X2, …, Xn) is now an assignment of probabilities P((X1, X2, …, Xn) ∈ B) to subsets B of. The probability distribution of a random vector is called a joint distribution, and the probability distributions of the individual coordinates Xj are called marginal distributions. A joint distribution can again be described by a probability mass function in the discrete case, by a probability density function in the jointly continuous case, and in general one can define a multivariate cumulative distribution function.
Joint distribution of discrete random variables
The following definition is the multivariate extension of Definition 1.35 of the probability mass function of a single discrete random variable.
Definition 6.1. Let X1, X2, …, Xn be discrete random variables, all defined on the same sample space. Their joint probability mass function is defined by
p(k1, k2, …, kn) = P (X1 = k1, X2 = k2, …, Xn = kn)
for all possible values k1, k2, …, kn of X1, X2, …, Xn.
As with our other notations, we can write pX1,X2,…,Xn (k1, k2, …, kn) if we wish the notation to include the random variables. The joint probability mass function serves precisely the same purpose as the probability mass function of a single random variable.