Website2025

Chapter 1Probability and Random Variables

LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL \(a^2 + b^2 = c^2\)

Remark 1.1: Zero★

This will reference itself [unresolved:rmk-new]

The value must be chosen carefully ¹.

Another this: ²

a^2 + b^2 = c^2 \rightarrow a^2 + b^2 = c^2 \rightarrow a^2 + b^2 = c^2 \rightarrow a^2 + b^2 = c^2 \rightarrow a^2 + b^2 = c^2 \rightarrow a^2 + b^2 = c^2 \rightarrow a^2 + b^2 = c^2 \rightarrow a^2 + b^2 = c^2 \rightarrow a^2 + b^2 = c^2 \rightarrow a^2 + b^2 = c^2 \rightarrow a^2 + b^2 = c^2

COSC1P03: Todo, COSC1P03: Goal, COSC1P03: Answer, and COSC1P03: Def Subset

Section 1.1Sample Spaces and Events

Remark 1.2: First remark★

The remark is working!

Probability formalizes uncertainty using set theory from ch-sets[unresolved:ch-sets]. Events live in a σ-algebra (see def-sigma-algebra[unresolved:def-sigma-algebra]), and measures quantify likelihoods.

Definition 1.1: Sample Space

The sample space Ω is the set of all possible outcomes of an experiment.

It is important to refer to [unresolved:rmk-event].

Definition 1.2: Event

An event is any subset E ⊆ Ω that belongs to the underlying σ-algebra 𝔽.

Definition 1.3: Probability Measure

A probability measure is a function P : 𝔽 → [0,1] such that P(Ω)=1 and for any countable collection of pairwise disjoint events Eᵢ, P(⋃ᵢ Eᵢ)=∑ᵢ P(Eᵢ). Here 𝔽 is a σ-algebra (see [unresolved:def-sigma-algebra]).

Example 1.1: Two Tosses of a Fair Coin

Let Ω = HH, HT, TH, TT. With the power set 𝒫(Ω) as 𝔽 (see [unresolved:def-powerset]), assign P(ω)=1/4 to each outcome. Then P(HH, HT) = 1/2.

Table 1.1. Kolmogorov axioms (informal)

Kolmogorov axioms (informal)
#	Statement
1	Nonnegativity: P(E) ≥ 0.
2	Normalization: P(Ω) = 1.
3	Countable additivity for disjoint events.

Section 1.2Conditional Probability and Bayes

Conditioning refines uncertainty by restricting to a subset event; its algebra mirrors set-intersection (see def-intersection[unresolved:def-intersection]).

Definition 1.4: Conditional Probability

For P(B) > 0, define P(A | B) = P(A ∩ B) / P(B).

Theorem 1.1: Law of Total Probability

If B₁,…,Bₙ partitions Ω with P(Bᵢ)>0, then for any event A, P(A) = ∑ᵢ P(A | Bᵢ) P(Bᵢ).

Proof 1.1:

Write A = ⋃ᵢ (A ∩ Bᵢ), notice the union is disjoint by the partition property, apply countable additivity, and rewrite P(A ∩ Bᵢ)=P(A | Bᵢ)P(Bᵢ).

Theorem 1.2: Bayes' Theorem

For P(A)>0 and P(B)>0, P(A | B) = P(B | A) P(A) / P(B).

Proof 1.2:

From P(A ∩ B)=P(A | B)P(B)=P(B | A)P(A), solve for P(A | B).

Example 1.2: Medical Testing

Let A be “has the disease,” B be “test is positive.” With sensitivity P(B|A)=0.99 and false positive P(B|Aᶜ)=0.02, if prevalence P(A)=0.01 then P(A|B)≈0.33 by [unresolved:thm-bayes]. Set operations like complements (Aᶜ) mirror [unresolved:def-set-difference].

Bayes theorem diagram — **Figure 1.1.** Bayes' theorem as proportions within a population.

Section 1.3Random Variables and Expectation

Random variables map outcomes to numbers; expectations average values with respect to probabilities. Linearity of expectation connects to series appearing in trigonometry (see def-taylor-sine[unresolved:def-taylor-sine]).

Definition 1.5: Random Variable

A random variable X is a measurable function X : (Ω, 𝔽) → (ℝ, 𝔅), where 𝔅 is the Borel σ-algebra (see [unresolved:ex-borel]).

Definition 1.6: Expected Value

For a discrete X with pmf p(x), define 𝔼[X] = ∑ₓ x p(x). For integrable continuous X, define 𝔼[X] = ∫ x f(x) dx.

Theorem 1.3: Linearity of Expectation

For integrable X,Y and scalars a,b, 𝔼[aX + bY] = a𝔼[X] + b𝔼[Y].

Proof 1.3:

For discrete X,Y expand sums termwise; for continuous variables use linearity of the integral.

Example 1.3: Bernoulli Trial

If X∈0,1 with P(X=1)=p, then 𝔼[X]=p. For n trials S=∑₁ⁿ Xᵢ has 𝔼[S]=np by [unresolved:thm-linearity-expectation].

For historical foundations of probability and classical reasoning about inference, see (Kolmogorov, Foundations of the Theory of Probability) and the lucid, modern exposition of Bayesian ideas in [2]. For practical introductions used widely in undergraduate courses, consult [3] and [4].

Methodological developments in hypothesis testing and estimation are surveyed by [5] and summarized in standard texts such as [6, pp. 45-48] and [7, ch. 2]. Modern connections to machine learning appear in [8] and [9].

Applied topics such as density estimation and asymptotics are covered by [10] and [11]. For readers wanting a concise compendium, (Wasserman, All of Statistics: A Concise Course in Statistical Inference, p. 101) is useful. Edge cases: an unknown key will be left as a literal when unresolved — e.g., [does-not-exist ??] — and numeric-looking postnotes such as [14, p. 20] are interpreted as page fragments (rendered as p. 20).

For algorithmic perspectives applied to probability, see [15] and [16, ch. 5]; a combined citation demonstrating multiple keys and per-key extras is: (Knuth, The TeXbook, p. 20; Cormen et al., Introduction to Algorithms, ch. 3; Fisher, Statistical Methods for Research Workers).

Sorting demos for multi-key citations:

Appearance order (default): (Knuth, The TeXbook; Cormen et al., Introduction to Algorithms; Fisher, Statistical Methods for Research Workers)
Author alphabetical order: (Cormen et al., Introduction to Algorithms; Fisher, Statistical Methods for Research Workers; Knuth, The TeXbook)
Author order: [15, 17, 14]
Alphabetic (title/alpha) order: [15, 17, 14]

References

[13]does-not-exist
[03]Sheldon M. Ross. “A First Course in Probability.” Prentice Hall, 2002.
[12]Larry Wasserman. “All of Statistics: A Concise Course in Statistical Inference.” Springer, 2004.
[11]Aad van der Vaart. “Asymptotic Statistics.” Cambridge University Press, 1998.
[10]B. W. Silverman. “Density Estimation for Statistics and Data Analysis.” Chapman and Hall, 1986.
[01]Andrey N. Kolmogorov. “Foundations of the Theory of Probability.” Chelsea Publishing, 1933.
[15]Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein. “Introduction to Algorithms.” MIT Press, 2009.
[05]Jerzy Neyman, Egon S. Pearson. “On the Problem of the Most Efficient Tests of Statistical Hypotheses.” Philosophical Transactions of the Royal Society A, vol. 231, pp. 289337, 1933.
[08]Christopher M. Bishop. “Pattern Recognition and Machine Learning.” Springer, 2006.
[04]Geoffrey Grimmett, David Stirzaker. “Probability and Random Processes.” Oxford University Press, 1992.
[02]E. T. Jaynes. “Probability Theory: The Logic of Science.” Cambridge University Press, 2003.
[16]A. Papoulis, S. U. Pillai. “Probability, Random Variables, and Stochastic Processes.” McGraw-Hill, 2002.
[06]George Casella, Roger L. Berger. “Statistical Inference.” Duxbury, 2002.
[17]R. A. Fisher. “Statistical Methods for Research Workers.” Oliver and Boyd, 1925.
[07]E. L. Lehmann, Joseph P. Romano. “Testing Statistical Hypotheses.” Springer, 1998.
[09]Trevor Hastie, Robert Tibshirani, Jerome Friedman. “The Elements of Statistical Learning: Data Mining, Inference, and Prediction.” Springer, 2009.
[14]Donald E. Knuth. “The TeXbook.” Addison-Wesley, 1984.

Ensure the value is non-negative.

Ensure the value is non-negative as well. This is a longer footnote to show line wrapping behavior. It should wrap correctly and be indented properly. Let's add even more text to make sure it wraps multiple lines. Now we have