Theme
Research
Teaching
Resume/CV
Chapter 1Probability and Random Variables
LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL LOL a^2 + b^2 = c^2\(a^2 + b^2 = c^2\)
Remark 1.1: Zero
This will reference itself rmk-new[unresolved:rmk-new]

The value must be chosen carefully 1.

Another this: 2 a^2 + b^2 = c^2 \rightarrow a^2 + b^2 = c^2 \rightarrow a^2 + b^2 = c^2 \rightarrow a^2 + b^2 = c^2 \rightarrow a^2 + b^2 = c^2 \rightarrow a^2 + b^2 = c^2 \rightarrow a^2 + b^2 = c^2 \rightarrow a^2 + b^2 = c^2 \rightarrow a^2 + b^2 = c^2 \rightarrow a^2 + b^2 = c^2 \rightarrow a^2 + b^2 = c^2
a^2 + b^2 = c^2 \rightarrow a^2 + b^2 = c^2 \rightarrow a^2 + b^2 = c^2 \rightarrow a^2 + b^2 = c^2 \rightarrow a^2 + b^2 = c^2 \rightarrow a^2 + b^2 = c^2 \rightarrow a^2 + b^2 = c^2 \rightarrow a^2 + b^2 = c^2 \rightarrow a^2 + b^2 = c^2 \rightarrow a^2 + b^2 = c^2 \rightarrow a^2 + b^2 = c^2
todo-envCOSC1P03: Todo, goal-envCOSC1P03: Goal, answer-envCOSC1P03: Answer, and def-subsetCOSC1P03: Def Subset
Section 1.1Sample Spaces and Events
Remark 1.2: First remark
The remark is working!

Probability formalizes uncertainty using set theory from ch-sets[unresolved:ch-sets]. Events live in a σ-algebra (see def-sigma-algebra[unresolved:def-sigma-algebra]), and measures quantify likelihoods.

Definition 1.1: Sample Space
The sample space Ω is the set of all possible outcomes of an experiment.
It is important to refer to rmk-event[unresolved:rmk-event].
Definition 1.2: Event
An event is any subset E ⊆ Ω that belongs to the underlying σ-algebra 𝔽.
Definition 1.3: Probability Measure
A probability measure is a function P : 𝔽 → [0,1] such that P(Ω)=1 and for any countable collection of pairwise disjoint events Eᵢ, P(⋃ᵢ Eᵢ)=∑ᵢ P(Eᵢ). Here 𝔽 is a σ-algebra (see def-sigma-algebra[unresolved:def-sigma-algebra]).
Example 1.1: Two Tosses of a Fair Coin
Let Ω = HH, HT, TH, TT. With the power set 𝒫(Ω) as 𝔽 (see def-powerset[unresolved:def-powerset]), assign P(ω)=1/4 to each outcome. Then P(HH, HT) = 1/2.
Table 1.1. Kolmogorov axioms (informal)
Kolmogorov axioms (informal)
#Statement
1Nonnegativity: P(E) ≥ 0.
2Normalization: P(Ω) = 1.
3Countable additivity for disjoint events.
Section 1.2Conditional Probability and Bayes

Conditioning refines uncertainty by restricting to a subset event; its algebra mirrors set-intersection (see def-intersection[unresolved:def-intersection]).

Definition 1.4: Conditional Probability
For P(B) > 0, define P(A | B) = P(A ∩ B) / P(B).
Theorem 1.1: Law of Total Probability
If B₁,…,Bₙ partitions Ω with P(Bᵢ)>0, then for any event A, P(A) = ∑ᵢ P(A | Bᵢ) P(Bᵢ).
Proof 1.1:
Write A = ⋃ᵢ (A ∩ Bᵢ), notice the union is disjoint by the partition property, apply countable additivity, and rewrite P(A ∩ Bᵢ)=P(A | Bᵢ)P(Bᵢ).
Theorem 1.2: Bayes' Theorem
For P(A)>0 and P(B)>0, P(A | B) = P(B | A) P(A) / P(B).
Proof 1.2:
From P(A ∩ B)=P(A | B)P(B)=P(B | A)P(A), solve for P(A | B).
Example 1.2: Medical Testing
Let A be “has the disease,” B be “test is positive.” With sensitivity P(B|A)=0.99 and false positive P(B|Aᶜ)=0.02, if prevalence P(A)=0.01 then P(A|B)≈0.33 by thm-bayes[unresolved:thm-bayes]. Set operations like complements (Aᶜ) mirror def-set-difference[unresolved:def-set-difference].
Bayes theorem diagram
Figure 1.1. Bayes' theorem as proportions within a population.
Section 1.3Random Variables and Expectation

Random variables map outcomes to numbers; expectations average values with respect to probabilities. Linearity of expectation connects to series appearing in trigonometry (see def-taylor-sine[unresolved:def-taylor-sine]).

Definition 1.5: Random Variable
A random variable X is a measurable function X : (Ω, 𝔽) → (ℝ, 𝔅), where 𝔅 is the Borel σ-algebra (see ex-borel[unresolved:ex-borel]).
Definition 1.6: Expected Value
For a discrete X with pmf p(x), define 𝔼[X] = ∑ₓ x p(x). For integrable continuous X, define 𝔼[X] = ∫ x f(x) dx.
Theorem 1.3: Linearity of Expectation
For integrable X,Y and scalars a,b, 𝔼[aX + bY] = a𝔼[X] + b𝔼[Y].
Proof 1.3:
For discrete X,Y expand sums termwise; for continuous variables use linearity of the integral.
Example 1.3: Bernoulli Trial
If X∈0,1 with P(X=1)=p, then 𝔼[X]=p. For n trials S=∑₁ⁿ Xᵢ has 𝔼[S]=np by thm-linearity-expectation[unresolved:thm-linearity-expectation].

For historical foundations of probability and classical reasoning about inference, see (Kolmogorov, Foundations of the Theory of Probability) and the lucid, modern exposition of Bayesian ideas in [2]. For practical introductions used widely in undergraduate courses, consult [3] and [4].

Methodological developments in hypothesis testing and estimation are surveyed by [5] and summarized in standard texts such as [6, pp. 45-48] and [7, ch. 2]. Modern connections to machine learning appear in [8] and [9].

Applied topics such as density estimation and asymptotics are covered by [10] and [11]. For readers wanting a concise compendium, (Wasserman, All of Statistics: A Concise Course in Statistical Inference, p. 101) is useful. Edge cases: an unknown key will be left as a literal when unresolved — e.g., [does-not-exist ??] — and numeric-looking postnotes such as [14, p. 20] are interpreted as page fragments (rendered as p. 20).

For algorithmic perspectives applied to probability, see [15] and [16, ch. 5]; a combined citation demonstrating multiple keys and per-key extras is: (Knuth, The TeXbook, p. 20; Cormen et al., Introduction to Algorithms, ch. 3; Fisher, Statistical Methods for Research Workers).

Sorting demos for multi-key citations:

References

  1. [13]does-not-exist
  2. [03]Sheldon M. Ross. “A First Course in Probability.” Prentice Hall, 2002.
  3. [12]Larry Wasserman. “All of Statistics: A Concise Course in Statistical Inference.” Springer, 2004.
  4. [11]Aad van der Vaart. “Asymptotic Statistics.” Cambridge University Press, 1998.
  5. [10]B. W. Silverman. “Density Estimation for Statistics and Data Analysis.” Chapman and Hall, 1986.
  6. [01]Andrey N. Kolmogorov. “Foundations of the Theory of Probability.” Chelsea Publishing, 1933.
  7. [15]Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein. “Introduction to Algorithms.” MIT Press, 2009.
  8. [05]Jerzy Neyman, Egon S. Pearson. “On the Problem of the Most Efficient Tests of Statistical Hypotheses.” Philosophical Transactions of the Royal Society A, vol. 231, pp. 289337, 1933.
  9. [08]Christopher M. Bishop. “Pattern Recognition and Machine Learning.” Springer, 2006.
  10. [04]Geoffrey Grimmett, David Stirzaker. “Probability and Random Processes.” Oxford University Press, 1992.
  11. [02]E. T. Jaynes. “Probability Theory: The Logic of Science.” Cambridge University Press, 2003.
  12. [16]A. Papoulis, S. U. Pillai. “Probability, Random Variables, and Stochastic Processes.” McGraw-Hill, 2002.
  13. [06]George Casella, Roger L. Berger. “Statistical Inference.” Duxbury, 2002.
  14. [17]R. A. Fisher. “Statistical Methods for Research Workers.” Oliver and Boyd, 1925.
  15. [07]E. L. Lehmann, Joseph P. Romano. “Testing Statistical Hypotheses.” Springer, 1998.
  16. [09]Trevor Hastie, Robert Tibshirani, Jerome Friedman. “The Elements of Statistical Learning: Data Mining, Inference, and Prediction.” Springer, 2009.
  17. [14]Donald E. Knuth. “The TeXbook.” Addison-Wesley, 1984.
1
Ensure the value is non-negative.
2
Ensure the value is non-negative as well. This is a longer footnote to show line wrapping behavior. It should wrap correctly and be indented properly. Let's add even more text to make sure it wraps multiple lines. Now we have