Probability Concepts

Events

What is Event?

In probability theory, an event is a set of outcomes of an experiment, to which a probability is assigned.

The "set of outcomes" part is the one that tripped me a lot when I learned math in the past. For example, an event can be five coin tosses turning up head in a sequence, not just one head or one tail.

Types of events:

Independent
Dependent
Mutual Exclusive (Disjoint)
(Not mutual exclusive)

Independent vs Dependent

Independent

Events can be independent, meaning that each event is not affected by any other events.

For instance, coin tosses are independent. Probability of the event "head", given that the coin is fair, is always 1/2, regardless the previous outcomes.

\text{Probability of an event happening} = \frac{\text{Number of ways it can happen}}{\text{Total number of outcomes}}

Probability of independent events happening together:

P(A\cap B) = P(A) \times P(B)

Dependent

Events can be dependent, meaning that they can be affected by previous events.

For instance, after taking one card from the deck, there are less cards available, so the probabilities change.

Probability of both events happens, given that B is dependent on A, is the probability of A happening multiply by the probabilit of B happen given that A happened:

P(A \cap B) = P(A) \times P(B|A) = P(B) \times P(A|B)

Mutual Exclusive vs Not Mutual Exclusive

Mutual Exclusive Events

Events are mutually exclusive if they cannot happen at the same time.

For instance, turning left and turning right are mutually exclusive.

P(A\cap B) = 0\\ P(A \cup B) = P(A) + P(B)

Not Mutual Exclusive

P(A\cup B) = P(A) + P(B) - P(A\cap B)

Conditional Probability, Bayesian Rules, and Bayesian Networks

Conditional Probability

Conditional probability is for handling dependent events.

The conditional probability of an event B is the probability that the event will occur given the knowledge that an event A has already occured. In other words, it is the probability that both A and B occur divided by the probability that A occurs.

P(B|A) = \frac{P(A\cap B)}{P(A)}

Bayes Theorem

Practically, Bayes' Theorem is a way of finding a probability when we know certain other probabilities.

P(A|B) = \frac{P(A) \times P(B|A)}{P(B)}

Note that $P(A) \times P(B|A) = P(A \cap B)$ . Moreover, $\frac{P(A \cap B)}{P(B)}$ is the definition of $P(A|B)$ .

Special version of Bayes's formula involving complement events:

P(A|B) = \frac{P(A) \times P(B|A)}{P(B\cap A) + P(B\cap \neg A)}= \frac{P(A) \times P(B|A)}{P(A) \times P(B|A) + P(\neg A) \times P(B|\neg A)}

Example

There is a test for allergy, but it is not always right. For people who really have the allergy, the test say yes 80% of the time. For people who do not have the allergy, the test says yes 10% of the times ("false positive"). Given that only 1% of the population have the allergy, and the test result is yes, what is the chance that we actually have the allergy?

P(Yes|Allergy) = 80\%\\ P(Allergy) = 1\%\\ P(Yes | \neg Allergy) = 10\%\\ P(Yes) = P(Allergy) \times P(Yes|Allergy) + P(\neg Allergy)\times P(Yes|\neg Allergy)\\ \Rightarrow P(Allergy|Yes) = \frac{P(Allergy) \times P(Yes | Allergy)}{P(Allergy) \times P(Yes|Allergy) + P(\neg Allergy)\times P(Yes|\neg Allergy)}

Think of it this way: the probability of having yes equals to the probability of having allergy and the test say yes plus the probability of not having allergy, but the test still say yes (i.e., probability of saying yes given not having allergy).

False Positives and False Negatives

TBA

Chain rule

Chain rule is the generalization of the product rules (i.e., $P(A\cap B) = P(A)\times P(B|A)$ ).

Let $E_1, E_2, \dots, E_n$ be $n$ events. The joint probability of all the n events is given by the following formula:

P(\bigcap\limits_{i=1}^n E_i)=P(\bigcap\limits_{i=1}^{n-1} E_i) \times P(E_n | \bigcap\limits_{i=1}^{n-1} E_i)

For instance, this rule can be applied recursively to calculate the full joint probability distribution as follow:

P(A, B, C) = P(A, B) \times P(C|A,B) = P(A)\times P(B|A) \times P(C|A,B)

Bayesian Networks

Bayesian networks, also known as "belief networks" or "causal networks", are graphical models for representing multivariate probability distributions. Each variable $X_i$ is represented as a vertex in a directed acyclic graph ("DAG"). The probability distriubtion $P(X_1, X_2, \dots, X_N)$ is represented in the factorized form as follows:

P(X_1, X_2, \dots, X_N) = \prod\limits_{i=1}^N P(X_i | \Pi_{X_i})

The set $\Pi_{X_i}$ denotes the set of vertices that are parents of $X_i$ in the graph. A Bayesian network is fully specified by the combination of:

The graph structure (i.e.,) what directed arcs exist in the graph
The probability table $P(X_i | \Pi_{X_i})$ for each variable $X_i$

Usage

Bayesian networks are convenient to represent systems of probabilistic causal relationships. The fact that "X often causes Y" may easily be modelled in the network by adding a directed arc from X to Y and setting the probabilities appropriately.

Inference in Bayesian Networks

Random Variable

A random variable, usually written as $X$ is a variable whose possible values are numerical outcomes of a random phenomenon. There two two types of random variables, namely discrete and continuous.

Discrete random variables

A discrete random variable takes on only countable number of distinct values, such as 0, 1, 2, 3, etc. For example, it can be used to model the number of children in a family, the number of patients in a doctor's surgery, the number of defective light bulbs in a box of ten.

Noted that a set is countable if every member of the set is associated with a unique natural number.

Probability Distribution

The probability distribution of a discrete random variable is a list of probabilities associated with each of its possible value. This is sometimes called probability mass function.

Notation

Suppose a random variable $X$ may take $k$ different values, with the probability that $X=x_i$ is defined to be $P(X=x_i)=p_i$ , then the probability $p_i$ must satisfy the following conditions:

$0 \le p_i \le 1$
$p_1 + p_2 + \dots + p_k =1$

Continuous Random Variables

A continuous random variable is the one which takes an infinite number of possible values. Continuous random variables are usually measurements, such as height, weight, the time rqeuired to run a mile.

Probability Distribution

A continuous random variable is not defined at specific values. Instead, it is defined over an interval of values, and is represented by the area under a curve (in other words, integral). The probability of observing any single value is equal to 0, since the number of values which may be assumed by the random variable is infinite.

Notation

Suppose a random variable $X$ may take all vaues over an interval of real number. Then, the probability that $X$ is in the set of outcomes $A$ , denoted as $p(A)$ , is defined as the area above $A$ and under a curve. The curve, which represent the function $p(x)$ must satisfy the following conditions:

$\forall x,{p(x) \ge 0}$
$\sum p(x)=1$

A curve meeting these requirements is known as a density curve.

Hex to Bytes

Artificial Intelligence