Revision

Bayes’ theorem

Definition

In probability theory and statistics, Bayes’ theorem (or Bayes’ law or Bayes’ rule) describes the probability of an event, based on prior knowledge of conditions that might be related to the event.

Formula

The Baye’s theorem is:

\[P\left(A \vert B\right) = \frac{P\left(B \vert A\right) P\left(A\right)}{P\left(B\right)}\]

It is just a derivation of this:

\[P\left(A \vert B\right) P\left(B\right) = P\left(A \cap B\right) = P\left(B \vert A\right) P\left(A\right)\]

Another formulation is:

\[posterior = \frac{prior \times likelihood}{evidence}\]

Where:

\(prior\) is the probability of the event \(A\) before taking into account any specific information,
\(likelihood\) is the likelihood of observing another event \(B\) given the event \(A\),
\(evidence\) is a normalizing constant that is the probability of event \(B\) in the whole population.

In a test for a disease:

\(A\) would be ‘presence of disease’,
\(B\) would be ‘positive test’,
\(prior\) would be ‘probability (percentage) of disease in the whole population’,
\(likelihood\) would be ‘probability (percentage) of having positive test if presence of disease’
\(evidence\) would be ‘probability (percentage) of positive test in the whole population’

Examples

See:

Bayes’ theorem page on Wikipedia, part on examples.

Resources

See:

MCMC Algorithms

In statistics, Markov chain Monte Carlo (MCMC) methods comprise a class of algorithms for sampling from a probability distribution. By constructing a Markov chain that has the desired distribution as its equilibrium distribution, one can obtain a sample of the desired distribution by recording states from the chain. The more steps are included, the more closely the distribution of the sample matches the actual desired distribution.

Metropolis Hasting Algorithm

Metropolis–Hastings algorithm is a Markov chain Monte Carlo (MCMC) method for obtaining a sequence of random samples from a probability distribution \(\pi\) from which direct sampling is difficult. This sequence can be used to approximate the distribution (e.g. to generate a histogram) or to compute an integral (e.g. an expected value).

Assumtions

Assume:

\[\pi(x) = \frac{f(x)}{\int f(y) \mathrm{d}y}\]

Here \(\frac{1}{\int f(y) \mathrm{d}y}\) is a normalization constant that is difficult to estimate. Metropolis Hasting Algorithm allows to sample from \(\pi\) without computing this normalization constant. It uses the fact that \(f \propto \pi\).

Let \(q(\cdot \vert x)\) be any probability density that depends on \(x\). For the algorithm to work correctly, it should be easy to sample following \(q(\cdot \vert x)\).

\(q(\cdot \vert x)\) may be for example \(\mathcal{N}(x, \sigma^2)\)

Pseudo code

Choose an initial \(X_0 \in \mathcal{X}\).
Assume \(X_1, \ldots, X_i\) has been generated:
1. Generate \(Y\) such that \(Y \sim q(\cdot \vert X_i)\),
2. \(\alpha = \min \left(\frac{f(Y) q(X_i \vert Y)}{f(X_i) q(Y \vert X_i)}, 1 \right) = \min \left(\frac{\pi(Y) q(X_i \vert Y)}{\pi(X_i) q(Y \vert X_i)}, 1 \right)\),
3. \(X_{i+1}= \begin{cases} Y && \text{with probability } && \alpha\\ X_i && \text{with probability } && 1-\alpha\\ \end{cases}\) .

Interpretation

At every iteration, the algorithm tries to move in the space of possible states. This move may be accepted or reject. The acceptance rate \(\alpha\) depends on the probability of the new state given the current state and the probability distribution \(\pi\). It the algorithm tries to move to a more probable state (ie \(\alpha = 1\)) then the move is always accepted. Otherwise the probability to reject the move is proportional to the drop of the probability density of the new point compared to the current point.

Resources

See:

Resources

See: