Revision

Back to Boosting

Definition

AdaBoost is a classification boosting algorithm which iteratively forces the learners to predict accurately the sample wrongly predicted.

It does this by setting weights on each sample. At initialisation the weights of every sample are equal but after an iteration the weights of wrongly predicted sample are greater than the correctly predicted samples.

Here are the steps of AdaBoost:

Start with same weight for all individuals: \(\alpha_j = \frac{1}{n_{pop}}\)
For \(k \in [1, n_{trees}]\):
- Learn a decision tree on data using the weights \(\alpha_j\),
- Compute from the errors \(\varepsilon_k\), the coefficients \(w_k\),
- Recompute weights \(\alpha_j\) based on \(w_k\),
- Normalise weights \(\alpha_j\) (they must sum to 1),
Final prediction is the sum of \(w_k \cdot t_k\).

Where:

\(\varepsilon_k=\sum_{j=1}^{n_{pop}}\alpha_j[t_k(X_j) \ne Y_j]\),
\(w_k=\frac{1}{2}\log\left(\frac{1-\varepsilon_k}{\varepsilon_k}\right)\) (negative logit function muliplied by 0.5),
\(\alpha_j\) is updated as follow: \(\alpha_j=\begin{cases} \alpha_j e^{-w_k}&& \text{if } \;\; t_k(X_j)=Y_j\\ \alpha_j e^{w_k}&& \text{if } \;\; t_k(X_j) \ne Y_j \end{cases}\),
Normalisation of \(\alpha_j\) is \(\alpha_j \leftarrow \frac{\alpha_j}{\sum_{l=1}^{n_{pop}}\alpha_l}\).

The final prediction for an unseen sample \(x\) is:

\[T_A(x)=\frac{1}{n_{trees}}\sum_{k=0}^{n_{trees}} w_k \cdot \left[t_k(x) \leq 0.5\right]\]

AdaBoost as a Gradient Boosting algorithm

Using the exponential loss \(L(h(X), Y)=e^{-h(X)Y}\) and \(\lambda=1\) it can be shown that Gradient Boosting is almost equivalent to AdaBoost.

See:

This StackExchange page.

Ressources

See: