Revision

Back to Machine Learning

Description

A random forest is a classification or regression method based on an ensemble of decision trees. Each tree of a random forest is calibrated on a randomly created sample of the original data.

Bagging

Bagging or bootstrap aggregating is a way of generating training subsets from an original dataset. The creation of the sub sample of data for a random forest is done as follow:

Let D be the ensemble of data \(D=\{\left(X^j, Y^j\right) \forall j \in [1, n_{pop}]\}\),
For \(k \in [1, n_{trees}]\):
- Sample, with replacement, \(n_{pop}\) (or \(l \leq n_{pop}\)) training examples from \(\left(X, Y\right)\); call these \(\left(X_k, Y_k\right)\).
Each subsample \(D_i\) of \(D\) contains the same quantity of data as \(D\) (ie \(n_{pop}\)),
Each subsample \(D_i\) may contain duplicates (each \(D_i\) has an average of 63% of the original data).

Bagging model

The method of learning decision tree on bagged sample of data is called the bagging algorithm.

For each \(D_i\), it trains a classification or regression tree \(T_i\) based on \(X_i\) and \(Y_i\). The final prediction for an unseen sample \(x\) using the bagging algorithm is then:

For regression

\[T_B(x)=\frac{1}{n_{trees}}\sum_{k=0}^{n_{trees}}T_k(x)\]

For classification:

\[T_B(x)=\frac{1}{n_{trees}}\sum_{k=0}^{n_{trees}} \left[T_k(x) \leq 0.5\right]\]

For regression the model outputs the average of every results and for classification it outputs the majority of vote.

Bagging algorithms has a lower variance than a decision tree without increasing the biais.

A random forest is very similar to a bagging model but adds some other type of bagging scheme.

Feature Bagging

Random forest uses a modified tree learning algorithm. For each tree and at split in the learning process only a random subset of the features is selected.

The reason for this is to decrease the correlation among the trees compared to bagging model. Indeed, in the bagging model, if one or a few features are very strong predictors for the response variable (target output), these features will be selected in many of the trees, causing them to become correlated.

Formula

The random forest predictor for an unseen sample \(x\) is thus:

For regression

\[T_B(x)=\frac{1}{n_{trees}}\sum_{k=0}^{n_{trees}}T_k(x)\]

For classification:

\[T_B(x)=\frac{1}{n_{trees}}\sum_{k=0}^{n_{trees}}\left[T_k(x) \leq 0.5\right]\]

Where:

\(T_i\) is a tree trained using feature bagging on a subsample \(D_i\)
\(n_{trees}\) is an hyperparameter

The only difference with the bagging model is the way \(T_i\) is trained (‘normal’ tree learning algorithm for bagging model and using feature bagging for random forest).

Compared to a bagging model it reduces more the variance (due to lower correlation among trees) but it adds biais as the trees does not use all the features at each split.

Increase the number \(n_{trees}\) of trees does not overfit the model.

ExtraTrees

ExtraTrees or extremely randomized trees is another method similar to random forest.

In ExtraTrees:

Each tree is trained using the whole learning sample (rather than a bootstrap sample),
At each split the selected ‘cut-point’ for each feature is randomly selected from a uniform distribution within the feature’s empirical range,
The best best split among all randomly selected features is used (using a criterion - Gini impurity or Entropy impurity for example).

Bias Variance

Random Forest uses an ensemble of weak learners (decision trees) with low bias but high variance.

Using an ensemble of uncorrelated trees it reduces the total variance. If feature bagging is used, the decorrelation if the trees is better but the bias is increased.

References

See: