Revision

Back to Value based methods

Introduction

Value based methods are algorithms based on state and action value functions (\(V_\pi\) and \(Q_\pi\)) that estimate the best possible policy based on these value functions.

As these functions are approximations of the true (unobservable) state and action value functions, the best policy given these approximated functions won’t be optimal in the environment except if the value functions perfeclty approximate the true value functions.

The process to obtain an optimal policy alternate between defining a policy \(\pi\) (at initialisation \(\pi\) is generally a random or equiprobable policy), estimating state and action value functions \(V_\pi\) and \(Q_\pi\) and updating the policy based on these new estimation of the value functions.

Given a policy \(\pi\) we know its associated state and action value functions:

Take an initial policy \(\pi\) as input,
Use the policy to observed the associated rewards \(R\) (using the environment model configuration if the model is known or using an agent that interact with the environment if the underlying model of the environment is unknown),
Update the state and action value functions \(V_\pi\) and \(Q_\pi\),
Update the policy \(\pi\).

Type of models

Their exist to type of models:

Model based learning

Model based learning methods are methods that assume the knowledge of the underlying model of the environment.

Model free learning

Model free learning methods are methods that do not assume the knowledge of the underlying model of the environment.