Optimality
Value functions define a partial ordering over policies.
Definition
A policy is defined to be better than or equal to a policy if its expected return is greater than or equal to that of over all states.
In other words,
The optimal policy is the one that is better than or equal to all the other policies. There may be multiple optimal policies. We denote it by .
Optimal policies also share the same optimal action-value function, denoted by .
For the state-action pair , this gives the expected return for taking the action in state and thereafter following an optimal policy.
In any finite MDP, there is always at least one deterministic optimal policy.
Bellman optimality equation for
Source: Practical RL, HSE University
Equations
Bellman optimality equation for
Source: Practical RL, HSE University