Markov Decision Process
We consider the finite case, where number of states is finite, and time steps are discrete.
Setup
States:
Actions:
Rewards:
is the probability that the action in state at time will lead to state at time .
Policy , which maps states to actions, defining the strategy
Reward and Objective
Total Reward: .
Policy: = Probability of taking action in state .
Goal: Maximize
- Find policy that maximizes the expected reward