Skip to main content

Markov Decision Process

We consider the finite case, where number of states is finite, and time steps are discrete.

Setup

  • States: sโˆˆSs \in \mathcal{S}

  • Actions: aโˆˆAa \in \mathcal{A}

  • Rewards: rโˆˆRr \in \mathbb{R}

  • Pa(s,sโ€ฒ)=Pr(st+1=sโ€ฒโˆฃst=s,at=a)P_a(s, s') = Pr(s_{t+1} = s' | s_t = s, a_t = a) is the probability that the action aa in state ss at time tt will lead to state sโ€ฒs' at time t+1t+1.

  • Policy ฯ€\pi, which maps states to actions, defining the strategy

Reward and Objective

  • Total Reward: R=โˆ‘trtR = \sum\limits_{t} r_t.

  • Policy: ฯ€(aโˆฃs)\pi(a | s) = Probability of taking action aa in state ss.

  • Goal: Maximize ฯ€(aโˆฃs):Eฯ€[R]\pi(a|s): E_{\pi}[R]

    • Find policy ฯ€\pi that maximizes the expected reward