Chapter 3: Finite Markov Decision Processes

  • You take a action a_t at a time-step in state s_t and in response to that action you get response(reward) r_t+1 just after that, in next time step and also get into new state s_t+1.

  • S0, A0, R1, S1, A1, R2, S2, A2, R3, . . . this sequence defines a MDP.

  • Markov Property:- Future is independent of the past given the present.

  • Finite MDP - Have finite element in S,A,R set

p defines the dynamics of the MDP
State transition probabilities
reward
bellman recursive equation

Last updated