Chapter 3: Finite Markov Decision Processes
Last updated
Last updated
You take a action a_t at a time-step in state s_t and in response to that action you get response(reward) r_t+1 just after that, in next time step and also get into new state s_t+1.
S0, A0, R1, S1, A1, R2, S2, A2, R3, . . . this sequence defines a MDP.
Markov Property:- Future is independent of the past given the present.
Finite MDP - Have finite element in S,A,R set