Algorithms

REINFORCE (Monte Carlo Policy Gradient)

  • This algorithm does not work very well.

  • It is said to have very high variance. Hence convergence issues.

  • It can work with POMDPs as it is, which is not the case with Actor-Critic or Value iteration.

Example: When we have Gaussian Policy

DDPG

PPO

TRPO

SAC

Last updated