Algorithms
REINFORCE (Monte Carlo Policy Gradient)

This algorithm does not work very well.
It is said to have very high variance. Hence convergence issues.
It can work with POMDPs as it is, which is not the case with Actor-Critic or Value iteration.
Example: When we have Gaussian Policy

DDPG
PPO
TRPO
SAC
Last updated