CtrlK

Algorithms

REINFORCE (Monte Carlo Policy Gradient)

This algorithm does not work very well.
It is said to have very high variance. Hence convergence issues.
It can work with POMDPs as it is, which is not the case with Actor-Critic or Value iteration.

Example: When we have Gaussian Policy

DDPG

PPO

TRPO

SAC

PreviousPolicy Gradient NextMulti Agent Reinforcement Learning

Last updated 1 year ago