This algorithm does not work very well.
It is said to have very high variance. Hence convergence issues.
It can work with POMDPs as it is, which is not the case with Actor-Critic or Value iteration.
Last updated 2 years ago