Last updated 1 year ago
This algorithm does not work very well.
It is said to have very high variance. Hence convergence issues.
It can work with POMDPs as it is, which is not the case with Actor-Critic or Value iteration.