Policy Gradient

Directly parametrise the policy instead of value or action-value function and optimize it.

Last updated