Simple implementations of vanilla reinforce (policy gradient) and actor critic methods with numpy and different frameworks