A PLEN2 Robot learns to walk using Twin-Delayed Deep Deterministic Policy Gradient in Gazebo and PyBullet with a custom OpenAI Gym interface