Reinforcement learning policy networks