For artificial agents to be considered truly intelligent they should excel at a wide variety of tasks that are considered challenging for humans. Until this point, it had only been possible to create individual algorithms capable of mastering a single specific domain. With our algorithm, we leveraged recent breakthroughs in training deep neural networks to show that a novel end-to-end reinforcement learning agent, termed a deep Q-network (DQN), was able to surpass the overall performance of a professional human reference player and all previous agents across a diverse range of 49 game scenarios.
Positive punishment is effective in eliminating undesired behaviors but it does have limitations. Positive punishment has been found to be more effective when the stimulus is added immediately following the undesired behavior as opposed to applying delayed stimulus. Another factor is consistent application of a stimulus following an undesired behavior, this is more effective than occasional application of a stimulus (Cheney & Pierce, 2004). The greatest drawback is that positive punishment fails to teach desirable behaviors. Furthermore, positive punishment can produce undesirable emotional reactions such as passivity, fear, anxiety, or hostility (Skinner, 1974; as cited in Cheney & Pierce, 2004).