Update 2021-01-17: New exploration bonus and RND bug-fixes

  • PR#6 introduces an autoencoder loss bonus wrapper and fixes significant problems in the RND wrapper. Previously RND accumulated gradients but did not use them to update the predictor network due to a missing call to the optimizer. I’ve replaced the animations and updated the text in relevant places.