Update 2021-01-17: New exploration bonus and RND bug-fixes

Jan 17, 2021

Update 2021-01-17: New exploration bonus and RND bug-fixes

PR#6 introduces an autoencoder loss bonus wrapper and fixes significant problems in the RND wrapper. Previously RND accumulated gradients but did not use them to update the predictor network due to a missing call to the optimizer. I’ve replaced the animations and updated the text in relevant places.