Update 2021-01-17: New exploration bonus and RND bug-fixes
Update 2021-01-17: New exploration bonus and RND bug-fixes
- PR#6 introduces an autoencoder loss bonus wrapper and fixes significant problems in the RND wrapper. Previously RND accumulated gradients but did not use them to update the predictor network due to a missing call to the optimizer. I’ve replaced the animations and updated the text in relevant places.