Jekyll2021-07-05T19:46:49+00:00https://rivesunder.github.io//carle/feed.xmlCarle’s GameCellular Automata Reinforcement Learning Environment. Update 2021-04-05: Alpha release: support for run-length encoding, stacked reward wrappers, and more.2021-04-05T00:00:00+00:002021-04-05T00:00:00+00:00https://rivesunder.github.io//carle/updates/2021/04/05/update<hr />
<h2 id="update-2021-04-05-alpha-release">Update 2021-04-05: Alpha Release</h2>
<p>CARLE is now in a state to permit useful experimentation and exploration. Recent additions rolled into the alpha release include:</p>
<ul>
<li>Stacking reward wrappers now works, so multiple reward metrics can be applied for simultaneous training(there are 4 included, but experimenters are encourage to implement their own).</li>
<li>Reward wrappers intended to correlate with the development of puffers/guns and gliders have been added.</li>
<li>Evaluation functionality and a random baseline demonstrator have been added.</li>
<li>Support for run-length encoding makes it much easier to import export cellular automaton universes to be used with external tools like Golly.</li>
</ul>
<p>Also I recently picked up a day job, which will make developing CARLE and running Carle’s Game for IEEE CoG 2021 significantly more challenging. However, I am still of the mind that it’s a doable thing, and my ideas about judging the contest have crystallized somewhat.</p>
<p>Currently evaluation functionality is accomplished by stacking the 4 included reward wrappers around CARLE with certain reward component weighting and accumulating rewards over 1024 steps for 5 different Life-like rulesets (Life, Morley/Move, Day and Night, DotLife, and Live Free or Die)*. However I’ve come to realize that evaluating machine creativity in an open-ended context like Carle’s Game will necessarily by a matter of subjective human judgement for all that entails. I think it would be a mistake to rank one agent that scores higher on arbitrary proxy metrics over another agent that discovers and exciting new puffer pattern. Therefore judging will be a matter of human judgement and quantitative proxy metrics will be used as a tool for human observers to explore agent activity and used to break ties in the event of conflicting or ambiguity.</p>
<p>In order to facilitate human interaction with agents’ exploration in CARLE, the beta release deadline has been moved to April 30th, and the objective has been changed to center around building tools for humans to use to judge agent activity. I plan to use <a href="https://bokeh.org/">Bokeh</a> for the first iteration of this functionality, but ideally I’ll be able to implement interactive evaluation tools that can be publicly available on GitHub pages, something that looks a little like Life implementations by <a href="https://wangytangy.github.io/Conway-Game-of-Life/">wangytangy</a>, <a href="https://magicmart.github.io/Game-of-Life/">magicmart</a> or <a href="https://igorkonovalov.github.io/projects/2017/01/04/Game_of_life.html">Igor Konovalov</a>. The trick is to convert PyTorch models to a portable representation that can be called from javascript.</p>
<p>Stay tuned for further developments. Meanwhile, here are some of the random baseline evaluation runs, which should make clear the difficulty of relying on proxy reward metrics:</p>
<div align="center">
<img src="/carle/assets/random_baseline_1.png" />
<br />
Random baseline 1, mean reward per step over 5120 steps in 5 different rulesets: 2.684e-01
<br /><br />
<img src="/carle/assets/random_baseline_2.png" />
<br />
Random baseline 2, mean reward per step over 5120 steps in 5 different rulesets: 3.011e-01
<br /><br />
<img src="/carle/assets/random_baseline_3.png" />
<br />
Random baseline 3, mean reward per step over 5120 steps in 5 different rulesets: 2.690e-01
</div>
<p><br /><br /></p>
<ul>
<li>aka B3/S23, B368/S245, B3678/S34678, B3/S023, and B2/S0</li>
</ul>Update 2021-04-05: Alpha ReleaseUpdate 2021-02-14: Fix cellular birth logic, blog post about evaluation in open-endedness.2021-02-14T00:00:00+00:002021-02-14T00:00:00+00:00https://rivesunder.github.io//carle/updates/2021/02/14/update<hr />
<h2 id="update-2021-02-14-fix-cellular-birth-logic-blog-post-about-evaluation-in-open-endedness">Update 2021-02-14: Fix cellular birth logic, blog post about evaluation in open-endedness</h2>
<ul>
<li>
<p>Commit <a href="https://github.com/riveSunder/carle/commit/4e93a692860817e011e22baed6d96904b7460dcc">4e93a692</a> corrects erroneous in calculating cell births. Previously no check was made to whether a cell was previously dead when calcuating births. This error affected rulesets which do not contain the birth neighborhood state(s) in the survive rule list. The issue was discovered by noticing ladders in the Coral ruleset (B3/S45678).</p>
</li>
<li>
<p>New <a href="https://rivesunder.github.io/cellular_automata/carle/2021/02/12/open_ended_eval.html">blog post</a> containing some of my thoughts on evaluating agent interactions with an open-ended environment with an eye toward mechanics and artistic beauty.</p>
</li>
</ul>Update 2021-02-14: Fix cellular birth logic, blog post about evaluation in open-endednessUpdate 2021-01-17: New exploration bonus and RND bug-fixes2021-01-17T00:00:00+00:002021-01-17T00:00:00+00:00https://rivesunder.github.io//carle/updates/2021/01/17/update<hr />
<h2 id="update-2021-01-17-new-exploration-bonus-and-rnd-bug-fixes">Update 2021-01-17: New exploration bonus and RND bug-fixes</h2>
<ul>
<li><a href="https://github.com/riveSunder/carle/pull/6/">PR#6</a> introduces an autoencoder loss bonus wrapper and fixes significant problems in the RND wrapper. Previously RND accumulated gradients but did not use them to update the predictor network due to a missing call to the optimizer. I’ve replaced the animations and updated the text in relevant places.</li>
</ul>Update 2021-01-17: New exploration bonus and RND bug-fixes