POC: Full Learned Sokoban experiment

This is a proof of concept MR (not intended to merge) replicating a full experiment with on-line learning of Sokoban model.

Merge request reports

Loading