- remove org.javaTuple in favour of org.apache.common for tuples and circleQueue
- remove ViewListener from non-GUI Controller
- stateActionTable saves the last 10 states that changed. They will get displayed in QTable Frame
in JTextAreas
- saving monte carlo did not include returnSum and returnCount, so it the state would be wrong after loading. Learning, EpisodicLearning and MonteCarlo classes are all overriding custom save and load methods, calling super() each time but including fields that are necessary to replace on runtime.
- moved generic episodic behaviour from monteCarlo to abstract top level class
- using AtomicInteger for episodesToLearn
- moved learning-Thread-handling from controller to model. Learning got one extra Leaning thread.
- add feature to use custom speed and distance for dino world obstacles
- no fake builder pattern anymore, moved needed fields into constructor
- add serializeUID
- action space extends iterable interface to simplify looping over all actions (and not returning the actual list)
- add feature to save and load learning progress (Q-Table) and current episode count
- episode end is now purely decided by environment instead of monte carlo algo capping it on 10 actions
- using linkedHashMap on all locations to ensure deterministic behaviour
- fixed major RNG issue to reproduce algorithmic behaviour
- clearing rewardHistory, to only save the last 10k rewards
- added google dino jump environment
- Add metric to display episodes per second
- view not implementing learning listener anymore, controller does. Controller is controlling all view actions based upon learning events. Reacts to view events via viewListener
- add executor service for learning task
- using instance of to distinguish between episodic learning and td learning
- add feature to trigger more episodes
- add checkboxes for smoothing graph, displaying last 100 rewards only and drawing environment
- remove history panel from antworld gui
- repainting every step on no time delay will certainly freeze the app, so "fast-learning" will disable it, only refreshing current episode label
- Added new abstract class "Episodic Learning". Maybe just use an interface instead?! Important because TD learning is not episodic, needs another way to represent the rewards received (maybe mean of last X rewards or sth)
- Opening two JFrames, one with learning infos and one with environment