- repainting every step on no time delay will certainly freeze the app, so "fast-learning" will disable it, only refreshing current episode label
- Added new abstract class "Episodic Learning". Maybe just use an interface instead?! Important because TD learning is not episodic, needs another way to represent the rewards received (maybe mean of last X rewards or sth)
- Opening two JFrames, one with learning infos and one with environment
- only setting the seed of RNG once at the beginning and not reseeding it afterwards. Deep copying
the initial AntWorld to use as blueprint for resetting the world instead of reseeding and creating pesudo random again. Reseeding the RNG has influence action selecting to always
choose the same trajectory.
- instance of is used to determine if policy has epsilon or not and the view will adopt to this, only showing epsilon slider if policy has epsilon
- fixed bug regarding wrong generation of hashCode. hashCodes needs to be equal across equal objects. Compute hashCode on final states once and return this value instead of computing it every time .hashCode() gets called.
-