kono/refo - refo - Lukas Mahler

Commit Graph

Author	SHA1	Message	Date
Jan Löwenstrom	19d5a87ce0	add multiple food scenario	2020-04-19 20:55:42 +02:00
Jan Löwenstrom	5b82e7965d	rename MC class and improve specific analysis of antGame examples	2020-04-05 12:29:44 +02:00
Jan Löwenstrom	ee1d62842d	split Antworld into episodic and continuous task - add new simple state for jumping dino, to see if convergence is guarenteed with with state representation - changed reward structure for ant game	2020-03-15 16:58:53 +01:00
Jan Löwenstrom	0e4f52a48e	first epsilon decaying method	2020-02-27 15:29:15 +01:00
Jan Löwenstrom	77898f4e5a	add TD algorithms and started adopting to continous tasks - add Q-Learning and SARSA - more config variables	2020-02-17 13:56:55 +01:00
Jan Löwenstrom	518683b676	split GUI parts from controller into sub class	2019-12-31 14:43:40 +01:00
Jan Löwenstrom	b2c3854b3a	change RL-Controller initialization process and action space iterable - no fake builder pattern anymore, moved needed fields into constructor - add serializeUID - action space extends iterable interface to simplify looping over all actions (and not returning the actual list)	2019-12-24 19:38:35 +01:00
Jan Löwenstrom	b1246f62cc	add features to gui to control learning and moving learning listener interface to controller - Add metric to display episodes per second - view not implementing learning listener anymore, controller does. Controller is controlling all view actions based upon learning events. Reacts to view events via viewListener - add executor service for learning task - using instance of to distinguish between episodic learning and td learning - add feature to trigger more episodes - add checkboxes for smoothing graph, displaying last 100 rewards only and drawing environment - remove history panel from antworld gui	2019-12-22 17:06:54 +01:00
Jan Löwenstrom	34e7e3fdd6	distinguish learning and episodic learning, enable fast-learning without drawing every step to reduce lag - repainting every step on no time delay will certainly freeze the app, so "fast-learning" will disable it, only refreshing current episode label - Added new abstract class "Episodic Learning". Maybe just use an interface instead?! Important because TD learning is not episodic, needs another way to represent the rewards received (maybe mean of last X rewards or sth) - Opening two JFrames, one with learning infos and one with environment	2019-12-21 00:23:09 +01:00
Jan Löwenstrom	7db5a2af3b	add fix RNG, add extended interface EpsilonPolicy and move rewardHistory to model instead of view - only setting the seed of RNG once at the beginning and not reseeding it afterwards. Deep copying the initial AntWorld to use as blueprint for resetting the world instead of reseeding and creating pesudo random again. Reseeding the RNG has influence action selecting to always choose the same trajectory. - instance of is used to determine if policy has epsilon or not and the view will adopt to this, only showing epsilon slider if policy has epsilon	2019-12-20 16:51:09 +01:00
Jan Löwenstrom	e0160ca1df	adopt MVC pattern and add real time graph interface	2019-12-18 16:48:24 +01:00

11 Commits