Commit Graph

19 Commits

Author SHA1 Message Date
Jan Löwenstrom b2c3854b3a change RL-Controller initialization process and action space iterable
- no fake builder pattern anymore, moved needed fields into constructor
- add serializeUID
- action space extends iterable interface to simplify looping over all actions (and not returning the actual list)
2019-12-24 19:38:35 +01:00
Jan Löwenstrom 5a4e380faf add dino jumping environment, deterministic/reproducable behaviour and save-and-load feature
- add feature to save and load learning progress (Q-Table) and current episode count
- episode end is now purely decided by environment instead of monte carlo algo capping it on 10 actions
- using linkedHashMap on all locations to ensure deterministic behaviour
- fixed major RNG issue to reproduce algorithmic behaviour
- clearing rewardHistory, to only save the last 10k rewards
- added google dino jump environment
2019-12-22 23:33:56 +01:00
Jan Löwenstrom b1246f62cc add features to gui to control learning and moving learning listener interface to controller
- Add metric to display episodes per second
- view not implementing learning listener anymore, controller does. Controller is controlling all view actions based upon learning events. Reacts to view events via viewListener
- add executor service for learning task
- using instance of to distinguish between episodic learning and td learning
- add feature to trigger more episodes
- add checkboxes for smoothing graph, displaying last 100 rewards only and drawing environment
- remove history panel from antworld gui
2019-12-22 17:06:54 +01:00
Jan Löwenstrom 34e7e3fdd6 distinguish learning and episodic learning, enable fast-learning without drawing every step to reduce lag
- repainting every step on no time delay will certainly freeze the app, so "fast-learning" will disable it, only refreshing current episode label
- Added new abstract class "Episodic Learning". Maybe just use an interface instead?! Important because TD learning is not episodic, needs another way to represent the rewards received (maybe mean of last X rewards or sth)
- Opening two JFrames, one with learning infos and one with environment
2019-12-21 00:23:09 +01:00
Jan Löwenstrom 7db5a2af3b add fix RNG, add extended interface EpsilonPolicy and move rewardHistory to model instead of view
- only setting the seed of RNG once at the beginning and not reseeding it afterwards. Deep copying
the initial AntWorld to use as blueprint for resetting the world instead of reseeding and creating pesudo random again. Reseeding the RNG has influence action selecting to always
choose the same trajectory.
- instance of is used to determine if policy has epsilon or not and the view will adopt to this, only showing epsilon slider if policy has epsilon
2019-12-20 16:51:09 +01:00
Jan Löwenstrom e0160ca1df adopt MVC pattern and add real time graph interface 2019-12-18 16:48:24 +01:00
Jan Löwenstrom 7f18a66e98 add random policy test 2019-12-12 00:20:38 +01:00
Jan Löwenstrom 584d6a1246 add javaFX gradle plugin and switch to java11 and add system.outs for error detecting
- The current implementation will not converge to the correct behaviour. See comment in MonteCarlo class for more details
2019-12-10 15:37:20 +01:00
Jan Löwenstrom 55d8bbf5dc add Random-, Greedy and EGreedy-Policy and first implementation of monte carlo method
- fixed bug regarding wrong generation of hashCode. hashCodes needs to be equal across equal objects. Compute hashCode on final states once and return this value instead of computing it every time .hashCode() gets called.
-
2019-12-09 23:21:48 +01:00
Jan Löwenstrom 0100f2e82a remove the Action interface in favour of Enums 2019-12-09 17:30:14 +01:00
Jan Löwenstrom 8a533dda94 change ActionSpace interface temporarily to quickly fit antWorld test and improve gui of walking ant 2019-12-09 13:41:00 +01:00
Jan Löwenstrom 2fb218a129 add separate class for intern Ant representation and adopt gui cell size to panel size 2019-12-09 12:08:53 +01:00
Jan Löwenstrom c11cc2c3f2 add two simple scroll panes to represent environment and ant brain 2019-12-09 01:09:34 +01:00
Jan Löwenstrom db9b62236c add logic to handle ant action and compute rewards
- ant world will handle and compute action received by the agent
- first try to convert observations to markov states
- improved .equals() methods
2019-12-08 16:03:00 +01:00
Jan Löwenstrom ec67ce60c9 add default structure for AntAgent 2019-12-08 13:15:20 +01:00
Jan Löwenstrom 581cf6b28b Merge remote-tracking branch 'origin/master' 2019-12-07 22:08:20 +01:00
Jan Löwenstrom 87f435c65a add basic core structure and first parts of antGame implementation 2019-12-07 22:05:11 +01:00
Jan Löwenstrom 431ae4d3df add basic core structure and first parts of antGame implementation 2019-12-07 17:31:30 +01:00
Jan Löwenstrom 66ee33b77f init the gradle project 2019-12-06 13:11:29 +01:00