Commit Graph

65 Commits

Author SHA1 Message Date
Jan Löwenstrom 149b8f4bd8 Merge branch 'master' of https://github.com/kono94/refo 2020-04-19 20:56:21 +02:00
Jan Löwenstrom 19d5a87ce0 add multiple food scenario 2020-04-19 20:55:42 +02:00
Jan Löwenstrom 4590562a4c add new every visit results, fix rngEnv NullPointer 2020-04-19 19:14:03 +02:00
Jan Löwenstrom 7d3d097599 add opening dialog to select all learning settings 2020-04-07 11:03:17 +02:00
Jan Löwenstrom 9d1f8dfd46 apply code improvements suggested by intelliJ 2020-04-05 14:44:48 +02:00
Jan Löwenstrom 94ad976a1f spawn start of antgame constant 2020-04-05 14:07:51 +02:00
Jan Löwenstrom bbccef1e71 removed unnecessary stuff from sampling branches 2020-04-05 13:37:38 +02:00
Jan Löwenstrom 0300f3b1fd Merge branch 'antWorldRewardAnalysis'
# Conflicts:
#	src/main/java/core/algo/EpisodicLearning.java
#	src/main/java/core/controller/RLController.java
#	src/main/java/evironment/jumpingDino/DinoWorld.java
#	src/main/java/evironment/jumpingDino/DinoWorldAdvanced.java
#	src/main/java/example/JumpingDino.java
2020-04-05 13:21:20 +02:00
Jan Löwenstrom ad07c1da8f remove DinoSampling stuff 2020-04-05 13:10:13 +02:00
Jan Löwenstrom 5b82e7965d rename MC class and improve specific analysis of antGame examples 2020-04-05 12:29:44 +02:00
Jan Löwenstrom 4402d70467 Merge remote-tracking branch 'origin/antWorldRewardAnalysis' into antWorldRewardAnalysis
# Conflicts:
#	OptimalityDifferentDiscountFactors.R
#	src/main/java/core/algo/td/QLearningOffPolicyTDControl.java
#	src/main/java/example/ContinuousAnt.java
2020-04-05 12:05:15 +02:00
Jan Löwenstrom b9be640284 add multiple folders to organize results 2020-04-05 12:00:16 +02:00
Jan Löwenstrom a08b8160a3 add new results of needed timestamps in total 2020-04-04 17:14:12 +02:00
Jan Löwenstrom 595451e88b add new results of needed timestamps in total 2020-04-04 17:07:43 +02:00
Jan Löwenstrom a40e279f48 change reward function for antgame to match BA 2020-04-04 14:41:58 +02:00
Jan Löwenstrom 9a3452ff9c add Every-Visit Monte-Carlo 2020-04-02 17:13:51 +02:00
Jan Löwenstrom 740289ee2b add constant for default reward 2020-04-02 14:01:37 +02:00
Jan Löwenstrom e7404a8d24 add improved result graphs 2020-03-31 17:49:15 +02:00
Jan Löwenstrom 0fde1bd962 Merge remote-tracking branch 'origin/antWorldRewardAnalysis' into antWorldRewardAnalysis 2020-03-29 17:22:56 +02:00
Jan Löwenstrom f4b50627d1 add antGame analysis data and R Scripts and images 2020-03-29 17:22:47 +02:00
Jan Löwenstrom 78955a9521 add antGame analysis data and R Scripts and images 2020-03-29 17:22:01 +02:00
Jan Löwenstrom 328fc85214 modify q Learning to sample results and update R script 2020-03-28 12:35:33 +01:00
Jan Löwenstrom eca0d8db4d create Dino Sampling state 2020-03-26 19:22:50 +01:00
Jan Löwenstrom 58f9900f3c
Delete con.txt 2020-03-17 18:33:54 +01:00
Jan Löwenstrom ee1d62842d split Antworld into episodic and continuous task
- add new simple state for jumping dino, to see if convergence is guarenteed with with state representation
- changed reward structure for ant game
2020-03-15 16:58:53 +01:00
Jan Löwenstrom 4641f50b79 add results for convergence for advanced dino jumping 2020-03-05 13:17:54 +01:00
Jan Löwenstrom b1d06293fe add shadowJar 2020-03-05 12:25:42 +01:00
Jan Löwenstrom 1f743cf8f2 fix eps/sec stat 2020-03-05 12:09:36 +01:00
Jan Löwenstrom e67f40ad65 split DinoWorld between simple and advanced example
# Conflicts:
#	src/main/java/example/JumpingDino.java
2020-03-05 12:06:41 +01:00
Jan Löwenstrom 18d6e32f64 split DinoWorld between simple and advanced example 2020-03-05 11:58:57 +01:00
Jan Löwenstrom cffec63dc6 apply threading changes to master branch and clean up for tag version
- no testing or epsilon testing stuff
2020-03-05 11:49:51 +01:00
Jan Löwenstrom 9b54b72a25 add epsilon convergence test and will remove unnecessary multithreaded learning 2020-03-03 02:52:39 +01:00
Jan Löwenstrom 6613e23c7c Fixed new method name for MC 2020-03-02 23:19:54 +01:00
Jan Löwenstrom 33f896ff40 Merge remote-tracking branch 'origin/epsilonTest' 2020-03-02 23:10:01 +01:00
Jan Löwenstrom 18a702ba62 add BlackJack environment and fix save and load
- method names were swapped
2020-03-01 13:51:47 +01:00
Jan Löwenstrom 0e4f52a48e first epsilon decaying method 2020-02-27 15:29:15 +01:00
Jan Löwenstrom cff1a4e531 add isJumping info to dinoState 2020-02-26 17:14:28 +01:00
Jan Löwenstrom 77898f4e5a add TD algorithms and started adopting to continous tasks
- add Q-Learning and SARSA
- more config variables
2020-02-17 13:56:55 +01:00
Jan Löwenstrom f4f1f7bd37 add QTableFrame and clickable states that display a gui
- remove org.javaTuple in favour of org.apache.common for tuples and circleQueue
- remove ViewListener from non-GUI Controller
- stateActionTable saves the last 10 states that changed. They will get displayed in QTable Frame
in JTextAreas
2020-01-01 23:54:18 +01:00
Jan Löwenstrom a8f8af1102 add gradle wrapper and jar building 2020-01-01 18:58:25 +01:00
Jan Löwenstrom 295a1f8af0 remove javaFX dependency in favour of org.javaTuples
- Pair<K,V> , .getValue0() .getValue1()
2020-01-01 18:25:22 +01:00
Jan Löwenstrom b7d991cc92 render 5 frames for every RL step
- temp. repainting JComponent in env.step()
2020-01-01 18:05:59 +01:00
Jan Löwenstrom ec86006a07 enhance hashCode and equals methods
- intelliJ generated methods
2020-01-01 14:57:08 +01:00
Jan Löwenstrom 518683b676 split GUI parts from controller into sub class 2019-12-31 14:43:40 +01:00
Jan Löwenstrom 195722e98f enhance save/load feature and change thread handling
- saving monte carlo did not include returnSum and returnCount, so it the state would be wrong after loading. Learning, EpisodicLearning and MonteCarlo classes are all overriding custom save and load methods, calling super() each time but including fields that are necessary to replace on runtime.
- moved generic episodic behaviour from monteCarlo to abstract top level class
- using AtomicInteger for episodesToLearn
- moved learning-Thread-handling from controller to model. Learning got one extra Leaning thread.
- add feature to use custom speed and distance for dino world obstacles
2019-12-29 01:12:11 +01:00
Jan Löwenstrom 64355e0b93 add javadoc 2019-12-27 00:50:59 +01:00
Jan Löwenstrom b2c3854b3a change RL-Controller initialization process and action space iterable
- no fake builder pattern anymore, moved needed fields into constructor
- add serializeUID
- action space extends iterable interface to simplify looping over all actions (and not returning the actual list)
2019-12-24 19:38:35 +01:00
Jan Löwenstrom 5a4e380faf add dino jumping environment, deterministic/reproducable behaviour and save-and-load feature
- add feature to save and load learning progress (Q-Table) and current episode count
- episode end is now purely decided by environment instead of monte carlo algo capping it on 10 actions
- using linkedHashMap on all locations to ensure deterministic behaviour
- fixed major RNG issue to reproduce algorithmic behaviour
- clearing rewardHistory, to only save the last 10k rewards
- added google dino jump environment
2019-12-22 23:33:56 +01:00
Jan Löwenstrom b1246f62cc add features to gui to control learning and moving learning listener interface to controller
- Add metric to display episodes per second
- view not implementing learning listener anymore, controller does. Controller is controlling all view actions based upon learning events. Reacts to view events via viewListener
- add executor service for learning task
- using instance of to distinguish between episodic learning and td learning
- add feature to trigger more episodes
- add checkboxes for smoothing graph, displaying last 100 rewards only and drawing environment
- remove history panel from antworld gui
2019-12-22 17:06:54 +01:00
Jan Löwenstrom 34e7e3fdd6 distinguish learning and episodic learning, enable fast-learning without drawing every step to reduce lag
- repainting every step on no time delay will certainly freeze the app, so "fast-learning" will disable it, only refreshing current episode label
- Added new abstract class "Episodic Learning". Maybe just use an interface instead?! Important because TD learning is not episodic, needs another way to represent the rewards received (maybe mean of last X rewards or sth)
- Opening two JFrames, one with learning infos and one with environment
2019-12-21 00:23:09 +01:00