kono/refo - refo - Lukas Mahler

Commit Graph

Author	SHA1	Message	Date
Jan Löwenstrom	149b8f4bd8	Merge branch 'master' of https://github.com/kono94/refo	2020-04-19 20:56:21 +02:00
Jan Löwenstrom	19d5a87ce0	add multiple food scenario	2020-04-19 20:55:42 +02:00
Jan Löwenstrom	4590562a4c	add new every visit results, fix rngEnv NullPointer	2020-04-19 19:14:03 +02:00
Jan Löwenstrom	7d3d097599	add opening dialog to select all learning settings	2020-04-07 11:03:17 +02:00
Jan Löwenstrom	9d1f8dfd46	apply code improvements suggested by intelliJ	2020-04-05 14:44:48 +02:00
Jan Löwenstrom	94ad976a1f	spawn start of antgame constant	2020-04-05 14:07:51 +02:00
Jan Löwenstrom	bbccef1e71	removed unnecessary stuff from sampling branches	2020-04-05 13:37:38 +02:00
Jan Löwenstrom	0300f3b1fd	Merge branch 'antWorldRewardAnalysis' # Conflicts: # src/main/java/core/algo/EpisodicLearning.java # src/main/java/core/controller/RLController.java # src/main/java/evironment/jumpingDino/DinoWorld.java # src/main/java/evironment/jumpingDino/DinoWorldAdvanced.java # src/main/java/example/JumpingDino.java	2020-04-05 13:21:20 +02:00
Jan Löwenstrom	ad07c1da8f	remove DinoSampling stuff	2020-04-05 13:10:13 +02:00
Jan Löwenstrom	5b82e7965d	rename MC class and improve specific analysis of antGame examples	2020-04-05 12:29:44 +02:00
Jan Löwenstrom	4402d70467	Merge remote-tracking branch 'origin/antWorldRewardAnalysis' into antWorldRewardAnalysis # Conflicts: # OptimalityDifferentDiscountFactors.R # src/main/java/core/algo/td/QLearningOffPolicyTDControl.java # src/main/java/example/ContinuousAnt.java	2020-04-05 12:05:15 +02:00
Jan Löwenstrom	b9be640284	add multiple folders to organize results	2020-04-05 12:00:16 +02:00
Jan Löwenstrom	a08b8160a3	add new results of needed timestamps in total	2020-04-04 17:14:12 +02:00
Jan Löwenstrom	595451e88b	add new results of needed timestamps in total	2020-04-04 17:07:43 +02:00
Jan Löwenstrom	a40e279f48	change reward function for antgame to match BA	2020-04-04 14:41:58 +02:00
Jan Löwenstrom	9a3452ff9c	add Every-Visit Monte-Carlo	2020-04-02 17:13:51 +02:00
Jan Löwenstrom	740289ee2b	add constant for default reward	2020-04-02 14:01:37 +02:00
Jan Löwenstrom	e7404a8d24	add improved result graphs	2020-03-31 17:49:15 +02:00
Jan Löwenstrom	0fde1bd962	Merge remote-tracking branch 'origin/antWorldRewardAnalysis' into antWorldRewardAnalysis	2020-03-29 17:22:56 +02:00
Jan Löwenstrom	f4b50627d1	add antGame analysis data and R Scripts and images	2020-03-29 17:22:47 +02:00
Jan Löwenstrom	78955a9521	add antGame analysis data and R Scripts and images	2020-03-29 17:22:01 +02:00
Jan Löwenstrom	328fc85214	modify q Learning to sample results and update R script	2020-03-28 12:35:33 +01:00
Jan Löwenstrom	eca0d8db4d	create Dino Sampling state	2020-03-26 19:22:50 +01:00
Jan Löwenstrom	58f9900f3c	Delete con.txt	2020-03-17 18:33:54 +01:00
Jan Löwenstrom	ee1d62842d	split Antworld into episodic and continuous task - add new simple state for jumping dino, to see if convergence is guarenteed with with state representation - changed reward structure for ant game	2020-03-15 16:58:53 +01:00
Jan Löwenstrom	4641f50b79	add results for convergence for advanced dino jumping	2020-03-05 13:17:54 +01:00
Jan Löwenstrom	b1d06293fe	add shadowJar	2020-03-05 12:25:42 +01:00
Jan Löwenstrom	1f743cf8f2	fix eps/sec stat	2020-03-05 12:09:36 +01:00
Jan Löwenstrom	e67f40ad65	split DinoWorld between simple and advanced example # Conflicts: # src/main/java/example/JumpingDino.java	2020-03-05 12:06:41 +01:00
Jan Löwenstrom	18d6e32f64	split DinoWorld between simple and advanced example	2020-03-05 11:58:57 +01:00
Jan Löwenstrom	cffec63dc6	apply threading changes to master branch and clean up for tag version - no testing or epsilon testing stuff	2020-03-05 11:49:51 +01:00
Jan Löwenstrom	9b54b72a25	add epsilon convergence test and will remove unnecessary multithreaded learning	2020-03-03 02:52:39 +01:00
Jan Löwenstrom	6613e23c7c	Fixed new method name for MC	2020-03-02 23:19:54 +01:00
Jan Löwenstrom	33f896ff40	Merge remote-tracking branch 'origin/epsilonTest'	2020-03-02 23:10:01 +01:00
Jan Löwenstrom	18a702ba62	add BlackJack environment and fix save and load - method names were swapped	2020-03-01 13:51:47 +01:00
Jan Löwenstrom	0e4f52a48e	first epsilon decaying method	2020-02-27 15:29:15 +01:00
Jan Löwenstrom	cff1a4e531	add isJumping info to dinoState	2020-02-26 17:14:28 +01:00
Jan Löwenstrom	77898f4e5a	add TD algorithms and started adopting to continous tasks - add Q-Learning and SARSA - more config variables	2020-02-17 13:56:55 +01:00
Jan Löwenstrom	f4f1f7bd37	add QTableFrame and clickable states that display a gui - remove org.javaTuple in favour of org.apache.common for tuples and circleQueue - remove ViewListener from non-GUI Controller - stateActionTable saves the last 10 states that changed. They will get displayed in QTable Frame in JTextAreas	2020-01-01 23:54:18 +01:00
Jan Löwenstrom	a8f8af1102	add gradle wrapper and jar building	2020-01-01 18:58:25 +01:00
Jan Löwenstrom	295a1f8af0	remove javaFX dependency in favour of org.javaTuples - Pair<K,V> , .getValue0() .getValue1()	2020-01-01 18:25:22 +01:00
Jan Löwenstrom	b7d991cc92	render 5 frames for every RL step - temp. repainting JComponent in env.step()	2020-01-01 18:05:59 +01:00
Jan Löwenstrom	ec86006a07	enhance hashCode and equals methods - intelliJ generated methods	2020-01-01 14:57:08 +01:00
Jan Löwenstrom	518683b676	split GUI parts from controller into sub class	2019-12-31 14:43:40 +01:00
Jan Löwenstrom	195722e98f	enhance save/load feature and change thread handling - saving monte carlo did not include returnSum and returnCount, so it the state would be wrong after loading. Learning, EpisodicLearning and MonteCarlo classes are all overriding custom save and load methods, calling super() each time but including fields that are necessary to replace on runtime. - moved generic episodic behaviour from monteCarlo to abstract top level class - using AtomicInteger for episodesToLearn - moved learning-Thread-handling from controller to model. Learning got one extra Leaning thread. - add feature to use custom speed and distance for dino world obstacles	2019-12-29 01:12:11 +01:00
Jan Löwenstrom	64355e0b93	add javadoc	2019-12-27 00:50:59 +01:00
Jan Löwenstrom	b2c3854b3a	change RL-Controller initialization process and action space iterable - no fake builder pattern anymore, moved needed fields into constructor - add serializeUID - action space extends iterable interface to simplify looping over all actions (and not returning the actual list)	2019-12-24 19:38:35 +01:00
Jan Löwenstrom	5a4e380faf	add dino jumping environment, deterministic/reproducable behaviour and save-and-load feature - add feature to save and load learning progress (Q-Table) and current episode count - episode end is now purely decided by environment instead of monte carlo algo capping it on 10 actions - using linkedHashMap on all locations to ensure deterministic behaviour - fixed major RNG issue to reproduce algorithmic behaviour - clearing rewardHistory, to only save the last 10k rewards - added google dino jump environment	2019-12-22 23:33:56 +01:00
Jan Löwenstrom	b1246f62cc	add features to gui to control learning and moving learning listener interface to controller - Add metric to display episodes per second - view not implementing learning listener anymore, controller does. Controller is controlling all view actions based upon learning events. Reacts to view events via viewListener - add executor service for learning task - using instance of to distinguish between episodic learning and td learning - add feature to trigger more episodes - add checkboxes for smoothing graph, displaying last 100 rewards only and drawing environment - remove history panel from antworld gui	2019-12-22 17:06:54 +01:00
Jan Löwenstrom	34e7e3fdd6	distinguish learning and episodic learning, enable fast-learning without drawing every step to reduce lag - repainting every step on no time delay will certainly freeze the app, so "fast-learning" will disable it, only refreshing current episode label - Added new abstract class "Episodic Learning". Maybe just use an interface instead?! Important because TD learning is not episodic, needs another way to represent the rewards received (maybe mean of last X rewards or sth) - Opening two JFrames, one with learning infos and one with environment	2019-12-21 00:23:09 +01:00

1 2

65 Commits All Branches Search

65 Commits

All Branches