├── ReadMe.md └── src ├── CliffWalkingQLearning.m ├── CliffWalkingSARSA.m ├── DemoGUI.fig ├── DemoGUI.m ├── DrawActionOnCell.m ├── DrawCliffEpisodeState.m ├── DrawEpisodeState.m ├── DrawGrid.m ├── DrawTextOnCell.m ├── DrawWindyEpisodeState.m ├── FindCellCenter.m ├── FindColBaseCenter.m ├── GenerateRandomWalkSequence.m ├── GridWorldQLearning.m ├── GridWorldSARSA.m ├── PredictionRandomWalk.m ├── PredictionRandomWalkAlphaEffect.m ├── RLRandomWalk.m ├── WindyGridWorldQLearning.m └── WindyGridWorldSARSA.m /ReadMe.md: -------------------------------------------------------------------------------- 1 | # Temporal-Difference Learning Demos in MATLAB 2 | 3 | In this package you will find MATLAB codes which demonstrate some selected examples of *temporal-difference learning* methods in *prediction problems* and in *reinforcement learning*. 4 | 5 | To begin: 6 | 7 | * Run `DemoGUI.m` 8 | * Start with the set of predefined demos: select one and press *Go* 9 | * Modify demos: select one of the predefined demos, and modify the options 10 | 11 | Feel free to distribute or use package especially for educational purposes. I personally, learned too much from cliff-walking. 12 | 13 | The repository for the package is hosted on [GitHub](https://github.com/sinairv/Temporal-Difference-Learning). 14 | 15 | ## Why temporal difference learning is important 16 | 17 | A quotation from *R. S. Sutton*, and *A. G. Barto* from their book *Introduction to Reinforcement Learning* ([here](http://www.cs.ualberta.ca/~sutton/book/ebook/node60.html)): 18 | 19 | > If one had to identify one idea as central and novel to reinforcement learning, it would undoubtedly be temporal-difference (TD) learning. 20 | 21 | Many basic reinforcement learning algorithms such as *Q-Laerning* and *SARSA* are in essence *temporal difference learning methods*. 22 | 23 | ## Demos 24 | 25 | * *Prediciton random walk*: see how precise we can predict the probability of visiting nodes 26 | 27 | * *RL random walk*: see how RL generated random walk policy converges the computed probabilities. 28 | 29 | * *Simple grid world (with and without king moves)*: see how RL generated policy helps the agent find the goal through time (by *king-moves* it is meant moving along the four main directions and the diagonals, i.e., the way king moves in chess). 30 | 31 | * *Windy grid world*: the wind distracts the agent from its destination sought by its actions. See how RL solves this problem. 32 | 33 | * *Cliff walking*: the agent should reach its destination while avoiding the cliffs. A truly instructive example, which shows the differences between *on-policy*, and *off-policy* learning algorithms. 34 | 35 | ## References 36 | 37 | [1] Sutton, R. S., "Learning to predict by the methods of temporal differences, In *Machine Learning*, pp. 9-44, 1988 (available [online](http://webdocs.cs.ualberta.ca/~sutton/papers/sutton-88.pdf)) 38 | 39 | [2] Sutton, R. S. and Barto, A. G., "Reinforcement learning: An introduction," 1998 (available [online](http://webdocs.cs.ualberta.ca/~sutton/book/ebook/the-book.html)) 40 | 41 | [3] Kaelbling, L. P., Littman, M. L., and Moore, A. W., "Reinforcement learning: A survey," *Journal of Artificial Intelligence Research*, Vol.4, pp.237-285, 1997 (available [online](http://www.jair.org/media/301/live-301-1562-jair.pdf)) 42 | 43 | ## Contact 44 | 45 | Copyright (c) 2011 Sina Iravanian - licensed under MIT. 46 | 47 | Homepage: [sinairv.github.io](https://sinairv.github.io) 48 | 49 | GitHub: [github.com/sinairv](https://github.com/sinairv) 50 | 51 | Twitter: [@sinairv](http://www.twitter.com/sinairv) 52 | 53 | ## Screenshots 54 | 55 | Prediction random walk demo: 56 | 57 | ![Prediction random walk demo](http://sinairv.github.io/temporal-difference-learning/images/PrdRandomWalk.png) 58 | 59 | RL random walk demo: 60 | 61 | ![RL random walk demo](http://sinairv.github.io/temporal-difference-learning/images/RLRandomWalk.png) 62 | 63 | Simple grid-world demo: 64 | 65 | ![Simple grid-world demo](http://sinairv.github.io/temporal-difference-learning/images/GridWorlds.png) 66 | -------------------------------------------------------------------------------- /src/CliffWalkingQLearning.m: -------------------------------------------------------------------------------- 1 | function CliffWalkingQLearning(options) 2 | % CliffWalkingQLearning: implements the cliff-walking problem using the 3 | % Q-Learning method 4 | 5 | % You can pass the parameters of the problem, through the options 6 | % structure, otherwise a default settings is used for running the program. 7 | 8 | % written by: Sina Iravanian - June 2009 9 | % sina@sinairv.com 10 | % Please send your comments or bug reports to the above email address. 11 | 12 | if(nargin < 1), 13 | gamma = 0.9; 14 | alpha = 0.05; 15 | epsilon = 0.1; 16 | gridcols = 10; 17 | gridrows = 7; 18 | fontsize = 16; 19 | showTitle = 1; 20 | 21 | episodeCount = 2000; 22 | selectedEpisodes = [20 200 700 1000 2000]; 23 | isKing = 0; 24 | canHold = 0; 25 | 26 | start.row = 7; 27 | start.col = 1; 28 | goal.row = 7; 29 | goal.col = 10; 30 | else 31 | gamma = options.gamma; 32 | alpha = options.alpha; 33 | epsilon = options.epsilon; 34 | gridcols = options.gridcols; 35 | gridrows = options.gridrows; 36 | fontsize = options.fontsize; 37 | showTitle = options.showTitle; 38 | 39 | episodeCount = options.episodeCount; 40 | selectedEpisodes = options.selectedEpisodes; 41 | isKing = options.isKing; 42 | canHold = options.canHold; 43 | 44 | start = options.start; 45 | goal = options.goal; 46 | end 47 | 48 | selectedEpIndex = 1; 49 | if(isKing ~= 0), actionCount = 8; else actionCount = 4; end 50 | if(canHold ~= 0 && isKing ~= 0), actionCount = actionCount + 1; end 51 | 52 | % initialize Q with zeros 53 | Q = zeros(gridrows, gridcols, actionCount); 54 | 55 | a = 0; % an invalid action 56 | % loop through episodes 57 | for ei = 1:episodeCount, 58 | %disp(sprintf('Running episode %d', ei)); 59 | curpos = start; 60 | nextpos = start; 61 | 62 | %epsilon or greedy 63 | if(rand > epsilon) % greedy 64 | [qmax, a] = max(Q(curpos.row,curpos.col,:)); 65 | else 66 | a = IntRand(1, actionCount); 67 | end 68 | 69 | episodeFinished = 0; 70 | while(episodeFinished == 0) 71 | % take action a, observe r, and nextpos 72 | nextpos = GiveNextPos(curpos, a, gridcols, gridrows); 73 | if(PosCmp(nextpos, goal) == 0), 74 | episodeFinished = 1; 75 | r = 0; 76 | elseif(nextpos.row == gridrows && 1 < nextpos.col && nextpos.col < gridcols) 77 | episodeFinished = 1; 78 | r = -100; 79 | else 80 | r = -1; 81 | end 82 | 83 | % choose a_next from nextpos 84 | [qmax, a_next] = max(Q(nextpos.row,nextpos.col,:)); 85 | if(rand <= epsilon) % greedy 86 | a_next = IntRand(1, actionCount); 87 | end 88 | 89 | % update Q: Sarsa 90 | curQ = Q(curpos.row, curpos.col, a); 91 | nextQ = qmax; %Q(nextpos.row, nextpos.col, a_next); 92 | Q(curpos.row, curpos.col, a) = curQ + alpha*(r + gamma*nextQ - curQ); 93 | 94 | curpos = nextpos; a = a_next; 95 | end % states in each episode 96 | 97 | % if the current state of the world is going to be drawn ... 98 | if(selectedEpIndex <= length(selectedEpisodes) && ei == selectedEpisodes(selectedEpIndex)) 99 | curpos = start; 100 | rows = []; cols = []; acts = []; 101 | for i = 1:(gridrows + gridcols) * 10, 102 | [qmax, a] = max(Q(curpos.row,curpos.col,:)); 103 | nextpos = GiveNextPos(curpos, a, gridcols, gridrows); 104 | rows = [rows curpos.row]; 105 | cols = [cols curpos.col]; 106 | acts = [acts a]; 107 | 108 | if(PosCmp(nextpos, goal) == 0), break; end 109 | curpos = nextpos; 110 | end % states in each episode 111 | 112 | %figure; 113 | figure('Name',sprintf('Episode: %d', ei), 'NumberTitle','off'); 114 | DrawCliffEpisodeState(rows, cols, acts, start.row, start.col, goal.row, goal.col, gridrows, gridcols, fontsize); 115 | if(showTitle == 1), 116 | title(sprintf('Cliff Walking Q-Learning - episode %d - (\\epsilon: %3.3f), (\\alpha = %3.4f), (\\gamma = %1.1f)', ei, epsilon, alpha, gamma)); 117 | end 118 | 119 | selectedEpIndex = selectedEpIndex + 1; 120 | end 121 | 122 | end % episodes loop 123 | 124 | function c = PosCmp(pos1, pos2) 125 | c = pos1.row - pos2.row; 126 | if(c == 0) 127 | c = c + pos1.col - pos2.col; 128 | end 129 | 130 | function nextPos = GiveNextPos(curPos, actionIndex, gridCols, gridRows) 131 | nextPos = curPos; 132 | switch actionIndex 133 | case 1 % east 134 | nextPos.col = curPos.col + 1; 135 | case 2 % south 136 | nextPos.row = curPos.row + 1; 137 | case 3 % west 138 | nextPos.col = curPos.col - 1; 139 | case 4 % north 140 | nextPos.row = curPos.row - 1; 141 | case 5 % northeast 142 | nextPos.col = curPos.col + 1; 143 | nextPos.row = curPos.row - 1; 144 | case 6 % southeast 145 | nextPos.col = curPos.col + 1; 146 | nextPos.row = curPos.row + 1; 147 | case 7 % southwest 148 | nextPos.col = curPos.col - 1; 149 | nextPos.row = curPos.row + 1; 150 | case 8 % northwest 151 | nextPos.col = curPos.col - 1; 152 | nextPos.row = curPos.row - 1; 153 | case 9 % hold 154 | nextPos = curPos; 155 | otherwise 156 | disp(sprintf('invalid action index: %d', actionIndex)) 157 | end 158 | if(nextPos.col <= 0), nextPos.col = 1; end 159 | if(nextPos.col > gridCols), nextPos.col = gridCols; end 160 | if(nextPos.row <= 0), nextPos.row = 1; end 161 | if(nextPos.row > gridRows), nextPos.row = gridRows; end 162 | 163 | function n = IntRand(lowerBound, upperBound) 164 | n = floor((upperBound - lowerBound) * rand + lowerBound); 165 | -------------------------------------------------------------------------------- /src/CliffWalkingSARSA.m: -------------------------------------------------------------------------------- 1 | function CliffWalkingSARSA(options) 2 | % CliffWalkingSARSA: implements the cliff-walking problem using the 3 | % SARSA method 4 | 5 | % You can pass the parameters of the problem, through the options 6 | % structure, otherwise a default settings is used for running the program. 7 | 8 | % written by: Sina Iravanian - June 2009 9 | % sina@sinairv.com 10 | % Please send your comments or bug reports to the above email address. 11 | 12 | if(nargin < 1), 13 | gamma = 0.9; 14 | alpha = 0.05; 15 | epsilon = 0.1; 16 | gridcols = 10; 17 | gridrows = 7; 18 | fontsize = 16; 19 | showTitle = 1; 20 | 21 | episodeCount = 2000; 22 | selectedEpisodes = [20 200 700 1000 2000]; 23 | isKing = 0; 24 | canHold = 0; 25 | 26 | start.row = 7; 27 | start.col = 1; 28 | goal.row = 7; 29 | goal.col = 10; 30 | else 31 | gamma = options.gamma; 32 | alpha = options.alpha; 33 | epsilon = options.epsilon; 34 | 35 | gridcols = options.gridcols; 36 | gridrows = options.gridrows; 37 | fontsize = options.fontsize; 38 | showTitle = options.showTitle; 39 | 40 | episodeCount = options.episodeCount; 41 | selectedEpisodes = options.selectedEpisodes; 42 | isKing = options.isKing; 43 | canHold = options.canHold; 44 | start = options.start; 45 | goal = options.goal; 46 | end 47 | 48 | selectedEpIndex = 1; 49 | 50 | if(isKing ~= 0), actionCount = 8; else actionCount = 4; end 51 | if(canHold ~= 0 && isKing ~= 0), actionCount = actionCount + 1; end 52 | 53 | % initialize Q with zeros 54 | Q = zeros(gridrows, gridcols, actionCount); 55 | 56 | a = 0; % an invalid action 57 | % loop through episodes 58 | for ei = 1:episodeCount, 59 | %disp(sprintf('Running episode %d', ei)); 60 | curpos = start; 61 | nextpos = start; 62 | 63 | %epsilon or greedy 64 | if(rand > epsilon) % greedy 65 | [qmax, a] = max(Q(curpos.row,curpos.col,:)); 66 | else 67 | a = IntRand(1, actionCount); 68 | end 69 | 70 | episodeFinished = 0; 71 | while(episodeFinished == 0) 72 | % take action a, observe r, and nextpos 73 | nextpos = GiveNextPos(curpos, a, gridcols, gridrows); 74 | if(PosCmp(nextpos, goal) == 0), 75 | episodeFinished = 1; 76 | r = 0; 77 | elseif(nextpos.row == gridrows && 1 < nextpos.col && nextpos.col < gridcols) 78 | episodeFinished = 1; 79 | r = -100; 80 | else 81 | r = -1; 82 | end 83 | 84 | % choose a_next from nextpos 85 | if(rand > epsilon) % greedy 86 | [qmax, a_next] = max(Q(nextpos.row,nextpos.col,:)); 87 | else 88 | a_next = IntRand(1, actionCount); 89 | end 90 | 91 | % update Q: Sarsa 92 | curQ = Q(curpos.row, curpos.col, a); 93 | nextQ = Q(nextpos.row, nextpos.col, a_next); 94 | Q(curpos.row, curpos.col, a) = curQ + alpha*(r + gamma*nextQ - curQ); 95 | 96 | curpos = nextpos; a = a_next; 97 | end % states in each episode 98 | 99 | % if the current state of the world is going to be drawn ... 100 | if(selectedEpIndex <= length(selectedEpisodes) && ei == selectedEpisodes(selectedEpIndex)) 101 | curpos = start; 102 | rows = []; cols = []; acts = []; 103 | for i = 1:(gridrows + gridcols) * 10, 104 | [qmax, a] = max(Q(curpos.row,curpos.col,:)); 105 | nextpos = GiveNextPos(curpos, a, gridcols, gridrows); 106 | rows = [rows curpos.row]; 107 | cols = [cols curpos.col]; 108 | acts = [acts a]; 109 | 110 | if(PosCmp(nextpos, goal) == 0), break; end 111 | curpos = nextpos; 112 | end % states in each episode 113 | 114 | %figure; 115 | figure('Name',sprintf('Episode: %d', ei), 'NumberTitle','off'); 116 | DrawCliffEpisodeState(rows, cols, acts, start.row, start.col, goal.row, goal.col, gridrows, gridcols, fontsize); 117 | if(showTitle == 1), 118 | title(sprintf('Cliff Walking SARSA - episode %d - (\\epsilon: %3.3f), (\\alpha = %3.4f), (\\gamma = %1.1f)', ei, epsilon, alpha, gamma)); 119 | end 120 | 121 | selectedEpIndex = selectedEpIndex + 1; 122 | end 123 | 124 | end % episodes loop 125 | 126 | function c = PosCmp(pos1, pos2) 127 | c = pos1.row - pos2.row; 128 | if(c == 0) 129 | c = c + pos1.col - pos2.col; 130 | end 131 | 132 | function nextPos = GiveNextPos(curPos, actionIndex, gridCols, gridRows) 133 | nextPos = curPos; 134 | switch actionIndex 135 | case 1 % east 136 | nextPos.col = curPos.col + 1; 137 | case 2 % south 138 | nextPos.row = curPos.row + 1; 139 | case 3 % west 140 | nextPos.col = curPos.col - 1; 141 | case 4 % north 142 | nextPos.row = curPos.row - 1; 143 | case 5 % northeast 144 | nextPos.col = curPos.col + 1; 145 | nextPos.row = curPos.row - 1; 146 | case 6 % southeast 147 | nextPos.col = curPos.col + 1; 148 | nextPos.row = curPos.row + 1; 149 | case 7 % southwest 150 | nextPos.col = curPos.col - 1; 151 | nextPos.row = curPos.row + 1; 152 | case 8 % northwest 153 | nextPos.col = curPos.col - 1; 154 | nextPos.row = curPos.row - 1; 155 | case 9 % hold 156 | nextPos = curPos; 157 | otherwise 158 | disp(sprintf('invalid action index: %d', actionIndex)) 159 | end 160 | if(nextPos.col <= 0), nextPos.col = 1; end 161 | if(nextPos.col > gridCols), nextPos.col = gridCols; end 162 | if(nextPos.row <= 0), nextPos.row = 1; end 163 | if(nextPos.row > gridRows), nextPos.row = gridRows; end 164 | 165 | function n = IntRand(lowerBound, upperBound) 166 | n = floor((upperBound - lowerBound) * rand + lowerBound); 167 | -------------------------------------------------------------------------------- /src/DemoGUI.fig: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sinairv/Temporal-Difference-Learning/1d94ce48d9eadae186e04ad09fdb4acc82c7da77/src/DemoGUI.fig -------------------------------------------------------------------------------- /src/DemoGUI.m: -------------------------------------------------------------------------------- 1 | function varargout = DemoGUI(varargin) 2 | % DEMOGUI M-file for DemoGUI.fig, the main GUI to the TD Demo 3 | % DEMOGUI, by itself, creates a new DEMOGUI or raises the existing 4 | % singleton*. 5 | % 6 | % H = DEMOGUI returns the handle to a new DEMOGUI or the handle to 7 | % the existing singleton*. 8 | % 9 | % DEMOGUI('CALLBACK',hObject,eventData,handles,...) calls the local 10 | % function named CALLBACK in DEMOGUI.M with the given input arguments. 11 | % 12 | % DEMOGUI('Property','Value',...) creates a new DEMOGUI or raises 13 | % the existing singleton*. Starting from the left, property value pairs are 14 | % applied to the GUI before DemoGUI_OpeningFcn gets called. An 15 | % unrecognized property name or invalid value makes property application 16 | % stop. All inputs are passed to DemoGUI_OpeningFcn via varargin. 17 | % 18 | % *See GUI Options on GUIDE's Tools menu. Choose "GUI allows only one 19 | % instance to run (singleton)". 20 | % 21 | % See also: GUIDE, GUIDATA, GUIHANDLES 22 | 23 | % written by: Sina Iravanian - June 2009 24 | % sina@sinairv.com 25 | % Please send your comments or bug reports to the above email address. 26 | 27 | % Begin initialization code - DO NOT EDIT 28 | gui_Singleton = 1; 29 | gui_State = struct('gui_Name', mfilename, ... 30 | 'gui_Singleton', gui_Singleton, ... 31 | 'gui_OpeningFcn', @DemoGUI_OpeningFcn, ... 32 | 'gui_OutputFcn', @DemoGUI_OutputFcn, ... 33 | 'gui_LayoutFcn', [] , ... 34 | 'gui_Callback', []); 35 | if nargin && ischar(varargin{1}) 36 | gui_State.gui_Callback = str2func(varargin{1}); 37 | end 38 | 39 | if nargout 40 | [varargout{1:nargout}] = gui_mainfcn(gui_State, varargin{:}); 41 | else 42 | gui_mainfcn(gui_State, varargin{:}); 43 | end 44 | % End initialization code - DO NOT EDIT 45 | 46 | % --- Executes just before DemoGUI is made visible. 47 | function DemoGUI_OpeningFcn(hObject, eventdata, handles, varargin) 48 | % This function has no output args, see OutputFcn. 49 | % hObject handle to figure 50 | % eventdata reserved - to be defined in a future version of MATLAB 51 | % handles structure with handles and user data (see GUIDATA) 52 | % varargin command line arguments to DemoGUI (see VARARGIN) 53 | 54 | % Choose default command line output for DemoGUI 55 | handles.output = hObject; 56 | 57 | % Update handles structure 58 | guidata(hObject, handles); 59 | 60 | initialize_gui(hObject, handles, false); 61 | 62 | % UIWAIT makes DemoGUI wait for user response (see UIRESUME) 63 | % uiwait(handles.figure1); 64 | 65 | 66 | % --- Outputs from this function are returned to the command line. 67 | function varargout = DemoGUI_OutputFcn(hObject, eventdata, handles) 68 | % varargout cell array for returning output args (see VARARGOUT); 69 | % hObject handle to figure 70 | % eventdata reserved - to be defined in a future version of MATLAB 71 | % handles structure with handles and user data (see GUIDATA) 72 | 73 | % Get default command line output from handles structure 74 | varargout{1} = handles.output; 75 | 76 | % -------------------------------------------------------------------- 77 | function initialize_gui(fig_handle, handles, isreset) 78 | % If the metricdata field is present and the reset flag is false, it means 79 | % we are we are just re-initializing a GUI by calling it from the cmd line 80 | % while it is up. So, bail out as we dont want to reset the data. 81 | if isfield(handles, 'metricdata') && ~isreset 82 | return; 83 | end 84 | 85 | set(handles.panelPredRandomWalk, 'Visible', 'on'); 86 | set(handles.panelRLRandomWalk, 'Visible', 'off'); 87 | set(handles.panelGridWorlds, 'Visible', 'off'); 88 | 89 | pos = get(handles.panelGridWorlds, 'Position'); 90 | set(handles.panelPredRandomWalk, 'Position', pos); 91 | set(handles.panelRLRandomWalk, 'Position', pos); 92 | 93 | 94 | % Update handles structure 95 | guidata(handles.figure1, handles); 96 | 97 | % --- Executes during object creation, after setting all properties. 98 | function gamma_CreateFcn(hObject, eventdata, handles) 99 | % hObject handle to gamma (see GCBO) 100 | % eventdata reserved - to be defined in a future version of MATLAB 101 | % handles empty - handles not created until after all CreateFcns called 102 | % Hint: edit controls usually have a white background on Windows. 103 | % See ISPC and COMPUTER. 104 | if ispc && isequal(get(hObject,'BackgroundColor'), get(0,'defaultUicontrolBackgroundColor')) 105 | set(hObject,'BackgroundColor','white'); 106 | end 107 | 108 | function alpha_Callback(hObject, eventdata, handles) 109 | % hObject handle to alpha (see GCBO) 110 | % eventdata reserved - to be defined in a future version of MATLAB 111 | % handles structure with handles and user data (see GUIDATA) 112 | % Hints: get(hObject,'String') returns contents of alpha as text 113 | % str2double(get(hObject,'String')) returns contents of alpha as a double 114 | 115 | % --- Executes during object creation, after setting all properties. 116 | function alpha_CreateFcn(hObject, eventdata, handles) 117 | % hObject handle to alpha (see GCBO) 118 | % eventdata reserved - to be defined in a future version of MATLAB 119 | % handles empty - handles not created until after all CreateFcns called 120 | % Hint: edit controls usually have a white background on Windows. 121 | % See ISPC and COMPUTER. 122 | if ispc && isequal(get(hObject,'BackgroundColor'), get(0,'defaultUicontrolBackgroundColor')) 123 | set(hObject,'BackgroundColor','white'); 124 | end 125 | 126 | function nStates_Callback(hObject, eventdata, handles) 127 | % hObject handle to nStates (see GCBO) 128 | % eventdata reserved - to be defined in a future version of MATLAB 129 | % handles structure with handles and user data (see GUIDATA) 130 | % Hints: get(hObject,'String') returns contents of nStates as text 131 | % str2double(get(hObject,'String')) returns contents of nStates as a double 132 | 133 | % --- Executes during object creation, after setting all properties. 134 | function nStates_CreateFcn(hObject, eventdata, handles) 135 | % hObject handle to nStates (see GCBO) 136 | % eventdata reserved - to be defined in a future version of MATLAB 137 | % handles empty - handles not created until after all CreateFcns called 138 | % Hint: edit controls usually have a white background on Windows. 139 | % See ISPC and COMPUTER. 140 | if ispc && isequal(get(hObject,'BackgroundColor'), get(0,'defaultUicontrolBackgroundColor')) 141 | set(hObject,'BackgroundColor','white'); 142 | end 143 | 144 | function nEpisodes_Callback(hObject, eventdata, handles) 145 | % hObject handle to nEpisodes (see GCBO) 146 | % eventdata reserved - to be defined in a future version of MATLAB 147 | % handles structure with handles and user data (see GUIDATA) 148 | % Hints: get(hObject,'String') returns contents of nEpisodes as text 149 | % str2double(get(hObject,'String')) returns contents of nEpisodes as a double 150 | 151 | % --- Executes during object creation, after setting all properties. 152 | function nEpisodes_CreateFcn(hObject, eventdata, handles) 153 | % hObject handle to nEpisodes (see GCBO) 154 | % eventdata reserved - to be defined in a future version of MATLAB 155 | % handles empty - handles not created until after all CreateFcns called 156 | % Hint: edit controls usually have a white background on Windows. 157 | % See ISPC and COMPUTER. 158 | if ispc && isequal(get(hObject,'BackgroundColor'), get(0,'defaultUicontrolBackgroundColor')) 159 | set(hObject,'BackgroundColor','white'); 160 | end 161 | 162 | function selectedEpisodes_Callback(hObject, eventdata, handles) 163 | % hObject handle to selectedEpisodes (see GCBO) 164 | % eventdata reserved - to be defined in a future version of MATLAB 165 | % handles structure with handles and user data (see GUIDATA) 166 | % Hints: get(hObject,'String') returns contents of selectedEpisodes as text 167 | % str2double(get(hObject,'String')) returns contents of selectedEpisodes as a double 168 | 169 | % --- Executes during object creation, after setting all properties. 170 | function selectedEpisodes_CreateFcn(hObject, eventdata, handles) 171 | % hObject handle to selectedEpisodes (see GCBO) 172 | % eventdata reserved - to be defined in a future version of MATLAB 173 | % handles empty - handles not created until after all CreateFcns called 174 | % Hint: edit controls usually have a white background on Windows. 175 | % See ISPC and COMPUTER. 176 | if ispc && isequal(get(hObject,'BackgroundColor'), get(0,'defaultUicontrolBackgroundColor')) 177 | set(hObject,'BackgroundColor','white'); 178 | end 179 | 180 | % --- Executes on button press in btnGo. 181 | function btnGo_Callback(hObject, eventdata, handles) 182 | % hObject handle to btnGo (see GCBO) 183 | % eventdata reserved - to be defined in a future version of MATLAB 184 | % handles structure with handles and user data (see GUIDATA) 185 | 186 | showTitle = get(handles.cbShowPlotTitle, 'Value'); 187 | options.showTitle = showTitle; 188 | 189 | demoType = get(handles.cmbDemoType,'Value'); 190 | if(demoType == 1), % Prediction Random Walk 191 | predType = get(handles.cmbPredType ,'Value'); 192 | 193 | options.nStates = str2num(get(handles.prdNStates, 'String')); 194 | options.nTrainingSets = str2num(get(handles.prdNTSets, 'String')); 195 | options.nSequences = str2num(get(handles.prdNEpisodes, 'String')); 196 | options.lambdas = str2num(get(handles.prdLambdas, 'String')); 197 | 198 | if(predType == 1), % Effect of lambda 199 | options.alpha = str2num(get(handles.prdAlpha, 'String')); 200 | options.doPlot = 1; 201 | PredictionRandomWalk(options); 202 | elseif(predType == 2), % Effect of alpha and lambda 203 | options.alphas = str2num(get(handles.prdAlphas, 'String')); 204 | PredictionRandomWalkAlphaEffect(options); 205 | end 206 | elseif(demoType == 2), % RL Random Walk 207 | options.gamma = str2num(get(handles.gamma, 'String')); 208 | options.alpha = str2num(get(handles.alpha, 'String')); 209 | options.nStates = str2num(get(handles.nStates, 'String')); 210 | options.nEpisodes = str2num(get(handles.nEpisodes, 'String')); 211 | options.selectedEpisodes = str2num(get(handles.selectedEpisodes, 'String')); 212 | 213 | RLRandomWalk(options); 214 | elseif(demoType == 3), % Grid Worlds 215 | gridType = get(handles.cmbGridType,'Value'); 216 | agentMoves = get(handles.cmbAgentMoves,'Value'); 217 | algorithmType = get(handles.cmbAlgorithm,'Value'); 218 | 219 | if(agentMoves == 1), 220 | options.isKing = 0; 221 | options.canHold = 0; 222 | elseif(agentMoves == 2), 223 | options.isKing = 1; 224 | options.canHold = 0; 225 | elseif(agentMoves == 3) 226 | options.isKing = 1; 227 | options.canHold = 1; 228 | end 229 | 230 | options.gamma = str2num(get(handles.gwGamma, 'String')); 231 | options.alpha = str2num(get(handles.gwAlpha, 'String')); 232 | options.epsilon = str2num(get(handles.gwEpsilon, 'String')); 233 | options.episodeCount = str2num(get(handles.gwNEpisodes, 'String')); 234 | options.selectedEpisodes = str2num(get(handles.gwSelectedEpisodes, 'String')); 235 | 236 | griddims = str2num(get(handles.gwGridDims, 'String')); 237 | options.gridcols = griddims(2); 238 | options.gridrows = griddims(1); 239 | 240 | startPt = str2num(get(handles.gwStart, 'String')); 241 | start.row = startPt(1); 242 | start.col = startPt(2); 243 | options.start = start; 244 | 245 | goalPt = str2num(get(handles.gwGoal, 'String')); 246 | goal.row = goalPt(1); 247 | goal.col = goalPt(2); 248 | options.goal = goal; 249 | 250 | options.fontsize = str2num(get(handles.fontSize, 'String')); 251 | 252 | if(gridType == 1), 253 | if(algorithmType == 1), 254 | GridWorldSARSA(options); 255 | elseif(algorithmType == 2), 256 | GridWorldQLearning(options); 257 | end 258 | elseif(gridType == 2), 259 | options.windpowers = str2num(get(handles.gwWindPowers, 'String')); 260 | if(algorithmType == 1), 261 | WindyGridWorldSARSA(options); 262 | elseif(algorithmType == 2), 263 | WindyGridWorldQLearning(options); 264 | end 265 | elseif(gridType == 3), 266 | if(algorithmType == 1), 267 | CliffWalkingSARSA(options); 268 | elseif(algorithmType == 2), 269 | CliffWalkingQLearning(options); 270 | end 271 | end 272 | end 273 | 274 | % --- Executes on selection change in cmbPredefinedDemos. 275 | function cmbPredefinedDemos_Callback(hObject, eventdata, handles) 276 | % hObject handle to cmbPredefinedDemos (see GCBO) 277 | % eventdata reserved - to be defined in a future version of MATLAB 278 | % handles structure with handles and user data (see GUIDATA) 279 | % Hints: contents = get(hObject,'String') returns cmbPredefinedDemos contents as cell array 280 | % contents{get(hObject,'Value')} returns selected item from cmbPredefinedDemos 281 | sel = get(hObject,'Value'); 282 | if(sel == 1), % Pred RW Lambda 283 | set(handles.cmbDemoType, 'Value', 1); 284 | set(handles.cmbPredType, 'Value', 1); 285 | set(handles.prdAlpha, 'String', 0.1); 286 | set(handles.prdNStates, 'String', 7); 287 | set(handles.prdNTSets, 'String', 100); 288 | set(handles.prdNEpisodes, 'String', 40); 289 | set(handles.prdLambdas, 'String', '[0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1]'); 290 | elseif(sel == 2), 291 | set(handles.cmbDemoType, 'Value', 1); 292 | set(handles.cmbPredType, 'Value', 2); 293 | set(handles.prdNStates, 'String', 7); 294 | set(handles.prdNTSets, 'String', 500); 295 | set(handles.prdNEpisodes, 'String', 20); 296 | set(handles.prdLambdas, 'String', '[0 0.3 0.8 1]'); 297 | set(handles.prdAlphas, 'String', '[0.0 0.1 0.2 0.3 0.4]'); 298 | elseif(sel == 3), 299 | set(handles.cmbDemoType, 'Value', 2); 300 | set(handles.gamma, 'String', 1); 301 | set(handles.alpha, 'String', 0.01); 302 | set(handles.nStates, 'String', 7); 303 | set(handles.nEpisodes, 'String', 5000); 304 | set(handles.selectedEpisodes, 'String','[1 20 200 1000 5000]'); 305 | elseif(sel == 4), 306 | set(handles.cmbDemoType, 'Value', 2); 307 | set(handles.gamma, 'String', 1); 308 | set(handles.alpha, 'String', 0.01); 309 | set(handles.nStates, 'String', 51); 310 | set(handles.nEpisodes, 'String', 5000); 311 | set(handles.selectedEpisodes, 'String','[1 20 200 1000 5000]'); 312 | elseif(sel == 5), 313 | set(handles.cmbDemoType, 'Value', 3); 314 | set(handles.cmbGridType, 'Value', 1); 315 | set(handles.cmbAlgorithm, 'Value', 1); 316 | set(handles.cmbAgentMoves, 'Value', 1); 317 | set(handles.gwGamma, 'String', 0.9); 318 | set(handles.gwAlpha, 'String', 0.1); 319 | set(handles.gwEpsilon, 'String', 0.1); 320 | set(handles.gwNEpisodes, 'String', 1000); 321 | set(handles.gwSelectedEpisodes, 'String', '[20 200 700 1000]'); 322 | set(handles.gwGridDims, 'String', '[7 10]'); 323 | set(handles.gwStart, 'String', '[4 1]'); 324 | set(handles.gwGoal, 'String', '[4 8]'); 325 | elseif(sel == 6), 326 | set(handles.cmbDemoType, 'Value', 3); 327 | set(handles.cmbGridType, 'Value', 1); 328 | set(handles.cmbAlgorithm, 'Value', 1); 329 | set(handles.cmbAgentMoves, 'Value', 1); 330 | set(handles.gwGamma, 'String', 0.9); 331 | set(handles.gwAlpha, 'String', 0.3); 332 | set(handles.gwEpsilon, 'String', 0.1); 333 | set(handles.gwNEpisodes, 'String', 4000); 334 | set(handles.gwSelectedEpisodes, 'String', '[1000 2000 3000 4000]'); 335 | set(handles.gwGridDims, 'String', '[20 20]'); 336 | set(handles.gwStart, 'String', '[2 2]'); 337 | set(handles.gwGoal, 'String', '[19 19]'); 338 | elseif(sel == 7), 339 | set(handles.cmbDemoType, 'Value', 3); 340 | set(handles.cmbGridType, 'Value', 1); 341 | set(handles.cmbAlgorithm, 'Value', 1); 342 | set(handles.cmbAgentMoves, 'Value', 2); 343 | set(handles.gwGamma, 'String', 0.9); 344 | set(handles.gwAlpha, 'String', 0.1); 345 | set(handles.gwEpsilon, 'String', 0.1); 346 | set(handles.gwNEpisodes, 'String', 4000); 347 | set(handles.gwSelectedEpisodes, 'String', '[1000 2000 3000 4000]'); 348 | set(handles.gwGridDims, 'String', '[20 20]'); 349 | set(handles.gwStart, 'String', '[2 2]'); 350 | set(handles.gwGoal, 'String', '[19 19]'); 351 | elseif(sel == 8), 352 | set(handles.cmbDemoType, 'Value', 3); 353 | set(handles.cmbGridType, 'Value', 2); 354 | set(handles.cmbAlgorithm, 'Value', 1); 355 | set(handles.cmbAgentMoves, 'Value', 1); 356 | set(handles.gwGamma, 'String', 0.9); 357 | set(handles.gwAlpha, 'String', 0.1); 358 | set(handles.gwEpsilon, 'String', 0.1); 359 | set(handles.gwNEpisodes, 'String', 1000); 360 | set(handles.gwSelectedEpisodes, 'String', '[20 200 700 1000]'); 361 | set(handles.gwGridDims, 'String', '[7 10]'); 362 | set(handles.gwStart, 'String', '[4 1]'); 363 | set(handles.gwGoal, 'String', '[4 8]'); 364 | set(handles.gwWindPowers, 'String', '[0 0 0 1 1 1 2 2 1 0]'); 365 | elseif(sel == 9), 366 | set(handles.cmbDemoType, 'Value', 3); 367 | set(handles.cmbGridType, 'Value', 2); 368 | set(handles.cmbAlgorithm, 'Value', 1); 369 | set(handles.cmbAgentMoves, 'Value', 2); 370 | set(handles.gwGamma, 'String', 0.9); 371 | set(handles.gwAlpha, 'String', 0.1); 372 | set(handles.gwEpsilon, 'String', 0.1); 373 | set(handles.gwNEpisodes, 'String', 1000); 374 | set(handles.gwSelectedEpisodes, 'String', '[20 200 700 1000]'); 375 | set(handles.gwGridDims, 'String', '[7 10]'); 376 | set(handles.gwStart, 'String', '[4 1]'); 377 | set(handles.gwGoal, 'String', '[4 8]'); 378 | set(handles.gwWindPowers, 'String', '[0 0 0 1 1 1 2 2 1 0]'); 379 | elseif(sel == 10), 380 | set(handles.cmbDemoType, 'Value', 3); 381 | set(handles.cmbGridType, 'Value', 3); 382 | set(handles.cmbAlgorithm, 'Value', 1); 383 | set(handles.cmbAgentMoves, 'Value', 1); 384 | set(handles.gwGamma, 'String', 0.9); 385 | set(handles.gwAlpha, 'String', 0.05); 386 | set(handles.gwEpsilon, 'String', 0.1); 387 | set(handles.gwNEpisodes, 'String', 5000); 388 | set(handles.gwSelectedEpisodes, 'String', '[20 200 700 1000 2000 5000]'); 389 | set(handles.gwGridDims, 'String', '[7 10]'); 390 | set(handles.gwStart, 'String', '[7 1]'); 391 | set(handles.gwGoal, 'String', '[7 10]'); 392 | elseif(sel == 11), 393 | set(handles.cmbDemoType, 'Value', 3); 394 | set(handles.cmbGridType, 'Value', 3); 395 | set(handles.cmbAlgorithm, 'Value', 2); 396 | set(handles.cmbAgentMoves, 'Value', 1); 397 | set(handles.gwGamma, 'String', 0.9); 398 | set(handles.gwAlpha, 'String', 0.05); 399 | set(handles.gwEpsilon, 'String', 0.1); 400 | set(handles.gwNEpisodes, 'String', 5000); 401 | set(handles.gwSelectedEpisodes, 'String', '[20 200 700 1000 2000 5000]'); 402 | set(handles.gwGridDims, 'String', '[7 10]'); 403 | set(handles.gwStart, 'String', '[7 1]'); 404 | set(handles.gwGoal, 'String', '[7 10]'); 405 | elseif(sel == 12), 406 | set(handles.cmbDemoType, 'Value', 3); 407 | set(handles.cmbGridType, 'Value', 3); 408 | set(handles.cmbAlgorithm, 'Value', 1); 409 | set(handles.cmbAgentMoves, 'Value', 2); 410 | set(handles.gwGamma, 'String', 0.9); 411 | set(handles.gwAlpha, 'String', 0.05); 412 | set(handles.gwEpsilon, 'String', 0.1); 413 | set(handles.gwNEpisodes, 'String', 5000); 414 | set(handles.gwSelectedEpisodes, 'String', '[20 200 700 1000 2000 5000]'); 415 | set(handles.gwGridDims, 'String', '[7 10]'); 416 | set(handles.gwStart, 'String', '[7 1]'); 417 | set(handles.gwGoal, 'String', '[7 10]'); 418 | elseif(sel == 13), 419 | set(handles.cmbDemoType, 'Value', 3); 420 | set(handles.cmbGridType, 'Value', 3); 421 | set(handles.cmbAlgorithm, 'Value', 2); 422 | set(handles.cmbAgentMoves, 'Value', 2); 423 | set(handles.gwGamma, 'String', 0.9); 424 | set(handles.gwAlpha, 'String', 0.05); 425 | set(handles.gwEpsilon, 'String', 0.1); 426 | set(handles.gwNEpisodes, 'String', 5000); 427 | set(handles.gwSelectedEpisodes, 'String', '[20 200 700 1000 2000 5000]'); 428 | set(handles.gwGridDims, 'String', '[7 10]'); 429 | set(handles.gwStart, 'String', '[7 1]'); 430 | set(handles.gwGoal, 'String', '[7 10]'); 431 | end 432 | 433 | 434 | RunAllComboCallBacks(hObject, eventdata, handles); 435 | 436 | function RunAllComboCallBacks(hObject, eventdata, handles) 437 | cmbGridType_Callback(hObject, eventdata, handles); 438 | cmbPredType_Callback(hObject, eventdata, handles); 439 | cmbDemoType_Callback(hObject, eventdata, handles); 440 | 441 | 442 | % --- Executes on selection change in cmbDemoType. 443 | function cmbDemoType_Callback(hObject, eventdata, handles) 444 | % hObject handle to cmbDemoType (see GCBO) 445 | % eventdata reserved - to be defined in a future version of MATLAB 446 | % handles structure with handles and user data (see GUIDATA) 447 | 448 | sel = get(handles.cmbDemoType, 'Value'); 449 | if(sel == 1), % Prediction Random Walk 450 | set(handles.panelPredRandomWalk, 'Visible', 'on'); 451 | set(handles.panelRLRandomWalk, 'Visible', 'off'); 452 | set(handles.panelGridWorlds, 'Visible', 'off'); 453 | elseif(sel == 2), % RL Random Walk 454 | set(handles.panelPredRandomWalk, 'Visible', 'off'); 455 | set(handles.panelRLRandomWalk, 'Visible', 'on'); 456 | set(handles.panelGridWorlds, 'Visible', 'off'); 457 | elseif(sel == 3), % Grid Worlds 458 | set(handles.panelPredRandomWalk, 'Visible', 'off'); 459 | set(handles.panelRLRandomWalk, 'Visible', 'off'); 460 | set(handles.panelGridWorlds, 'Visible', 'on'); 461 | end 462 | 463 | % --- Executes during object creation, after setting all properties. 464 | function cmbDemoType_CreateFcn(hObject, eventdata, handles) 465 | % hObject handle to cmbDemoType (see GCBO) 466 | % eventdata reserved - to be defined in a future version of MATLAB 467 | % handles empty - handles not created until after all CreateFcns called 468 | % Hint: popupmenu controls usually have a white background on Windows. 469 | % See ISPC and COMPUTER. 470 | if ispc && isequal(get(hObject,'BackgroundColor'), get(0,'defaultUicontrolBackgroundColor')) 471 | set(hObject,'BackgroundColor','white'); 472 | end 473 | 474 | % --- Executes on selection change in cmbPredType. 475 | function cmbPredType_Callback(hObject, eventdata, handles) 476 | % hObject handle to cmbPredType (see GCBO) 477 | % eventdata reserved - to be defined in a future version of MATLAB 478 | % handles structure with handles and user data (see GUIDATA) 479 | 480 | sel = get(handles.cmbPredType,'Value'); 481 | if(sel == 1), % Effect of lambda 482 | set(handles.prdAlpha, 'Enable', 'on'); 483 | set(handles.prdAlphas, 'Enable', 'off'); 484 | elseif(sel == 2), % Effect of alpha and lambda 485 | set(handles.prdAlpha, 'Enable', 'off'); 486 | set(handles.prdAlphas, 'Enable', 'on'); 487 | end 488 | 489 | % --- Executes during object creation, after setting all properties. 490 | function cmbPredType_CreateFcn(hObject, eventdata, handles) 491 | % hObject handle to cmbPredType (see GCBO) 492 | % eventdata reserved - to be defined in a future version of MATLAB 493 | % handles empty - handles not created until after all CreateFcns called 494 | % Hint: popupmenu controls usually have a white background on Windows. 495 | % See ISPC and COMPUTER. 496 | if ispc && isequal(get(hObject,'BackgroundColor'), get(0,'defaultUicontrolBackgroundColor')) 497 | set(hObject,'BackgroundColor','white'); 498 | end 499 | 500 | function prdNStates_Callback(hObject, eventdata, handles) 501 | % hObject handle to prdNStates (see GCBO) 502 | % eventdata reserved - to be defined in a future version of MATLAB 503 | % handles structure with handles and user data (see GUIDATA) 504 | % Hints: get(hObject,'String') returns contents of prdNStates as text 505 | % str2double(get(hObject,'String')) returns contents of prdNStates as a double 506 | 507 | % --- Executes during object creation, after setting all properties. 508 | function prdNStates_CreateFcn(hObject, eventdata, handles) 509 | % hObject handle to prdNStates (see GCBO) 510 | % eventdata reserved - to be defined in a future version of MATLAB 511 | % handles empty - handles not created until after all CreateFcns called 512 | % Hint: edit controls usually have a white background on Windows. 513 | % See ISPC and COMPUTER. 514 | if ispc && isequal(get(hObject,'BackgroundColor'), get(0,'defaultUicontrolBackgroundColor')) 515 | set(hObject,'BackgroundColor','white'); 516 | end 517 | 518 | function prdAlpha_Callback(hObject, eventdata, handles) 519 | % hObject handle to prdAlpha (see GCBO) 520 | % eventdata reserved - to be defined in a future version of MATLAB 521 | % handles structure with handles and user data (see GUIDATA) 522 | % Hints: get(hObject,'String') returns contents of prdAlpha as text 523 | % str2double(get(hObject,'String')) returns contents of prdAlpha as a double 524 | 525 | % --- Executes during object creation, after setting all properties. 526 | function prdAlpha_CreateFcn(hObject, eventdata, handles) 527 | % hObject handle to prdAlpha (see GCBO) 528 | % eventdata reserved - to be defined in a future version of MATLAB 529 | % handles empty - handles not created until after all CreateFcns called 530 | % Hint: edit controls usually have a white background on Windows. 531 | % See ISPC and COMPUTER. 532 | if ispc && isequal(get(hObject,'BackgroundColor'), get(0,'defaultUicontrolBackgroundColor')) 533 | set(hObject,'BackgroundColor','white'); 534 | end 535 | 536 | function prdNTSets_Callback(hObject, eventdata, handles) 537 | % hObject handle to prdNTSets (see GCBO) 538 | % eventdata reserved - to be defined in a future version of MATLAB 539 | % handles structure with handles and user data (see GUIDATA) 540 | % Hints: get(hObject,'String') returns contents of prdNTSets as text 541 | % str2double(get(hObject,'String')) returns contents of prdNTSets as a double 542 | 543 | % --- Executes during object creation, after setting all properties. 544 | function prdNTSets_CreateFcn(hObject, eventdata, handles) 545 | % hObject handle to prdNTSets (see GCBO) 546 | % eventdata reserved - to be defined in a future version of MATLAB 547 | % handles empty - handles not created until after all CreateFcns called 548 | % Hint: edit controls usually have a white background on Windows. 549 | % See ISPC and COMPUTER. 550 | if ispc && isequal(get(hObject,'BackgroundColor'), get(0,'defaultUicontrolBackgroundColor')) 551 | set(hObject,'BackgroundColor','white'); 552 | end 553 | 554 | function prdNEpisodes_Callback(hObject, eventdata, handles) 555 | % hObject handle to prdNEpisodes (see GCBO) 556 | % eventdata reserved - to be defined in a future version of MATLAB 557 | % handles structure with handles and user data (see GUIDATA) 558 | % Hints: get(hObject,'String') returns contents of prdNEpisodes as text 559 | % str2double(get(hObject,'String')) returns contents of prdNEpisodes as a double 560 | 561 | % --- Executes during object creation, after setting all properties. 562 | function prdNEpisodes_CreateFcn(hObject, eventdata, handles) 563 | % hObject handle to prdNEpisodes (see GCBO) 564 | % eventdata reserved - to be defined in a future version of MATLAB 565 | % handles empty - handles not created until after all CreateFcns called 566 | % Hint: edit controls usually have a white background on Windows. 567 | % See ISPC and COMPUTER. 568 | if ispc && isequal(get(hObject,'BackgroundColor'), get(0,'defaultUicontrolBackgroundColor')) 569 | set(hObject,'BackgroundColor','white'); 570 | end 571 | 572 | function prdAlphas_Callback(hObject, eventdata, handles) 573 | % hObject handle to prdAlphas (see GCBO) 574 | % eventdata reserved - to be defined in a future version of MATLAB 575 | % handles structure with handles and user data (see GUIDATA) 576 | % Hints: get(hObject,'String') returns contents of prdAlphas as text 577 | % str2double(get(hObject,'String')) returns contents of prdAlphas as a double 578 | 579 | % --- Executes during object creation, after setting all properties. 580 | function prdAlphas_CreateFcn(hObject, eventdata, handles) 581 | % hObject handle to prdAlphas (see GCBO) 582 | % eventdata reserved - to be defined in a future version of MATLAB 583 | % handles empty - handles not created until after all CreateFcns called 584 | % Hint: edit controls usually have a white background on Windows. 585 | % See ISPC and COMPUTER. 586 | if ispc && isequal(get(hObject,'BackgroundColor'), get(0,'defaultUicontrolBackgroundColor')) 587 | set(hObject,'BackgroundColor','white'); 588 | end 589 | 590 | function prdLambdas_Callback(hObject, eventdata, handles) 591 | % hObject handle to prdLambdas (see GCBO) 592 | % eventdata reserved - to be defined in a future version of MATLAB 593 | % handles structure with handles and user data (see GUIDATA) 594 | % Hints: get(hObject,'String') returns contents of prdLambdas as text 595 | % str2double(get(hObject,'String')) returns contents of prdLambdas as a double 596 | 597 | % --- Executes during object creation, after setting all properties. 598 | function prdLambdas_CreateFcn(hObject, eventdata, handles) 599 | % hObject handle to prdLambdas (see GCBO) 600 | % eventdata reserved - to be defined in a future version of MATLAB 601 | % handles empty - handles not created until after all CreateFcns called 602 | % Hint: edit controls usually have a white background on Windows. 603 | % See ISPC and COMPUTER. 604 | if ispc && isequal(get(hObject,'BackgroundColor'), get(0,'defaultUicontrolBackgroundColor')) 605 | set(hObject,'BackgroundColor','white'); 606 | end 607 | 608 | % --- Executes on selection change in cmbGridType. 609 | function cmbGridType_Callback(hObject, eventdata, handles) 610 | % hObject handle to cmbGridType (see GCBO) 611 | % eventdata reserved - to be defined in a future version of MATLAB 612 | % handles structure with handles and user data (see GUIDATA) 613 | % Hints: contents = get(hObject,'String') returns cmbGridType contents as cell array 614 | % contents{get(hObject,'Value')} returns selected item from cmbGridType 615 | sel = get(handles.cmbGridType,'Value'); 616 | if(sel == 2), 617 | set(handles.gwWindPowers, 'Enable', 'on'); 618 | else 619 | set(handles.gwWindPowers, 'Enable', 'off'); 620 | end 621 | 622 | % --- Executes during object creation, after setting all properties. 623 | function cmbGridType_CreateFcn(hObject, eventdata, handles) 624 | % hObject handle to cmbGridType (see GCBO) 625 | % eventdata reserved - to be defined in a future version of MATLAB 626 | % handles empty - handles not created until after all CreateFcns called 627 | % Hint: popupmenu controls usually have a white background on Windows. 628 | % See ISPC and COMPUTER. 629 | if ispc && isequal(get(hObject,'BackgroundColor'), get(0,'defaultUicontrolBackgroundColor')) 630 | set(hObject,'BackgroundColor','white'); 631 | end 632 | 633 | % --- Executes on selection change in cmbAlgorithm. 634 | function cmbAlgorithm_Callback(hObject, eventdata, handles) 635 | % hObject handle to cmbAlgorithm (see GCBO) 636 | % eventdata reserved - to be defined in a future version of MATLAB 637 | % handles structure with handles and user data (see GUIDATA) 638 | % Hints: contents = get(hObject,'String') returns cmbAlgorithm contents as cell array 639 | % contents{get(hObject,'Value')} returns selected item from cmbAlgorithm 640 | 641 | % --- Executes during object creation, after setting all properties. 642 | function cmbAlgorithm_CreateFcn(hObject, eventdata, handles) 643 | % hObject handle to cmbAlgorithm (see GCBO) 644 | % eventdata reserved - to be defined in a future version of MATLAB 645 | % handles empty - handles not created until after all CreateFcns called 646 | % Hint: popupmenu controls usually have a white background on Windows. 647 | % See ISPC and COMPUTER. 648 | if ispc && isequal(get(hObject,'BackgroundColor'), get(0,'defaultUicontrolBackgroundColor')) 649 | set(hObject,'BackgroundColor','white'); 650 | end 651 | 652 | % --- Executes on selection change in cmbAgentMoves. 653 | function cmbAgentMoves_Callback(hObject, eventdata, handles) 654 | % hObject handle to cmbAgentMoves (see GCBO) 655 | % eventdata reserved - to be defined in a future version of MATLAB 656 | % handles structure with handles and user data (see GUIDATA) 657 | % Hints: contents = get(hObject,'String') returns cmbAgentMoves contents as cell array 658 | % contents{get(hObject,'Value')} returns selected item from cmbAgentMoves 659 | 660 | % --- Executes during object creation, after setting all properties. 661 | function cmbAgentMoves_CreateFcn(hObject, eventdata, handles) 662 | % hObject handle to cmbAgentMoves (see GCBO) 663 | % eventdata reserved - to be defined in a future version of MATLAB 664 | % handles empty - handles not created until after all CreateFcns called 665 | % Hint: popupmenu controls usually have a white background on Windows. 666 | % See ISPC and COMPUTER. 667 | if ispc && isequal(get(hObject,'BackgroundColor'), get(0,'defaultUicontrolBackgroundColor')) 668 | set(hObject,'BackgroundColor','white'); 669 | end 670 | 671 | function gwGamma_Callback(hObject, eventdata, handles) 672 | % hObject handle to gwGamma (see GCBO) 673 | % eventdata reserved - to be defined in a future version of MATLAB 674 | % handles structure with handles and user data (see GUIDATA) 675 | % Hints: get(hObject,'String') returns contents of gwGamma as text 676 | % str2double(get(hObject,'String')) returns contents of gwGamma as a double 677 | % --- Executes during object creation, after setting all properties. 678 | 679 | function gwGamma_CreateFcn(hObject, eventdata, handles) 680 | % hObject handle to gwGamma (see GCBO) 681 | % eventdata reserved - to be defined in a future version of MATLAB 682 | % handles empty - handles not created until after all CreateFcns called 683 | % Hint: edit controls usually have a white background on Windows. 684 | % See ISPC and COMPUTER. 685 | if ispc && isequal(get(hObject,'BackgroundColor'), get(0,'defaultUicontrolBackgroundColor')) 686 | set(hObject,'BackgroundColor','white'); 687 | end 688 | 689 | function gwAlpha_Callback(hObject, eventdata, handles) 690 | % hObject handle to gwAlpha (see GCBO) 691 | % eventdata reserved - to be defined in a future version of MATLAB 692 | % handles structure with handles and user data (see GUIDATA) 693 | % Hints: get(hObject,'String') returns contents of gwAlpha as text 694 | % str2double(get(hObject,'String')) returns contents of gwAlpha as a double 695 | 696 | % --- Executes during object creation, after setting all properties. 697 | function gwAlpha_CreateFcn(hObject, eventdata, handles) 698 | % hObject handle to gwAlpha (see GCBO) 699 | % eventdata reserved - to be defined in a future version of MATLAB 700 | % handles empty - handles not created until after all CreateFcns called 701 | % Hint: edit controls usually have a white background on Windows. 702 | % See ISPC and COMPUTER. 703 | if ispc && isequal(get(hObject,'BackgroundColor'), get(0,'defaultUicontrolBackgroundColor')) 704 | set(hObject,'BackgroundColor','white'); 705 | end 706 | 707 | function gwNEpisodes_Callback(hObject, eventdata, handles) 708 | % hObject handle to gwNEpisodes (see GCBO) 709 | % eventdata reserved - to be defined in a future version of MATLAB 710 | % handles structure with handles and user data (see GUIDATA) 711 | % Hints: get(hObject,'String') returns contents of gwNEpisodes as text 712 | % str2double(get(hObject,'String')) returns contents of gwNEpisodes as a double 713 | 714 | % --- Executes during object creation, after setting all properties. 715 | function gwNEpisodes_CreateFcn(hObject, eventdata, handles) 716 | % hObject handle to gwNEpisodes (see GCBO) 717 | % eventdata reserved - to be defined in a future version of MATLAB 718 | % handles empty - handles not created until after all CreateFcns called 719 | % Hint: edit controls usually have a white background on Windows. 720 | % See ISPC and COMPUTER. 721 | if ispc && isequal(get(hObject,'BackgroundColor'), get(0,'defaultUicontrolBackgroundColor')) 722 | set(hObject,'BackgroundColor','white'); 723 | end 724 | 725 | function gwSelectedEpisodes_Callback(hObject, eventdata, handles) 726 | % hObject handle to gwSelectedEpisodes (see GCBO) 727 | % eventdata reserved - to be defined in a future version of MATLAB 728 | % handles structure with handles and user data (see GUIDATA) 729 | % Hints: get(hObject,'String') returns contents of gwSelectedEpisodes as text 730 | % str2double(get(hObject,'String')) returns contents of gwSelectedEpisodes as a double 731 | 732 | % --- Executes during object creation, after setting all properties. 733 | function gwSelectedEpisodes_CreateFcn(hObject, eventdata, handles) 734 | % hObject handle to gwSelectedEpisodes (see GCBO) 735 | % eventdata reserved - to be defined in a future version of MATLAB 736 | % handles empty - handles not created until after all CreateFcns called 737 | % Hint: edit controls usually have a white background on Windows. 738 | % See ISPC and COMPUTER. 739 | if ispc && isequal(get(hObject,'BackgroundColor'), get(0,'defaultUicontrolBackgroundColor')) 740 | set(hObject,'BackgroundColor','white'); 741 | end 742 | 743 | function gwEpsilon_Callback(hObject, eventdata, handles) 744 | % hObject handle to gwEpsilon (see GCBO) 745 | % eventdata reserved - to be defined in a future version of MATLAB 746 | % handles structure with handles and user data (see GUIDATA) 747 | % Hints: get(hObject,'String') returns contents of gwEpsilon as text 748 | % str2double(get(hObject,'String')) returns contents of gwEpsilon as a double 749 | 750 | % --- Executes during object creation, after setting all properties. 751 | function gwEpsilon_CreateFcn(hObject, eventdata, handles) 752 | % hObject handle to gwEpsilon (see GCBO) 753 | % eventdata reserved - to be defined in a future version of MATLAB 754 | % handles empty - handles not created until after all CreateFcns called 755 | % Hint: edit controls usually have a white background on Windows. 756 | % See ISPC and COMPUTER. 757 | if ispc && isequal(get(hObject,'BackgroundColor'), get(0,'defaultUicontrolBackgroundColor')) 758 | set(hObject,'BackgroundColor','white'); 759 | end 760 | 761 | function gwGridDims_Callback(hObject, eventdata, handles) 762 | % hObject handle to gwGridDims (see GCBO) 763 | % eventdata reserved - to be defined in a future version of MATLAB 764 | % handles structure with handles and user data (see GUIDATA) 765 | % Hints: get(hObject,'String') returns contents of gwGridDims as text 766 | % str2double(get(hObject,'String')) returns contents of gwGridDims as a double 767 | 768 | % --- Executes during object creation, after setting all properties. 769 | function gwGridDims_CreateFcn(hObject, eventdata, handles) 770 | % hObject handle to gwGridDims (see GCBO) 771 | % eventdata reserved - to be defined in a future version of MATLAB 772 | % handles empty - handles not created until after all CreateFcns called 773 | % Hint: edit controls usually have a white background on Windows. 774 | % See ISPC and COMPUTER. 775 | if ispc && isequal(get(hObject,'BackgroundColor'), get(0,'defaultUicontrolBackgroundColor')) 776 | set(hObject,'BackgroundColor','white'); 777 | end 778 | 779 | function gwStart_Callback(hObject, eventdata, handles) 780 | % hObject handle to gwStart (see GCBO) 781 | % eventdata reserved - to be defined in a future version of MATLAB 782 | % handles structure with handles and user data (see GUIDATA) 783 | % Hints: get(hObject,'String') returns contents of gwStart as text 784 | % str2double(get(hObject,'String')) returns contents of gwStart as a double 785 | 786 | % --- Executes during object creation, after setting all properties. 787 | function gwStart_CreateFcn(hObject, eventdata, handles) 788 | % hObject handle to gwStart (see GCBO) 789 | % eventdata reserved - to be defined in a future version of MATLAB 790 | % handles empty - handles not created until after all CreateFcns called 791 | 792 | % Hint: edit controls usually have a white background on Windows. 793 | % See ISPC and COMPUTER. 794 | if ispc && isequal(get(hObject,'BackgroundColor'), get(0,'defaultUicontrolBackgroundColor')) 795 | set(hObject,'BackgroundColor','white'); 796 | end 797 | 798 | function gwGoal_Callback(hObject, eventdata, handles) 799 | % hObject handle to gwGoal (see GCBO) 800 | % eventdata reserved - to be defined in a future version of MATLAB 801 | % handles structure with handles and user data (see GUIDATA) 802 | % Hints: get(hObject,'String') returns contents of gwGoal as text 803 | % str2double(get(hObject,'String')) returns contents of gwGoal as a double 804 | 805 | % --- Executes during object creation, after setting all properties. 806 | function gwGoal_CreateFcn(hObject, eventdata, handles) 807 | % hObject handle to gwGoal (see GCBO) 808 | % eventdata reserved - to be defined in a future version of MATLAB 809 | % handles empty - handles not created until after all CreateFcns called 810 | 811 | % Hint: edit controls usually have a white background on Windows. 812 | % See ISPC and COMPUTER. 813 | if ispc && isequal(get(hObject,'BackgroundColor'), get(0,'defaultUicontrolBackgroundColor')) 814 | set(hObject,'BackgroundColor','white'); 815 | end 816 | 817 | function gwWindPowers_Callback(hObject, eventdata, handles) 818 | % hObject handle to gwWindPowers (see GCBO) 819 | % eventdata reserved - to be defined in a future version of MATLAB 820 | % handles structure with handles and user data (see GUIDATA) 821 | % Hints: get(hObject,'String') returns contents of gwWindPowers as text 822 | % str2double(get(hObject,'String')) returns contents of gwWindPowers as a double 823 | 824 | % --- Executes during object creation, after setting all properties. 825 | function gwWindPowers_CreateFcn(hObject, eventdata, handles) 826 | % hObject handle to gwWindPowers (see GCBO) 827 | % eventdata reserved - to be defined in a future version of MATLAB 828 | % handles empty - handles not created until after all CreateFcns called 829 | 830 | % Hint: edit controls usually have a white background on Windows. 831 | % See ISPC and COMPUTER. 832 | if ispc && isequal(get(hObject,'BackgroundColor'), get(0,'defaultUicontrolBackgroundColor')) 833 | set(hObject,'BackgroundColor','white'); 834 | end 835 | 836 | % --- Executes during object creation, after setting all properties. 837 | function cmbPredefinedDemos_CreateFcn(hObject, eventdata, handles) 838 | % hObject handle to cmbPredefinedDemos (see GCBO) 839 | % eventdata reserved - to be defined in a future version of MATLAB 840 | % handles empty - handles not created until after all CreateFcns called 841 | 842 | % Hint: popupmenu controls usually have a white background on Windows. 843 | % See ISPC and COMPUTER. 844 | if ispc && isequal(get(hObject,'BackgroundColor'), get(0,'defaultUicontrolBackgroundColor')) 845 | set(hObject,'BackgroundColor','white'); 846 | end 847 | 848 | % --- Executes on button press in cbShowPlotTitle. 849 | function cbShowPlotTitle_Callback(hObject, eventdata, handles) 850 | % hObject handle to cbShowPlotTitle (see GCBO) 851 | % eventdata reserved - to be defined in a future version of MATLAB 852 | % handles structure with handles and user data (see GUIDATA) 853 | % Hint: get(hObject,'Value') returns toggle state of cbShowPlotTitle 854 | 855 | function fontSize_Callback(hObject, eventdata, handles) 856 | % hObject handle to fontSize (see GCBO) 857 | % eventdata reserved - to be defined in a future version of MATLAB 858 | % handles structure with handles and user data (see GUIDATA) 859 | % Hints: get(hObject,'String') returns contents of fontSize as text 860 | % str2double(get(hObject,'String')) returns contents of fontSize as a double 861 | 862 | % --- Executes during object creation, after setting all properties. 863 | function fontSize_CreateFcn(hObject, eventdata, handles) 864 | % hObject handle to fontSize (see GCBO) 865 | % eventdata reserved - to be defined in a future version of MATLAB 866 | % handles empty - handles not created until after all CreateFcns called 867 | 868 | % Hint: edit controls usually have a white background on Windows. 869 | % See ISPC and COMPUTER. 870 | if ispc && isequal(get(hObject,'BackgroundColor'), get(0,'defaultUicontrolBackgroundColor')) 871 | set(hObject,'BackgroundColor','white'); 872 | end 873 | 874 | -------------------------------------------------------------------------------- /src/DrawActionOnCell.m: -------------------------------------------------------------------------------- 1 | function DrawActionOnCell(actionIndex, row, col, gridrows, gridcols, fontsize) 2 | % DrawActionOnCell: draws the symbol of different grid-world actions on the 3 | % cell specified 4 | 5 | % written by: Sina Iravanian - June 2009 6 | % sina@sinairv.com 7 | % Please send your comments or bug reports to the above email address. 8 | 9 | rotation = 0; 10 | textToDraw = 'o'; 11 | 12 | switch actionIndex 13 | case 1 % east 14 | textToDraw = '\rightarrow'; 15 | rotation = 0; 16 | case 2 % south 17 | textToDraw = '\downarrow'; 18 | rotation = 0; 19 | case 3 % west 20 | textToDraw = '\leftarrow'; 21 | rotation = 0; 22 | case 4 % north 23 | textToDraw = '\uparrow'; 24 | rotation = 0; 25 | case 5 % northeast 26 | textToDraw = '\rightarrow'; 27 | rotation = 45; 28 | case 6 % southeast 29 | textToDraw = '\downarrow'; 30 | rotation = 45; 31 | case 7 % southwest 32 | textToDraw = '\leftarrow'; 33 | rotation = 45; 34 | case 8 % northwest 35 | textToDraw = '\uparrow'; 36 | rotation = 45; 37 | case 9 % hold 38 | textToDraw = 'o'; 39 | rotation = 0; 40 | otherwise 41 | disp(sprintf('invalid action index: %d', actionIndex)) 42 | end 43 | 44 | DrawTextOnCell(textToDraw, rotation, row, col, gridrows, gridcols, fontsize); 45 | -------------------------------------------------------------------------------- /src/DrawCliffEpisodeState.m: -------------------------------------------------------------------------------- 1 | function DrawCliffEpisodeState(rows, cols, acts, SRow, SCol, GRow, GCol, gridrows, gridcols, fontsize) 2 | % DrawCliffEpisodeState: draws a snapshot of the cliff walking world, 3 | % specifying the sequence of actions on the world. 4 | 5 | % written by: Sina Iravanian - June 2009 6 | % sina@sinairv.com 7 | % Please send your comments or bug reports to the above email address. 8 | 9 | DrawGrid(gridrows, gridcols); 10 | DrawTextOnCell('S', 0, SRow, SCol, gridrows, gridcols, fontsize); 11 | DrawTextOnCell('G', 0, GRow, GCol, gridrows, gridcols, fontsize); 12 | 13 | for i=1:length(rows), 14 | DrawActionOnCell(acts(i), rows(i), cols(i), gridrows, gridcols, fontsize); 15 | end 16 | 17 | for i=2:gridcols-1, 18 | DrawTextOnCell('C', 0, gridrows, i, gridrows, gridcols, fontsize); 19 | end 20 | -------------------------------------------------------------------------------- /src/DrawEpisodeState.m: -------------------------------------------------------------------------------- 1 | function DrawEpisodeState(rows, cols, acts, SRow, SCol, GRow, GCol, gridrows, gridcols, fontsize) 2 | % DrawEpisodeState: draws a snapshot of the simple grid world, 3 | % specifying the sequence of actions on the world. 4 | 5 | % written by: Sina Iravanian - June 2009 6 | % sina@sinairv.com 7 | % Please send your comments or bug reports to the above email address. 8 | 9 | DrawGrid(gridrows, gridcols); 10 | DrawTextOnCell('S', 0, SRow, SCol, gridrows, gridcols, fontsize); 11 | DrawTextOnCell('G', 0, GRow, GCol, gridrows, gridcols, fontsize); 12 | 13 | for i=1:length(rows), 14 | DrawActionOnCell(acts(i), rows(i), cols(i), gridrows, gridcols, fontsize); 15 | end 16 | 17 | -------------------------------------------------------------------------------- /src/DrawGrid.m: -------------------------------------------------------------------------------- 1 | function DrawGrid(gridrows, gridcols) 2 | % DrawGrid: draws a grid with the dimensions specified 3 | 4 | % written by: Sina Iravanian - June 2009 5 | % sina@sinairv.com 6 | % Please send your comments or bug reports to the above email address. 7 | 8 | xsp = 1 / (gridcols + 2); 9 | ysp = 1 / (gridrows + 2); 10 | 11 | x = zeros(1, 2*(gridcols + 1)); 12 | y = zeros(1, 2*(gridcols + 1)); 13 | i = 1; 14 | for xi = xsp:xsp:1 - xsp, 15 | x(2*i - 1) = xi; x(2*i) = xi; 16 | if(mod(i , 2) == 0) 17 | y(2*i - 1) = ysp;y(2*i) = 1-ysp; 18 | else 19 | y(2*i - 1) = 1 - ysp;y(2*i) = ysp; 20 | end 21 | i = i + 1; 22 | end 23 | 24 | x2 = zeros(1, 2*(gridrows + 1)); 25 | y2 = zeros(1, 2*(gridrows + 1)); 26 | i = 1; 27 | for yi = ysp:ysp:1 - ysp, 28 | y2(2*i - 1) = yi; y2(2*i) = yi; 29 | if(mod(i , 2) == 0) 30 | x2(2*i - 1) = xsp;x2(2*i) = 1-xsp; 31 | else 32 | x2(2*i - 1) = 1 - xsp;x2(2*i) = xsp; 33 | end 34 | i = i + 1; 35 | end 36 | 37 | plot(x, y, '-'); 38 | hold on 39 | plot(x2, y2, '-'); 40 | axis([0 1 0 1]); 41 | axis off 42 | set(gcf, 'color', 'white'); 43 | -------------------------------------------------------------------------------- /src/DrawTextOnCell.m: -------------------------------------------------------------------------------- 1 | function DrawTextOnCell(theText, rotation, row, col, gridrows, gridcols, fontsize) 2 | % DrawTextOnCell: draws the specified text in a specific cell of 3 | % a grid with specified dimensions, using the rotation and fontsize 4 | % parameters given 5 | 6 | % written by: Sina Iravanian - June 2009 7 | % sina@sinairv.com 8 | % Please send your comments or bug reports to the above email address. 9 | 10 | [xc, yc] = FindCellCenter(row, col, gridrows, gridcols); 11 | text(xc, yc, theText, 'FontSize', fontsize, 'Rotation', rotation); 12 | -------------------------------------------------------------------------------- /src/DrawWindyEpisodeState.m: -------------------------------------------------------------------------------- 1 | function DrawWindyEpisodeState(rows, cols, acts, SRow, SCol, GRow, GCol, windpowers, gridrows, gridcols, fontsize) 2 | % DrawWindyEpisodeState: draws a snapshot of the windy grid world, 3 | % specifying the sequence of actions on the world. 4 | 5 | % written by: Sina Iravanian - June 2009 6 | % sina@sinairv.com 7 | % Please send your comments or bug reports to the above email address. 8 | 9 | DrawGrid(gridrows, gridcols); 10 | DrawTextOnCell('S', 0, SRow, SCol, gridrows, gridcols, fontsize); 11 | DrawTextOnCell('G', 0, GRow, GCol, gridrows, gridcols, fontsize); 12 | 13 | for i=1:length(rows), 14 | DrawActionOnCell(acts(i), rows(i), cols(i), gridrows, gridcols, fontsize); 15 | end 16 | 17 | for i=1:gridcols, 18 | [xc, yc] = FindColBaseCenter(i, gridrows, gridcols); 19 | text(xc, yc, sprintf('%d',windpowers(i)), 'FontSize', fontsize, 'Rotation', 0); 20 | end 21 | -------------------------------------------------------------------------------- /src/FindCellCenter.m: -------------------------------------------------------------------------------- 1 | function [x,y] = FindCellCenter(row, col, gridrows, gridcols) 2 | % FindCellCenter: finds the coordination of the center of the cell 3 | % specified in a grid with the dimensions specified 4 | 5 | % written by: Sina Iravanian - June 2009 6 | % sina@sinairv.com 7 | % Please send your comments or bug reports to the above email address. 8 | 9 | xsp = 1 / (gridcols + 2); 10 | ysp = 1 / (gridrows + 2); 11 | x = ((2*col + 1) / 2) * xsp; 12 | y = 1 - (((2*row + 1) / 2) * ysp); 13 | x = x - xsp/5; 14 | -------------------------------------------------------------------------------- /src/FindColBaseCenter.m: -------------------------------------------------------------------------------- 1 | function [x,y] = FindColBaseCenter(col, gridrows, gridcols) 2 | % FindColBaseCenter: finds the coordination of the center of the column 3 | % base point specified in a grid with the dimensions specified. This is the 4 | % point where the wind powers are written. 5 | 6 | % written by: Sina Iravanian - June 2009 7 | % sina@sinairv.com 8 | % Please send your comments or bug reports to the above email address. 9 | 10 | row = gridrows + 1; 11 | xsp = 1 / (gridcols + 2); 12 | ysp = 1 / (gridrows + 2); 13 | x = ((2*col + 1) / 2) * xsp; 14 | y = 1 - (((2*row + 1) / 2) * ysp); 15 | x = x - xsp/5; 16 | -------------------------------------------------------------------------------- /src/GenerateRandomWalkSequence.m: -------------------------------------------------------------------------------- 1 | function [r, z] = GenerateRandomWalkSequence(NodesCount) 2 | % GenerateRandomWalkSequence: Generates a random walk sequence 3 | % for the number of nodes specified, starting from the middle node. 4 | % return values: 5 | % r: the sequence of nodes visited 6 | % z: the outcome of the sequence; the value is 1 if the sequence ends in 7 | % the rightmost node, and 0 if the sequence ends in the leftmost node 8 | 9 | % written by: Sina Iravanian - June 2009 10 | % sina@sinairv.com 11 | % Please send your comments or bug reports to the above email address. 12 | 13 | pos = (NodesCount - 1) / 2; % the initial state 14 | r = [pos]; 15 | while(1) 16 | if(rand > 0.5) 17 | nextwalk = 1; 18 | else 19 | nextwalk = -1; 20 | end 21 | 22 | pos = pos + nextwalk; 23 | if(pos == NodesCount - 1) 24 | z = 1; 25 | break; 26 | elseif(pos == 0) 27 | z = 0; 28 | break; 29 | else 30 | r = [r pos]; 31 | end 32 | end 33 | -------------------------------------------------------------------------------- /src/GridWorldQLearning.m: -------------------------------------------------------------------------------- 1 | function GridWorldQLearning(options) 2 | % GridWorldQLearning: implements the simple grid world problem using the 3 | % Q-Learning method 4 | 5 | % You can pass the parameters of the problem, through the options 6 | % structure, otherwise a default settings is used for running the program. 7 | 8 | % written by: Sina Iravanian - June 2009 9 | % sina@sinairv.com 10 | % Please send your comments or bug reports to the above email address. 11 | 12 | if(nargin < 1), 13 | alpha = 0.1; 14 | gamma = 0.9; 15 | epsilon = 0.1; 16 | 17 | gridcols = 10; 18 | gridrows = 7; 19 | fontsize = 16; 20 | showTitle = 1; 21 | 22 | episodeCount = 1000; 23 | selectedEpisodes = [20 200 700 1000]; 24 | 25 | isKing = 0; 26 | canHold = 0; 27 | 28 | start.row = 4; 29 | start.col = 1; 30 | goal.row = 4; 31 | goal.col = 8; 32 | else 33 | gamma = options.gamma; 34 | alpha = options.alpha; 35 | epsilon = options.epsilon; 36 | 37 | gridcols = options.gridcols; 38 | gridrows = options.gridrows; 39 | fontsize = options.fontsize; 40 | showTitle = options.showTitle; 41 | 42 | episodeCount = options.episodeCount; 43 | selectedEpisodes = options.selectedEpisodes; 44 | 45 | isKing = options.isKing; 46 | canHold = options.canHold; 47 | 48 | start = options.start; 49 | goal = options.goal; 50 | end 51 | 52 | selectedEpIndex = 1; 53 | if(isKing ~= 0), actionCount = 8; else actionCount = 4; end 54 | if(canHold ~= 0 && isKing ~= 0), actionCount = actionCount + 1; end 55 | 56 | 57 | % initialize Q with zeros 58 | Q = zeros(gridrows, gridcols, actionCount); 59 | 60 | a = 0; % an invalid action 61 | % loop through episodes 62 | for ei = 1:episodeCount, 63 | %disp(sprintf('Running episode %d', ei)); 64 | curpos = start; 65 | nextpos = start; 66 | 67 | %epsilon or greedy 68 | if(rand > epsilon) % greedy 69 | [qmax, a] = max(Q(curpos.row,curpos.col,:)); 70 | else 71 | a = IntRand(1, actionCount); 72 | end 73 | 74 | while(PosCmp(curpos, goal) ~= 0) 75 | % take action a, observe r, and nextpos 76 | nextpos = GiveNextPos(curpos, a, gridcols, gridrows); 77 | if(PosCmp(nextpos, goal) ~= 0), r = -1; else r = 0; end 78 | 79 | % choose a_next from nextpos 80 | [qmax, a_next] = max(Q(nextpos.row,nextpos.col,:)); 81 | if(rand <= epsilon) % explore 82 | a_next = IntRand(1, actionCount); 83 | end 84 | 85 | % update Q: Q-Learning 86 | curQ = Q(curpos.row, curpos.col, a); 87 | nextQ = qmax; %Q(nextpos.row, nextpos.col, a_next); 88 | Q(curpos.row, curpos.col, a) = curQ + alpha*(r + gamma*nextQ - curQ); 89 | 90 | curpos = nextpos; a = a_next; 91 | end % states in each episode 92 | 93 | % if the current state of the world is going to be drawn ... 94 | if(selectedEpIndex <= length(selectedEpisodes) && ei == selectedEpisodes(selectedEpIndex)) 95 | curpos = start; 96 | rows = []; cols = []; acts = []; 97 | for i = 1:(gridrows + gridcols) * 10, 98 | [qmax, a] = max(Q(curpos.row,curpos.col,:)); 99 | nextpos = GiveNextPos(curpos, a, gridcols, gridrows); 100 | rows = [rows curpos.row]; 101 | cols = [cols curpos.col]; 102 | acts = [acts a]; 103 | 104 | if(PosCmp(nextpos, goal) == 0), break; end 105 | curpos = nextpos; 106 | end % states in each episode 107 | 108 | %figure; 109 | figure('Name',sprintf('Episode: %d', ei), 'NumberTitle','off'); 110 | DrawEpisodeState(rows, cols, acts, start.row, start.col, goal.row, goal.col, gridrows, gridcols, fontsize); 111 | if(showTitle == 1), 112 | title(sprintf('Simple grid-world Q-Learning - episode %d - (\\epsilon: %3.3f), (\\alpha = %3.4f), (\\gamma = %1.1f)', ei, epsilon, alpha, gamma)); 113 | end 114 | 115 | selectedEpIndex = selectedEpIndex + 1; 116 | end 117 | 118 | end % episodes loop 119 | 120 | function c = PosCmp(pos1, pos2) 121 | c = pos1.row - pos2.row; 122 | if(c == 0) 123 | c = c + pos1.col - pos2.col; 124 | end 125 | 126 | function nextPos = GiveNextPos(curPos, actionIndex, gridCols, gridRows) 127 | nextPos = curPos; 128 | switch actionIndex 129 | case 1 % east 130 | nextPos.col = curPos.col + 1; 131 | case 2 % south 132 | nextPos.row = curPos.row + 1; 133 | case 3 % west 134 | nextPos.col = curPos.col - 1; 135 | case 4 % north 136 | nextPos.row = curPos.row - 1; 137 | case 5 % northeast 138 | nextPos.col = curPos.col + 1; 139 | nextPos.row = curPos.row - 1; 140 | case 6 % southeast 141 | nextPos.col = curPos.col + 1; 142 | nextPos.row = curPos.row + 1; 143 | case 7 % southwest 144 | nextPos.col = curPos.col - 1; 145 | nextPos.row = curPos.row + 1; 146 | case 8 % northwest 147 | nextPos.col = curPos.col - 1; 148 | nextPos.row = curPos.row - 1; 149 | case 9 % hold 150 | nextPos = curPos; 151 | otherwise 152 | disp(sprintf('invalid action index: %d', actionIndex)) 153 | end 154 | if(nextPos.col <= 0), nextPos.col = 1; end 155 | if(nextPos.col > gridCols), nextPos.col = gridCols; end 156 | if(nextPos.row <= 0), nextPos.row = 1; end 157 | if(nextPos.row > gridRows), nextPos.row = gridRows; end 158 | 159 | function n = IntRand(lowerBound, upperBound) 160 | n = floor((upperBound - lowerBound) * rand + lowerBound); 161 | -------------------------------------------------------------------------------- /src/GridWorldSARSA.m: -------------------------------------------------------------------------------- 1 | function GridWorldSARSA(options) 2 | % GridWorldSARSA: implements the simple grid world problem using the 3 | % SARSA method 4 | 5 | % You can pass the parameters of the problem, through the options 6 | % structure, otherwise a default settings is used for running the program. 7 | 8 | % written by: Sina Iravanian - June 2009 9 | % sina@sinairv.com 10 | % Please send your comments or bug reports to the above email address. 11 | 12 | if(nargin < 1), 13 | alpha = 0.1; 14 | gamma = 0.9; 15 | epsilon = 0.1; 16 | 17 | gridcols = 10; 18 | gridrows = 7; 19 | fontsize = 16; 20 | showTitle = 1; 21 | 22 | episodeCount = 1000; 23 | selectedEpisodes = [20 200 700 1000]; 24 | 25 | isKing = 0; 26 | canHold = 0; 27 | 28 | start.row = 4; 29 | start.col = 1; 30 | goal.row = 4; 31 | goal.col = 8; 32 | else 33 | gamma = options.gamma; 34 | alpha = options.alpha; 35 | epsilon = options.epsilon; 36 | 37 | gridcols = options.gridcols; 38 | gridrows = options.gridrows; 39 | fontsize = options.fontsize; 40 | showTitle = options.showTitle; 41 | 42 | episodeCount = options.episodeCount; 43 | selectedEpisodes = options.selectedEpisodes; 44 | 45 | isKing = options.isKing; 46 | canHold = options.canHold; 47 | 48 | start = options.start; 49 | goal = options.goal; 50 | end 51 | 52 | if(isKing ~= 0), actionCount = 8; else actionCount = 4; end 53 | if(canHold ~= 0 && isKing ~= 0), actionCount = actionCount + 1; end 54 | selectedEpIndex = 1; 55 | 56 | % initialize Q with zeros 57 | Q = zeros(gridrows, gridcols, actionCount); 58 | 59 | a = 0; % an invalid action 60 | % loop through episodes 61 | for ei = 1:episodeCount, 62 | %disp(sprintf('Running episode %d', ei)); 63 | curpos = start; 64 | nextpos = start; 65 | 66 | %epsilon or greedy 67 | if(rand > epsilon) % greedy 68 | [qmax, a] = max(Q(curpos.row,curpos.col,:)); 69 | else 70 | a = IntRand(1, actionCount); 71 | end 72 | 73 | while(PosCmp(curpos, goal) ~= 0) 74 | % take action a, observe r, and nextpos 75 | nextpos = GiveNextPos(curpos, a, gridcols, gridrows); 76 | if(PosCmp(nextpos, goal) ~= 0), r = -1; else r = 0; end 77 | 78 | % choose a_next from nextpos 79 | if(rand > epsilon) % greedy 80 | [qmax, a_next] = max(Q(nextpos.row,nextpos.col,:)); 81 | else 82 | a_next = IntRand(1, actionCount); 83 | end 84 | 85 | % update Q: Sarsa 86 | curQ = Q(curpos.row, curpos.col, a); 87 | nextQ = Q(nextpos.row, nextpos.col, a_next); 88 | Q(curpos.row, curpos.col, a) = curQ + alpha*(r + gamma*nextQ - curQ); 89 | 90 | curpos = nextpos; a = a_next; 91 | end % states in each episode 92 | 93 | % if the current state of the world is going to be drawn ... 94 | if(selectedEpIndex <= length(selectedEpisodes) && ei == selectedEpisodes(selectedEpIndex)) 95 | curpos = start; 96 | rows = []; cols = []; acts = []; 97 | for i = 1:(gridrows + gridcols) * 10, 98 | [qmax, a] = max(Q(curpos.row,curpos.col,:)); 99 | nextpos = GiveNextPos(curpos, a, gridcols, gridrows); 100 | rows = [rows curpos.row]; 101 | cols = [cols curpos.col]; 102 | acts = [acts a]; 103 | 104 | if(PosCmp(nextpos, goal) == 0), break; end 105 | curpos = nextpos; 106 | end % states in each episode 107 | 108 | %figure; 109 | figure('Name',sprintf('Episode: %d', ei), 'NumberTitle','off'); 110 | DrawEpisodeState(rows, cols, acts, start.row, start.col, goal.row, goal.col, gridrows, gridcols, fontsize); 111 | if(showTitle == 1), 112 | title(sprintf('Simple grid-world SARSA - episode %d - (\\epsilon: %3.3f), (\\alpha = %3.4f), (\\gamma = %1.1f)', ei, epsilon, alpha, gamma)); 113 | end 114 | 115 | selectedEpIndex = selectedEpIndex + 1; 116 | end 117 | 118 | end % episodes loop 119 | 120 | function c = PosCmp(pos1, pos2) 121 | c = pos1.row - pos2.row; 122 | if(c == 0) 123 | c = c + pos1.col - pos2.col; 124 | end 125 | 126 | function nextPos = GiveNextPos(curPos, actionIndex, gridCols, gridRows) 127 | nextPos = curPos; 128 | switch actionIndex 129 | case 1 % east 130 | nextPos.col = curPos.col + 1; 131 | case 2 % south 132 | nextPos.row = curPos.row + 1; 133 | case 3 % west 134 | nextPos.col = curPos.col - 1; 135 | case 4 % north 136 | nextPos.row = curPos.row - 1; 137 | case 5 % northeast 138 | nextPos.col = curPos.col + 1; 139 | nextPos.row = curPos.row - 1; 140 | case 6 % southeast 141 | nextPos.col = curPos.col + 1; 142 | nextPos.row = curPos.row + 1; 143 | case 7 % southwest 144 | nextPos.col = curPos.col - 1; 145 | nextPos.row = curPos.row + 1; 146 | case 8 % northwest 147 | nextPos.col = curPos.col - 1; 148 | nextPos.row = curPos.row - 1; 149 | case 9 % hold 150 | nextPos = curPos; 151 | otherwise 152 | disp(sprintf('invalid action index: %d', actionIndex)) 153 | end 154 | if(nextPos.col <= 0), nextPos.col = 1; end 155 | if(nextPos.col > gridCols), nextPos.col = gridCols; end 156 | if(nextPos.row <= 0), nextPos.row = 1; end 157 | if(nextPos.row > gridRows), nextPos.row = gridRows; end 158 | 159 | function n = IntRand(lowerBound, upperBound) 160 | n = floor((upperBound - lowerBound) * rand + lowerBound); 161 | -------------------------------------------------------------------------------- /src/PredictionRandomWalk.m: -------------------------------------------------------------------------------- 1 | function totalrms = PredictionRandomWalk(options) 2 | % PredictionRandomWalk: analyzes the effect the discount factor (gamma) 3 | % in the prediction error of the random walk problem with constant learning 4 | % rate (alpha) 5 | 6 | % You can pass the parameters of the problem, through the options 7 | % structure, otherwise a default settings is used for running the program. 8 | 9 | % written by: Sina Iravanian - June 2009 10 | % sina@sinairv.com 11 | % Please send your comments or bug reports to the above email address. 12 | 13 | if(nargin < 1) 14 | alpha = 0.1; 15 | nStates = 7; 16 | nTrainingSets = 100; 17 | nSequences = 40; 18 | doPlot = 1; 19 | showTitle = 1; 20 | lambdas = [0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1]; 21 | else 22 | alpha = options.alpha; % 0.1; 23 | nStates = options.nStates; % 7; 24 | nTrainingSets = options.nTrainingSets; % 100; 25 | nSequences = options.nSequences; % 10; 26 | doPlot = options.doPlot; % 1 27 | lambdas = options.lambdas; %[0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1]; 28 | showTitle = options.showTitle; 29 | end 30 | 31 | 32 | optValues = (1:nStates - 2)/(nStates - 1); 33 | totalrms = zeros(1, length(lambdas)); 34 | 35 | for ti = 1:nTrainingSets, 36 | trainingSet = {}; 37 | % build the training sets 38 | for si = 1:nSequences, 39 | [r, z] = GenerateRandomWalkSequence(nStates); 40 | trainingSet{end + 1} = [r, z]; 41 | end 42 | 43 | for li = 1:length(lambdas), 44 | w = 0.5*ones(nStates, 1); w(1) = 0; w(end) = 1; %weights(li,:)'; 45 | for si = 1:nSequences, 46 | curSeq = trainingSet{si}; 47 | z = curSeq(end); 48 | curSeq = curSeq(1:end-1); 49 | %w = PerformTDLambda(nStates, curSeq, z, w, alpha, lambdas(li)); 50 | w = PerformTDLambdaFinalUpdate(nStates, curSeq, z, w, alpha, lambdas(li)); 51 | end 52 | rmsErr = VectorRMS(w(2:nStates-1)' - optValues); 53 | totalrms(li) = totalrms(li) + rmsErr; 54 | end 55 | end 56 | 57 | totalrms = totalrms ./ nTrainingSets; 58 | 59 | if(doPlot ~= 0), 60 | figure; 61 | plot(lambdas, totalrms, '.-'); 62 | xlabel('\lambda'); 63 | ylabel('RMS Error'); 64 | if(showTitle), 65 | title(sprintf('Effect of \\lambda in Random Walk for %d nodes for %d training sets of %d sequences (\\alpha = %1.2f)', nStates - 2, nTrainingSets, nSequences, alpha)); 66 | end 67 | end 68 | 69 | function rms = VectorRMS(x) 70 | rms = norm(x) / sqrt(length(x)); 71 | 72 | function w = PerformTDLambdaFinalUpdate(NodesCount, seq, z, w0, alpha, lambda) 73 | w = w0; 74 | w2 = w; 75 | xt = GetObservationVector(NodesCount, seq(1) + 1); 76 | Pt = w' * xt; 77 | S = xt; 78 | for i=2:length(seq), 79 | xt = GetObservationVector(NodesCount, seq(i) + 1); 80 | Pt_1 = Pt; 81 | Pt = w' * xt; 82 | dw = alpha * (Pt - Pt_1)*S; 83 | S = xt + (lambda * S); 84 | w2 = w2 + dw; 85 | end 86 | dw = alpha * (z - Pt)*S; 87 | w2 = w2 + dw; 88 | w = w2; 89 | 90 | function w = PerformTDLambda(NodesCount, seq, z, w0, alpha, lambda) 91 | w = w0; 92 | xt = GetObservationVector(NodesCount, seq(1)+1); 93 | Pt = w' * xt; 94 | S = xt; 95 | for i=2:length(seq), 96 | xt = GetObservationVector(NodesCount, seq(i)+1); 97 | Pt_1 = Pt; 98 | Pt = w' * xt; 99 | dw = alpha * (Pt - Pt_1)*S; 100 | S = xt + (lambda * S); 101 | w = w + dw; 102 | end 103 | dw = alpha * (z - Pt)*S; 104 | w = w + dw; 105 | 106 | function xt = GetObservationVector(NodesCount, i) 107 | xt = zeros(NodesCount, 1); 108 | xt(i) = 1; 109 | 110 | -------------------------------------------------------------------------------- /src/PredictionRandomWalkAlphaEffect.m: -------------------------------------------------------------------------------- 1 | function PredictionRandomWalkAlphaEffect(options) 2 | % PredictionRandomWalkAlphaEffect: analyzes the effect of the learning rate 3 | % (alpha) together with the discount factor (gamma) in the prediction error 4 | % of the random walk problem 5 | 6 | % You can pass the parameters of the problem, through the options 7 | % structure, otherwise a default settings is used for running the program. 8 | 9 | % written by: Sina Iravanian - June 2009 10 | % sina@sinairv.com 11 | % Please send your comments or bug reports to the above email address. 12 | 13 | if(nargin < 1) 14 | nStates = 7; 15 | nTrainingSets = 500; 16 | nSequences = 20; 17 | alphas = [0.0 0.1 0.2 0.3 0.4]; 18 | lambdas = [0 0.3 0.8 1]; 19 | showTitle = 1; 20 | else 21 | nStates = options.nStates; %7; 22 | nTrainingSets = options.nTrainingSets; % 100; 23 | nSequences = options.nSequences; % 10; 24 | alphas = options.alphas; %[0.0 0.1 0.2 0.3 0.4]; 25 | lambdas = options.lambdas; 26 | showTitle = options.showTitle; 27 | end 28 | 29 | loptions.nStates = nStates; 30 | loptions.nSequences = nSequences; 31 | loptions.nTrainingSets = nTrainingSets; 32 | loptions.doPlot = 0; 33 | loptions.alpha = -1; 34 | loptions.lambdas = lambdas; %[0 0.3 0.8 1]; 35 | loptions.showTitle = showTitle; 36 | 37 | Graphs = []; 38 | Legends = {}; 39 | for li = 1:length(loptions.lambdas), 40 | Legends{end + 1} = sprintf('%1.1f', loptions.lambdas(li)); 41 | end 42 | 43 | for ai = 1:length(alphas), 44 | loptions.alpha = alphas(ai); 45 | rmsErr = PredictionRandomWalk(loptions); 46 | Graphs = [Graphs; rmsErr]; 47 | end 48 | 49 | figure; 50 | plot(alphas, Graphs', '.-'); 51 | legend(Legends,'Location', 'NorthEastOutside'); 52 | xlabel('\alpha'); 53 | ylabel('RMS Error'); 54 | if(showTitle == 1), 55 | title(sprintf('Effect of \\alpha in Random Walk for %d nodes for %d training sets of %d sequences for different \\lambda s', nStates - 2, nTrainingSets, nSequences)); 56 | end 57 | -------------------------------------------------------------------------------- /src/RLRandomWalk.m: -------------------------------------------------------------------------------- 1 | function V = RLRandomWalk(options) 2 | % RLRandomWalk: implements the random-walk problem through the reinforcement 3 | % learning methods (namely the TD(0) method) 4 | 5 | % You can pass the parameters of the problem, through the options 6 | % structure, otherwise a default settings is used for running the program. 7 | 8 | % written by: Sina Iravanian - June 2009 9 | % sina@sinairv.com 10 | % Please send your comments or bug reports to the above email address. 11 | 12 | if(nargin < 1), 13 | nStates = 7; 14 | gamma = 1; 15 | alpha = 0.01; 16 | nEpisodes = 5000; 17 | selectedEpisodes = [1 20 200 1000 5000]; 18 | showTitle = 1; 19 | else 20 | nStates = options.nStates; 21 | gamma = options.gamma; 22 | alpha = options.alpha; 23 | nEpisodes = options.nEpisodes; 24 | selectedEpisodes = options.selectedEpisodes; 25 | showTitle = options.showTitle; 26 | end 27 | 28 | selectedEpIndex = 1; 29 | Graphs = (1:nStates - 2)/(nStates - 1); 30 | Legends = {'Ideal'}; 31 | 32 | 33 | V = ones(1,nStates) * 0.5; V(1) = 0; V(end) = 0; 34 | for ei = 1:nEpisodes, 35 | % find the initial state 36 | if(mod(nStates, 2) == 0) 37 | s = nStates / 2; 38 | else 39 | s = (nStates + 1) / 2; % the initial state 40 | end 41 | 42 | while s ~= 1 && s ~= nStates, 43 | % take action according to the policy (random walk) 44 | if(rand > 0.5) 45 | s_next = s + 1; 46 | else 47 | s_next = s - 1; 48 | end 49 | 50 | % set reward 51 | if(s_next == nStates) 52 | r = 1; 53 | else 54 | r = 0; 55 | end 56 | 57 | % update Value Table (V) 58 | V(s) = V(s) + alpha * ( r + gamma * V(s_next) - V(s)); 59 | 60 | s = s_next; 61 | end 62 | if(selectedEpIndex <= length(selectedEpisodes) && ei == selectedEpisodes(selectedEpIndex)) 63 | Graphs = [Graphs; V(2:nStates - 1)]; 64 | Legends{end + 1} = sprintf('%d', ei); 65 | selectedEpIndex = selectedEpIndex + 1; 66 | end 67 | end 68 | 69 | V = V(2:nStates - 1); 70 | figure; 71 | plot(1:nStates - 2, Graphs', '.-'); 72 | legend(Legends,'Location', 'NorthEastOutside'); 73 | if(showTitle == 1), 74 | title(sprintf('Random Walk for %d nodes, (\\alpha = %3.4f), (\\gamma = %1.1f)', nStates - 2, alpha, gamma)); 75 | end 76 | axis([0, nStates-1, 0,1]); -------------------------------------------------------------------------------- /src/WindyGridWorldQLearning.m: -------------------------------------------------------------------------------- 1 | function WindyGridWorldQLearning(options) 2 | % WindyGridWorldQLearning: implements the windy grid world problem using the 3 | % Q-Learning method 4 | 5 | % You can pass the parameters of the problem, through the options 6 | % structure, otherwise a default settings is used for running the program. 7 | 8 | % written by: Sina Iravanian - June 2009 9 | % sina@sinairv.com 10 | % Please send your comments or bug reports to the above email address. 11 | 12 | if(nargin < 1), 13 | gamma = 0.9; 14 | alpha = 0.1; 15 | epsilon = 0.1; 16 | 17 | gridcols = 10; 18 | gridrows = 7; 19 | windpowers = [0 0 0 1 1 1 2 2 1 0]; 20 | fontsize = 16; 21 | showTitle = 1; 22 | 23 | episodeCount = 1000; 24 | selectedEpisodes = [20 200 700 1000]; 25 | 26 | isKing = 0; 27 | canHold = 0; 28 | 29 | start.row = 4; 30 | start.col = 1; 31 | goal.row = 4; 32 | goal.col = 8; 33 | else 34 | gamma = options.gamma; 35 | alpha = options.alpha; 36 | epsilon = options.epsilon; 37 | 38 | gridcols = options.gridcols; 39 | gridrows = options.gridrows; 40 | windpowers = options.windpowers; 41 | fontsize = options.fontsize; 42 | showTitle = options.showTitle; 43 | 44 | episodeCount = options.episodeCount; 45 | selectedEpisodes = options.selectedEpisodes; 46 | 47 | isKing = options.isKing; canHold = options.canHold; 48 | 49 | start = options.start; 50 | goal = options.goal; 51 | end 52 | 53 | selectedEpIndex = 1; 54 | if(isKing ~= 0), actionCount = 8; else actionCount = 4; end 55 | if(canHold ~= 0 && isKing ~= 0), actionCount = actionCount + 1; end 56 | 57 | 58 | % initialize Q with zeros 59 | Q = zeros(gridrows, gridcols, actionCount); 60 | 61 | a = 0; % an invalid action 62 | % loop through episodes 63 | for ei = 1:episodeCount, 64 | %disp(sprintf('Running episode %d', ei)); 65 | curpos = start; 66 | nextpos = start; 67 | 68 | %epsilon or greedy 69 | if(rand > epsilon) % greedy 70 | [qmax, a] = max(Q(curpos.row,curpos.col,:)); 71 | else 72 | a = IntRand(1, actionCount); 73 | end 74 | 75 | while(PosCmp(curpos, goal) ~= 0) 76 | % take action a, observe r, and nextpos 77 | nextpos = GiveNextPos(curpos, a, windpowers, gridcols, gridrows); 78 | if(PosCmp(nextpos, goal) ~= 0), r = -1; else r = 0; end 79 | 80 | % choose a_next from nextpos 81 | [qmax, a_next] = max(Q(nextpos.row,nextpos.col,:)); 82 | if(rand <= epsilon) % explore 83 | a_next = IntRand(1, actionCount); 84 | end 85 | 86 | % update Q: Sarsa 87 | curQ = Q(curpos.row, curpos.col, a); 88 | nextQ = qmax; %Q(nextpos.row, nextpos.col, a_next); 89 | Q(curpos.row, curpos.col, a) = curQ + alpha*(r + gamma*nextQ - curQ); 90 | 91 | curpos = nextpos; a = a_next; 92 | end % states in each episode 93 | 94 | % if the current state of the world is going to be drawn ... 95 | if(selectedEpIndex <= length(selectedEpisodes) && ei == selectedEpisodes(selectedEpIndex)) 96 | curpos = start; 97 | rows = []; cols = []; acts = []; 98 | for i = 1:(gridrows + gridcols) * 10, 99 | [qmax, a] = max(Q(curpos.row,curpos.col,:)); 100 | nextpos = GiveNextPos(curpos, a, windpowers, gridcols, gridrows); 101 | rows = [rows curpos.row]; 102 | cols = [cols curpos.col]; 103 | acts = [acts a]; 104 | 105 | if(PosCmp(nextpos, goal) == 0), break; end 106 | curpos = nextpos; 107 | end % states in each episode 108 | 109 | %figure; 110 | figure('Name',sprintf('Episode: %d', ei), 'NumberTitle','off'); 111 | DrawWindyEpisodeState(rows, cols, acts, start.row, start.col, goal.row, goal.col, windpowers, gridrows, gridcols, fontsize); 112 | if(showTitle == 1), 113 | title(sprintf('Windy grid-world SARSA - episode %d - (\\epsilon: %3.3f), (\\alpha = %3.4f), (\\gamma = %1.1f)', ei, epsilon, alpha, gamma)); 114 | end 115 | 116 | selectedEpIndex = selectedEpIndex + 1; 117 | end 118 | 119 | end % episodes loop 120 | 121 | function c = PosCmp(pos1, pos2) 122 | c = pos1.row - pos2.row; 123 | if(c == 0) 124 | c = c + pos1.col - pos2.col; 125 | end 126 | 127 | function nextPos = GiveNextPos(curPos, actionIndex, windpowers, gridCols, gridRows) 128 | nextPos = curPos; 129 | switch actionIndex 130 | case 1 % east 131 | nextPos.col = curPos.col + 1; 132 | case 2 % south 133 | nextPos.row = curPos.row + 1; 134 | case 3 % west 135 | nextPos.col = curPos.col - 1; 136 | case 4 % north 137 | nextPos.row = curPos.row - 1; 138 | case 5 % northeast 139 | nextPos.col = curPos.col + 1; 140 | nextPos.row = curPos.row - 1; 141 | case 6 % southeast 142 | nextPos.col = curPos.col + 1; 143 | nextPos.row = curPos.row + 1; 144 | case 7 % southwest 145 | nextPos.col = curPos.col - 1; 146 | nextPos.row = curPos.row + 1; 147 | case 8 % northwest 148 | nextPos.col = curPos.col - 1; 149 | nextPos.row = curPos.row - 1; 150 | case 9 % hold 151 | nextPos = curPos; 152 | otherwise 153 | disp(sprintf('invalid action index: %d', actionIndex)) 154 | end 155 | if(nextPos.col <= 0), nextPos.col = 1; end 156 | if(nextPos.col > gridCols), nextPos.col = gridCols; end 157 | nextPos.row = nextPos.row - windpowers(nextPos.col); 158 | if(nextPos.row <= 0), nextPos.row = 1; end 159 | if(nextPos.row > gridRows), nextPos.row = gridRows; end 160 | 161 | function n = IntRand(lowerBound, upperBound) 162 | n = floor((upperBound - lowerBound) * rand + lowerBound); 163 | -------------------------------------------------------------------------------- /src/WindyGridWorldSARSA.m: -------------------------------------------------------------------------------- 1 | function WindyGridWorldSARSA(options) 2 | % WindyGridWorldSARSA: implements the windy grid world problem using the 3 | % SARSA method 4 | 5 | % You can pass the parameters of the problem, through the options 6 | % structure, otherwise a default settings is used for running the program. 7 | 8 | % written by: Sina Iravanian - June 2009 9 | % sina@sinairv.com 10 | % Please send your comments or bug reports to the above email address. 11 | 12 | if(nargin < 1), 13 | gamma = 0.9; 14 | alpha = 0.1; 15 | epsilon = 0.1; 16 | 17 | gridcols = 10; 18 | gridrows = 7; 19 | windpowers = [0 0 0 1 1 1 2 2 1 0]; 20 | fontsize = 16; 21 | showTitle = 1; 22 | 23 | episodeCount = 1000; 24 | selectedEpisodes = [20 200 700 1000]; 25 | 26 | isKing = 0; 27 | canHold = 0; 28 | 29 | start.row = 4; 30 | start.col = 1; 31 | goal.row = 4; 32 | goal.col = 8; 33 | else 34 | gamma = options.gamma; 35 | alpha = options.alpha; 36 | epsilon = options.epsilon; 37 | 38 | gridcols = options.gridcols; 39 | gridrows = options.gridrows; 40 | windpowers = options.windpowers; 41 | fontsize = options.fontsize; 42 | showTitle = options.showTitle; 43 | 44 | episodeCount = options.episodeCount; 45 | selectedEpisodes = options.selectedEpisodes; 46 | 47 | isKing = options.isKing; 48 | canHold = options.canHold; 49 | 50 | start = options.start; 51 | goal = options.goal; 52 | end 53 | 54 | selectedEpIndex = 1; 55 | if(isKing ~= 0), actionCount = 8; else actionCount = 4; end 56 | if(canHold ~= 0 && isKing ~= 0), actionCount = actionCount + 1; end 57 | 58 | % initialize Q with zeros 59 | Q = zeros(gridrows, gridcols, actionCount); 60 | 61 | a = 0; % an invalid action 62 | % loop through episodes 63 | for ei = 1:episodeCount, 64 | %disp(sprintf('Running episode %d', ei)); 65 | curpos = start; 66 | nextpos = start; 67 | 68 | %epsilon or greedy 69 | if(rand > epsilon) % greedy 70 | [qmax, a] = max(Q(curpos.row,curpos.col,:)); 71 | else 72 | a = IntRand(1, actionCount); 73 | end 74 | 75 | while(PosCmp(curpos, goal) ~= 0) 76 | % take action a, observe r, and nextpos 77 | nextpos = GiveNextPos(curpos, a, windpowers, gridcols, gridrows); 78 | if(PosCmp(nextpos, goal) ~= 0), r = -1; else r = 0; end 79 | 80 | % choose a_next from nextpos 81 | if(rand > epsilon) % greedy 82 | [qmax, a_next] = max(Q(nextpos.row,nextpos.col,:)); 83 | else 84 | a_next = IntRand(1, actionCount); 85 | end 86 | 87 | % update Q: Sarsa 88 | curQ = Q(curpos.row, curpos.col, a); 89 | nextQ = Q(nextpos.row, nextpos.col, a_next); 90 | Q(curpos.row, curpos.col, a) = curQ + alpha*(r + gamma*nextQ - curQ); 91 | 92 | curpos = nextpos; a = a_next; 93 | end % states in each episode 94 | 95 | % if the current state of the world is going to be drawn ... 96 | if(selectedEpIndex <= length(selectedEpisodes) && ei == selectedEpisodes(selectedEpIndex)) 97 | curpos = start; 98 | rows = []; cols = []; acts = []; 99 | for i = 1:(gridrows + gridcols) * 10, 100 | [qmax, a] = max(Q(curpos.row,curpos.col,:)); 101 | nextpos = GiveNextPos(curpos, a, windpowers, gridcols, gridrows); 102 | rows = [rows curpos.row]; 103 | cols = [cols curpos.col]; 104 | acts = [acts a]; 105 | 106 | if(PosCmp(nextpos, goal) == 0), break; end 107 | curpos = nextpos; 108 | end % states in each episode 109 | 110 | %figure; 111 | figure('Name',sprintf('Episode: %d', ei), 'NumberTitle','off'); 112 | DrawWindyEpisodeState(rows, cols, acts, start.row, start.col, goal.row, goal.col, windpowers, gridrows, gridcols, fontsize); 113 | if(showTitle == 1), 114 | title(sprintf('Windy grid-world SARSA - episode %d - (\\epsilon: %3.3f), (\\alpha = %3.4f), (\\gamma = %1.1f)', ei, epsilon, alpha, gamma)); 115 | end 116 | 117 | selectedEpIndex = selectedEpIndex + 1; 118 | end 119 | 120 | end % episodes loop 121 | 122 | function c = PosCmp(pos1, pos2) 123 | c = pos1.row - pos2.row; 124 | if(c == 0) 125 | c = c + pos1.col - pos2.col; 126 | end 127 | 128 | function nextPos = GiveNextPos(curPos, actionIndex, windpowers, gridCols, gridRows) 129 | nextPos = curPos; 130 | switch actionIndex 131 | case 1 % east 132 | nextPos.col = curPos.col + 1; 133 | case 2 % south 134 | nextPos.row = curPos.row + 1; 135 | case 3 % west 136 | nextPos.col = curPos.col - 1; 137 | case 4 % north 138 | nextPos.row = curPos.row - 1; 139 | case 5 % northeast 140 | nextPos.col = curPos.col + 1; 141 | nextPos.row = curPos.row - 1; 142 | case 6 % southeast 143 | nextPos.col = curPos.col + 1; 144 | nextPos.row = curPos.row + 1; 145 | case 7 % southwest 146 | nextPos.col = curPos.col - 1; 147 | nextPos.row = curPos.row + 1; 148 | case 8 % northwest 149 | nextPos.col = curPos.col - 1; 150 | nextPos.row = curPos.row - 1; 151 | case 9 % hold 152 | nextPos = curPos; 153 | otherwise 154 | disp(sprintf('invalid action index: %d', actionIndex)) 155 | end 156 | if(nextPos.col <= 0), nextPos.col = 1; end 157 | if(nextPos.col > gridCols), nextPos.col = gridCols; end 158 | nextPos.row = nextPos.row - windpowers(nextPos.col); 159 | if(nextPos.row <= 0), nextPos.row = 1; end 160 | if(nextPos.row > gridRows), nextPos.row = gridRows; end 161 | 162 | function n = IntRand(lowerBound, upperBound) 163 | n = floor((upperBound - lowerBound) * rand + lowerBound); 164 | --------------------------------------------------------------------------------