├── README.md ├── conversion ├── Contents.m ├── HTMLequation.m ├── gpmigrate2_2f.m ├── gpmodel2func.m ├── gpmodel2mfile.m ├── gpmodel2struct.m ├── gpmodel2sym.m ├── gpmodelgenes2mfile.m └── processOrgChartJS.m ├── core ├── Contents.m ├── assignadf.m ├── bootsample.m ├── comparemodelsREC.m ├── crossover.m ├── displaystats.m ├── drawtrees.m ├── evalfitness.m ├── evalfitness_par.m ├── extract.m ├── genebrowser.m ├── genefilter.m ├── genes2gpmodel.m ├── getcomplexity.m ├── getdepth.m ├── getnumnodes.m ├── gp_2d_mesh.m ├── gp_toolboxlist.m ├── gpcheck.m ├── gpdefaults.m ├── gpfinalise.m ├── gpinit.m ├── gpinitparallel.m ├── gpmodelfilter.m ├── gpmodelreport.m ├── gpmodelvars.m ├── gppopvars.m ├── gppretty.m ├── gprandom.m ├── gpreformat.m ├── gpresids.m ├── gpsimplify.m ├── gpsym.m ├── gpterminate.m ├── gptic.m ├── gptoc.m ├── gptoolboxcheck.m ├── gptreestructure.m ├── initbuild.m ├── kogene.m ├── mergegp.m ├── mutate.m ├── ndfsort_rank1.m ├── paretoreport.m ├── parseadf.m ├── picknode.m ├── popbrowser.m ├── popbuild.m ├── pref2inf.m ├── regressionErrorCharacteristic.m ├── rungp.m ├── runtree.m ├── scangenes.m ├── selection.m ├── standaloneModelStats.m ├── summary.m ├── tree2evalstr.m ├── treegen.m ├── uniquegenes.m └── updatestats.m ├── examples ├── concrete.mat ├── cubic_config.m ├── demo2data.mat ├── gpdemo1.m ├── gpdemo1_config.m ├── gpdemo2.m ├── gpdemo2_config.m ├── gpdemo3.m ├── gpdemo3_config.m ├── gpdemo4.m ├── gpdemo4_config.m ├── ph2data.mat ├── quartic_fitfun.m ├── ripple_config.m ├── salustowicz1d_config.m └── uball_config.m ├── fitness ├── regressmulti_fitfun.m └── regressmulti_fitfun_validate.m ├── gpl.txt ├── nodefcn ├── add3.m ├── cube.m ├── gauss.m ├── gpand.m ├── gpnot.m ├── gpor.m ├── gth.m ├── iflte.m ├── lth.m ├── maxx.m ├── minx.m ├── mult3.m ├── neg.m ├── negexp.m ├── pdiv.m ├── plog.m ├── psqroot.m ├── square.m ├── step.m └── thresh.m ├── rules ├── gprule_no_nested_adfs.m └── gprule_template.m └── user └── gp_userfcn.m /README.md: -------------------------------------------------------------------------------- 1 | # GPTIPS2F Toolbox for MATLAB # 2 | 3 | GPTIPS2F is the evolution of the second version of the MATLAB toolbox developed by Dr. Dominic Searson. 4 | 5 | Since 2017, a fork of the toolbox (‘2F’) is maintained by Dr. Aleksei Tepljakov, TalTech University, https://taltech.ee/ 6 | 7 | **Note that the version number of GPTIPS2F has been reset to 1.0 since Dec 6, 2018.** 8 | 9 | ## New features ## 10 | 11 | * Preset Random Constants (PRCs): a subset of Ephemeral Random Constants (ERCs) the difference being that PRCs are randomly chosen from a predefined set. 12 | * Automatically Defined Functions (ADFs): basically templates that are seeded into the initial population and can also arise naturally during mutation. 13 | * Evolutionary rules: define rules for discarding individuals with certain undesirable traits or significantly decrease their chances of staying in the population (alpha testing feature). 14 | 15 | ## Installation ## 16 | 17 | Run the code `addpath(genpath(gptips2f))` and save the resulting path afterwards. 18 | 19 | ## Documentation ## 20 | Please consider [the original docs](https://sites.google.com/site/gptips4matlab/). Updated docs will be published in due time. Meanwhile, read on to learn how to use the ADF feature. 21 | 22 | ## A brief on how to use ADFs ## 23 | 24 | Contrary to the original meaning of the term, ADFs---automatically defined functions---are defined manually by the user prior to the regression procedure. The purpose of ADFs at the moment is to enforce some structure into the otherwise unstructured modeling problem. Think of them as intelligent design elements---those that would be most useful in a certain context. 25 | 26 | Here's how to set up ADFs in MATLAB code in the gp config file: 27 | ``` 28 | % ADF expressions. Can contain any custom function also. 29 | gp.nodes.adf.expr = {'$1+?1*$2', '$1*(#1+$2)'}; 30 | 31 | % For terminals (the arguments of the ADF functions) use 32 | % $1, $2, ... to define normal terminals 33 | % ?1, ?2, ... to mark a terminal where ERC should go 34 | % #1, #2, ... to mark a terminal where PRC should go 35 | % Don't forget to enumerate these correctly! Counters are unique to each ADF 36 | % (see example above) 37 | 38 | % Assign names to ADFs 39 | gp.nodes.adf.name = {'simple_rnd_sum', 'simple_rnd_prod'}; 40 | 41 | % Predefined random constants (PRC) settings 42 | 43 | % If a PRC is generated (see above), only the entries in this set are used 44 | gp.nodes.pconst.set = [0.1 0.2 0.3 0.4 0.5]; 45 | 46 | % General probability of generating a PRC (does not affect "#nn" terminals) 47 | gp.nodes.pconst.p_PRC = 0.25; 48 | 49 | % Probability of generating an ADF 50 | gp.nodes.adf.p_gen = 0.25; 51 | 52 | % Enforce structure on mutation 53 | gp.nodes.adf.arg_force = true; 54 | ``` 55 | 56 | ## A brief on how to use Rules ## 57 | 58 | Evolutionary rules allow to either discard or otherwise decrease the chances of survival of individuals having certain traits. These traits must be extracted from symbolic gene expressions and tested in user-defined rule functions. An example of how to structure the latter can be found in the `rules` folder under the name `gprule_template.m`. 59 | 60 | There is also an implemented rule called `gprule_no_nested_adfs.m` available in the same folder. This rule will allow to remove from the population any individual who has genes where nested ADFs appear. This is useful for a certain class of regression problems. In the config function, this rule is set up as follows: 61 | ``` 62 | % Evolutionary rules 63 | gp.evolution.rules.use = true; 64 | gp.evolution.rules.sets = {{@gprule_no_nested_adfs, []}}; 65 | ``` 66 | 67 | **Note that this is an alpha feature.** -------------------------------------------------------------------------------- /conversion/Contents.m: -------------------------------------------------------------------------------- 1 | % GPTIPS2F 2 | % Symbolic Data Mining for MATLAB evolved 3 | % (c) Aleksei Tepljakov, 2017-... 4 | % (c) Dominic Searson 2009-2015 5 | % 6 | % Files 7 | % gpmodel2func - Converts a multigene symbolic regression model to an anonymous function and returns the function handle. 8 | % gpmodel2mfile - Converts a multigene regression model to a standalone M file. 9 | % gpmodel2struct - Create a struct describing a multigene regression model. 10 | % gpmodel2sym - Create a simplified Symbolic Math object for a multigene symbolic regression model. 11 | % gpmodelfilter - Object to filter a population of multigene symbolic regression models. 12 | % gpmodelgenes2mfile - Converts individual genes of a multigene symbolic regression model to a standalone M file. 13 | % HTMLequation - Returns an HTML formatted multigene regression model equation. 14 | % processOrgChartJS - Writes Google org chart JavaScript for genes to an existing HTML file. 15 | -------------------------------------------------------------------------------- /conversion/HTMLequation.m: -------------------------------------------------------------------------------- 1 | function strOut = HTMLequation(gp,eqn,numDigits) 2 | %HTMLEQUATION Returns an HTML formatted multigene regression model equation. 3 | % 4 | % HTMLEQN = HTMLEQUATION(GP,EQN) where EQN is a Symbolic Math object 5 | % representing the GPTIPS model and GP is the GPTIPS data structure 6 | % returns a string HTMLEQN containing the equation in HTML format. 7 | % 8 | % HTMLEQN = HTMLEQUATION(GP,ID) where ID is a numeric model identifier in 9 | % the GP data structure. Returns a string HTMLEQN containing the overall 10 | % model equation in HTML format. 11 | % 12 | % HTMLEQN = HTMLEQUATION(GP,ID, NUMDIGITS) does the same but formats the 13 | % displayed precision to NUMDIGITS where NUMDIGITS >=2. 14 | % 15 | % HTMLEQN = HTMLEQUATION(GP,'best') returns a string HTMLEQN containing 16 | % the 'best' model equation (on the training data) in HTML format. 17 | % 18 | % HTMLEQN = HTMLEQUATION(GP,'valbest') returns a string HTMLEQN 19 | % containing the 'best' model (on the validation data) equation in HTML 20 | % format. 21 | % 22 | % HTMLEQN = HTMLEQUATION(GP,'testbest') returns a string HTMLEQN 23 | % containing the 'best' model (on the test data) equation in HTML format. 24 | % 25 | % Note: 26 | % 27 | % The model equation is not by default returned with full numeric 28 | % precision. 29 | % 30 | % Copyright (c) 2009-2015 Dominic Searson 31 | % 32 | % GPTIPS 2 33 | % 34 | % See also SYM/VPA, GPMODEL2SYM, GPMODELREPORT, PARETOREPORT, GPPRETTY 35 | 36 | strOut = []; 37 | 38 | if nargin < 2 39 | disp('Usage is HTMLEQUATION(EQN,GP)'); 40 | return; 41 | end 42 | 43 | if ~gp.info.toolbox.symbolic() 44 | error('The Symbolic Math Toolbox is required to use this function.'); 45 | end 46 | 47 | if nargin < 3 || isempty(numDigits) 48 | numDigits = 4; 49 | end 50 | 51 | if numDigits < 2 52 | error('The number of digits must be > 1.'); 53 | end 54 | 55 | modelform = true; 56 | 57 | if isa(eqn,'sym') 58 | modelform = false; 59 | elseif isnumeric(eqn) 60 | model = gpmodel2struct(gp,eqn,false,true,false); 61 | elseif strcmpi(eqn,'best') 62 | model = gpmodel2struct(gp,'best',false,true,false); 63 | elseif strcmpi(eqn,'testbest') 64 | model = gpmodel2struct(gp,'testbest',false,true,false); 65 | elseif strcmpi(eqn,'valbest') 66 | model = gpmodel2struct(gp,'valbest',false,true,false); 67 | else %otherwise assume that the 'eqn' is in raw form, e.g.'x1' 68 | modelform = false; 69 | end 70 | 71 | if modelform 72 | 73 | if model.valid 74 | eqn = formattedPrecisChar(model.sym,numDigits); 75 | else 76 | disp(['Invalid model specified. Reason: ' model.invalidReason]); 77 | return; 78 | end 79 | 80 | else 81 | eqn = formattedPrecisChar(eqn,numDigits); 82 | end 83 | 84 | pat1 = 'x(\d+)'; %subscript regex; 85 | pat2='\^([-]?\d+(\.\d*)?|\.\d+)'; %decimal numerical power superscript regex; 86 | pat3 = '\^((\()[-]?\d+(/)\d+(\)))'; %fractional numerical power subscript 87 | 88 | strOut = regexprep(eqn,pat1,'x$1'); 89 | strOut = regexprep(strOut,pat2,'$1'); 90 | 91 | %specific replacements for sqrt being represented as '^(1/2)' by symbolic math 92 | %toolbox 93 | strOut = strrep(strOut,'^(1/2)','1/2'); 94 | strOut = strrep(strOut,'^(3/2)','3/2'); 95 | strOut = strrep(strOut,'^(1/4)','1/4'); 96 | strOut = strrep(strOut,'^(1/8)','1/8'); 97 | 98 | %any other numerical fractions 99 | strOut = regexprep(strOut,pat3,'$1'); 100 | 101 | %also, get rid of '*' for display purposes 102 | strOut = strrep(strOut,'*',' '); 103 | 104 | %replace plain versions of user defined variable names with marked up 105 | %versions 106 | for i = 1:numel(gp.nodes.inputs.names) 107 | if ~isempty(gp.nodes.inputs.names{i}) 108 | strOut = strrep(strOut,gp.nodes.inputs.namesPlain{i},gp.nodes.inputs.names{i}); 109 | end 110 | end 111 | 112 | function eqn = formattedPrecisChar(symEq,numDigits) 113 | %gets the char form of the SYM equation with controlled numeric precision. 114 | %Uses MuPAD's Pref::outputDigits setting and may not work properly on old 115 | %Matlab versions 116 | 117 | %exit if a char eqn. 118 | if ~isa(symEq,'sym') 119 | eqn = symEq; 120 | return; 121 | end 122 | 123 | verReallyOld = verLessThan('matlab', '7.7.0'); 124 | 125 | if nargin < 2 || isempty(numDigits) 126 | numDigits = 4; 127 | end 128 | 129 | %get existing value of MuPAD's 'Pref::outputDigits' setting so this can be 130 | %be restored after this function call. 131 | userDigits = char(feval(symengine,'Pref::outputDigits')); 132 | 133 | if ~verReallyOld 134 | evalin(symengine,['Pref::outputDigits(' int2str(numDigits) ')']); 135 | end 136 | 137 | %use VPA here (not sure why this is actually necessary having just set 138 | %Pref::outputDigits but it seems to be reqd in some cases) 139 | eqn = char(vpa(symEq,numDigits)); 140 | 141 | if ~verReallyOld 142 | evalin(symengine,['Pref::outputDigits(' userDigits ')']); 143 | end 144 | -------------------------------------------------------------------------------- /conversion/gpmigrate2_2f.m: -------------------------------------------------------------------------------- 1 | function gp = gpmigrate2_2f(gp) 2 | %GPMIGRATE2_2F Migrate parameters of GP structure from V2 to V2F of toolbox 3 | % Some parameters of the GP strucure have changed in the new version of 4 | % the toolbox. Therefore, this function is provided to update those 5 | % parameters. Simply do GP = GPMIGRATE2_2F(GP) to get those parameters 6 | % fixed. 7 | % 8 | % List of parameter changes: 9 | % * Toolboxes are now correctly detected on all computers 10 | % dynamically by means of anonymous function handles. This means that 11 | % when you run GPTIPS 2F on a computer that doesn't have MATLAB's 12 | % Symbolic Math toolbox installed, for instance, and then use the 13 | % resulting GP structure on a computer that does have it installed, 14 | % the toolbox is detected and you can proceed. 15 | 16 | % Correct toolbox check using handles 17 | [gp.info.toolbox.symbolic, gp.info.toolbox.parallel, gp.info.toolbox.stats] = gptoolboxcheck; 18 | 19 | end 20 | 21 | -------------------------------------------------------------------------------- /conversion/gpmodel2func.m: -------------------------------------------------------------------------------- 1 | function f = gpmodel2func(gp,ID) 2 | %GPMODEL2FUNC Converts a multigene symbolic regression model to an anonymous function and returns the function handle. 3 | % 4 | % F = GPMODEL2FUNC(GP,ID) converts the model identified by numeric 5 | % identifier ID to a standard MATLAB anonymous function with function 6 | % handle F. 7 | % 8 | % F = GPMODEL2FUNC(GP,'best') converts the best model on the training 9 | % data to a MATLAB anonymous function with function handle F. 10 | % 11 | % F = GPMODEL2FUNC(GP,'valbest') converts the best model on the 12 | % validation data (if it exists) to a MATLAB anonymous function with 13 | % function handle F. 14 | % 15 | % F = GPMODEL2FUNC(GP,'testbest') converts the best model on the test 16 | % data (if it exists) to a MATLAB anonymous function with function handle 17 | % F. 18 | % 19 | % F = GPMODEL2FUNC(GP,GPMODEL) operates on the GPMODEL struct 20 | % representing a multigene regression model, i.e. the struct returned by 21 | % the functions GPMODEL2STRUCT or GENES2GPMODEL. 22 | % 23 | % The anonymous function F can then be used to evaluate the model on new 24 | % data. It can also be used in MATLAB visualisation tools such as the 25 | % EZSURF function. 26 | % 27 | % Copyright (c) 2009-2015 Dominic Searson 28 | % 29 | % GPTIPS 2 30 | % 31 | % See also GPMODEL2SYM, GPMODEL2MFILE, FUNCTION_HANDLE, MATLABFUNCTION, 32 | % EZPLOT, EZCONTOUR, EZSURF 33 | 34 | if ~gp.info.toolbox.symbolic() 35 | error('The Symbolic Math Toolbox is required to use this function.'); 36 | end 37 | 38 | if nargin < 2 39 | disp('Basic usage is F = GPMODEL2FUNC(GP,ID)'); 40 | disp('or F = GPMODEL2FUNC(GP,''best'')'); 41 | disp('or F = GPMODEL2FUNC(GP,''valbest'')'); 42 | disp('or F = GPMODEL2FUNC(GP,''testbest'')'); 43 | return; 44 | end 45 | 46 | if isnumeric(ID) && ( ID < 1 || ID > numel(gp.pop) ) 47 | error('The supplied numerical model indentifier ID is not valid.'); 48 | end 49 | 50 | if ~strncmpi('regressmulti', func2str(gp.fitness.fitfun),12) 51 | error('GPMODEL2FUNC may only be used for multigene symbolic regression problems.'); 52 | end 53 | 54 | % If ADFs are used, make them known to this function 55 | if gp.nodes.adf.use, assignadf(gp); end 56 | 57 | s = gpmodel2sym(gp,ID); 58 | 59 | if isempty(s) 60 | error('Not a valid model ID or model selector'); 61 | end 62 | 63 | f = matlabFunction(s); 64 | 65 | % If ADFs are used, make them known to this function, and re-evaluate 66 | if gp.nodes.adf.use 67 | assignadf(gp); 68 | f = func2str(f); 69 | f = eval(f); 70 | end 71 | -------------------------------------------------------------------------------- /conversion/gpmodel2mfile.m: -------------------------------------------------------------------------------- 1 | function gpmodel2mfile(gp,ID,filename,useAlias) 2 | %GPMODEL2MFILE Converts a multigene regression model to a standalone M file. 3 | % 4 | % GPMODEL2MFILE(GP,'best') converts the "best" model in the population to 5 | % a function in GPMODEL.M 6 | % 7 | % GPMODEL2MFILE(GP,'best','FILENAME') converts the "best" model in the 8 | % population to a function in FILENAME.M 9 | % 10 | % GPMODEL2MFILE(GP,'valbest','FILENAME') converts the "best" model, as 11 | % evaluated on the holdout validation data (if it exists), to a function 12 | % in FILENAME.M 13 | % 14 | % GPMODEL2MFILE(GP,'testbest','FILENAME') converts the "best" model, as 15 | % evaluated on the test data (if it exists), to a function in FILENAME.M 16 | % 17 | % GPMODEL2MFILE(GP,ID,'FILENAME') converts the model with numeric 18 | % identifier ID in the population to a function in FILENAME.M 19 | % 20 | % GPMODEL2MFILE(GP,ID,'FILENAME',USEALIAS) does the same but uses 21 | % variable aliases (if defined in gp.nodes.inputs.names) if USEALIAS is 22 | % TRUE. 23 | % 24 | % GPMODEL2MFILE(GP,GPMODEL) operates on the GPMODEL struct representing a 25 | % multigene regression model, i.e. the struct returned by the functions 26 | % GPMODEL2STRUCT or GENES2GPMODEL. 27 | % 28 | % Remarks: 29 | % 30 | % This function is designed for use with multigene symbolic regression 31 | % models created with the REGRESSMULTI_FITFUN fitness function (or 32 | % variant thereof having a file name beginning 'REGRESSMULTI'). 33 | % 34 | % Note: 35 | % 36 | % Requires Symbolic Math Toolbox. 37 | % 38 | % Copyright (c) 2009-2015 Dominic Searson 39 | % 40 | % GPTIPS 2 41 | % 42 | % See also GPMODELGENES2MFILE, GPMODEL2SYM, GPMODELREPORT, 43 | % GPMODEL2STRUCT, GPMODEL2FUNC 44 | 45 | if nargin < 2 46 | disp('Usage is GPMODEL2MFILE(GP,ID) where ID is the numeric identifier of the desired model'); 47 | disp('or GPMODEL2MFILE(GP,''BEST'') to convert the best model of run'); 48 | disp('or GPMODEL2MFILE(GP,''VALBEST'') to convert the best validation model of run.'); 49 | disp('or GPMODEL2MFILE(GP,''TESTBEST'') to convert the best test model of run.'); 50 | return; 51 | end 52 | 53 | if ~strncmpi('regressmulti',func2str(gp.fitness.fitfun),12) 54 | error('GPMODEL2MFILE may only be used for multigene symbolic regression problems.'); 55 | end 56 | 57 | if nargin < 3 || isempty(filename) 58 | filename = 'gpmodel'; 59 | end 60 | 61 | if nargin < 4 || isempty(useAlias) 62 | useAlias = false; 63 | end 64 | 65 | if ~ischar(filename) 66 | error('The filename parameter must be a string.'); 67 | end 68 | 69 | if gp.info.toolbox.symbolic() 70 | 71 | %simplify and get sym object for overall model 72 | exprSym = gpmodel2sym(gp,ID,false,useAlias); 73 | 74 | if isempty(exprSym) 75 | error('Not a valid model ID or model selector'); 76 | end 77 | 78 | %vectorise the whole multigene model and extract to string 79 | vectorisedModelStr = ['ypred=' vectorize(exprSym)]; 80 | 81 | %replace x1,x2,x3 etc with x(:,1), x(:,2), x(:,3) etc. 82 | pat='x(\d+)'; 83 | vectorisedModelStr = regexprep(vectorisedModelStr,pat,'x(:,$1)'); 84 | 85 | if strcmp(filename(end-1:end),'.m') 86 | filename = filename(1:end-2); 87 | end 88 | 89 | %open file and write m file header etc 90 | fid = fopen([filename '.m'],'w+'); 91 | 92 | fprintf(fid,['function ypred=' filename '(x)']); 93 | fprintf(fid,'\n'); 94 | fprintf(fid,['%%' upper(filename) ... 95 | ' This model file was automatically generated by the GPTIPS 2 function gpmodel2mfile on ' datestr(now)]); 96 | fprintf(fid,'\n\n'); 97 | 98 | % If ADFs are used, write them as well 99 | if gp.nodes.adf.use 100 | for k=1:length(gp.nodes.adf.name) 101 | fprintf(fid,[gp.nodes.adf.name{k} '=' gp.nodes.adf.eval{k} ';']); 102 | fprintf(fid,'\n'); 103 | end 104 | fprintf(fid,'\n'); 105 | end 106 | 107 | %write model 108 | fprintf(fid,[vectorisedModelStr ';']); 109 | fprintf(fid,'\n'); 110 | fclose(fid); 111 | disp(['Model sucessfully written to ' filename '.m']); 112 | else 113 | error('The Symbolic Math Toolbox is required to use this function.'); 114 | end 115 | -------------------------------------------------------------------------------- /conversion/gpmodel2sym.m: -------------------------------------------------------------------------------- 1 | function [symObj,cellsymObj] = gpmodel2sym(gp,ID,fastMode,useAlias) 2 | %GPMODEL2SYM Create a simplified Symbolic Math object for a multigene symbolic regression model. 3 | % 4 | % SYMOBJ = GPMODEL2SYM(GP,ID) simplifies and gets the symbolic regression 5 | % model SYMOBJ with mumeric identifier ID in the GPTIPS datastructure GP. 6 | % 7 | % SYMOBJ = GPMODEL2SYM(GP,'best') gets the best model of the run (as 8 | % evaluated on the training data). 9 | % 10 | % SYMOBJ = GPMODEL2SYM(GP,'valbest') gets the model that performed best 11 | % on the validation data (if this data exists). 12 | % 13 | % SYMOBJ = GPMODEL2SYM(GP,'testbest') gets the model that performed best 14 | % on the test data (if this data exists). 15 | % 16 | % [SYMOBJ,SYMGENES] = GPMODEL2SYM(GP,ID) gets the overall symbolic model 17 | % SYMOBJ as well as a cell row array of symbolic objects SYMGENES 18 | % containing the individual genes (and bias term) that comprise the 19 | % overall model. The 1st element of SYMGENES is the bias term. The 20 | % (n+1)th element of SYMGENES is the nth gene of the model. 21 | % 22 | % [SYMOBJ,SYMGENES] = GPMODEL2SYM(GP,ID,FASTMODE) does the same but uses 23 | % a slightly faster simplification method (MuPAD symbolic:simplify 24 | % instead of symbolic:Simplify) 25 | % 26 | % [SYMOBJ,SYMGENES] = GPMODEL2SYM(GP,ID,FASTMODE,USEALIAS) does the same 27 | % but uses aliased variable names (if they exist in 28 | % gp.nodes.inputs.names) in the returned SYM objects when USEALIAS is 29 | % TRUE (default = FALSE) 30 | % 31 | % SYMOBJ = GPMODEL2SYM(GP,GPMODEL) operates on the GPMODEL struct 32 | % representing a multigene regression model, i.e. the struct returned by 33 | % the functions GPMODEL2STRUCT or GENES2GPMODEL. 34 | % 35 | % Remarks: 36 | % 37 | % As of GPTIPS 2, SYM objects are created using the full default 38 | % precision of the Symbolic Math toolbox and numbers are represented 39 | % (except for ERCs) by a ratio of 2 large integers. E.g. consider the 40 | % following model SYM. 41 | % 42 | % (5589981434161623*x53)/(8796093022208*x30*(x35 + x46)) 43 | % 44 | % To display SYM objects to N significant digits, use the VPA and CHAR 45 | % commands, e.g. 46 | % 47 | % VPA(SYMOBJ,N) 48 | % 49 | % For instance, using VPA(SYMOBJ,4) on the above example gives: 50 | % 51 | % (635.5*x53)/(x30*(x35 + x46)) 52 | % 53 | % You can also use PRETTY(VPA(SYMOBJ,4)) to get a more nicely formatted 54 | % display of the model SYM object (this is similar to the command line 55 | % output of the GPPRETTY function). This seems glitchy in some versions 56 | % of MATLAB however. 57 | % 58 | % Further remarks: 59 | % 60 | % This assumes each population member is a multigene regression model, 61 | % i.e. created using the fitness function REGRESSMULTI_FITFUN - or a 62 | % variant thereof with filename beginning REGRESSMULTI. 63 | % 64 | % (c) Dominic Searson 2009-2015 65 | % 66 | % GPTIPS 2 67 | % 68 | % See also GPPRETTY, GPMODEL2MFILE, GPMODELREPORT, GPMODEL2STRUCT, 69 | % GPMODEL2FUNC 70 | 71 | symObj = [];cellsymObj = []; 72 | 73 | if ~gp.info.toolbox.symbolic() 74 | error('The Symbolic Math Toolbox is required to use this function.'); 75 | end 76 | 77 | if nargin < 2 78 | disp('Basic usage is SYMOBJ = GPMODEL2SYM(GP,ID)'); 79 | disp('or SYMOBJ = GPMODEL2SYM(GP,''best'')'); 80 | disp('or SYMOBJ = GPMODEL2SYM(GP,''valbest'')'); 81 | return; 82 | end 83 | 84 | if nargin < 3|| isempty(fastMode) 85 | fastMode = false; 86 | end 87 | 88 | if nargin < 4|| isempty(useAlias) 89 | useAlias = false; 90 | end 91 | 92 | if isnumeric(ID) && ( ID < 1 || ID > numel(gp.pop) ) 93 | error('The supplied numerical model indentifier ID is not valid.'); 94 | end 95 | 96 | if ~strncmpi('regressmulti', func2str(gp.fitness.fitfun),12) 97 | error('GPMODEL2SYM may only be used for multigene symbolic regression problems.'); 98 | end 99 | 100 | try 101 | [symObj, cellsymObj] = gppretty(gp,ID,[],true,fastMode,useAlias); 102 | catch 103 | error(['Could not get symbolic object(s) for this model. ' lasterr]); 104 | end -------------------------------------------------------------------------------- /conversion/gpmodelgenes2mfile.m: -------------------------------------------------------------------------------- 1 | function gpmodelgenes2mfile(gp,ID,filename,useAlias) 2 | %GPMODELGENES2MFILE Converts individual genes of a multigene symbolic regression model to a standalone M file. 3 | % 4 | % GPMODELGENES2MFILE(GP,'best') converts the "best" model, as evaluated 5 | % on the training data, to a standalone function in GPGENESMODEL.M 6 | % 7 | % GPMODELGENES2MFILE(GP,'best','FILENAME') converts the 'best' model, as 8 | % evaluated on the training data, to a standalone function in FILENAME.M. 9 | % 10 | % GPMODELGENES2MFILE(GP,'valbest','FILENAME') converts the 'best' model, 11 | % as evaluated on the validation data, to a function in FILENAME.M 12 | % 13 | % GPMODELGENES2MFILE(GP,'testbest','FILENAME') converts the 'best' model, 14 | % as evaluated on the test data, to a function in FILENAME.M 15 | % 16 | % GPMODELGENES2MFILE(GP,ID,'FILENAME') converts the model with numeric 17 | % identifier ID in the population to a function in FILENAME.M 18 | % 19 | % GPMODELGENES2MFILE(GP,ID,'FILENAME',USEALIAS) does the same but uses 20 | % variable aliases (if defined in gp.nodes.inputs.names) if USEALIAS is 21 | % TRUE. 22 | % 23 | % GPMODELGENES2MFILE(GP,GPMODEL) operates on the GPMODEL struct 24 | % representing a multigene regression model, i.e. the struct returned by 25 | % the functions GPMODEL2STRUCT or GENES2GPMODEL. 26 | % 27 | % Remarks: 28 | % 29 | % This function is designed for use with multigene symbolic models 30 | % created with the REGRESSMULTI_FITFUN fitness function (or variant 31 | % thereof with fitness function filename beginning REGRESSMULTI). The 32 | % difference between this and GPMODEL2MFILE is that within the model file 33 | % generated the indivdiual outputs of the separate genes are exported as 34 | % the variable YGENES. 35 | % 36 | % Note: 37 | % 38 | % Requires Symbolic Math Toolbox. 39 | % 40 | % Copyright (c) 2009-2015 Dominic Searson 41 | % 42 | % GPTIPS 2 43 | % 44 | % See also GPMODEL2MFILE, GPMODEL2SYM, GPMODEL2STRUCT, GPMODEL2FUNC 45 | 46 | if nargin < 2 47 | disp('Usage is GPMODELGENES2MFILE(GP,IND) where IND is the population index of the desired individual'); 48 | disp('or GPMODELGENES2MFILE(GP,''BEST'') to convert the best individual of run'); 49 | disp('or GPMODELGENES2MFILE(GP,''VALBEST'') to convert the best validation individual of run.'); 50 | return; 51 | end 52 | 53 | if nargin<3 || isempty(filename) 54 | filename = 'gpgenesmodel'; 55 | end 56 | 57 | if nargin<4 || isempty(useAlias) 58 | useAlias = false; 59 | end 60 | 61 | if ~ischar(filename) 62 | error('The filename parameter must be a string.'); 63 | end 64 | 65 | if gp.info.toolbox.symbolic() 66 | 67 | %get model data 68 | disp('Simplifying model...'); 69 | 70 | [~,genes_sym] = gppretty(gp,ID,false,true,false,useAlias); 71 | 72 | pat='x(\d+)'; 73 | numGenes = numel(genes_sym); 74 | vectorModelStr = cell(numGenes,1); 75 | 76 | %loop through the individual genes and vectorise each separately (in 77 | %contrast with gpmodel2mfile which combines genes before simplication 78 | %and vectorisation) 79 | for i=1:numGenes 80 | 81 | %vectorise the whole multigene model and extract to string 82 | vectorModelStr{i}=['ygenes(:,' num2str(i) ')=' vectorize(genes_sym{i})]; 83 | 84 | %replace x1,x2,x3 etc with x(:,1), x(:,2), x(:,3) etc. 85 | vectorModelStr{i} = regexprep(vectorModelStr{i},pat,'x(:,$1)'); 86 | end 87 | 88 | if strcmp(filename(end-1:end),'.m') 89 | filename = filename(1:end-2); 90 | end 91 | 92 | %now open file and write m file header etc 93 | fid = fopen([filename '.m'],'w+'); 94 | 95 | fprintf(fid,['function [ypred,ygenes]=' filename '(x)']); 96 | fprintf(fid,'\n'); 97 | fprintf(fid,['%%' upper(filename) ' This model file was automatically generated by the GPTIPS 2 function gpmodelgenes2mfile on ' datestr(now)]); 98 | fprintf(fid,'\n\n'); 99 | 100 | %set up ygenes variable 101 | fprintf(fid, ['\nygenes = zeros(size(x,1),' num2str(numGenes) ');\n']); 102 | 103 | %print line to evaluate each gene 104 | for i=1:numGenes 105 | fprintf(fid,[vectorModelStr{i} ';']); 106 | fprintf(fid,'\n'); 107 | end 108 | 109 | %print line to get overall prediction ypred 110 | fprintf(fid, '\nypred = sum(ygenes,2);'); 111 | fclose(fid); 112 | 113 | disp(['Model sucessfully written to ' filename '.m']); 114 | else 115 | error('The Symbolic Math Toolbox is required to use this function.'); 116 | end -------------------------------------------------------------------------------- /conversion/processOrgChartJS.m: -------------------------------------------------------------------------------- 1 | function processOrgChartJS(fid,gp,gpmodel) 2 | %PROCESSORGCHARTJS Writes Google org chart JavaScript for genes to an existing HTML file. 3 | % 4 | % PROCESSORGCHARTJS(FID,GP,GPMODEL) writes the JavaScript for the 5 | % multigene regression model structure GPMODEL to the file handle FID 6 | % (which should be open for writing to and should be closed manually 7 | % after a call to this function). 8 | % 9 | % Copyright (c) 2009-2015 Dominic Searson 10 | % 11 | % GPTIPS 2 12 | % 13 | % See also GPTREESTRUCTURE, GPMODELREPORT, DRAWTREES 14 | 15 | %loop through genes and populate a data table for each 16 | for n=1:gpmodel.genes.num_genes 17 | treestr = gpmodel.genes.geneStrs{n}; 18 | treestruct = gptreestructure(gp,treestr); 19 | numNodes = length(treestruct); 20 | 21 | %set up data table to hold node info 22 | fprintf(fid,'\n'); 40 | end -------------------------------------------------------------------------------- /core/Contents.m: -------------------------------------------------------------------------------- 1 | % GPTIPS2 2 | % Symbolic Data Mining for MATLAB evolved 3 | % (c) Aleksei Tepljakov 2017-... 4 | % (c) Dominic Searson 2009-2015 5 | % 6 | % Files 7 | % bootsample - Get an index vector to sample with replacement a data matrix X. 8 | % comparemodelsREC - Graphical REC performance curves for between 1 and 5 multigene models. 9 | % crossover - Sub-tree crossover of encoded tree expressions to produce 2 new ones. 10 | % displaystats - Displays run stats periodically. 11 | % drawtrees - Draws the tree structure(s) of an individual in a web browser. 12 | % evalfitness - Calls the user specified fitness function. 13 | % evalfitness_par - Calls the user specified fitness function (parallel version). 14 | % extract - Extract a subtree from an encoded tree expression. 15 | % genebrowser - Visually analyse unique genes in a population and identify horizontal bloat. 16 | % genefilter - Removes highly correlated genes from a unique GENES struct. 17 | % genes2gpmodel - Create a data structure representing a multigene symbolic regression model from the specified gene list. 18 | % getcomplexity - Returns the expressional complexity of an encoded tree or a cell array of trees. 19 | % getdepth - Returns the tree depth of an encoded tree expression. 20 | % getnumnodes - Returns the number of nodes in an encoded tree expression or the total node count for a cell array of expressions. 21 | % gp_2d_mesh - Creates new training matrix containing pairwise values of all x1 and x2. 22 | % gpcheck - Perform pre-run error checks. 23 | % gpdefaults - Initialises the GPTIPS struct by creating default parameter values. 24 | % gpfinalise - Finalises a run. 25 | % gpinit - Initialises a run. 26 | % gpinitparallel - Initialise the Parallel Computing Toolbox. 27 | % gpmodelfilter - Object to filter a population of multigene symbolic regression models. 28 | % gpmodelreport - Generate an HTML report on the specified multigene regression model. 29 | % gpmodelvars - Display the frequency of input variables present in the specified model. 30 | % gppopvars - Display frequency of the input variables present in models in the population. 31 | % gppretty - Simplify and prettify a multigene symbolic regression model. 32 | % gprandom - Sets random number generator seed according to system clock or user seed. 33 | % gpreformat - Reformats encoded trees so that the Symbolic Math toolbox can process them properly. 34 | % gpsimplify - Simplify SYM expressions in a less glitchy way than SIMPLIFY or SIMPLE. 35 | % gpterminate - Check for early termination of run. 36 | % gptic - Updates the running time of this run. 37 | % gptoc - Updates the running time of this run. 38 | % gptoolboxcheck - Checks if certain toolboxes are installed and licensed. 39 | % gptreestructure - Create cell array containing tree structure connectivity and label information for an encoded tree expression. 40 | % initbuild - Generate an initial population of GP individuals. 41 | % kogene - Knock out genes from a cell array of tree expressions. 42 | % mergegp - Merges two GP population structs into a new one. 43 | % mutate - Mutate an encoded symbolic tree expression. 44 | % ndfsort_rank1 - Fast non dominated sorting algorithm for 2 objectives only - returns only rank 1 solutions. 45 | % paretoreport - Generate an HTML performance/complexity report on the Pareto front of the population. 46 | % picknode - Select a node (or nodes) of specified type from an encoded GP expression and return its position. 47 | % popbrowser - Visually browse complexity and performance characteristics of a population. 48 | % popbuild - Build next population of individuals. 49 | % pref2inf - Recursively extract arguments from a prefix expression and convert to infix where possible. 50 | % regressionErrorCharacteristic - Generates REC curve data using actual and predicted output vectors. 51 | % rungp - Runs GPTIPS 2 using the specified configuration file. 52 | % runtree - Run the fitness function on an individual in the current population. 53 | % scangenes - Scan a single multigene individual for all input variables and return a frequency vector. 54 | % selection - Selects an individual from the current population. 55 | % standaloneModelStats - Compute model performance stats for actual and predicted values. 56 | % summary - Plots basic summary information from a run. 57 | % tree2evalstr - Converts encoded tree expressions into math expressions that MATLAB can evaluate directly. 58 | % treegen - Generate a new encoded GP tree expression. 59 | % uniquegenes - Returns a GENES structure containing the unique genes in a population. 60 | % updatestats - Update run statistics. 61 | -------------------------------------------------------------------------------- /core/assignadf.m: -------------------------------------------------------------------------------- 1 | function gp = assignadf(gp) 2 | %ASSIGNADF Create ADF function handles in the caller workspace 3 | 4 | % Get both function names and the corresponding evaluation symbols 5 | funNames = gp.nodes.adf.name; 6 | funSymbols = gp.nodes.adf.eval; 7 | 8 | % Assign in caller's workspace 9 | for k=1:length(funNames) 10 | assignin('caller', funNames{k}, str2func(funSymbols{k})); 11 | end 12 | 13 | end 14 | 15 | -------------------------------------------------------------------------------- /core/bootsample.m: -------------------------------------------------------------------------------- 1 | function xind = bootsample(x,sampleSize) 2 | %BOOTSAMPLE Get an index vector to sample with replacement a data matrix X. 3 | % 4 | % XIND = BOOTSAMPLE(X,SAMPLESIZE) samples the data matrix X to return the 5 | % row indices XIND of X. 6 | % 7 | % Copyright (c) 2009-2015 Dominic Searson 8 | % 9 | % GPTIPS 2 10 | 11 | xSize = size(x,1); 12 | 13 | %generate the required sample indices and sample 14 | xind = ceil(xSize*rand(sampleSize,1)); 15 | 16 | 17 | -------------------------------------------------------------------------------- /core/comparemodelsREC.m: -------------------------------------------------------------------------------- 1 | function comparemodelsREC(gp,modelList,dataset,showBest,showValBest,showTestBest) 2 | %COMPAREMODELSREC Graphical REC performance curves for between 1 and 5 multigene models. 3 | % 4 | % COMPAREMODELSREC(GP,MODELLIST,DATASET,SHOWBEST,SHOWVALBEST,SHOWTESTBEST) 5 | % generates a graphical REC (regression error characteristic) comparison 6 | % of multigene regression models in GP with the numeric IDs in the row 7 | % vector MODELLIST. DATASET is either 'train','test' or 'val' and setting 8 | % SHOWBEST to TRUE also shows the best model in GP (as evaluated on the 9 | % training data) and setting SHOWVALBEST to TRUE shows the best model of 10 | % the run as evaluated on the validation data (if it exists). Similarly, 11 | % setting SHOWTESTBEST to TRUE shows the best model as evaluated on the 12 | % test data (if it exists). 13 | % 14 | % Examples: 15 | % 16 | % COMPAREMODELSREC(GP,[10 19 12]) shows REC curves for models 10, 19 and 17 | % 12 on the training data. 18 | % 19 | % COMPAREMODELSREC(GP,[10 19 12],'test',TRUE) shows REC curves for 20 | % models 10, 19 and 12 on the test data and also shows the 'best' model 21 | % of the run (as evaluated on the training data). 22 | % 23 | % Remarks: 24 | % 25 | % REC curves are similar to receiever operating characteristic (ROC) 26 | % curves for classifiers. The REC curve for a model shows the proportion 27 | % of data (on the Y axis, as a fraction between 0 and 1) that was 28 | % predicted with an accuracy equal to or better than the absolute error 29 | % on the corresponding X axis. Better models lie above and to the left 30 | % of worse models. 31 | % 32 | % Based on the method outlined in "Regression Error Characteristic 33 | % Curves", Jinbo Bi & Kristin P. Bennett, Proceedings of the Twentieth 34 | % International Conference on Machine Learning (ICML-2003), Washington 35 | % DC, 2003. 36 | % 37 | % Copyright (c) 2009-2015 Dominic Searson 38 | % 39 | % GPTIPS 2 40 | % 41 | % See also POPBROWSER, GPMODELREPORT, REGRESSIONERRORCHARACTERISTIC 42 | 43 | if nargin < 5 || isempty(showValBest) 44 | showValBest = false; 45 | end 46 | 47 | if nargin < 4 || isempty(showBest) 48 | showBest = false; 49 | end 50 | 51 | if nargin < 5 || isempty(showTestBest) 52 | showTestBest = false; 53 | end 54 | 55 | if nargin < 3 56 | dataset = 'train'; 57 | end 58 | 59 | if nargin < 2 || isempty(modelList) 60 | disp('Usage is COMPAREMODELSREC(GP,MODELLIST)'); 61 | return; 62 | end 63 | 64 | if ~strcmpi(func2str(gp.fitness.fitfun),'regressmulti_fitfun'); 65 | error('This function is only for use on multigene regression models.'); 66 | end 67 | 68 | if isempty(dataset) 69 | dataset = 'train'; 70 | end 71 | 72 | if strcmpi(dataset,'train') 73 | dset = 1; 74 | 75 | elseif strcmpi(dataset,'test') 76 | 77 | if ~isfield(gp.userdata,'xtest') 78 | error('Cannot find testing data.'); 79 | end 80 | dset = 2; 81 | 82 | elseif strcmpi(dataset,'val') 83 | 84 | if ~isfield(gp.userdata,'xval') 85 | error('There is no validation data.'); 86 | end 87 | dset = 3; 88 | 89 | else 90 | error('The dataset must be ''train'', ''test'' or ''val''.'); 91 | end 92 | 93 | if ~isnumeric(modelList) || size(modelList,1) > 1 94 | error('Supplied model list must be a row vector of model indices.'); 95 | end 96 | 97 | numModels = numel(modelList); 98 | 99 | if numModels > 5 100 | error('Maximum number of models to compare is 5.'); 101 | end 102 | 103 | plotdata = struct; 104 | 105 | for i=1:numModels 106 | model = gpmodel2struct(gp,modelList(i),false,false,false); 107 | if ~model.valid 108 | error(['Model ' num2str(modelList(i)) ' in the supplied list is invalid because: ' model.invalidReason]); 109 | end 110 | 111 | if dset == 1 112 | [xdata,ydata,xref,yref] = regressionErrorCharacteristic(gp.userdata.ytrain,model.train.ypred); 113 | elseif dset == 2 114 | [xdata,ydata,xref,yref] = regressionErrorCharacteristic(gp.userdata.ytest,model.test.ypred); 115 | else 116 | [xdata,ydata,xref,yref] = regressionErrorCharacteristic(gp.userdata.yval,model.val.ypred); 117 | end 118 | 119 | plotdata.xref = xref; 120 | plotdata.yref = yref; 121 | plotdata.model(i).xdata = xdata; 122 | plotdata.model(i).ydata = ydata; 123 | end 124 | 125 | %plot 126 | pcol = 'bgrcy'; 127 | recFig = figure('visible','off','name','GPTIPS 2 - REC model comparison','numbertitle','off'); 128 | 129 | plot(plotdata.model(1).xdata,plotdata.model(1).ydata,'b','LineWidth',2); 130 | hold on; 131 | legstr = cell(0); 132 | legstr{1} = ['Model ID: ' num2str(modelList(1))]; 133 | 134 | pre2014b = verLessThan('matlab','8.4'); %R2014b (HG2) 135 | 136 | for i=2:numModels 137 | if pre2014b 138 | plot(plotdata.model(i).xdata,plotdata.model(i).ydata,'LineWidth',2,'Color',pcol(i)); 139 | else 140 | plot(plotdata.model(i).xdata,plotdata.model(i).ydata,'LineWidth',2); 141 | end 142 | legstr{i} = ['Model ID: ' num2str(modelList(i))]; 143 | end 144 | 145 | xlabel('Absolute error'); 146 | ylabel('Accuracy'); 147 | grid on; 148 | 149 | if ~isempty(gp.userdata.name) 150 | dstr = [' Data set: ' gp.userdata.name]; 151 | else 152 | dstr = ''; 153 | end 154 | 155 | if dset == 1 156 | title(['REC model comparison. Training data.' dstr],'interpreter','none','fontWeight','bold'); 157 | elseif dset == 2 158 | title(['REC model comparison. Test data.' dstr],'interpreter','none','fontWeight','bold'); 159 | else 160 | title(['REC model comparison. Validation data.' dstr],'interpreter','none','fontWeight','bold'); 161 | end 162 | 163 | %add best and valbest if specified 164 | if showBest 165 | model = gpmodel2struct(gp,'best'); 166 | 167 | if dset == 1 168 | [xdata,ydata] = regressionErrorCharacteristic(gp.userdata.ytrain,model.train.ypred); 169 | elseif dset == 2 170 | [xdata,ydata] = regressionErrorCharacteristic(gp.userdata.ytest,model.test.ypred); 171 | else 172 | [xdata,ydata] = regressionErrorCharacteristic(gp.userdata.yval,model.val.ypred); 173 | end 174 | 175 | plot(xdata,ydata,'k','LineWidth',2); 176 | legstr = horzcat(legstr,'''Best'' model'); 177 | end 178 | 179 | if showValBest && isfield(gp.userdata,'xval') && ~isempty(gp.userdata.xval) 180 | model = gpmodel2struct(gp,'valbest'); 181 | 182 | if dset == 1 183 | [xdata,ydata] = regressionErrorCharacteristic(gp.userdata.ytrain,model.train.ypred); 184 | elseif dset == 2 185 | [xdata,ydata] = regressionErrorCharacteristic(gp.userdata.ytest,model.test.ypred); 186 | else 187 | [xdata,ydata] = regressionErrorCharacteristic(gp.userdata.yval,model.val.ypred); 188 | end 189 | 190 | plot(xdata,ydata,'m','LineWidth',2); 191 | legstr = horzcat(legstr,'''Valbest'' model'); 192 | end 193 | 194 | if showTestBest && isfield(gp.userdata,'xtest') && ~isempty(gp.userdata.xtest) 195 | model = gpmodel2struct(gp,'testbest'); 196 | 197 | if dset == 1 198 | [xdata,ydata] = regressionErrorCharacteristic(gp.userdata.ytrain,model.train.ypred); 199 | elseif dset == 2 200 | [xdata,ydata] = regressionErrorCharacteristic(gp.userdata.ytest,model.test.ypred); 201 | else 202 | [xdata,ydata] = regressionErrorCharacteristic(gp.userdata.yval,model.val.ypred); 203 | end 204 | 205 | plot(xdata,ydata,'g','LineWidth',2); 206 | legstr = horzcat(legstr,'''Testbest'' model'); 207 | end 208 | 209 | legend(legstr);legend boxoff; 210 | hold off; 211 | set(recFig,'visible','on'); 212 | -------------------------------------------------------------------------------- /core/crossover.m: -------------------------------------------------------------------------------- 1 | function [son,daughter] = crossover(mum,dad,gp) 2 | %CROSSOVER Sub-tree crossover of encoded tree expressions to produce 2 new ones. 3 | % 4 | % [SON,DAUGHTER] = CROSSOVER(MUM,DAD,GP) uses standard subtree crossover 5 | % on the expressions MUM and DAD to produce the offspring expressions SON 6 | % and DAUGHTER. 7 | % 8 | % Copyright (c) 2009-2015 Dominic Searson 9 | % 10 | % GPTIPS 2 11 | % 12 | % See also MUTATE 13 | 14 | %select random crossover nodes in mum and dad expressions 15 | m_position = picknode(mum,0,gp); 16 | d_position = picknode(dad,0,gp); 17 | 18 | %extract main and subtree expressions 19 | [m_main,m_sub] = extract(m_position,mum); 20 | [d_main,d_sub] = extract(d_position,dad); 21 | 22 | %combine to form 2 new GP trees 23 | daughter = strrep(m_main,'$',d_sub); 24 | son = strrep(d_main,'$',m_sub); -------------------------------------------------------------------------------- /core/displaystats.m: -------------------------------------------------------------------------------- 1 | function displaystats(gp) 2 | %DISPLAYSTATS Displays run stats periodically. 3 | % 4 | % DISPLAYSTATS(GP) updates the screen with run stats at the interval 5 | % specified in GP.RUN.VERBOSE 6 | % 7 | % Copyright (c) 2009-2015 Dominic Searson 8 | % 9 | % GPTIPS 2 10 | % 11 | % See also UPDATESTATS 12 | 13 | %only display info if required 14 | if ~gp.runcontrol.verbose || gp.runcontrol.quiet || ... 15 | mod(gp.state.count-1,gp.runcontrol.verbose) 16 | return 17 | end 18 | 19 | gen = gp.state.count - 1; 20 | 21 | %multiple independent run count display 22 | if gp.runcontrol.runs > 1 && gen == 0 23 | disp(['Run ' int2str(gp.state.run) ' of ' int2str(gp.runcontrol.runs)]); 24 | end 25 | 26 | disp(['Generation ' num2str(gen)]); 27 | disp(['Best fitness: ' num2str(gp.results.best.fitness)]); 28 | disp(['Mean fitness: ' num2str(gp.state.meanfitness)]); 29 | 30 | if gp.fitness.complexityMeasure 31 | disp(['Best complexity: ' num2str(gp.results.best.complexity)]); 32 | else 33 | disp(['Best node count: ' num2str(gp.results.best.nodecount)]); 34 | end 35 | 36 | %display inputs in "best training" individual, if enabled. 37 | if gp.runcontrol.showBestInputs 38 | hitvec = gpmodelvars(gp,'best'); 39 | inputs = num2str(find(hitvec)); 40 | inputs = strtrim(cellstr(inputs)); 41 | pat = '(\d+)'; 42 | inputs = regexprep(inputs,pat,'x$1 '); %note: trailing space essential 43 | inputs = [inputs{:}]; % converts char cell array to a single string 44 | disp(['Inputs in best individual: ' inputs]); 45 | end 46 | 47 | %display inputs in "best validation" individual, if enabled. 48 | if gp.runcontrol.showValBestInputs && isfield(gp.userdata,'xval') ... 49 | && ~isempty(gp.userdata.xval) && isfield(gp.results,'valbest') 50 | hitvec = gpmodelvars(gp,'valbest'); 51 | inputs = num2str(find(hitvec)); 52 | inputs = strtrim(cellstr(inputs)); 53 | pat='(\d+)'; 54 | inputs = regexprep(inputs,pat,'x$1 '); %note: trailing space essential 55 | inputs = [inputs{:}]; % converts char cell array to a single string 56 | disp(['Inputs in best validation individual: ' inputs]); 57 | end 58 | 59 | disp(' '); -------------------------------------------------------------------------------- /core/evalfitness.m: -------------------------------------------------------------------------------- 1 | function gp = evalfitness(gp) 2 | %EVALFITNESS Calls the user specified fitness function. 3 | % 4 | % GP = EVALFITNESS(GP) evaluates the the fitnesses of individuals stored 5 | % in the GP structure and updates various other fields of GP accordingly. 6 | % 7 | % Copyright (c) 2009-2015 Dominic Searson 8 | % 9 | % GPTIPS 2 10 | % 11 | % See also TREE2EVALSTR, EVALFITNESS_PAR 12 | 13 | %check parallel mode. 14 | if gp.runcontrol.parallel.enable && gp.runcontrol.parallel.ok 15 | gp = evalfitness_par(gp); 16 | return; 17 | 18 | %regular version 19 | else 20 | 21 | % If ADFs are used, make them known to this function 22 | if gp.nodes.adf.use, assignadf(gp); end 23 | 24 | for i = 1:gp.runcontrol.pop_size 25 | 26 | gp.state.current_individual = i; 27 | 28 | %retrieve values if cached 29 | if gp.runcontrol.usecache && gp.fitness.cache.isKey(i) 30 | cache = gp.fitness.cache(i); 31 | gp.fitness.complexity(i,1) = cache.complexity; 32 | gp.fitness.values(i,1) = cache.value; 33 | gp.fitness.returnvalues{i,1} = cache.returnvalues; 34 | 35 | else 36 | %preprocess cell array of string expressions into a form that 37 | %Matlab can evaluate 38 | evalstr = tree2evalstr(gp.pop{i},gp); 39 | 40 | %store complexity of individual (either number of nodes or tree 41 | %expressional complexity) 42 | if gp.fitness.complexityMeasure 43 | gp.fitness.complexity(i,1) = getcomplexity(gp.pop{i}); 44 | else 45 | gp.fitness.complexity(i,1) = getnumnodes(gp.pop{i}); 46 | end 47 | 48 | [fitness,gp] = feval(gp.fitness.fitfun,evalstr,gp); 49 | gp.fitness.values(i) = fitness; 50 | 51 | end 52 | end 53 | end -------------------------------------------------------------------------------- /core/evalfitness_par.m: -------------------------------------------------------------------------------- 1 | function gp = evalfitness_par(gp) 2 | %EVALFITNESS_PAR Calls the user specified fitness function (parallel version). 3 | % 4 | % GP = EVALFITNESS_PAR(GP) evaluates the the fitnesses of individuals 5 | % stored in the GP structure and updates various other fields of GP 6 | % accordingly. 7 | % 8 | % This has the same functionality as EVALFITNESS but makes use of the 9 | % Mathworks Parallel Computing Toolbox to distribute the fitness 10 | % computations across multiple cores. 11 | % 12 | % Copyright (c) 2009-2015 Dominic Searson 13 | % 14 | % GPTIPS 2 15 | % 16 | % See also TREE2EVALSTR, EVALFITNESS 17 | 18 | popSize = gp.runcontrol.pop_size; 19 | evalstrs = cell(popSize,1); 20 | complexityMeasure = gp.fitness.complexityMeasure; 21 | complexities = zeros(popSize,1); 22 | fitfun = gp.fitness.fitfun; 23 | fitvals = zeros(popSize,1); 24 | returnvals = cell(popSize,1); 25 | 26 | if gp.runcontrol.usecache 27 | usecache = true; 28 | else 29 | usecache = false; 30 | end 31 | 32 | % If ADFs are used, make them known to this function 33 | if gp.nodes.adf.use, assignadf(gp); end 34 | 35 | parfor i = 1:popSize; 36 | 37 | %assign copy of gp inside parfor loop to a temp struct 38 | tempgp = gp; 39 | 40 | %update state of temp variable to index of the individual that is about to 41 | %be evaluated 42 | tempgp.state.current_individual = i; 43 | 44 | if usecache && tempgp.fitness.cache.isKey(i) 45 | 46 | cache = tempgp.fitness.cache(i); 47 | complexities(i) = cache.complexity; 48 | fitvals(i) = cache.value; 49 | returnvals{i} = cache.returnvalues; 50 | 51 | else 52 | 53 | %process coded trees into evaluable matlab expressions 54 | evalstrs{i} = tree2evalstr(tempgp.pop{i},gp); 55 | 56 | %store complexity of individual (either number of nodes or tree 57 | % "expressional complexity"). 58 | if complexityMeasure 59 | complexities(i) = getcomplexity(tempgp.pop{i}); 60 | else 61 | complexities(i) = getnumnodes(tempgp.pop{i}); 62 | end 63 | 64 | %evaluate gp individual using fitness function 65 | [fitness,tempgp] = feval(fitfun,evalstrs{i},tempgp); 66 | returnvals{i} = tempgp.fitness.returnvalues{i}; 67 | fitvals(i) = fitness; 68 | end 69 | end %end of parfor loop 70 | 71 | %attach returned values to original GP structure 72 | gp.fitness.values = fitvals; 73 | gp.fitness.returnvalues = returnvals; 74 | gp.fitness.complexity = complexities; 75 | gp.state.current_individual = popSize; -------------------------------------------------------------------------------- /core/extract.m: -------------------------------------------------------------------------------- 1 | function [mainTree,subTree] = extract(index,parentExpr) 2 | %EXTRACT Extract a subtree from an encoded tree expression. 3 | % 4 | % [MAINTREE,SUBTREE] = EXTRACT(INDEX,PARENTEXPR) 5 | % 6 | % Input args: 7 | % PARENTEXPR (the parent string expression) 8 | % INDEX (the index in PARENTEXPR of the root node of the subtree to be 9 | % extracted) 10 | % 11 | % Output args: 12 | % MAINTREE (PARENTEXPR with the removed subtree replaced by '$') 13 | % SUBTREE (the extracted subtree) 14 | % 15 | % Copyright (c) 2009-2015 Dominic Searson 16 | % 17 | % GPTIPS 2 18 | % 19 | % See also PICKNODE, GETDEPTH, GETCOMPLEXITY 20 | 21 | cnode = parentExpr(index); 22 | iplus = index + 1; 23 | iminus = index - 1; 24 | endpos = numel(parentExpr); 25 | 26 | if cnode == 'x' %extracting an input terminal (x1, x2 etc.) 27 | 28 | section = parentExpr(iplus:endpos); 29 | inp_comma_ind = strfind(section,','); 30 | inp_brack_ind = strfind(section,')'); 31 | 32 | %if none found then string must consist of single input 33 | if isempty(inp_brack_ind) && isempty(inp_comma_ind) 34 | mainTree = '$'; 35 | subTree = parentExpr; 36 | else 37 | inp_ind = sort([inp_brack_ind inp_comma_ind]); 38 | final_ind = inp_ind(1) + index; 39 | subTree = parentExpr(index:final_ind-1); 40 | mainTree = [parentExpr(1:iminus) '$' parentExpr(final_ind:endpos)]; 41 | end 42 | 43 | return 44 | 45 | elseif cnode == '[' %ERC 46 | 47 | cl_sbr = strfind(parentExpr(iplus:endpos),']'); 48 | final_ind = cl_sbr(1)+index; 49 | subTree = parentExpr(index:final_ind); 50 | mainTree = [parentExpr(1:iminus) '$' parentExpr(final_ind+1:endpos)]; 51 | return 52 | 53 | elseif cnode=='?' % ERC token 54 | subTree = cnode; 55 | mainTree = parentExpr; 56 | mainTree(index) = '$'; 57 | return 58 | 59 | elseif cnode=='#' % PRC token 60 | subTree = cnode; 61 | mainTree = parentExpr; 62 | mainTree(index) = '$'; 63 | return 64 | 65 | else %otherwise extract a tree with a function node as root 66 | 67 | %subtree defined when number open brackets=number of closed brackets 68 | search_seg = parentExpr(index:endpos); 69 | 70 | %get indices of open brackets 71 | op = strfind(search_seg,'('); 72 | cl = strfind(search_seg,')'); 73 | 74 | %compare indices to determine point where num_open=num_closed 75 | tr_op = op(2:end); 76 | l_tr_op = numel(tr_op); 77 | 78 | hibvec = tr_op-cl(1:l_tr_op); 79 | 80 | cl_ind = find(hibvec > 0); 81 | 82 | if isempty(cl_ind) 83 | j = cl(numel(op)); 84 | else 85 | cl_ind = cl_ind(1); 86 | j = cl(cl_ind); 87 | end 88 | 89 | subTree = search_seg(1:j); 90 | mainTree = [parentExpr(1:iminus) '$' parentExpr(j+index:endpos)]; 91 | return 92 | end -------------------------------------------------------------------------------- /core/genefilter.m: -------------------------------------------------------------------------------- 1 | function genes = genefilter(genes,corrThresh) 2 | %GENEFILTER Removes highly correlated genes from a unique GENES struct. 3 | % 4 | % GENESFILT = GENEFILTER(GENES) returns a unique genes structure 5 | % GENESFILT with the most complex gene of any highly correlated pair of 6 | % genes removed from the GENES structure. Hence, the most complex of any 7 | % pair of genes whose outputs (as evaluated on the training data) have a 8 | % correlation of >= (1 - CORRTHESH) is removed. 9 | % 10 | % This has the effect of reducing the size of the 'gene space' whilst 11 | % leaving the predictive ability of the remaining genes largely 12 | % unchanged. The default correlation threshold of CORRTHRESH is 0.001. 13 | % 14 | % GENESFILT = GENEFILTER(GENES,CORRTHRESH) does the same but CORRTHRESH 15 | % is the user supplied correlation threshold. 16 | % 17 | % Remarks: 18 | % 19 | % Prior to using this function, the GENES structure should be generated 20 | % using the UNIQUEGENES function. 21 | % 22 | % Copyright (c) 2009-2015 Dominic Searson 23 | % 24 | % GPTIPS 2 25 | % 26 | % See also UNIQUEGENES, GENEBROWSER, GPMODELFILTER 27 | 28 | if nargin < 1 29 | error('Basic usage is GENEFILTER(GENES)'); 30 | end 31 | 32 | if ~gptoolboxcheck 33 | error('The Symbolic Math Toolbox is required to use this function.'); 34 | end 35 | 36 | %set the default correlation threshold 37 | %(i.e. when pairs of genes that have a correlation > (1-threshold) 38 | %then the most complex gene in the pair is removed. 39 | if nargin < 2 || isempty(corrThresh) 40 | corrThresh = 0.001; 41 | end 42 | 43 | disp(['Filtering genes with correlation threshold ' num2str(corrThresh)]); 44 | 45 | %get abs. correlation matrix of gene outputs 46 | c = abs(corrcoef(genes.geneOutputsTrain)); 47 | 48 | %loop pairwise through genes and flag correlates for removal. 49 | %when correlated then always remove the more complex of the genes. 50 | removals = false(genes.numUniqueGenes,1); 51 | 52 | for i=1:genes.numUniqueGenes 53 | 54 | %if already removed then skip 55 | if removals(i) 56 | continue; 57 | end 58 | 59 | %flag matches for ith gene 60 | matches = find( c(:,i) >= (1 - corrThresh) ); 61 | 62 | %remove the identity match 63 | ident = (matches == i); 64 | matches(ident) = []; 65 | 66 | %if no other matches then jump to next 67 | if isempty(matches) 68 | continue; 69 | end 70 | 71 | %loop through matches for ith gene and get complexities for each match 72 | identComplexity = genes.complexity(i); 73 | for j=1:numel(matches) 74 | 75 | complexity = genes.complexity(matches(j)); 76 | 77 | %flag for removal jth gene if complexity of jth is higher 78 | %flag for removal ith gene otherwise 79 | if identComplexity <= complexity 80 | removals(matches(j)) = true; 81 | else 82 | removals(i) = true; 83 | end 84 | 85 | end 86 | 87 | end 88 | 89 | %tidy up GENES structure 90 | genes.filtered = corrThresh; 91 | genes.numUniqueGenes = genes.numUniqueGenes-sum(removals); 92 | remain = ~removals; 93 | genes.rtrain = genes.rtrain(remain); 94 | genes.complexity = genes.complexity(remain); 95 | genes.uniqueGenesSym = genes.uniqueGenesSym(remain); 96 | genes.uniqueGenesCoded = genes.uniqueGenesCoded(remain); 97 | genes.uniqueGenesDecoded = genes.uniqueGenesDecoded(remain); 98 | genes.geneOutputsTrain = genes.geneOutputsTrain(:,remain); 99 | 100 | if ~isempty(genes.geneOutputsTest) 101 | genes.geneOutputsTest = genes.geneOutputsTest(:,remain); 102 | end 103 | 104 | if ~isempty(genes.geneOutputsVal) 105 | genes.geneOutputsVal = genes.geneOutputsVal(:,remain); 106 | end 107 | 108 | disp(['Number of genes removed : ' int2str(sum(removals))]); 109 | disp(['Number of genes retained: ' int2str(sum(remain))]); 110 | 111 | genes.source = 'genefilter'; -------------------------------------------------------------------------------- /core/genes2gpmodel.m: -------------------------------------------------------------------------------- 1 | function gpmodel = genes2gpmodel(gp,genes,uniqueGeneList,tbxStats,createSyms,modelStruc,fastSymMode) 2 | %GENES2GPMODEL Create a data structure representing a multigene symbolic regression model from the specified gene list. 3 | % 4 | % GPMODEL = GENES2GPMODEL(GP,GENES,UNIQUEGENELIST) creates a multigene 5 | % model GPMODEL (functionally identical to that obtained using the 6 | % GPMODEL2STRUCT function) using the list of genes in UNIQUEGENELIST 7 | % where the unique genes are contained within the GENES data structure. 8 | % The list is comprised of numeric model IDs, e.g. [77 22 99 1]. 9 | % 10 | % The 'optimal' gene weighting coefficients are computed using a least 11 | % squares procedure on the training data. 12 | % 13 | % Note: 14 | % 15 | % The GENES data structure must be first obtained using the UNIQUEGENES 16 | % function. See GPMODEL2STRUCT for more details about the GPMODEL data 17 | % structure returned. 18 | % 19 | % Copyright (c) 2009-2015 Dominic Searson 20 | % 21 | % GPTIPS 2 22 | % 23 | % See also UNIQUEGENES, GPMODEL2STRUCT, GENEBROWSER, GENEFILTER 24 | 25 | if nargin < 3 26 | error('Usage is GENES2GPMODEL(GP,GENES,UNIQUEGENELIST)'); 27 | end 28 | 29 | if nargin < 6 || isempty(fastSymMode) 30 | fastSymMode = false; 31 | end 32 | 33 | %scan model genes for input frequency, depth, complexity? (default = yes). 34 | %(setting to false is a bit quicker but doesn't compute the structural info) 35 | if nargin < 6 || isempty(modelStruc) 36 | modelStruc = true; 37 | end 38 | 39 | %create symbolic objects of overall model and genes? (default = yes). 40 | %(setting to false is significantly quicker but you don't get the symbolic 41 | %math object for each gene) 42 | if nargin < 5 || isempty(createSyms) 43 | createSyms = true; 44 | end 45 | 46 | %compute statistics toolbox stats? (default = no). 47 | if nargin < 4 || isempty(tbxStats) 48 | tbxStats = false; 49 | end 50 | 51 | %remove duplicates in list 52 | uniqueGeneList = unique(uniqueGeneList); 53 | noGene = (uniqueGeneList == 0); 54 | uniqueGeneList(noGene) = []; 55 | 56 | %get gene encodings from genes structure 57 | geneStrs = genes.uniqueGenesCoded(uniqueGeneList)'; 58 | 59 | %create gpmodel struct 60 | gpmodel = gpmodel2struct(gp,geneStrs,tbxStats,createSyms,modelStruc,fastSymMode); 61 | gpmodel.source = 'genes2gpmodel'; -------------------------------------------------------------------------------- /core/getcomplexity.m: -------------------------------------------------------------------------------- 1 | function comp = getcomplexity(expr) 2 | %GETCOMPLEXITY Returns the expressional complexity of an encoded tree or a cell array of trees. 3 | % 4 | % COMP = GETCOMPLEXITY(EXPR) finds the expressional complexity COMP of 5 | % the encoded GPTIPS expression EXPR or cell array of expressions EXPR. 6 | % 7 | % Remarks: 8 | % 9 | % "Expressional Complexity" is defined as the sum of nodes of all 10 | % sub-trees within a tree (i.e. as defined by Guido F. Smits and Mark 11 | % Kotanchek) in "Pareto-front exploitation in symbolic regression", pp. 12 | % 283 - 299, Genetic Programming Theory and Practice II, 2004. 13 | % 14 | % Copyright (c) 2009-2015 Dominic Searson 15 | % 16 | % GPTIPS 2 17 | % 18 | % See also GETDEPTH, GETNUMNODES 19 | 20 | if isa(expr,'char') 21 | 22 | comp = getcomp(expr); 23 | return; 24 | 25 | elseif iscell(expr) 26 | 27 | numexpr = numel(expr); 28 | 29 | if numexpr < 1 30 | error('Cell array must contain at least one valid symbolic expression'); 31 | else 32 | comp = 0; 33 | for i=1:numexpr 34 | comp = comp + getcomp(expr{i}); 35 | end 36 | end 37 | else 38 | error('Illegal argument'); 39 | end 40 | 41 | function comp = getcomp(expr) 42 | %GETCOMP Get complexity from a single tree. 43 | 44 | %count leaf nodes (excluding zero arity functions). 45 | %these have complexity measure of 1. 46 | leafcount = numel(strfind(expr,'x')) + numel(strfind(expr,'[')); 47 | 48 | %iterate across remaining function 49 | %nodes, getting the number of nodes of each subtree found 50 | ind = strfind(expr,'('); 51 | if isempty(ind) 52 | comp = leafcount; 53 | return; 54 | end 55 | 56 | ind = ind - 1; 57 | 58 | comp = 0; 59 | for i = 1:length(ind) 60 | [~,subtree] = extract(ind(i),expr); 61 | comp = comp + getnn(subtree); 62 | end 63 | 64 | comp = comp + leafcount; 65 | 66 | function numnodes = getnn(expr) 67 | %GETNN get number of nodes from a single tree 68 | 69 | %number of nodes = number of open brackets + number of inputs + number of 70 | %ERC constants 71 | open_br = strfind(expr,'('); 72 | open_sq_br = strfind(expr,'['); 73 | inps = strfind(expr,'x'); 74 | 75 | num_open = numel(open_br); 76 | num_const = numel(open_sq_br); 77 | num_inps = numel(inps); 78 | numnodes = num_open + num_const + num_inps; -------------------------------------------------------------------------------- /core/getdepth.m: -------------------------------------------------------------------------------- 1 | function depth=getdepth(expr) 2 | %GETDEPTH Returns the tree depth of an encoded tree expression. 3 | % 4 | % DEPTH = GETDEPTH(EXPR) returns the DEPTH of the tree expression EXPR. 5 | % 6 | % Remarks: 7 | % 8 | % A single node is a tree of depth 1. 9 | % 10 | % Copyright (c) 2009-2015 Dominic Searson 11 | % 12 | % GPTIPS 2 13 | % 14 | % See also GETNUMNODES, GETCOMPLEXITY 15 | 16 | %check to make sure expression is not actually a cell array of size 1 17 | if iscell(expr) 18 | expr = expr{1}; 19 | end 20 | 21 | %workaround for arity zero functions, which are represented by f() 22 | %string replace '()' with the empty string 23 | expr = strrep(expr,'()',''); 24 | open_br = strfind(expr,'('); 25 | close_br = strfind(expr,')'); 26 | num_open = numel(open_br); 27 | 28 | if ~num_open %i.e. a single node 29 | depth = 1; 30 | return 31 | elseif num_open == 1 32 | depth = 2; 33 | return 34 | else 35 | %depth = max consecutive number of open brackets+1 36 | br_vec = zeros(1,numel(expr)); 37 | br_vec(open_br) = 1; 38 | br_vec(close_br) = -1; 39 | depth = max(cumsum(br_vec)) + 1; 40 | end -------------------------------------------------------------------------------- /core/getnumnodes.m: -------------------------------------------------------------------------------- 1 | function numnodes = getnumnodes(expr) 2 | %GETNUMNODES Returns the number of nodes in an encoded tree expression or the total node count for a cell array of expressions. 3 | % 4 | % NUMNODES = GETNUMNODES(EXPR) finds the number of nodes in the GP 5 | % expression EXPR or cell array of expressions EXPR. 6 | % 7 | % Copyright (c) 2009-2015 Dominic Searson 8 | % 9 | % GPTIPS 2 10 | % 11 | % See also GETDEPTH, GETCOMPLEXITY 12 | 13 | if isa(expr,'char') 14 | 15 | numnodes = getnn(expr); 16 | return; 17 | 18 | elseif iscell(expr) 19 | 20 | numexpr = numel(expr); 21 | 22 | if numexpr < 1 23 | error('Cell array must contain at least one valid symbolic expression'); 24 | else 25 | numnodes = 0; 26 | for i=1:numexpr 27 | numnodes = numnodes + getnn(expr{i}); 28 | end 29 | end 30 | else 31 | error('Illegal argument'); 32 | end 33 | 34 | function numnodes=getnn(expr) 35 | %GETNN get numnodes from a single symbolic string 36 | 37 | %number of nodes = number of open brackets +number of inputs +number of 38 | %constants 39 | open_br = strfind(expr,'('); 40 | open_sq_br = strfind(expr,'['); 41 | inps = strfind(expr,'x'); 42 | 43 | num_open = numel(open_br); 44 | num_const = numel(open_sq_br); 45 | num_inps = numel(inps); 46 | 47 | numnodes = num_open + num_const + num_inps; -------------------------------------------------------------------------------- /core/gp_2d_mesh.m: -------------------------------------------------------------------------------- 1 | function x = gp_2d_mesh(x1,x2) 2 | %GP_2D_MESH Creates new training matrix containing pairwise values of all x1 and x2. 3 | % 4 | % X = GP_2D_MESH(X1,X2) 5 | % 6 | % e.g. if X1 = [0 1]' and X2 = [5 6]' then 7 | % 8 | % X = [0 5 9 | % 0 6 10 | % 1 5 11 | % 1 6] 12 | % 13 | % Copyright (c) 2009-2015 Dominic Searson 14 | % 15 | % GPTIPS 2 16 | 17 | %create a training "mesh" over the range spanned by x1 and x2 18 | numx1 = length(x1); 19 | numx2 = length(x2); 20 | x=[]; 21 | 22 | for i = 1:numx1 23 | xtemp = ones(numx2,2); 24 | xtemp(:,1) = xtemp(:,1)*x1(i); 25 | xtemp(:,2) = x2; 26 | x = [x;xtemp]; 27 | end 28 | -------------------------------------------------------------------------------- /core/gp_toolboxlist.m: -------------------------------------------------------------------------------- 1 | function toolboxes = gp_toolboxlist() 2 | %GP_TOOLBOXLIST Generate a list of currently available MATLAB toolboxes 3 | 4 | v = ver; 5 | [toolboxes{1:length(v)}] = deal(v.Name); 6 | 7 | end 8 | 9 | -------------------------------------------------------------------------------- /core/gpinitparallel.m: -------------------------------------------------------------------------------- 1 | function gp = gpinitparallel(gp) 2 | %GPINITPARALLEL Initialise the Parallel Computing Toolbox. 3 | % 4 | % GP = GPINITPARALLEL(GP) initialises the Parallel Computing Toolbox if 5 | % it has been enabled and the Parallel Computing Toolbox license is 6 | % present. 7 | % 8 | % IMPORTANT: 9 | % 10 | % There is a known JVM bug in versions of MATLAB (all platforms) prior to 11 | % version R2013b (6.3). This causes a failure of the Parallel Computing 12 | % Toolbox in most cases. 13 | % 14 | % There is a fix/workaround for this here: 15 | % 16 | % http://www.mathworks.com/support/bugreports/919688 17 | % 18 | % Please apply this fix if you are using a version prior to R2013b. 19 | % 20 | % Copyright (c) 2009-2015 Dominic Searson 21 | % 22 | % GPTIPS 2 23 | % 24 | % See also GPINIT, GPTOOLBOXCHECK 25 | 26 | gp.runcontrol.parallel.ok = false; 27 | parToolbox = gp.info.toolbox.parallel(); 28 | 29 | if parToolbox && (gp.runcontrol.parallel.auto || gp.runcontrol.parallel.enable) 30 | mps = getMaxPoolSize; 31 | end 32 | 33 | %in auto mode - try to start parallel mode with auto settings but exit and 34 | %proceed in standard mode if no license 35 | if gp.runcontrol.parallel.auto 36 | if parToolbox 37 | gp.runcontrol.parallel.enable = true; 38 | gp.runcontrol.parallel.autosize = true; 39 | gp.runcontrol.parallel.numWorkers = mps; 40 | else 41 | gp.runcontrol.parallel.enable = false; 42 | return; 43 | end 44 | end 45 | 46 | if gp.runcontrol.parallel.enable 47 | 48 | if ~parToolbox 49 | error('You do not have a license for the Parallel Computing Toolbox or it is not installed. Please set gp.runcontrol.parallel.enable = false.'); 50 | end 51 | 52 | ps = getCurrentPoolSize; 53 | 54 | %check if pool is already open and of right size 55 | if (ps > 0) && ( (gp.runcontrol.parallel.autosize && (ps == mps) ) || ... 56 | (~gp.runcontrol.parallel.autosize && (ps == gp.runcontrol.parallel.numWorkers) ) ) 57 | gp.runcontrol.parallel.ok = true; 58 | gp.runcontrol.parallel.numWorkers = ps; 59 | 60 | if ~gp.runcontrol.quiet 61 | disp(' '); 62 | disp(['GPTIPS: proceeding in parallel mode with ' num2str(gp.runcontrol.parallel.numWorkers) ' workers.']); 63 | disp(' '); 64 | end 65 | 66 | else %otherwise stop then start correct size pool 67 | 68 | if ps > 0 69 | if ~gp.runcontrol.quiet 70 | disp(' '); 71 | disp('GPTIPS: stopping existing pool for parallel computation.'); 72 | disp(' '); 73 | end 74 | 75 | if verLessThan('matlab','8.3.0') 76 | matlabpool close; 77 | else 78 | delete(gcp('nocreate')); 79 | end 80 | end 81 | 82 | try %start new pool 83 | 84 | if ~gp.runcontrol.quiet 85 | disp(' '); 86 | disp('GPTIPS: attempting to start new pool for parallel computation.'); 87 | disp(' '); 88 | end 89 | 90 | if verLessThan('matlab','8.3.0') 91 | matlabpool close force local; 92 | eval(['matlabpool ' num2str(gp.runcontrol.parallel.numWorkers)]); 93 | else 94 | delete(gcp('nocreate')); 95 | 96 | if gp.runcontrol.parallel.autosize 97 | a = parpool; 98 | gp.runcontrol.parallel.numWorkers = a.NumWorkers; 99 | else 100 | parpool(gp.runcontrol.parallel.numWorkers); 101 | end 102 | 103 | end 104 | 105 | gp.runcontrol.parallel.ok = true; 106 | 107 | if ~gp.runcontrol.quiet 108 | disp(' '); 109 | disp(['GPTIPS: proceeding in parallel mode with ' num2str(gp.runcontrol.parallel.numWorkers) ' workers.']); 110 | disp(' '); 111 | end 112 | 113 | catch 114 | gp.runcontrol.parallel.ok = false; 115 | 116 | if ~gp.runcontrol.quiet 117 | warning('There was a problem starting GPTIPS in parallel computation mode.'); 118 | warning('To run GPTIPS in non-parallel mode set gp.runcontrol.parallel.enable = false in your config file.'); 119 | error(['GPTIPS: could not start pool for parallel computation because: ' lasterr]); 120 | end 121 | 122 | end 123 | end 124 | 125 | else 126 | gp.runcontrol.parallel.ok = false; 127 | end 128 | 129 | function x = getCurrentPoolSize 130 | %GETCURRENTPOOLSIZE Get the current parallel pool size. Workaround 131 | %function: calls either matlabpool('size') or pool_size hack depending on 132 | %MATLAB version 133 | 134 | if verLessThan('matlab', '7.7.0') 135 | x = pool_size; 136 | elseif verLessThan('matlab','8.3.0') 137 | x = matlabpool('size'); 138 | else 139 | pool = gcp('nocreate'); 140 | 141 | if isempty(pool) 142 | x = 0; 143 | else 144 | if pool.Connected 145 | x = pool.NumWorkers; 146 | else 147 | x = 0; 148 | end 149 | end 150 | 151 | end 152 | 153 | function x = getMaxPoolSize 154 | %getMaxPoolSize - try to get the max number of workers in a pool for the 155 | %default profile on the current machine 156 | 157 | if verLessThan('matlab','8.3.0') 158 | c = findResource('scheduler', 'configuration', defaultParallelConfig); 159 | x = c.ClusterSize; 160 | else 161 | c = parcluster(); 162 | x = c.NumWorkers; 163 | end 164 | 165 | function x = pool_size 166 | %POOL_SIZE - Mathworks hack to return size of current MATLABPOOL. Used for 167 | %versions prior to MATLAB 7.7 (R2008b) 168 | % 169 | % See: 170 | % http://www.mathworks.co.uk/support/solutions/en/data/1-5UDHQP/index.html 171 | 172 | session = com.mathworks.toolbox.distcomp.pmode.SessionFactory.getCurrentSession; 173 | if ~isempty( session ) && session.isSessionRunning() && session.isPoolManagerSession() 174 | client = distcomp.getInteractiveObject(); 175 | if strcmp( client.CurrentInteractiveType, 'matlabpool' ) 176 | x = session.getLabs().getNumLabs(); 177 | else 178 | x = 0; 179 | end 180 | else 181 | x = 0; 182 | end -------------------------------------------------------------------------------- /core/gpmodelvars.m: -------------------------------------------------------------------------------- 1 | function hvec=gpmodelvars(gp,ID) 2 | %GPMODELVARS Display the frequency of input variables present in the specified model. 3 | % 4 | % GPMODELVARS(GP,ID) displays input frequency for the model with numeric 5 | % identifier ID in the GPTIPS datastructure GP. 6 | % 7 | % GPMODELVARS(GP,'best') does the same for the 'best' model of the run 8 | % (as evaluated on the training data). 9 | % 10 | % GPMODELVARS(GP,'valbest') does the same for the model that performed 11 | % best on the validation data (if this data exists). 12 | % 13 | % GPMODELVARS(GP,'testbest') does the same for the model that performed 14 | % best on the test data (if this data exists). 15 | % 16 | % GPMODELVARS(GP,GPMODEL) operates on the GPMODEL struct representing a 17 | % multigene regression model, i.e. the struct returned by the functions 18 | % GPMODEL2STRUCT or GENES2GPMODEL. 19 | % 20 | % HITVEC = GPMODELVARS(GP,'valbest') returns a frequency vector and 21 | % suppresses graphical output. 22 | % 23 | % Copyright (c) 2009-2015 Dominic Searson 24 | % 25 | % GPTIPS 2 26 | % 27 | % See also GPPOPVARS 28 | 29 | hitvec = []; 30 | 31 | if nargin < 2 32 | disp('Usage is GPMODELVARS(GP,ID) where ID is the numeric identifier of the selected individual.'); 33 | disp('or GPMODELVARS(GP,''BEST'') to run the best individual of the run'); 34 | disp('or GPMODELVARS(GP,''VALBEST'') to run the best validation individual of the run.'); 35 | disp('or GPMODELVARS(GP,''TESTBEST'') to run the best test individual of the final population.'); 36 | return; 37 | end 38 | 39 | if nargout < 1 40 | graph = true; 41 | else 42 | graph = false; 43 | end 44 | 45 | if isfield(gp.userdata,'name') && ~isempty(gp.userdata.name) 46 | setname = ['Data: ' gp.userdata.name]; 47 | else 48 | setname = ''; 49 | end 50 | 51 | %get a specified individual from the population and scan it for input 52 | %variables 53 | if isnumeric(ID) 54 | 55 | if ID < 1 || ID > numel(gp.pop) 56 | disp('Invalid value of supplied numerical identifier ID.'); 57 | return; 58 | end 59 | 60 | model = gp.pop{ID}; 61 | title_str = ['Input frequency in individual with ID: ' int2str(ID)]; 62 | 63 | elseif ischar(ID) && strcmpi(ID,'best') 64 | 65 | model = gp.results.best.individual; 66 | title_str = 'Input frequency in best individual.'; 67 | 68 | elseif ischar(ID) && strcmpi(ID,'valbest') 69 | 70 | % check that validation data is present 71 | if ~isfield(gp.results,'valbest') 72 | error('No validation data was found. Try GPMODELVARS(GP,''BEST'') instead.'); 73 | end 74 | 75 | model = gp.results.valbest.individual; 76 | title_str = 'Input frequency in best validation individual.'; 77 | 78 | elseif ischar(ID) && strcmpi(ID,'testbest') 79 | 80 | % check that test data is present 81 | if ~isfield(gp.results,'testbest') 82 | error('No test data was found. Try GPMODELVARS(GP,''BEST'') instead.'); 83 | end 84 | 85 | model = gp.results.testbest.individual; 86 | title_str = 'Input frequency in best test individual.'; 87 | 88 | %for a supplied cell array of encoded tree expressions 89 | elseif iscell(ID) 90 | model = ID; 91 | title_str = ('Input frequency in user model.'); 92 | 93 | elseif isstruct(ID) && isfield(ID,'genes') 94 | model = ID.genes.geneStrs; 95 | title_str = ('Input frequency in user model.'); 96 | 97 | else 98 | error('Illegal argument'); 99 | end 100 | 101 | %perform scan 102 | numx = gp.nodes.inputs.num_inp; 103 | hitvec = scangenes(model,numx); 104 | 105 | % plot results as barchart 106 | if graph 107 | h = figure; 108 | set(h,'name','GPTIPS 2 single model input frequency','numbertitle','off'); 109 | a = gca; 110 | bar(a,hitvec);shading faceted; 111 | 112 | if ~verLessThan('matlab','8.4') %R2014b 113 | a.Children.FaceColor = [0 0.45 0.74]; 114 | a.Children.BaseLine.Visible = 'off'; 115 | else 116 | b = get(a,'Children'); 117 | set(b,'FaceColor',[0 0.45 0.74],'ShowBaseLine','off'); 118 | end 119 | axis tight; 120 | xlabel('Input'); 121 | ylabel('Input frequency'); 122 | title({setname, title_str},'Fontweight','bold'); 123 | grid on; 124 | end 125 | 126 | hitvec = hitvec'; 127 | 128 | if nargout > 0 129 | hvec = hitvec; 130 | end -------------------------------------------------------------------------------- /core/gppopvars.m: -------------------------------------------------------------------------------- 1 | function hvec = gppopvars(gp,R2thresh,complexityThresh) 2 | %GPPOPVARS Display frequency of the input variables present in models in the population. 3 | % 4 | % For multigene symbolic regression problems: 5 | % 6 | % GPPOPVARS(GP) displays an input frequency barchart for all models in GP 7 | % with R2 >= 0.6 where R2 is the coefficient of the determination for the 8 | % models on the training data. 9 | % 10 | % GPPOPVARS(GP,R2THRESH) displays an input frequency barchart for all 11 | % models with R2 >= R2THRESH. R2THRESH must be > 0 and <= 1. 12 | % 13 | % GPPOPVARS(GP,R2THRESH,COMPLEXITYTHRESH) displays an input frequency 14 | % barchart for all models with R2 >= R2THRESH which have complexities 15 | % (node count or expressional complexity) lower or equal to 16 | % COMPLEXITYTHRESH. 17 | % 18 | % HITVEC = GPPOPVARS(GP,R2THRESH,COMPLEXITYTHRESH) returns a frequency 19 | % vector and suppresses graphical output. 20 | % 21 | % For other (non multigene regression) problems: 22 | % 23 | % HITVEC = GPPOPVARS(GP) displays and returns an input frequency vector 24 | % for the best 5% of the population. 25 | % 26 | % HITVEC = GPPOPVARS(GP,TOP_FRAC) displays and returns an input frequency 27 | % vector for the best fraction TOP_FRAC of the population. TOP_FRAC must 28 | % be > 0 and <= 1. 29 | % 30 | % Remarks: 31 | % 32 | % Assumes lower fitnesses are better. 33 | % 34 | % Copyright (c) 2009-2015 Dominic Searson 35 | % 36 | % GPTIPS 2 37 | % 38 | % See also GPMODELVARS 39 | 40 | if nargin < 1 41 | disp('Basic usage is GPPOPVARS(GP).'); 42 | 43 | if nargout > 0 44 | hvec = []; 45 | end 46 | 47 | return 48 | end 49 | 50 | if nargout < 1 51 | graph = true; 52 | else 53 | graph = false; 54 | end 55 | 56 | if nargin < 3 57 | complexityThresh = Inf; 58 | end 59 | 60 | %set threshold defaults 61 | if nargin < 2 62 | R2thresh = 0.6; 63 | top_frac = 0.05; 64 | end 65 | 66 | if R2thresh <= 0 || R2thresh > 1 67 | error('Supplied R^2 threshold must be greater than zero and less than or equal to 1.'); 68 | end 69 | 70 | if complexityThresh < 1 71 | error('Complexity/nodecount threshold must be greater than zero.'); 72 | end 73 | 74 | numx = gp.nodes.inputs.num_inp; 75 | hitvec = zeros(1,numx); 76 | 77 | if isfield(gp.userdata,'name') && ~isempty(gp.userdata.name) 78 | setname = ['Data: ' gp.userdata.name]; 79 | else 80 | setname = ''; 81 | end 82 | 83 | %for mg symbolic regression models 84 | if strncmpi('regressmulti',func2str(gp.fitness.fitfun),12) 85 | 86 | %get indices of models satisfying constraints 87 | inds = (gp.fitness.r2train >= R2thresh) & (gp.fitness.complexity <= complexityThresh); 88 | numModels = sum(inds); 89 | 90 | if numModels == 0 91 | disp(['No models matching the supplied criteria (R2 >= ' num2str(R2thresh)... 92 | ', complexity <= ' num2str(complexityThresh) ') were found.']); 93 | hitvec = []; 94 | return 95 | end 96 | 97 | inds2scan = find(inds); 98 | num2scan = numel(inds2scan); 99 | 100 | if num2scan > 200 101 | disp('Please wait, scanning models ...'); 102 | end 103 | 104 | for i=1:num2scan 105 | model = gp.pop{i}; 106 | hitvec = hitvec + scangenes(model,numx); 107 | end 108 | 109 | if ~isinf(complexityThresh) 110 | if gp.fitness.complexityMeasure 111 | endstr = [' and complexity <= ' num2str(complexityThresh) '.']; 112 | else 113 | endstr = [' and node count <= ' num2str(complexityThresh) '.']; 114 | end 115 | else 116 | endstr = '.'; 117 | end 118 | titlestr = {setname,['Input frequency in ' num2str(numModels) ' models (from ' num2str(gp.runcontrol.pop_size) ') with R^2 >= ' num2str(R2thresh) endstr]}; 119 | 120 | else %non mg symbolic regression 121 | 122 | num2process = ceil(top_frac*gp.runcontrol.pop_size); 123 | 124 | [~,sort_ind] = sort(gp.fitness.values); 125 | sort_ind = sort_ind(1:num2process); 126 | 127 | if ~gp.fitness.minimisation 128 | sort_ind = flipud(sort_ind); 129 | end 130 | 131 | %loop through population 132 | for i=1:num2process 133 | model = gp.pop{sort_ind(i)}; 134 | hitvec = hitvec + scangenes(model,numx); 135 | end 136 | titlestr= ['Input frequency - best ' num2str(100*top_frac) '% of whole population.']; 137 | end 138 | 139 | % plot results as barchart 140 | if graph 141 | 142 | h = figure; 143 | set(h,'name','GPTIPS 2 Population input frequency','numbertitle','off'); 144 | a = gca; 145 | bar(a,hitvec); 146 | if ~verLessThan('matlab','8.4') %R2014b 147 | a.Children.FaceColor = [0 0.45 0.74]; 148 | a.Children.BaseLine.Visible = 'off'; 149 | else 150 | b = get(a,'Children'); 151 | set(b,'FaceColor',[0 0.45 0.74],'ShowBaseLine','off'); 152 | end 153 | grid on; 154 | xlabel('Input'); 155 | ylabel('Input frequency'); 156 | title(titlestr,'FontWeight','bold'); 157 | hitvec=hitvec'; 158 | else 159 | h = []; 160 | end 161 | 162 | if nargout > 0 163 | hvec = hitvec'; 164 | end 165 | -------------------------------------------------------------------------------- /core/gprandom.m: -------------------------------------------------------------------------------- 1 | function gp = gprandom(gp,seed) 2 | %GPRANDOM Sets random number generator seed according to system clock or user seed. 3 | % 4 | % (c) Dominic Searson 2009-2015 5 | % 6 | % GPTIPS 2 7 | 8 | if nargin < 2 || isempty(seed) 9 | seed = sum(100*clock); 10 | end 11 | 12 | gp.info.PRNGseed = seed; 13 | 14 | if verLessThan('matlab', '7.7.0'); 15 | rand('twister', gp.info.PRNGseed); 16 | elseif verLessThan('matlab','8.1') 17 | s = RandStream.create('mt19937ar','seed',gp.info.PRNGseed); 18 | RandStream.setDefaultStream(s); 19 | else 20 | s = RandStream.create('mt19937ar','seed',gp.info.PRNGseed); 21 | RandStream.setGlobalStream(s); 22 | end -------------------------------------------------------------------------------- /core/gpsimplify.m: -------------------------------------------------------------------------------- 1 | function symOut = gpsimplify(symIn,numSteps,verOld,fastMode,seconds) 2 | %GPSIMPLIFY Simplify SYM expressions in a less glitchy way than SIMPLIFY or SIMPLE. 3 | % 4 | % SYMOUT = GPSIMPLIFY(SYMIN, NUMSTEPS, VEROLD, FAST) simplifies SYMIN 5 | % using a maximum of NUMSTEPS using the MuPAD engine directly if VEROLD 6 | % is FALSE and falling back to the standard MATLAB SIMPLIFY method is 7 | % VEROLD is TRUE. 8 | % 9 | % The slower (but 'better') MuPAD Engine symbolic:Simplify method is used 10 | % if FASTMODE is FALSE and the possibly faster symbolic:simplify method 11 | % is used if FASTMODE = TRUE. The default is FASTMODE = FALSE. 12 | % 13 | % According to Mathworks, the MuPAD symbolic:simplify method is "faster, 14 | % but less reliable and controllable" than the algorithm used by MuPAD 15 | % symbolic:Simplify. 16 | % 17 | % It is recommended that you use the functions GPPRETTY and GPMODEL2SYM 18 | % to simplify GPTIPS expressions, rather than calling this function 19 | % directly. 20 | % 21 | % Copyright (c) 2009-2015 Dominic Searson 22 | % 23 | % GPTIPS 2 24 | % 25 | % See also GPPRETTY, GPMODEL2SYM, GPMODEL2STRUCT, SYM/VPA 26 | 27 | if nargin < 4 || isempty(fastMode) 28 | fastMode = false; 29 | end 30 | 31 | if nargin < 5 || isempty(seconds) 32 | seconds = '0.1'; 33 | end 34 | 35 | try 36 | if verOld %for 2007 MATLAB, default to standard simplify method 37 | symOut = simplify(symIn); 38 | else 39 | %otherwise call Mupad symengine directly 40 | 41 | if fastMode 42 | %Note that according to: 43 | %http://www.mathworks.co.uk/help/symbolic/mupad_ug/use-general-simplification-functions.html 44 | %The MuPAD symbolic:simplify method "searches for a simpler 45 | %form by rewriting the terms of an expression." and "generally, 46 | %this method is faster, but less reliable and controllable than 47 | %the algorithm used by symbolic:Simplify." 48 | symOut = evalin(symengine,['symbolic:simplify(' char(symIn) ')']); 49 | else %and similarly, symbolic:Simplify "performs more extensive search 50 | %for a simple form of an expression. For some expressions, this 51 | %function can be slower, but more flexible and more powerful 52 | %than simplify." 53 | symOut = evalin(symengine,['symbolic:Simplify(' char(symIn) ',Seconds = ' num2str(seconds) ', Steps = ' num2str(numSteps) ')']); 54 | end 55 | end 56 | catch 57 | symOut = symIn; 58 | end -------------------------------------------------------------------------------- /core/gpsym.m: -------------------------------------------------------------------------------- 1 | function s = gpsym(gp, expr) 2 | %GPSYM Compatibility function for MATLAB pre-R2017b and post-R2018a 3 | %The GP structure must be known to GPSYM otherwise it will not be able 4 | %to differentiate between symbolic names found in the expressions 5 | 6 | % Known functions that should be excluded from becoming symbols 7 | known_fun = {'abs' 'acos' 'acosd' 'acosh' 'acot' 'acotd' 'acoth' 'acsc' ... 8 | 'acscd' 'acsch' 'adjoint' 'airy' 'and' 'angle' 'any' 'asec' 'asecd' ... 9 | 'asech' 'asin' 'asind' 'asinh' 'atan' 'atan2' 'atan2d' 'atand' ... 10 | 'atanh' 'bernoulli' 'bernstein' 'bernsteinMatrix' 'besselh' ... 11 | 'besseli' 'besselj' 'besselk' 'bessely' 'beta' 'cat' 'catalan' ... 12 | 'ceil' 'charpoly' 'chebyshevT' 'chebyshevU' 'checkUnits' 'chol' ... 13 | 'coeffs' 'collect' 'colspace' 'combine' 'compose' 'cond' 'conj' ... 14 | 'cos' 'cosd' 'cosh' 'coshint' 'cosint' 'cospi' 'cot' 'cotd' 'coth' ... 15 | 'csc' 'cscd' 'csch' 'ctranspose' 'cumprod' 'cumsum' 'curl' ... 16 | 'daeFunction' 'dawson' 'deg2rad' 'det' 'diag' 'diff' 'dilog' ... 17 | 'dirac' 'display' 'divergence' 'divisors' 'ei' 'eig' 'eliminate' ... 18 | 'ellipj' 'ellipke' 'ellipticCE' 'ellipticCK' 'ellipticCPi' ... 19 | 'ellipticE' 'ellipticF' 'ellipticK' 'ellipticNome' 'ellipticPi' ... 20 | 'eq' 'equationsToMatrix' 'erf' 'erfc' 'erfcinv' 'erfi' 'erfinv' ... 21 | 'euler' 'eulergamma' 'exp' 'expand' 'expint' 'expm' 'factor' ... 22 | 'factorial' 'factorIntegerPower' 'fibonacci' 'find' 'findSymType' ... 23 | 'finverse' 'fix' 'floor' 'fortran' 'fourier' 'frac' 'fresnelc' ... 24 | 'fresnels' 'functionalDerivative' 'funm' 'gamma' 'gammaln' 'gbasis' ... 25 | 'gcd' 'ge' 'gegenbauerC' 'gradient' 'gt' 'harmonic' 'has' ... 26 | 'heaviside' 'hermiteForm' 'hermiteH' 'hessian' 'horner' 'htrans' ... 27 | 'hurwitzZeta' 'hypergeom' 'hypot' 'ifourier' 'igamma' 'ihtrans' ... 28 | 'ilaplace' 'imag' 'in' 'incidenceMatrix' 'int' 'inv' 'isAlways' ... 29 | 'isequal' 'isequaln' 'ismember' 'isnan' 'isolate' 'iztrans' ... 30 | 'jacobiAM' 'jacobian' 'jacobiCD' 'jacobiCN' 'jacobiCS' 'jacobiDC' ... 31 | 'jacobiDN' 'jacobiDS' 'jacobiNC' 'jacobiND' 'jacobiNS' 'jacobiP' ... 32 | 'jacobiSC' 'jacobiSD' 'jacobiSN' 'jacobiZeta' 'jordan' 'kron' ... 33 | 'kroneckerDelta' 'kummerU' 'laguerreL' 'lambertw' 'laplace' ... 34 | 'laplacian' 'lcm' 'ldivide' 'le' 'legendreP' 'lhs' 'limit' ... 35 | 'linsolve' 'log' 'log10' 'log2' 'logint' 'logm' 'lt' 'lu' 'max' ... 36 | 'meijerG' 'min' 'minpoly' 'minus' 'mixedUnits' 'mldivide' 'mod' ... 37 | 'mpower' 'mrdivide' 'mtimes' 'nchoosek' 'ndims' 'ne' 'nextprime' ... 38 | 'nnz' 'nonzeros' 'norm' 'not' 'nthprime' 'nthroot' 'null' 'numden' ... 39 | 'numel' 'odeToVectorField' 'or' 'orth' 'pade' 'partfrac' 'permute' ... 40 | 'piecewise' 'pinv' 'plus' 'pochhammer' 'poles' 'polylog' ... 41 | 'polynomialDegree' 'polynomialReduce' 'potential' 'power' ... 42 | 'powermod' 'prevprime' 'prod' 'psi' 'qr' 'quorem' 'rad2deg' 'rank' ... 43 | 'rdivide' 'real' 'rectangularPulse' 'reduceDAEIndex' ... 44 | 'reduceDAEToODE' 'reduceDifferentialOrder' 'reduceRedundancies' ... 45 | 'rem' 'reshape' 'resultant' 'rewrite' 'rhs' 'root' 'round' 'rref' ... 46 | 'rsums' 'sec' 'secd' 'sech' 'series' 'sign' 'signIm' 'simplify' ... 47 | 'simplifyFraction' 'sin' 'sinc' 'sind' 'single' 'sinh' 'sinhint' ... 48 | 'sinint' 'sinpi' 'smithForm' 'sort' 'sqrt' 'sqrtm' 'ssinint' ... 49 | 'subexpr' 'subs' 'sum' 'svd' 'symprod' 'symsum' 'tan' 'tand' 'tanh' ... 50 | 'taylor' 'times' 'toeplitz' 'transpose' 'triangularPulse' 'tril' ... 51 | 'triu' 'uminus' 'uplus' 'vectorPotential' 'vertcat' 'vpa' ... 52 | 'vpaintegral' 'vpasolve' 'whittakerM' 'whittakerW' 'wrightOmega' ... 53 | 'xor' 'zeta' 'ztrans'}; 54 | 55 | % Here we need to take into account that expr may be numeric in which 56 | % case the str2sym function will fail. sym(), on the other hand, will 57 | % return a symbolic object while converting a double prec. number to a 58 | % fraction, if possible. 59 | % TODO: Possible bug hidden in here? Must test. 60 | if ~exist('str2sym', 'file') || isnumeric(expr) 61 | s = sym(expr); 62 | else 63 | 64 | funcs = [gp.nodes.functions.name gp.nodes.adf.syms]; % Combine all 65 | funcs = unique(funcs); % Remove repeating entries 66 | funcs = setdiff(funcs, known_fun); % Remove fundamental functions 67 | for k=1:numel(funcs) 68 | assignin('caller', funcs{k}, sym(funcs{k})); 69 | end 70 | 71 | % Finally, create the symbolic expression 72 | s = str2sym(expr); 73 | end 74 | 75 | end 76 | 77 | -------------------------------------------------------------------------------- /core/gpterminate.m: -------------------------------------------------------------------------------- 1 | function gp = gpterminate(gp) 2 | %GPTERMINATE Check for early termination of run. 3 | % 4 | % GP = GPTERMINATE(GP) checks if the GPTIPS run should be terminated at 5 | % the end of the current generation according to a fitness criterion or 6 | % run timeout. 7 | % 8 | % Copyright (c) 2009-2015 Dominic Searson 9 | % 10 | % GPTIPS 2 11 | % 12 | % See also GPFINALISE, GPTIC, GPTOC 13 | 14 | %check if fitness termination criterion met 15 | if gp.fitness.terminate 16 | 17 | if gp.fitness.minimisation 18 | if gp.results.best.fitness <= gp.fitness.terminate_value 19 | gp.state.terminate = true; 20 | gp.state.terminationReason = ['Fitness criterion acheived <= ' ... 21 | num2str(gp.fitness.terminate_value)]; 22 | end 23 | else 24 | if gp.results.best.fitness >= gp.fitness.terminate_value; 25 | gp.state.terminate = true; 26 | gp.state.terminationReason = ['Fitness criterion acheived >= ' ... 27 | num2str(gp.fitness.terminate_value)]; 28 | end 29 | end 30 | end 31 | 32 | %run timeout 33 | if ~gp.state.terminate && (gp.state.runTimeElapsed >= gp.runcontrol.timeout) 34 | gp.state.terminate = true; 35 | gp.state.terminationReason = ['Timeout >= ' num2str(gp.runcontrol.timeout) ' sec']; 36 | end -------------------------------------------------------------------------------- /core/gptic.m: -------------------------------------------------------------------------------- 1 | function gp = gptic(gp) 2 | %GPTIC Updates the running time of this run. 3 | % 4 | % GP = GPTIC(GP) begins the start timer for the current generation. 5 | % 6 | % Copyright (c) 2009-2015 Dominic Searson 7 | % 8 | % GPTIPS 2 9 | % 10 | % See also GPTOC 11 | 12 | gp.state.tic = tic; -------------------------------------------------------------------------------- /core/gptoc.m: -------------------------------------------------------------------------------- 1 | function gp = gptoc(gp) 2 | %GPTOC Updates the running time of this run. 3 | % 4 | % GP = GPTOC(GP) updates the running time for the current run. 5 | % 6 | % Copyright (c) 2014 Dominic Searson 7 | % 8 | % GPTIPS 2 9 | % 10 | % See also GPTIC 11 | 12 | gp.state.runTimeElapsed = gp.state.runTimeElapsed + toc(gp.state.tic); 13 | 14 | -------------------------------------------------------------------------------- /core/gptoolboxcheck.m: -------------------------------------------------------------------------------- 1 | function [symbolicResult, parallelResult, statsResult] = gptoolboxcheck 2 | %GPTOOLBOXCHECK Checks if certain toolboxes are installed and licensed. 3 | % 4 | % Checks for the Stats, Parallel Computing and Symbolic Math toolboxes. 5 | % 6 | % Copyright (c) 2009-2015 Dominic Searson 7 | % 8 | % GPTIPS 2 9 | % 10 | % See also GPINIT 11 | 12 | symbolicResult = @() (ismember('Symbolic Math Toolbox',gp_toolboxlist()) && license('test','symbolic_toolbox')); 13 | parallelResult = @() (ismember('Parallel Computing Toolbox',gp_toolboxlist()) && license('test','distrib_computing_toolbox')); 14 | statsResult = @() (( ismember('Statistics Toolbox',gp_toolboxlist()) || ismember('Statistics and Machine Learning Toolbox',gp_toolboxlist()) ) ... 15 | && license('test','statistics_toolbox')); 16 | 17 | -------------------------------------------------------------------------------- /core/gptreestructure.m: -------------------------------------------------------------------------------- 1 | function treestructure = gptreestructure(gp,expr) 2 | %GPTREESTRUCTURE Create cell array containing tree structure connectivity and label information for an encoded tree expression. 3 | % 4 | % TREESTRUCTURE = GPTREESTRUCTURE(GP,EXPR) on the encoded expression 5 | % string EXPR - e.g. g(a(m(x2),x1)) - returns the cell array 6 | % TREESTRUCTURE, each element of which contains a MATLAB structure 7 | % describing tree connectivity information for a node in EXPR. The ith 8 | % structure in TREESTRUCTURE describes the ith node in EXPR. Node 1 is 9 | % the root node (other nodes are numbered in an arbitrary but consistent 10 | % way). 11 | % 12 | % Copyright (c) 2009-2015 Dominic Searson 13 | % 14 | % GPTIPS 2 15 | % 16 | % See also DRAWTREES, GPMODELREPORT, TREEGEN 17 | 18 | %get a list of all nodes in expression, left to right 19 | nodes = picknode(expr,6,gp); 20 | numnodes = length(nodes); 21 | 22 | %construct cell array for holding info for each node 23 | treestructure = cell(1,numnodes); 24 | 25 | %proceeding from left to right, get the name of the function node and the 26 | %id of its parent and its own node id 27 | for i=1:numnodes 28 | 29 | node = struct; 30 | shortName = expr(nodes(i)); 31 | node.id = i; 32 | 33 | %lookup node in function table 34 | floc = strfind(gp.nodes.functions.afid,shortName); 35 | 36 | %if not empty then it is a function node 37 | if ~isempty(floc) 38 | node.name = lower(gp.nodes.functions.active_name_UC{floc}); 39 | else %otherwise it is a terminal or constant so extract it 40 | 41 | [~,node.name] = extract(nodes(i),expr); 42 | 43 | %if terminal then replace with lookup name, if it exists 44 | if node.name(1) == 'x' 45 | 46 | newnameInd = str2double(strrep(node.name,'x','')); 47 | if newnameInd <= numel(gp.nodes.inputs.names) && ~isempty(gp.nodes.inputs.names{newnameInd}) 48 | node.name = gp.nodes.inputs.names{newnameInd}; 49 | end 50 | 51 | if gp.info.toolbox.symbolic() 52 | node.name = HTMLequation(gp,node.name); 53 | end 54 | 55 | %if ERC then strip brackets and reduce digits 56 | elseif node.name(1) == '[' 57 | newname = strrep(node.name,'[',''); 58 | newname = strrep(newname,']',''); 59 | 60 | if length(newname) > 6 %clip long ERCs 61 | newname = newname(1:6); 62 | end 63 | node.name = newname; 64 | end 65 | end 66 | 67 | %if the id is 1 then it is root node and has no parent 68 | if node.id == 1 69 | node.parentId = []; 70 | 71 | else %otherwise find the parent id of this node 72 | 73 | %search left across string for opening '(' not matched by 74 | % a closing ')'. The node to the left of this is the parent node. 75 | substring = expr(1:nodes(i)-1); 76 | len = length(substring); 77 | 78 | numClosed = 0; 79 | for j=len:-1:1 80 | if substring(j) =='(' && numClosed == 0 81 | node.parentId = find(nodes==(j-1)); 82 | break 83 | elseif substring(j) == ')' 84 | numClosed = numClosed + 1; 85 | elseif substring(j) == '(' 86 | numClosed = numClosed - 1; 87 | end 88 | 89 | end 90 | end 91 | treestructure{1,i} = node; 92 | end -------------------------------------------------------------------------------- /core/initbuild.m: -------------------------------------------------------------------------------- 1 | function gp=initbuild(gp) 2 | %INITBUILD Generate an initial population of GP individuals. 3 | % 4 | % GP = INITBUILD(GP) creates an initial population using the parameters 5 | % in the structure GP. Various changes to fields of GP are made. 6 | % 7 | % Remarks: 8 | % 9 | % Each individual in the population can contain 1 or more separate trees 10 | % (if more than 1 then each tree is referred to as a gene of the 11 | % individual). This is set by the user in the config file in the field 12 | % GP.GENES.MULTIGENE. 13 | % 14 | % Each individual is a cell array. You can use cell addressing to 15 | % retrieve the individual genes. E.g. GP.POP{3}{2} will return the 2nd 16 | % gene in the third individual. 17 | % 18 | % Trees are (by default) constructed using a probabilistic version of 19 | % Koza's ramped 1/2 and 1/2 method. E.g. if maximum tree depth is 5 and 20 | % population is 100 then, on average, 20 will be generated at each depth 21 | % (1/2 using 'grow' and 1/2 using 'full'). 22 | % 23 | % Multiple copies of genes in an individual are disallowed when 24 | % individuals are created. There is, however, no such restriction imposed 25 | % on future generations. 26 | % 27 | % (c) Dominic Searson 2009-2015 28 | % 29 | % GPTIPS 2 30 | % 31 | % See also POPBUILD, TREEGEN 32 | 33 | % Extract temp variables from the gp structure 34 | popSize = gp.runcontrol.pop_size; 35 | maxNodes = gp.treedef.max_nodes; 36 | maxGenes = gp.genes.max_genes; 37 | 38 | %override any gene settings if using single gene gp 39 | if ~gp.genes.multigene 40 | maxGenes = 1; 41 | end 42 | 43 | %initialise vars 44 | gp.pop = cell(popSize,1); 45 | numGenes = 1; 46 | 47 | %building process 48 | for i=1:popSize %loop through population 49 | 50 | %randomly pick num of genes in individual 51 | if maxGenes > 1 52 | numGenes = ceil(rand*maxGenes); 53 | end 54 | 55 | individ = cell(1,(numGenes)); %construct empty individual 56 | 57 | for z = 1:numGenes %loop through genes in each individual and generate a tree for each 58 | 59 | %generate gene z and check that genes 1...z-1 are different 60 | geneLoopCounter = 0; 61 | while true 62 | 63 | geneLoopCounter = geneLoopCounter + 1; 64 | 65 | %generate a trial tree for gene z 66 | temp = treegen(gp); 67 | numnodes = getnumnodes(temp); 68 | 69 | if numnodes <= maxNodes 70 | 71 | copyDetected = false; 72 | 73 | if z > 1 %check previous genes for copies 74 | 75 | for j = 1:z-1 76 | if strcmp(temp,individ{1,j}) 77 | copyDetected = true; 78 | break 79 | end 80 | end 81 | 82 | end 83 | 84 | if ~copyDetected 85 | break 86 | end 87 | 88 | end %max nodes check 89 | 90 | %display a warning if having difficulty building trees due to 91 | %constraints 92 | if ~gp.runcontrol.quiet && geneLoopCounter > 10 93 | disp('initbuild: iterating tree build loop because of uniqueness constraints.'); 94 | end 95 | 96 | end %while loop 97 | 98 | individ{1,z} = temp; 99 | 100 | end %gene loop 101 | 102 | %write new individual to population cell array 103 | gp.pop{i,1} = individ; 104 | end 105 | -------------------------------------------------------------------------------- /core/kogene.m: -------------------------------------------------------------------------------- 1 | function treestrs = kogene(treestrs, knockout) 2 | %KOGENE Knock out genes from a cell array of tree expressions. 3 | % 4 | % TREESTRS = KOGENE(TREESTRS, KNOCKOUT) removes genes from the cell array 5 | % TREESTRS with a boolean vector KNOCKOUT which must be the same length 6 | % as TREESTRS. GENES corresponding to a an entry of 'true' are removed. 7 | % 8 | % Copyright (c) 2009-2015 Dominic Searson 9 | % 10 | % GPTIPS 2 11 | 12 | if numel(knockout) ~= numel(treestrs) 13 | error('Knockout must be a vector the same length as the number of genes in the supplied individual'); 14 | end 15 | 16 | knockout = logical(knockout); 17 | 18 | treestrs(knockout) = []; 19 | 20 | if isempty(treestrs) 21 | error('Cannot perform knockout. There are no genes left in the selected individual.'); 22 | end 23 | -------------------------------------------------------------------------------- /core/mergegp.m: -------------------------------------------------------------------------------- 1 | function gp1 = mergegp(gp1,gp2,multirun) 2 | %MERGEGP Merges two GP population structs into a new one. 3 | % 4 | % GPNEW = MERGEGP(GP1,GP2) takes the results of 2 GPTIPS runs in GP1 and 5 | % GP2 and merges the populations into a new population GPNEW. 6 | % 7 | % Remarks: 8 | % 9 | % The merged populations do not have to have exactly the same run 10 | % settings in terms of population size, number of generations, selection 11 | % method type etc. but they do need to have the same function sets (i.e. 12 | % mathematical building blocks), inputs and be based on the same data set 13 | % (i.e. same training, validation and test sets). If you randomise the 14 | % allocation of training, test and validation sets in your config file, 15 | % then the results (in terms of fitness, R^2 etc.) will not be 16 | % meaningful. 17 | % 18 | % Important: 19 | % 20 | % The merged structure GPNEW can be used like any other GP results 21 | % structure (e.g. in POPBROWSER, RUNTREE etc.). However, many of the 22 | % fields in GPNEW will not necessarily apply to all the data in the new 23 | % structure. For instance, gp.runcontrol.pop_size may be different in GP2 24 | % to GP1 and gp.results.history and gp.state will certainly be different. 25 | % Note that GPNEW will always inherit these fields from GP1. 26 | % 27 | % Copyright (c) 2009-2015 Dominic Searson 28 | % 29 | % GPTIPS 2 30 | % 31 | % See also GPMODELFILTER 32 | 33 | disp('Merging GP populations ...'); 34 | 35 | if nargin < 3 36 | multirun = false; 37 | end 38 | 39 | %do some tests to see if the GP structures are ostensibly OK to be merged. 40 | if ~isequal(gp1.nodes.inputs,gp2.nodes.inputs) 41 | error('GP structures cannot be merged because the input nodes are not configured identically.'); 42 | end 43 | 44 | if ~isequal(gp1.nodes.functions,gp2.nodes.functions) 45 | error('GP structures cannot be merged because the function nodes are not configured identically.'); 46 | end 47 | 48 | %update the appropriate fields of gp1. 49 | popsize1 = gp1.runcontrol.pop_size; 50 | popsize2 = gp2.runcontrol.pop_size; 51 | inds = popsize1+1:(popsize1+popsize2); 52 | 53 | gp1.pop(inds) = gp2.pop(:); 54 | gp1.fitness.returnvalues(inds) = gp2.fitness.returnvalues(:); 55 | gp1.fitness.complexity(inds) = gp2.fitness.complexity(:); 56 | gp1.fitness.nodecount(inds) = gp2.fitness.nodecount(:); 57 | gp1.fitness.values(inds) = gp2.fitness.values(:); 58 | gp1.runcontrol.pop_size = popsize1 + popsize2; 59 | 60 | %check for existence of r2train r2val and r2test fields before merging 61 | if isfield(gp1.fitness,'r2train') && isfield(gp2.fitness,'r2train') 62 | gp1.fitness.r2train(inds) = gp2.fitness.r2train(:); 63 | end 64 | 65 | if isfield(gp1.fitness,'r2val') && isfield(gp2.fitness,'r2val') 66 | gp1.fitness.r2val(inds) = gp2.fitness.r2val(:); 67 | end 68 | 69 | if isfield(gp1.fitness,'r2test') && isfield(gp2.fitness,'r2test') 70 | gp1.fitness.r2test(inds) = gp2.fitness.r2test(:); 71 | end 72 | 73 | %check to see if 'best' on validation data set is present 74 | if isfield(gp1.results,'valbest') && isfield(gp2.results,'valbest') 75 | valPresent = true; 76 | else 77 | valPresent = false; 78 | end 79 | 80 | %check to see if 'testbest' on test data set is present 81 | if isfield(gp1.results,'testbest') && isfield(gp2.results,'testbest') 82 | testPresent = true; 83 | else 84 | testPresent = false; 85 | end 86 | 87 | %find the best and valbest of gp1 and gp2 and get complexity values. 88 | if gp1.fitness.complexityMeasure 89 | traincomp1 = gp1.results.best.complexity; 90 | traincomp2 = gp2.results.best.complexity; 91 | 92 | if valPresent 93 | valcomp1 = gp1.results.valbest.complexity; 94 | valcomp2 = gp2.results.valbest.complexity; 95 | end 96 | 97 | if testPresent 98 | testcomp1 = gp1.results.testbest.complexity; 99 | testcomp2 = gp2.results.testbest.complexity; 100 | end 101 | 102 | else 103 | traincomp1 = gp1.results.best.nodecount; 104 | traincomp2 = gp2.results.best.nodecount; 105 | 106 | if valPresent 107 | valcomp1 = gp1.results.valbest.nodecount; 108 | valcomp2 = gp2.results.valbest.nodecount; 109 | end 110 | 111 | if testPresent 112 | testcomp1 = gp1.results.testbest.nodecount; 113 | testcomp2 = gp2.results.testbest.nodecount; 114 | end 115 | end 116 | 117 | if gp1.fitness.minimisation 118 | if gp2.results.best.fitness < gp1.results.best.fitness || ... 119 | ((gp2.results.best.fitness == gp1.results.best.fitness) && traincomp2 < traincomp1) 120 | gp1.results.best = gp2.results.best; 121 | end 122 | 123 | if valPresent 124 | if gp2.results.valbest.valfitness < gp1.results.valbest.valfitness || ... 125 | ((gp2.results.valbest.valfitness == gp1.results.valbest.valfitness) && valcomp2 < valcomp1) 126 | gp1.results.valbest = gp2.results.valbest; 127 | end 128 | end 129 | 130 | if testPresent 131 | if gp2.results.testbest.testfitness < gp1.results.testbest.testfitness || ... 132 | ((gp2.results.testbest.testfitness == gp1.results.testbest.testfitness) && testcomp2 < testcomp1) 133 | gp1.results.testbest = gp2.results.testbest; 134 | end 135 | end 136 | 137 | else %maximisation 138 | if gp2.results.best.fitness > gp1.results.best.fitness || ... 139 | ((gp2.results.best.fitness == gp1.results.best.fitness) && traincomp2 < traincomp1) 140 | gp1.results.best = gp2.results.best; 141 | end 142 | 143 | if valPresent 144 | if gp2.results.valbest.valfitness > gp1.results.valbest.valfitness || ... 145 | ((gp2.results.valbest.valfitness == gp1.results.valbest.valfitness) && valcomp2 < valcomp1) 146 | gp1.results.valbest = gp2.results.valbest; 147 | end 148 | end 149 | 150 | if gp2.results.testbest.testfitness > gp1.results.testbest.testfitness || ... 151 | ((gp2.results.testbest.testfitness == gp1.results.testbest.testfitness) && testcomp2 < testcomp1) 152 | gp1.results.testbest = gp2.results.testbest; 153 | end 154 | end 155 | 156 | %merge info 157 | if ~gp1.info.merged && ~gp2.info.merged 158 | gp1.info.mergedPopSizes = [popsize1 popsize2]; 159 | elseif gp1.info.merged && ~gp2.info.merged 160 | gp1.info.mergedPopSizes = [gp1.info.mergedPopSizes popsize2]; 161 | elseif ~gp1.info.merged && gp2.info.merged 162 | gp1.info.mergedPopSizes = [popsize1 gp2.info.mergedPopSizes]; 163 | elseif gp1.info.merged && gp2.info.merged 164 | gp1.info.mergedPopSizes = [gp1.info.mergedPopSizes gp2.info.mergedPopSizes]; 165 | end 166 | 167 | %counter for number of times gp1 has experienced a merge 168 | gp1.info.merged = gp1.info.merged + 1; 169 | gp1.info.duplicatesRemoved = false; 170 | 171 | %set stop time to time taken for all runs in multirun 172 | if multirun 173 | gp1.info.stopTime = gp2.info.stopTime; 174 | end -------------------------------------------------------------------------------- /core/mutate.m: -------------------------------------------------------------------------------- 1 | function exprMut = mutate(expr,gp) 2 | %MUTATE Mutate an encoded symbolic tree expression. 3 | % 4 | % EXPRMUT = MUTATE(EXPR,GP) mutates the encoded symbolic expression EXPR 5 | % using the parameters in GP to produce expression EXPRXMUT. 6 | % 7 | % Remarks: 8 | % 9 | % This function uses the GP.OPERATORS.MUTATION.MUTATE_PAR field to 10 | % determine probabilistically which type of mutation event to use. 11 | % 12 | % The types are: 13 | % 14 | % mutate_type=1 (Default) Ordinary Koza style subtree mutation 15 | % mutate_type=2 Mutate a non-constant terminal to another non-constant 16 | % terminal. 17 | % mutate_type=3 Constant perturbation. Generate small number using 18 | % Gaussian and add to existing constant. 19 | % mutate_type=4 Zero constant. Sets the selected constant to zero. 20 | % mutate_type=5 Randomise constant. Substitutes a random new value for 21 | % the selected constant. 22 | % mutate_type=6 Unity constant. Sets the selected constant to 1. 23 | % 24 | % The probabilities of each mutate_type being selected are stored in the 25 | % GP.OPERATORS.MUTATION.MUTATE_PAR field which must be a row vector of 26 | % length 6. 27 | % 28 | % I.e. 29 | % 30 | % GP.OPERATORS.MUTATION.MUTATE_PAR = [p_1 p_2 p_3 p_4 p_5 p_6] 31 | % 32 | % where p_1 = probability of mutate_type 1 being used 33 | % p_2= " " " 2 " " 34 | % 35 | % and so on. 36 | % 37 | % Example: 38 | % 39 | % If GP.OPERATORS.MUTATION.MUTATE_PAR = [0.5 0.25 0.25 0 0] then approx. 40 | % 1/2 of mutation events will be ordinary subtree mutations, 1/4 will be 41 | % Gaussian perturbations of a constant and 1/4 will be setting a constant 42 | % to zero. 43 | % 44 | % Note: 45 | % 46 | % If a mutation of a certain type cannot be performed (e.g. there are no 47 | % constants in the tree) then a default subtree mutation is performed 48 | % instead. 49 | % 50 | % Copyright (c) 2009-2015 Dominic Searson 51 | % 52 | % GPTIPS 2 53 | % 54 | % See also CROSSOVER 55 | 56 | %pick a mutation event type based on the weights in mutate_par 57 | rnum = rand; 58 | 59 | while true %loop until a mutation occurs 60 | 61 | %ordinary subtree mutation 62 | if rnum <= gp.operators.mutation.cumsum_mutate_par(1) 63 | 64 | %first get the string position of the node 65 | position = picknode(expr,0,gp); 66 | 67 | %remove the logical subtree for the selected node 68 | %and return the main tree with '$' replacing the extracted subtree 69 | maintree = extract(position,expr); 70 | 71 | %generate a new tree to replace the old 72 | %Pick a depth between 1 and gp.treedef.max_mutate_depth; 73 | depth = ceil(rand*gp.treedef.max_mutate_depth); 74 | newtree = treegen(gp,depth); 75 | 76 | %replace the '$' with the new tree 77 | exprMut = strrep(maintree,'$',newtree); 78 | 79 | break; 80 | 81 | %substitute terminals 82 | elseif rnum <= gp.operators.mutation.cumsum_mutate_par(2) 83 | 84 | position = picknode(expr,4,gp); 85 | 86 | if position == -1 87 | rnum = 0; 88 | continue; 89 | else 90 | maintree = extract(position,expr); %extract the constant 91 | exprMut = strrep(maintree,'$',['x' ... 92 | sprintf('%d',ceil(rand*gp.nodes.inputs.num_inp))]); %replace it with random terminal 93 | break; 94 | end 95 | 96 | %constant Gaussian perturbation 97 | elseif rnum <= gp.operators.mutation.cumsum_mutate_par(3) 98 | 99 | position = picknode(expr,3,gp); 100 | 101 | if position == -1 102 | rnum = 0; 103 | continue; 104 | else 105 | 106 | [maintree,subtree] = extract(position,expr); 107 | 108 | %process constant 109 | oldconst = sscanf(subtree(2:end-1),'%f'); 110 | newconst = oldconst + (randn*gp.operators.mutation.gaussian.std_dev); 111 | 112 | %use appropriate string formatting for new constant 113 | if newconst == fix(newconst) 114 | newtree = ['[' sprintf('%.0f',newconst) ']']; 115 | else 116 | newtree = ['[' sprintf(gp.nodes.const.format,newconst) ']']; 117 | end 118 | 119 | %replace '$' with the new tree 120 | exprMut = strrep(maintree,'$',newtree); 121 | break 122 | 123 | end 124 | 125 | %make constant zero 126 | elseif rnum <= gp.operators.mutation.cumsum_mutate_par(4) 127 | 128 | position = picknode(expr,3,gp); 129 | 130 | if position == -1 131 | rnum = 0; 132 | continue; 133 | else 134 | maintree = extract(position,expr); %extract the constant 135 | exprMut = strrep(maintree,'$','[0]'); %replace it with zero 136 | break 137 | end 138 | 139 | %randomise constant 140 | elseif rnum <= gp.operators.mutation.cumsum_mutate_par(5) 141 | position = picknode(expr,3,gp); 142 | 143 | if position == -1 144 | rnum = 0; 145 | continue; 146 | else 147 | maintree = extract(position,expr); %extract the constant 148 | 149 | %generate a constant in the range 150 | const = rand*(gp.nodes.const.range(2) - gp.nodes.const.range(1)) + gp.nodes.const.range(1); 151 | 152 | %convert the constant to square bracketed string form: 153 | if const == fix(const) 154 | arg = ['[' sprintf('%.0f',const) ']']; 155 | else 156 | arg = ['[' sprintf(gp.nodes.const.format,const) ']']; 157 | end 158 | 159 | exprMut = strrep(maintree,'$',arg); %insert new constant 160 | break 161 | end 162 | 163 | %set a constant to 1 164 | elseif rnum <= gp.operators.mutation.cumsum_mutate_par(6) 165 | 166 | position = picknode(expr,3,gp); 167 | 168 | if position == -1 169 | rnum = 0; 170 | continue; 171 | else 172 | maintree = extract(position,expr); %extract the constant 173 | exprMut = strrep(maintree,'$','[1]'); %replace it with one 174 | break 175 | end 176 | end 177 | end -------------------------------------------------------------------------------- /core/ndfsort_rank1.m: -------------------------------------------------------------------------------- 1 | function xrank = ndfsort_rank1(x) 2 | %NDFSORT_RANK1 Fast non dominated sorting algorithm for 2 objectives only - returns only rank 1 solutions. 3 | % 4 | % XRANK = NDFSORT_RANK1(X) performed on a (N x 2) matrix X where N is the 5 | % number of solutions and there are 2 objectives separates out the 6 | % solutions into different levels (ranks) of non-domination. The returned 7 | % vector XRANK contains only rank 1 solutions. Solutions with rank 1 come 8 | % from the non-dominated front of X. 9 | % 10 | % Remarks: 11 | % 12 | % Based on the sorting method described on page 184 of: "A fast and 13 | % elitist multiobjective genetic algorithm: NSGA-II" by Kalyanmoy Deb, 14 | % Amrit Pratap, Sameer Agarwal, T. Meyarivan. IEEE Transactions on 15 | % Evolutionary Computation Vol. 6, No. 2, April 2002, pp. 182-197. 16 | % 17 | % (c) Dominic Searson 2009-2015 18 | % 19 | % GPTIPS 2 20 | % 21 | % See also SELECTION, GPMODELFILTER, PARETOREPORT, POPBROWSER 22 | 23 | P = size(x,1); 24 | np = zeros(P,1); %current domination level 25 | xrank = np; %rank vector 26 | 27 | for p=1:P 28 | 29 | for q=1:P 30 | 31 | if q ~= p 32 | if doesx1Domx2(x(q,:),x(p,:)) 33 | np(p) = np(p) + 1; 34 | end 35 | end 36 | 37 | end 38 | 39 | if np(p) == 0 40 | xrank(p) = 1; 41 | end 42 | 43 | end 44 | 45 | function result = doesx1Domx2(x1,x2) 46 | %Returns true if first solution dominates second, false otherwise. Here, 47 | %dominance means if one solution beats the other on 1 criterion and is no 48 | %worse on the other criterion (not strict dominance). 49 | 50 | result = false; 51 | 52 | if (x1(1) < x2(1) && x1(2) <= x2(2)) || (x1(1) <= x2(1) && x1(2) < x2(2)) 53 | result = true; 54 | end 55 | -------------------------------------------------------------------------------- /core/parseadf.m: -------------------------------------------------------------------------------- 1 | function [fcn, newexpr, args, seed] = parseadf(expr, opt) 2 | %PARSEADF Parse the ADF expression and return anonymous fcn, args, and seed 3 | % 4 | % Calling sequence: 5 | % [FCN, NEWEXPR, ARGS, SEED] = PARSEADF(EXPR, OPT) 6 | % 7 | % where FCN contains the string for the anonymous function or a handle, 8 | % depending on the option, NEWEXPR contains the new expression string with 9 | % generated arguments and ARGS contains a string containing these arguments 10 | % encolsed in ( ). SEED contains the gene seed for generating the initial 11 | % population. The input argument OPT is a string that contains either 's' 12 | % or 'f'. In case of 's' the function outputs FCN as string (default), and 13 | % in case of 'f' an anonymous function handle is returned. 14 | % 15 | % When designing your expressions, use any valid functions with the 16 | % following symbols representing input arguments: 17 | % 18 | % $n - Free connection point 19 | % #m - Preset random constant (PRC) point that later turns into an ERC 20 | % ?k - An ephemeral random constant (ERC) point 21 | % where n, m, and k are indices. You can thus have the same terminal 22 | % applied as several arguments in the function. The use of indices is 23 | % mandatory. All the input arguments are ultimately replaced by x1,x2,x3,.. 24 | % and the corresponding expression (or anonymous function handle) is 25 | % returned. 26 | % 27 | % For example, input argument 'times($1,times(#1,plus($1,?1)))' 28 | % 29 | % Returns result '@(x1,x2,x3) times(x2,times(x1,plus(x2,x3)))' 30 | % with seed (#,$,?), the latter used during initial tree generation. 31 | % 32 | % Note that the function must accept at least one argument. Depending on 33 | % the context you can include constants into function calls. In the 34 | % algorithm, the insides of the ADF are never revealed, so it is treated as 35 | % just another function, with the big difference that you can specify 36 | % locations to which terminal nodes (PRCs and ERCs) are attached with 100% 37 | % probability during the initial creation of the population. 38 | % 39 | % (c) Aleksei Tepljakov 2017 40 | % 41 | % GPTIPS2F 42 | 43 | % Check input 44 | if ~isa(expr,'char') 45 | error('Expression must be a character array describing an ADF function.'); 46 | end 47 | 48 | if nargin < 2 49 | opt = 's'; 50 | end 51 | 52 | % Parse the arguments of the array 53 | argname = 'x'; % Default argument is "x" 54 | argind = 1; % The index of the argument 55 | seed = ''; % The seed is used when the initial population is built 56 | newexpr = expr; % Updated expression with proper arguments inserted 57 | 58 | % Parse the expression using regex 59 | regex = '[\$\#\?][0-9]+'; 60 | params = unique(regexp(expr, regex, 'match')); 61 | 62 | for k=1:length(params) 63 | % Assign x1, x2, x3, etc. to corresponding arguments 64 | oldarg = params{k}; 65 | newexpr = strrep(newexpr, oldarg, [argname num2str(argind)]); 66 | argind = argind + 1; 67 | 68 | % Construct the seed 69 | seed = [seed oldarg(1) ',']; 70 | end 71 | 72 | % Are there any arguments? 73 | if argind == 1 74 | error('The ADF must have at least one input argument.'); 75 | end 76 | 77 | % Amend the seed 78 | seed = ['(' seed(1:end-1) ')']; 79 | 80 | % Create the anonymous function 81 | args = ''; 82 | for k=1:(argind-1) 83 | args = [args argname num2str(k) ',']; 84 | end 85 | 86 | % Amend the arguments and construct the expression 87 | args = ['(' args(1:end-1) ')']; 88 | myfun = ['@' args ' ' newexpr]; 89 | 90 | % Assign correct output 91 | switch lower(opt) 92 | case 's' 93 | fcn = myfun; 94 | case 'f' 95 | fcn = eval(myfun); 96 | otherwise 97 | error('Unknown option given in second argument.'); 98 | end 99 | 100 | end 101 | 102 | -------------------------------------------------------------------------------- /core/picknode.m: -------------------------------------------------------------------------------- 1 | function position = picknode(expr,nodetype,gp) 2 | %PICKNODE Select a node (or nodes) of specified type from an encoded GP expression and return its position. 3 | % 4 | % POSITION = PICKNODE(EXPR,NODETYPE,GP) returns the POSITION where EXPR 5 | % is the symbolic expression, NODETYPE is the specified node type and GP 6 | % is the GPTIPS data struct. 7 | % 8 | % Remarks: 9 | % 10 | % For NODETYPEs 0, 3 or 4 this function returns the string POSITION of 11 | % the node or -1 if no appropriate node can be found. If NODETYPE is 5 or 12 | % 6 then POSITION is a vector of sorted node indices. If NODETYPE = 7 13 | % then POSITION is an unsorted vector of node indices. 14 | % 15 | % Set NODETYPE argument as follows: 16 | % 17 | % 0 = any node 18 | % 1 = unused 19 | % 2 = unused 20 | % 3 = constant (ERC) selection only 21 | % 4 = input selection only 22 | % 5 = indices of nodes with a builtin MATLAB infix representation, 23 | % sorted left to right (offline use) 24 | % 6 = indices of all nodes, sorted left to right (offline use) 25 | % 7 = gets indices of cube and square functions (offline use) 26 | % 27 | % Copyright (c) 2009-2015 Dominic Searson 28 | % 29 | % GPTIPS 2 30 | % 31 | % See also EXTRACT, MUTATE, CROSSOVER 32 | 33 | %mask negative constants 34 | x = strrep(expr,'[-','["'); 35 | 36 | if nodetype == 0 %pick any node 37 | 38 | %get indices of all function node, constant and input node locations 39 | %NB char(97) ='a' and char(122) = 'z' and double('[') = 91 40 | xd = double(x); 41 | ind = find((xd <= 122 & xd >= 97) | xd==91); 42 | 43 | elseif nodetype == 3 %just constants 44 | 45 | ind = strfind(x,'['); 46 | 47 | elseif nodetype == 4 %just inputs 48 | 49 | ind = strfind(x,'x'); 50 | 51 | elseif nodetype == 5 %special option, only selects plus, minus, times,rdivide,power (intended for offline use) 52 | 53 | idcount = 0; ind = []; 54 | for i=1:numel(gp.nodes.functions.name) 55 | 56 | if gp.nodes.functions.active(i) 57 | idcount = idcount+1; 58 | func = gp.nodes.functions.name{i}; 59 | 60 | if strcmpi(func,'times') || strcmpi(func,'plus') ... 61 | || strcmpi(func,'minus') || strcmpi(func,'rdivide') || strcmpi(func,'power') 62 | id = gp.nodes.functions.afid(idcount); 63 | %look for id in string and concatenate to indices vector 64 | ind = [ind strfind(x,id)]; 65 | end 66 | 67 | end 68 | end 69 | position = sort(ind); 70 | return 71 | 72 | elseif nodetype == 6 %offline use (all nodes, sorted left to right) 73 | 74 | %get indices of function and input locations 75 | temp_ind = find(double(x)<=122 & double(x)>=97); 76 | 77 | %locations of '@','%' and '&' which may have been added by 78 | %gpreformat.m 79 | 80 | %special case of post-run added power node 81 | temp_ind = [temp_ind strfind(x,'@')]; 82 | 83 | %special case of post-run added times node 84 | temp_ind = [temp_ind strfind(x,'&')]; 85 | 86 | %special case of post-run added exp node 87 | temp_ind = [temp_ind strfind(x,'%')]; 88 | 89 | %add indices of constant locations 90 | ind = [temp_ind strfind(x,'[')]; 91 | 92 | position = sort(ind); 93 | return 94 | 95 | elseif nodetype == 7 %gets all square and cube node indices 96 | 97 | idcount = 0;ind = []; 98 | for i=1:numel(gp.nodes.functions.name) 99 | 100 | if gp.nodes.functions.active(i) 101 | idcount = idcount+1; 102 | func = gp.nodes.functions.name{i}; 103 | 104 | if strcmpi(func,'square') || strcmpi(func,'cube') 105 | id = gp.nodes.functions.afid(idcount); 106 | %look for id in string and concatenate to indices vector 107 | ind = [ind strfind(x,id)]; 108 | end 109 | 110 | end 111 | end 112 | if isempty(ind) 113 | position = -1; 114 | else 115 | position = ind; 116 | end 117 | return; 118 | 119 | end 120 | 121 | %count legal nodes 122 | num_nodes = numel(ind); 123 | 124 | %if none legal, return -1 125 | if ~num_nodes 126 | position = -1; 127 | return; 128 | else 129 | %select a random node from those indexed 130 | fun_choice = ceil(rand*num_nodes); 131 | position = ind(fun_choice); 132 | end -------------------------------------------------------------------------------- /core/pref2inf.m: -------------------------------------------------------------------------------- 1 | function [sendup,status] = pref2inf(expr,gp) 2 | %PREF2INF Recursively extract arguments from a prefix expression and convert to infix where possible. 3 | % 4 | % [SENDUP,STATUS] = PREF2INF(EXPR,GP) 5 | % 6 | % Copyright (c) 2009-2015 Dominic Searson 7 | % 8 | % GPTIPS 2 9 | % 10 | % See also PICKNODE, EXTRACT, GPREFORMAT, TREE2EVALSTR 11 | 12 | %get indices of all nodes in current expression 13 | ind = picknode(expr,6,gp); 14 | if isempty(ind) 15 | sendup = expr; %exits if current expression has no further extractable args 16 | status = 0; 17 | return 18 | end 19 | 20 | %get first node 21 | ind = ind(1); 22 | 23 | %get number of arguments of node 24 | try 25 | args = gp.nodes.functions.arity_argt0(strfind(gp.nodes.functions.afid,expr(ind))); 26 | if isempty(args) 27 | % if has zero arity then exit 28 | sendup = expr; 29 | status = 0; 30 | return; 31 | end 32 | catch 33 | %also exit if error 34 | sendup = expr; 35 | status = 0; 36 | return; 37 | end 38 | 39 | if args == 1 40 | 41 | %get subtree rooted in this node 42 | [~,subtree] = extract(ind,expr); 43 | node = subtree(1); 44 | 45 | %get index of 2nd node in 'sub_tree' (i.e. first logical argument) 46 | keynodeInd = picknode(subtree,6,gp); 47 | keynodeInd = keynodeInd(2); 48 | 49 | %extract the 1st argument expression 50 | [~,arg1] = extract(keynodeInd,subtree); 51 | 52 | [rec1,rec1_status] = pref2inf(arg1,gp); 53 | 54 | if rec1_status==1 55 | arg1 = rec1; 56 | end 57 | 58 | sendup = [node '(' arg1 ')']; 59 | status = 1; 60 | 61 | elseif args > 2 62 | 63 | %get subtree rooted in this node 64 | [~,subtree] = extract(ind,expr); 65 | node = subtree(1); 66 | 67 | %get index of 2nd node in 'sub_tree' (i.e. first logical argument) 68 | keynodeInd = picknode(subtree,6,gp); % 69 | keynodeInd = keynodeInd(2); 70 | 71 | %extract the 1st argument expression 72 | [remainder,arg1] = extract(keynodeInd,subtree); 73 | 74 | %extract the second argument expression from the remainder 75 | %find the 1st node after $ in the remainder as this will be the 76 | %keynode of the 2nd argument we wish to extract 77 | keynodeInd = picknode(remainder,6,gp); 78 | tokenInd = strfind(remainder,'$'); 79 | 80 | keynodeLoc = find(keynodeInd > tokenInd); 81 | keynodeInd = keynodeInd(keynodeLoc); 82 | keynode_ind_1 = keynodeInd(1); 83 | [remainder2,arg2] = extract(keynode_ind_1,remainder); 84 | 85 | %extract the third argument expression from the remainder 86 | %find the 1st node after $ in remainder2 as this will be the keynode of 87 | %the 3nd argument we wish to extract 88 | keynodeInd = picknode(remainder2,6,gp); 89 | tokenInd = strfind(remainder2,'$'); 90 | 91 | keynodeLoc = find(keynodeInd > max(tokenInd)); 92 | keynodeInd = keynodeInd(keynodeLoc); 93 | keynode_ind_1 = keynodeInd(1); 94 | [remainder3,arg3] = extract(keynode_ind_1,remainder2); 95 | 96 | [rec1,rec1_status] = pref2inf(arg1,gp); 97 | [rec2,rec2_status] = pref2inf(arg2,gp); 98 | [rec3,rec3_status] = pref2inf(arg3,gp); 99 | 100 | if rec1_status==1 101 | arg1 = rec1; 102 | end 103 | 104 | if rec2_status==1 105 | arg2 = rec2; 106 | end 107 | 108 | if rec3_status==1 109 | arg3 = rec3; 110 | end 111 | 112 | sendup = [node '(' arg1 ',' arg2 ',' arg3 ')']; 113 | status = 1; 114 | 115 | if args > 3 116 | 117 | %extract the fourth argument expression from the remainder 118 | %find the 1st node after $ in remainder3 as this will be the 119 | %keynode of the 4nd argument we wish to extract 120 | keynodeInd = picknode(remainder3,6,gp); 121 | tokenInd = strfind(remainder3,'$'); 122 | 123 | keynodeLoc = find(keynodeInd > max(tokenInd)); 124 | keynodeInd = keynodeInd(keynodeLoc); 125 | keynode_ind_1 = keynodeInd(1); 126 | [~,arg4] = extract(keynode_ind_1,remainder3); 127 | [rec4,rec4_status] = pref2inf(arg4,gp); 128 | 129 | if rec4_status==1 130 | arg4 = rec4; 131 | end 132 | 133 | sendup = [node '(' arg1 ',' arg2 ',' arg3 ',' arg4 ')']; 134 | status = 1; 135 | end 136 | 137 | else %must have exactly 2 args 138 | 139 | ind = picknode(expr,6,gp); 140 | %get subtree rooted in this node 141 | [maintree,subtree] = extract(ind,expr); 142 | node = subtree(1); 143 | 144 | %get index of 2nd node in 'sub_tree' (i.e. first logical argument) 145 | keynodeInd = picknode(subtree,6,gp); % 146 | keynodeInd = keynodeInd(2); 147 | 148 | %extract the 1st argument expression 149 | [remainder,arg1] = extract(keynodeInd,subtree); 150 | 151 | %extract the second argument expression from the remainder 152 | %find the 1st node after $ in the remainder as this will be the 153 | %keynode of the 2nd argument we wish to extract 154 | keynodeInd = picknode(remainder,6,gp); 155 | tokenInd = strfind(remainder,'$'); 156 | 157 | keynodeLoc = find(keynodeInd>tokenInd); 158 | keynodeInd = keynodeInd(keynodeLoc); 159 | keynode_ind_1 = keynodeInd(1); 160 | [~,arg2] = extract(keynode_ind_1,remainder); 161 | 162 | %process arguments of these arguments if any exist 163 | [rec1,rec1_status] = pref2inf(arg1,gp); 164 | [rec2,rec2_status] = pref2inf(arg2,gp); 165 | 166 | if rec1_status==1 167 | arg1 = rec1; 168 | end 169 | 170 | if rec2_status==1 171 | arg2 = rec2; 172 | end 173 | 174 | %If the node is of infix type (for Matlab symbolic purposes) 175 | % then send up the results differently 176 | afid_ind = strfind(gp.nodes.functions.afid, node); 177 | nodename = gp.nodes.functions.active_name_UC{afid_ind}; 178 | 179 | if strcmpi(nodename,'times') || strcmpi(nodename,'minus') || ... 180 | strcmpi(nodename,'plus') || strcmpi(nodename,'rdivide') || strcmpi(nodename,'power') 181 | 182 | sendup = ['(' arg1 ')' node '(' arg2 ')']; 183 | else 184 | sendup = [node '(' arg1 ',' arg2 ')']; 185 | end 186 | sendup = strrep(maintree,'$',sendup); 187 | 188 | status = 1; %i.e. ok 189 | end -------------------------------------------------------------------------------- /core/regressionErrorCharacteristic.m: -------------------------------------------------------------------------------- 1 | function [xdata,ydata,xdataref,ydataref] = regressionErrorCharacteristic(y,ypred) 2 | %REGRESSIONERRORCHARACTERISTIC Generates REC curve data using actual and predicted output vectors. 3 | % 4 | % [XDATA,YDATA] = REGRESSIONERRORCHARACTERISTIC(Y,YPRED) generates REC 5 | % (regression error characteristic) curve data for the model prediction 6 | % YPRED of the actual response Y. 7 | % 8 | % [XDATA,YDATA,XDATAREF,YDATAREF] = REGRESSIONERRORCHARACTERISTIC(Y,YPRED) 9 | % generates REC curve data as well as data for naive reference prediction 10 | % (in this case the mean of the Y data is used as a naive prediction of 11 | % any Y). This is similar to the ZeroR model in Weka, e,g, see 12 | % http://weka.wikispaces.com/ZeroR 13 | % 14 | % Remarks: 15 | % 16 | % Based on the method outlined in "Regression Error Characteristic 17 | % Curves", Jinbo Bi & Kristin P. Bennett, Proceedings of the Twentieth 18 | % International Conference on Machine Learning (ICML-2003), Washington 19 | % DC, 2003. 20 | % 21 | % Note: 22 | % 23 | % This function only generates REC data, to generate a graph use 24 | % COMPAREMODELSREC 25 | % 26 | % Copyright (c) 2009-2015 Dominic Searson 27 | % 28 | % GPTIPS 2 29 | % 30 | % See also COMPAREMODELSREC 31 | 32 | %first compute loss function abs(y-ypred) 33 | err = abs(y - ypred); 34 | m = numel(err); 35 | eprev = 0; 36 | correct = 0; 37 | 38 | %count of plottable points on curve 39 | datapoints = 1; 40 | 41 | %sort errors 42 | errs= sort(err); 43 | 44 | %generate plot data for model prediction 45 | %(x - abs. error, y - fraction of sample accurate within error) 46 | for i=1:m 47 | if errs(i) > eprev 48 | xdata(datapoints,1)= eprev; 49 | ydata(datapoints,1) = correct/m; 50 | datapoints = datapoints + 1; 51 | eprev = errs(i); 52 | end 53 | correct = correct + 1; 54 | end 55 | 56 | xdata(datapoints,1) = errs(m); 57 | ydata(datapoints,1) = correct/m; 58 | 59 | %next do the same for reference model 60 | err = abs(y-mean(y)); 61 | errs= sort(err); 62 | eprev = 0; 63 | correct = 0; 64 | datapoints = 1; 65 | 66 | %generate plot data for reference prediction 67 | %(x - abs. error, y - fraction of sample accurate within error) 68 | for i=1:m 69 | if errs(i) > eprev 70 | xdataref(datapoints,1) = eprev; 71 | ydataref(datapoints,1) = correct/m; 72 | datapoints = datapoints + 1; 73 | eprev = errs(i); 74 | end 75 | correct = correct + 1; 76 | end 77 | 78 | xdataref(datapoints,1) = errs(m); 79 | ydataref(datapoints,1) = correct/m; -------------------------------------------------------------------------------- /core/rungp.m: -------------------------------------------------------------------------------- 1 | function gpout = rungp(configFile) 2 | %RUNGP Runs GPTIPS 2 using the specified configuration file. 3 | % 4 | % GP = RUNGP(@CONFIGFILE) runs GPTIPS using the user parameters contained 5 | % in the configuration file CONFIGFILE.M and returns the results in the 6 | % GP data structure. Post run analysis commands may then be run on this 7 | % structure, e.g. POPBROWSER(GP), SUMMARY(GP), RUNTREE(GP,'best') etc. 8 | % 9 | % Demos: 10 | % 11 | % Use GPDEMO1 at the commmand line for a demonstration of naive symbolic 12 | % regression (not multigene). 13 | % 14 | % For demos involving multigene symbolic regression see GPDEMO2, GPDEMO3 15 | % and GPDEMO4. 16 | % 17 | % Addionally, example config files for muligene regression on data 18 | % generated by several non-linear mathematical functions are 19 | % CUBIC_CONFIG, RIPPLE_CONFIG, UBALL_CONFIG and SALUSTOWICZ1D_CONFIG. 20 | % 21 | % Remarks: 22 | % 23 | % For further information see the GPTIPS websites at: 24 | % 25 | % https://sites.google.com/site/gptips4matlab/ 26 | % 27 | % Also, see the following paper for a summary of GPTIPS 2 capabilities: 28 | % 29 | % Searson DP, GPTIPS 2: an open-source software platform for symbolic 30 | % data mining, Chapter 22 in Gandomi, AH, Alavi, AH, Ryan, C (eds). 31 | % Springer Handbook of Genetic Programming Applications, Springer, 2015. 32 | % (author proof freely available at http://arxiv.org/abs/1412.4690) 33 | % 34 | % ---------------------------------------------------------------------- 35 | % 36 | % This program is free software: you can redistribute it and/or modify it 37 | % under the terms of the GNU General Public License as published by the 38 | % Free Software Foundation, either version 3 of the License, or (at your 39 | % option) any later version. 40 | % 41 | % This program is distributed in the hope that it will be useful, but 42 | % WITHOUT ANY WARRANTY; without even the implied warranty of 43 | % MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU 44 | % General Public License for more details. 45 | % 46 | % You should have received a copy of the GNU General Public License along 47 | % with this program. If not, see . 48 | % 49 | % ---------------------------------------------------------------------- 50 | % 51 | % Copyright (c) 2009-2015 Dominic Searson 52 | % 53 | % GPTIPS 2 54 | % 55 | % See also GPDEMO1, GPDEMO2, GPDEMO3, GPDEMO4, CUBIC_CONFIG, 56 | % UBALL_CONFIG, SALUSTOWICZ1D_CONFIG, RIPPLE_CONFIG 57 | 58 | %if no config supplied then report error 59 | if nargin < 1 || isempty(configFile) 60 | disp('To run GPTIPS a configuration m file function handle must be specified'); 61 | disp('e.g. GP = RUNGP(@CUBIC_CONFIG)'); 62 | return; 63 | end 64 | 65 | %generate prepared data structure with some default run parameter values 66 | gp = gpdefaults(); 67 | 68 | %setup random number generator 69 | gp = gprandom(gp); 70 | 71 | %run user configuration file 72 | gp = feval(configFile,gp); 73 | 74 | %multiple runs (are merged after each run) 75 | for run=1:gp.runcontrol.runs 76 | 77 | gp.state.run = run; 78 | 79 | %run user configuration file each time for subsequent runs, unless 80 | %suppressed (default) 81 | if run > 1 && ~gp.runcontrol.suppressConfig 82 | gp = feval(configFile,gp); 83 | end 84 | 85 | %attach name of config file for reference 86 | gp.info.configFile = configFile; 87 | 88 | %perform error checks 89 | gp = gpcheck(gp); 90 | 91 | %perform initialisation 92 | gp = gpinit(gp); 93 | 94 | %main generation loop 95 | for count=1:gp.runcontrol.num_gen 96 | 97 | %start gen timer 98 | gp = gptic(gp); 99 | 100 | if count == 1 101 | 102 | %generate the initial population 103 | gp = initbuild(gp); 104 | 105 | else 106 | 107 | %use crossover, mutation etc. to generate a new population 108 | gp = popbuild(gp); 109 | 110 | end 111 | 112 | %calculate fitnesses of population members 113 | gp = evalfitness(gp); 114 | 115 | %update run statistics 116 | gp = updatestats(gp); 117 | 118 | %call user defined function 119 | gp = gp_userfcn(gp); 120 | 121 | %display current stats on screen 122 | displaystats(gp); 123 | 124 | %end gen timer 125 | gp = gptoc(gp); 126 | 127 | %check termination criteria 128 | gp = gpterminate(gp); 129 | 130 | %save gp structure 131 | if gp.runcontrol.savefreq && ... 132 | ~mod(gp.state.count-1,gp.runcontrol.savefreq) 133 | save gptips_checkpoint gp 134 | end 135 | 136 | %break out of generation loop if termination required 137 | if gp.state.terminate 138 | if ~gp.runcontrol.quiet 139 | disp(['Termination criterion met (' ... 140 | gp.state.terminationReason '). Terminating run.']); 141 | end 142 | break; 143 | end 144 | 145 | end %end generation loop 146 | 147 | %finalise the run 148 | gp = gpfinalise(gp); 149 | 150 | %merge independent runs 151 | if gp.state.run > 1 152 | gpout = mergegp(gpout,gp,true); 153 | else 154 | gpout = gp; 155 | end 156 | 157 | end %end independent run loop -------------------------------------------------------------------------------- /core/runtree.m: -------------------------------------------------------------------------------- 1 | function fitness = runtree(gp,ID,knockout) 2 | %RUNTREE Run the fitness function on an individual in the current population. 3 | % 4 | % RUNTREE(GP,ID) runs the fitness function using a user selected 5 | % individual with numeric identifier ID from the current population. 6 | % Despite the name RUNTREE will run either a single or a multigene 7 | % individual. 8 | % 9 | % RUNTREE(GP,'best') runs the fitness function using the 'best' 10 | % individual in the current population. 11 | % 12 | % RUNTREE(GP,'valbest') runs the fitness function using the best 13 | % individual on the validation data set (if it exists - for symbolic 14 | % regression problems that use the fitness function REGRESSMULTI_FITFUN). 15 | % 16 | % RUNTREE(GP,'testbest') runs the fitness function using the best 17 | % individual on the test data set (if it exists - for symbolic 18 | % regression problems that use the fitness function REGRESSMULTI_FITFUN). 19 | % 20 | % RUNTREE(GP,GPMODEL) runs the fitness function using the GPMODEL struct 21 | % representing a multigene regression model, i.e. the struct returned by 22 | % the functions GPMODEL2STRUCT or GENES2GPMODEL. 23 | % 24 | % FITNESS = RUNTREE(GP,ID) also returns the FITNESS value as returned by 25 | % the fitness function specified in gp.fitness.fitfun 26 | % 27 | % Additional functionality: 28 | % 29 | % RUNTREE can also accept an optional third argument KNOCKOUT which 30 | % should be a boolean vector the with same number of entries as genes in 31 | % the individual to be run. This evaluates the individual with the 32 | % indicated genes removed ('knocked out'). 33 | % 34 | % E.g. RUNTREE(GP,'best',[1 0 0 1]) knocks out the 1st and 4th genes from 35 | % the best individual of run. In the case of multigene symbolic 36 | % regression (i.e. if the fitness function is REGRESSMULTI_FITFUN) the 37 | % weights for the remaining genes are recomputed by least squares 38 | % regression on the training data. 39 | % 40 | % Copyright (c) 2009-2015 Dominic Searson 41 | % 42 | % GPTIPS 2 43 | % 44 | % See also EVALFITNESS, GPMODELREPORT 45 | 46 | if nargin < 2 || isempty(ID) 47 | disp('Basic usage is RUNTREE(GP,ID) where ID is the population identifier of the selected individual.'); 48 | disp('or RUNTREE(GP,''best'') to run the best individual in the population.'); 49 | 50 | if nargout > 0 51 | fitness = []; 52 | end 53 | 54 | return; 55 | 56 | elseif nargin < 3 57 | knockout = 0; 58 | end 59 | 60 | if isempty(knockout) || ~any(knockout) 61 | doknockout = false; 62 | else 63 | doknockout = true; 64 | end 65 | 66 | i = ID; 67 | 68 | if isnumeric(ID) 69 | 70 | if i > 0 && i <= gp.runcontrol.pop_size 71 | 72 | %set this in case the fitness function needs to retrieve 73 | %the right returnvalues 74 | gp.state.current_individual = i; 75 | treestrs = tree2evalstr(gp.pop{i},gp); 76 | 77 | %if genes are being knocked out then remove appropriate gene 78 | if doknockout 79 | treestrs = kogene(treestrs, knockout); 80 | gp.state.force_compute_theta = true; %need to recompute gene weights if doing symbolic regression 81 | gp.userdata.showgraphs = true; %if using symbolic regression, plot graphs 82 | end 83 | 84 | fitness = feval(gp.fitness.fitfun,treestrs,gp); 85 | 86 | else 87 | error('A valid population member ID must be entered, e.g. 1, 99 or ''best'''); 88 | end 89 | 90 | elseif ischar(ID) && strcmpi(ID,'best') 91 | 92 | gp.fitness.returnvalues{gp.state.current_individual} = gp.results.best.returnvalues; 93 | treestrs = gp.results.best.eval_individual; 94 | 95 | if doknockout 96 | treestrs = kogene(treestrs, knockout); 97 | gp.state.force_compute_theta = true; 98 | gp.userdata.showgraphs = true; 99 | end 100 | 101 | fitness = feval(gp.fitness.fitfun,treestrs,gp); 102 | 103 | elseif ischar(ID) && strcmpi(ID,'valbest') 104 | 105 | % check that validation results/data present 106 | if ~isfield(gp.results,'valbest') 107 | disp('No validation results/data were found. Try runtree(gp,''best'') instead.'); 108 | return; 109 | end 110 | 111 | %copy "valbest" return values to a slot in the "current" return values 112 | gp.fitness.returnvalues{gp.state.current_individual} = gp.results.valbest.returnvalues; 113 | 114 | treestrs = gp.results.valbest.eval_individual; 115 | 116 | if doknockout 117 | treestrs = kogene(treestrs, knockout); 118 | gp.state.force_compute_theta = true; 119 | gp.userdata.showgraphs = true; 120 | end 121 | 122 | fitness=feval(gp.fitness.fitfun,treestrs,gp); 123 | 124 | elseif ischar(ID) && strcmpi(ID,'testbest') 125 | 126 | % check that test data results are present 127 | if ~isfield(gp.results,'testbest') 128 | disp('No test results/data were found. Try runtree(gp,''best'') instead.'); 129 | return; 130 | end 131 | 132 | %copy "testbest" return values to a slot in the "current" return values 133 | gp.fitness.returnvalues{gp.state.current_individual} = gp.results.testbest.returnvalues; 134 | 135 | treestrs = gp.results.testbest.eval_individual; 136 | 137 | if doknockout 138 | treestrs = kogene(treestrs, knockout); 139 | gp.state.force_compute_theta = true; 140 | gp.userdata.showgraphs = true; 141 | end 142 | 143 | fitness=feval(gp.fitness.fitfun,treestrs,gp); 144 | 145 | %if the selected individual is a GPMODEL struct (NB knockout disabled 146 | %for this form) 147 | elseif isa(ID,'struct') && isfield(ID,'source') && ... 148 | (strcmpi(ID.source,'gpmodel2struct') || strcmpi(ID.source,'genes2gpmodel') ); 149 | 150 | treestrs = ID.genes.geneStrs; 151 | treestrs = tree2evalstr(treestrs,gp); 152 | gp.fitness.returnvalues{gp.state.current_individual} = ID.genes.geneWeights; 153 | fitness = feval(gp.fitness.fitfun,treestrs,gp); 154 | 155 | else 156 | error('Invalid argument.'); 157 | end -------------------------------------------------------------------------------- /core/scangenes.m: -------------------------------------------------------------------------------- 1 | function xhits = scangenes(genes,numx) 2 | %SCANGENES Scan a single multigene individual for all input variables and return a frequency vector. 3 | % 4 | % XHITS = SCANGENES(GENES,NUMX) scans the cell array GENES for NUMX 5 | % input variables and returns the frequency vector XHITS. 6 | % 7 | % Copyright (c) 2009-2015 Dominic Searson 8 | % 9 | % GPTIPS 2 10 | % 11 | % See also GPMODELVARS 12 | 13 | numgenes = length(genes); 14 | xhits = zeros(1,numx); 15 | 16 | %loop through inputs 17 | for i=1:numx 18 | k1 = []; 19 | k2 = []; 20 | istr = ['x' int2str(i)]; 21 | 22 | %loop through genes, look for current input 23 | for j=1:numgenes 24 | 25 | k1 = strfind(genes{j}, [istr ',']); 26 | k2 = strfind(genes{j},[istr ')']); 27 | 28 | %workaround for special case (trees containing a single terminal node) 29 | k3 = strfind(genes{j},istr); 30 | if ~isempty(k3) && numel(k3) == 1 31 | if length(istr) == length(genes{j}); 32 | xhits(i) = xhits(i) + 1; 33 | end 34 | end 35 | 36 | if ~isempty(k1) 37 | xhits(i) = xhits(i) + length(k1); 38 | end 39 | 40 | if ~isempty(k2) 41 | xhits(i) = xhits(i) + length(k2); 42 | end 43 | 44 | end 45 | 46 | end -------------------------------------------------------------------------------- /core/selection.m: -------------------------------------------------------------------------------- 1 | function ID = selection(gp) 2 | %SELECTION Selects an individual from the current population. 3 | % 4 | % Selection is based on fitness and/or complexity using either regular or 5 | % Pareto tournaments. 6 | % 7 | % ID = SELECTION(GP) returns the numeric identifier of the selected 8 | % individual. 9 | % 10 | % Remarks: 11 | % 12 | % Selection method is either: 13 | % 14 | % A) Standard fitness based tournament (with the option of lexicographic 15 | % selection pressure). 16 | % 17 | % This is always used when the field GP.SELECTION.TOURNAMENT.P_PARETO 18 | % equals 0. 19 | % 20 | % OR 21 | % 22 | % B) Pareto tournament 23 | % 24 | % This is used with a probability equal to 25 | % GP.SELECTION.TOURNAMENT.P_PARETO 26 | % 27 | % Remarks: 28 | % 29 | % Method A (standard tournament) 30 | % ------------------------------ 31 | % 32 | % For a tournament of size N 33 | % 34 | % 1) N individuals are randomly selected from the population with 35 | % reselection allowed. 36 | % 37 | % 2) The population index of the best individual in the tournament is 38 | % returned. 39 | % 40 | % 3) If one or more individuals in the tournament have the same best 41 | % fitness then one of these is selected randomly, unless 42 | % GP.SELECTION.TOURNAMENT.LEX_PRESSURE is set to true. In this case the 43 | % one with the lowest complexity is selected (if there is more than one 44 | % individual with the best fitness AND the same complexity then one of 45 | % these individuals is randomly selected.) 46 | % 47 | % OR 48 | % 49 | % Method B (Pareto tournament) 50 | % ---------------------------- 51 | % 52 | % This is used probabalistically when GP.SELECTION.TOURNAMENT.P_PARETO > 0 53 | % 54 | % For a tournament of size N 55 | % 56 | % 1) N individuals are randomly selected from the population with 57 | % reselection allowed. 58 | % 59 | % 2) These N individuals are then Pareto sorted (sorted using fitness and 60 | % complexity as competing objectives) to obtain a (rank 1) pareto set of 61 | % individuals that are non-dominated in terms of both fitness and 62 | % complexity (this set can be thought of as being the pareto front of the 63 | % tournament). The pareto sort is based on on the sorting method 64 | % described on page 184 of: 65 | % 66 | % "A fast and elitist multiobjective genetic algorithm: NSGA-II" by 67 | % Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, T. Meyarivan IEEE 68 | % Transactions on Evolutionary Computation VOL. 6, NO. 2, APRIL 2002, pp. 69 | % 182-197. 70 | % 71 | % 3) A random individual is then selected from the pareto set and its 72 | % index IND in the population is returned. 73 | % 74 | % Copyright (c) 2009-2015 Dominic Searson 75 | % 76 | % GPTIPS 2 77 | % 78 | % See also POPBUILD, INITBUILD 79 | 80 | %Method A 81 | if rand >= gp.selection.tournament.p_pareto 82 | 83 | %pick N individuals at random 84 | tour_ind = ceil(rand(gp.selection.tournament.size,1)*gp.runcontrol.pop_size); 85 | 86 | %retrieve fitness values of tournament members 87 | tour_fitness = gp.fitness.values(tour_ind); 88 | 89 | %check if max or min problem 90 | if gp.fitness.minimisation 91 | bestfitness = min(tour_fitness); 92 | else 93 | bestfitness = max(tour_fitness); 94 | end 95 | 96 | %locate indices of best individuals in tournament 97 | bestfitness_tour_ind = find(tour_fitness == bestfitness); 98 | number_of_best = numel(bestfitness_tour_ind); 99 | 100 | %use plain lexicographic parsimony pressure a la Sean Luke 101 | %and Livui Panait (Lexicographic Parsimony Pressure, GECCO 2002, pp. 829-836). 102 | if number_of_best > 1 103 | 104 | if gp.selection.tournament.lex_pressure 105 | 106 | tcomps = gp.fitness.complexity(tour_ind(bestfitness_tour_ind)); 107 | 108 | [~,min_ind] = min(tcomps); 109 | bestfitness_tour_ind = bestfitness_tour_ind(min_ind); 110 | 111 | else %otherwise randomly pick one 112 | 113 | bestfitness_tour_ind = bestfitness_tour_ind(ceil(rand*number_of_best)); 114 | 115 | end 116 | end 117 | 118 | ID = tour_ind(bestfitness_tour_ind); 119 | 120 | else %Method B: Pareto tournament 121 | 122 | %pick N individuals at random 123 | tour_ind = ceil(rand(gp.selection.tournament.size,1)*gp.runcontrol.pop_size); 124 | 125 | %retrieve fitness values of tournament members 126 | tour_fitness = gp.fitness.values(tour_ind); 127 | 128 | %retrieve complexities of tournament members 129 | tour_comp = gp.fitness.complexity(tour_ind); 130 | 131 | %perform fast pareto sort on tournament members 132 | %and select winner randomly from rank 1 solutions 133 | if gp.fitness.minimisation 134 | mo = [tour_fitness tour_comp]; 135 | else 136 | mo = [ (-1 *tour_fitness) tour_comp]; 137 | end 138 | 139 | %remove any 'infs' from consideration and recall if necessary 140 | infs= isinf(mo(:,1)); 141 | mo(infs,:)=[]; 142 | if isempty(mo) 143 | ID = selection(gp); 144 | return; 145 | end 146 | 147 | tour_ind(infs,:)=[]; 148 | 149 | rank = ndfsort_rank1(mo); 150 | rank1_tour_ind = tour_ind(rank == 1); 151 | num_rank1 = numel(rank1_tour_ind); 152 | ID = rank1_tour_ind(ceil(rand * num_rank1)); 153 | end -------------------------------------------------------------------------------- /core/standaloneModelStats.m: -------------------------------------------------------------------------------- 1 | function stats = standaloneModelStats(yactual,ypredicted) 2 | %STANDALONEMODELSTATS Compute model performance stats for actual and predicted values. 3 | % 4 | % STATS = STANDALONEMODELSTATS(YACTUAL,YPREDICTED) returns a structure 5 | % STATS containing model performance metrics. 6 | % 7 | % Copyright (c) 2009-2015 Dominic Searson 8 | % 9 | % GPTIPS 2 10 | 11 | %residuals 12 | e = (yactual-ypredicted); 13 | stats.e = e; 14 | 15 | %rsquared 16 | stats.rsq = 1 - sum(e.^2)/sum( (yactual-mean(yactual)).^2 ); 17 | 18 | %corrcoef 19 | c = corrcoef(yactual, ypredicted); 20 | stats.r = c(1,2); 21 | 22 | %rmse 23 | stats.rmse = sqrt(mean(e.^2)); 24 | 25 | %mae 26 | stats.mae = mean(abs(e)); 27 | 28 | %mse 29 | stats.mse = mean(e.^2); 30 | 31 | 32 | -------------------------------------------------------------------------------- /core/summary.m: -------------------------------------------------------------------------------- 1 | function summary(gp,plotlog) 2 | %SUMMARY Plots basic summary information from a run. 3 | % 4 | % By default, log(fitness) is plotted rather than raw fitness when all 5 | % best fitness history values are > 0. Use SUMMARY(GP,FALSE) to plot raw 6 | % fitness values. 7 | % 8 | % Copyright (c) 2009-2015 Dominic Searson 9 | % 10 | % GPTIPS 2 11 | % 12 | % See also RUNTREE, POPBROWSER, GPPOPVARS, GPMODELVARS 13 | 14 | if nargin < 1 15 | disp('Usage is SUMMARY(GP) to plot log fitness values or SUMMARY(GP,FALSE) to plot raw fitness values.'); 16 | return; 17 | end 18 | 19 | %plot log of fitness/rmse by default 20 | if nargin < 2 || isempty(plotlog) 21 | plotlog = true; 22 | end 23 | 24 | %check if this is a merged gp structure, because the run history data will 25 | %only apply to the first structure in the merge. 26 | if gp.info.merged 27 | mergeStr = '1st run in merged population'; 28 | else 29 | mergeStr = 'run'; 30 | end 31 | 32 | if ~isempty(gp.userdata.name) 33 | setname = ['. Data: ' gp.userdata.name]; 34 | else 35 | setname=''; 36 | end 37 | 38 | if strncmpi('regressmulti',func2str(gp.fitness.fitfun),12) 39 | ylab1 = 'Log RMSE'; 40 | ylab2 = 'RMSE'; 41 | else 42 | ylab1 = ['Log ' gp.fitness.label]; 43 | ylab2 = gp.fitness.label; 44 | end 45 | 46 | if any(gp.results.history.bestfitness <= 0) 47 | plotlog = false; 48 | end 49 | 50 | h = figure; 51 | set(h,'name','GPTIPS 2 run summary','numbertitle','off'); 52 | 53 | if ~plotlog && strncmpi('regressmulti',func2str(gp.fitness.fitfun),12) 54 | ylab1 = 'RMSE'; 55 | end 56 | 57 | 58 | if plotlog 59 | subplot(2,1,1); 60 | stairs(0:1:gp.state.count-1,log(gp.results.history.bestfitness),'LineWidth',2,'Color',[0 0.45 0.74]); 61 | ylabel(ylab1); 62 | else 63 | subplot(2,1,1); 64 | stairs(0:1:gp.state.count-1,gp.results.history.bestfitness,'LineWidth',2,'Color',[0 0.45 0.74]); 65 | ylabel(ylab2); 66 | end 67 | a=axis; 68 | a(2) = gp.state.count-1;axis(a); 69 | grid on; 70 | 71 | 72 | title({['Summary of ' mergeStr], ['Config: ' char(gp.info.configFile) setname '.']},'interpreter','none','fontWeight','bold'); 73 | xlabel('Generation'); 74 | legend('Best fitness'); 75 | legend boxoff; 76 | 77 | subplot(2,1,2); 78 | hold on; 79 | plot([0:1:gp.state.count-1],gp.results.history.meanfitness,'-x','LineWidth',2,'Color',[0.85 0.33 0.1]); 80 | plot([0:1:gp.state.count-1],gp.results.history.meanfitness+gp.results.history.std_devfitness,'r:');hold on; 81 | plot([0:1:gp.state.count-1],gp.results.history.meanfitness-gp.results.history.std_devfitness,'r:'); 82 | a=axis; 83 | a(2) = gp.state.count-1;axis(a); 84 | set(gca,'box','on'); 85 | legend('Mean fitness (+ - 1 std. dev)'); 86 | legend boxoff; 87 | xlabel('Generation'); 88 | ylabel(ylab2); 89 | grid on; 90 | hold off; -------------------------------------------------------------------------------- /core/tree2evalstr.m: -------------------------------------------------------------------------------- 1 | function decodedArray = tree2evalstr(encodedArray,gp) 2 | %TREE2EVALSTR Converts encoded tree expressions into math expressions that MATLAB can evaluate directly. 3 | % 4 | % DECODEDARRAY = TREE2EVALSTR(ENCODEDARRAY,GP) processes the encoded 5 | % symbolic expressions in the cell array ENCODEDARRAY using the 6 | % information in the GP data structure and creates a cell array 7 | % DECODEDARRAY, each element of which is an evaluable Matlab string 8 | % expression. 9 | % 10 | % Remarks: 11 | % 12 | % GPTIPS uses a encoded string representation of expressions which is not 13 | % very human readable, but is fairly compact and makes events like 14 | % crossover and mutation easier to handle. An example of such an 15 | % expression is a(b(x1,a(x4,x3)),c(x2,x1)). Before an expression can be 16 | % evaluated it is converted using TREE2EVALSTR to produce an evaluable 17 | % math expression, e.g. times(minus(x1,times(x4,x3)),plus(x2,x1)). 18 | % 19 | % Copyright (c) 2009-2015 Dominic Searson 20 | % 21 | % GPTIPS 2 22 | % 23 | % See also EVALFITNESS, TREEGEN, GPREFORMAT 24 | 25 | %loop through active function list and replace with function names 26 | for j=1:gp.nodes.functions.num_active 27 | encodedArray = strrep(encodedArray,gp.nodes.functions.afid(j),gp.nodes.functions.active_name_UC{j}); 28 | end 29 | 30 | decodedArray = lower(encodedArray); -------------------------------------------------------------------------------- /core/uniquegenes.m: -------------------------------------------------------------------------------- 1 | function genes = uniquegenes(gp,modelList) 2 | %UNIQUEGENES Returns a GENES structure containing the unique genes in a population. 3 | % 4 | % GENES = UNIQUEGENES(GP) scans through the symbolic regression models 5 | % stored in the population contained in GP and extracts the unique set of 6 | % genes that makes up the population of 'valid' models. This unique set 7 | % is returned in a MATLAB data struct GENES. (See GPMODEL2STRUCT for what 8 | % constitutes a 'valid' model.) 9 | % 10 | % GENES = UNIQUEGENES(GP, MODELLIST) does the same but only scans the 11 | % population members with indices contained in the row vector MODELLIST. 12 | % 13 | % E.g. GENES = UNIQUEGENES(GP, [1 3 9 10]) returns a structure GENES 14 | % containing the unique genes in the models from GP with population 15 | % indices 1, 3, 9 and 10. 16 | % 17 | % Copyright (c) 2009-2015 Dominic Searson 18 | % 19 | % GPTIPS 2 20 | % 21 | % See also GENEFILTER, GENEBROWSER, GPMODELFILTER, GPMODEL2STRUCT 22 | 23 | if nargin < 1 24 | disp('Basic usage is UNIQUEGENES(GP).'); 25 | if nargout > 0 26 | genes = []; 27 | end 28 | return 29 | end 30 | 31 | if ~gp.info.toolbox.symbolic() 32 | error('The Symbolic Math Toolbox is required to use this function.'); 33 | end 34 | 35 | if nargin < 2 36 | modelList = 1:gp.runcontrol.pop_size; 37 | end 38 | 39 | if ~strncmpi(func2str(gp.fitness.fitfun),'regressmulti',12); 40 | error('This function is intended only for use on populations containing multigene regression models.'); 41 | end 42 | 43 | if ~isnumeric(modelList) || size(modelList,1) > 1 44 | error('Supplied model list must be a row vector of model indices.'); 45 | end 46 | 47 | verOld = verLessThan('matlab', '7.7.0'); 48 | 49 | numSuppliedModels = numel(modelList); 50 | 51 | %check model validity 52 | numGenes = 0; 53 | models = cell(numSuppliedModels,1); 54 | disp(['Scanning ' num2str(numSuppliedModels) ' models ...']); 55 | 56 | for i = 1:numSuppliedModels 57 | models{i} = gpmodel2struct(gp,modelList(i),false,false,false); 58 | if ~models{i}.valid 59 | %if model is 'invalid' do nothing and skip to next model 60 | else 61 | numGenes = numGenes + models{i}.genes.num_genes; 62 | end 63 | end 64 | 65 | if numGenes == 0; 66 | error('No valid models found in supplied population list.'); 67 | end 68 | 69 | %addin genes from 'best' model on training data and 'valbest' and 70 | %'testbest' 71 | if numSuppliedModels == gp.runcontrol.pop_size 72 | modelbest = gpmodel2struct(gp,'best', false, false, false); 73 | modelvalbest = gpmodel2struct(gp,'valbest', false, false, false); 74 | modeltestbest = gpmodel2struct(gp,'testbest', false, false, false); 75 | 76 | numGenes = numGenes + modelbest.genes.num_genes; 77 | 78 | if modelvalbest.valid 79 | numGenes = numGenes + modelvalbest.genes.num_genes; 80 | end 81 | 82 | if modeltestbest.valid 83 | numGenes = numGenes + modeltestbest.genes.num_genes; 84 | end 85 | else 86 | modelvalbest.valid = false; 87 | modeltestbest.valid = false; 88 | modelbest.valid = false; 89 | end 90 | 91 | %loop through valid models and get all genes 92 | allGenes = cell(numGenes,1); 93 | offset = 0; 94 | 95 | for i=1:numSuppliedModels 96 | if models{i}.valid 97 | numModelGenes = models{i}.genes.num_genes; 98 | allGenes(1+offset:numModelGenes+offset) = models{i}.genes.geneStrs; 99 | offset = offset + numModelGenes; 100 | end 101 | end 102 | 103 | if modelbest.valid 104 | numModelGenes = modelbest.genes.num_genes; 105 | allGenes(1+offset:numModelGenes+offset) = modelbest.genes.geneStrs; 106 | offset = offset + numModelGenes; 107 | end 108 | 109 | if modelvalbest.valid 110 | numModelGenes = modelvalbest.genes.num_genes; 111 | allGenes(1+offset:numModelGenes+offset) = modelvalbest.genes.geneStrs; 112 | offset = offset + numModelGenes; 113 | end 114 | 115 | if modeltestbest.valid 116 | numModelGenes = modeltestbest.genes.num_genes; 117 | allGenes(1+offset:numModelGenes+offset) = modeltestbest.genes.geneStrs; 118 | offset = offset + numModelGenes; 119 | end 120 | 121 | unique_genes = unique(allGenes); 122 | disp(['Total encoded genes found: ' num2str(numel(allGenes))]); 123 | disp(['Unique encoded genes found: ' num2str(numel(unique_genes))]); 124 | disp('Decoding & simplifying genes...please wait.'); 125 | 126 | genes.uniqueGenesCoded = unique_genes; 127 | genes.uniqueGenesDecoded = gpreformat(gp,unique_genes)'; 128 | genes.numModelsScanned = numSuppliedModels; 129 | 130 | symgenes = cell(numel(genes.uniqueGenesDecoded),1); 131 | simpleGenes = cell(numel(genes.uniqueGenesDecoded),1); 132 | simpleGenesChar = cell(numel(genes.uniqueGenesDecoded),1); 133 | 134 | for i=1:numel(symgenes) 135 | 136 | disp(['Simplifying gene ' num2str(i)]); 137 | symgenes{i} = sym(genes.uniqueGenesDecoded{i}); 138 | 139 | try 140 | simpleGenes{i} = gpsimplify(symgenes{i},10,verOld, true); 141 | catch 142 | warning(['Could not simplify gene ' num2str(i)]); 143 | simpleGenes{i} = symgenes{i}; 144 | end 145 | 146 | simpleGenesChar{i} = char(simpleGenes{i}); 147 | end 148 | 149 | [~, uniqueInds] = unique(simpleGenesChar); 150 | genes.symUniqueInds = sort(uniqueInds); 151 | genes.uniqueGenesSym = simpleGenes(genes.symUniqueInds); 152 | genes.numUniqueGenes = numel(genes.uniqueGenesSym); 153 | 154 | %rectify non-sym gene lists 155 | genes.uniqueGenesCoded = genes.uniqueGenesCoded(genes.symUniqueInds); 156 | genes.uniqueGenesDecoded = genes.uniqueGenesDecoded(genes.symUniqueInds); 157 | 158 | genes.totalGenes = numel(allGenes); 159 | genes.numUniqueGenes = numel(genes.uniqueGenesSym); 160 | disp(['Unique decoded genes found: ' num2str(genes.numUniqueGenes)]); 161 | 162 | %modify the GP structure to just contain the unique genes 163 | gp.pop = genes.uniqueGenesCoded; 164 | gp.state.run_completed = true; 165 | gp.state.force_compute_theta = true; 166 | gp.runcontrol.pop_size = genes.numUniqueGenes; 167 | gp.userdata.showgraphs = false; 168 | gp.userdata.stats = false; 169 | 170 | codedGenes2Use = genes.uniqueGenesCoded; 171 | geneOutputs = zeros(length(gp.userdata.ytrain),genes.numUniqueGenes); 172 | 173 | if isfield(gp.userdata,'yval') && ~isempty(gp.userdata.yval) 174 | geneOutputsVal = zeros(length(gp.userdata.yval),genes.numUniqueGenes); 175 | else 176 | geneOutputsVal = []; 177 | end 178 | 179 | if isfield(gp.userdata,'ytest') && ~isempty(gp.userdata.ytest) 180 | geneOutputsTest = zeros(length(gp.userdata.ytest),genes.numUniqueGenes); 181 | else 182 | geneOutputsTest = []; 183 | end 184 | 185 | rtrainGenes = zeros(genes.numUniqueGenes,1); 186 | geneComplexity = zeros(genes.numUniqueGenes,1); 187 | 188 | %evaluate the genes against the output & get corrcoef 189 | for i=1:numel(codedGenes2Use) 190 | evalstr = tree2evalstr(codedGenes2Use{i},gp); 191 | gp.state.current_individual = i; 192 | [fitness,gp,~,ypredtrain,~,~,~,~,~,~,... 193 | gene_outputs,gene_outputs_test,gene_outputs_val] = feval(gp.fitness.fitfun,{evalstr},gp); 194 | 195 | %in rare cases genes in isolation give overflows etc, just set the 196 | %corrcoef to zero. 197 | if isinf(fitness) 198 | rtrainGenes(i,1) = 0; 199 | else 200 | c = abs(corrcoef(ypredtrain,gp.userdata.ytrain)); 201 | rtrainGenes(i,1) = c(1,2); 202 | geneComplexity(i,1) = getcomplexity(codedGenes2Use{i}); 203 | geneOutputs(:,i) = gene_outputs(:,2); 204 | if ~isempty(gene_outputs_test) 205 | geneOutputsTest(:,i) = gene_outputs_test(:,2); 206 | elseif i == 1 207 | geneOutputsTest = []; 208 | end 209 | 210 | if ~isempty(gene_outputs_val) 211 | geneOutputsVal(:,i) = gene_outputs_val(:,2); 212 | elseif i == 1 213 | geneOutputsVal = []; 214 | end 215 | end 216 | end 217 | 218 | %tidy up 219 | genes.about = 'A struct containing unique genes from a population.'; 220 | genes.geneOutputsTrain = geneOutputs; 221 | genes.geneOutputsTest = geneOutputsTest; 222 | genes.geneOutputsVal = geneOutputsVal; 223 | genes.rtrain = rtrainGenes; 224 | genes.complexity = geneComplexity; 225 | genes = rmfield(genes,'symUniqueInds'); 226 | genes.filtered = false; 227 | genes = orderfields(genes); 228 | genes.source = 'uniqueGenes'; -------------------------------------------------------------------------------- /core/updatestats.m: -------------------------------------------------------------------------------- 1 | function gp = updatestats(gp) 2 | %UPDATESTATS Update run statistics. 3 | % 4 | % GP = UPDATESTATS(GP) updates the stats of the GP struct. 5 | % 6 | % Copyright (c) 2009-2015 Dominic Searson 7 | % 8 | % GPTIPS 2 9 | % 10 | % See also DISPLAYSTATS, GPINIT, GPFINALISE 11 | 12 | %check for NaN fitness values 13 | nanInd = isnan(gp.fitness.values); 14 | 15 | %write best fitness of current generation to gp struct 16 | if gp.fitness.minimisation 17 | 18 | gp.fitness.values(nanInd) = Inf; 19 | bestTrainFitness = min(gp.fitness.values); 20 | 21 | else %maximisation of fitness function 22 | 23 | %replace Inf and NaN with -Inf 24 | infInd = isinf(gp.fitness.values); 25 | gp.fitness.values(infInd) = -Inf; 26 | gp.fitness.values(nanInd) = -Inf; 27 | bestTrainFitness = max(gp.fitness.values); 28 | 29 | end 30 | 31 | %There may be more than one individual with best fitness. If so designate 32 | %as "best" the one with the lowest complexity/node count. If there is more 33 | %than one with the same complexity then just pick the first one 34 | %(effectively a random selection). This doesn't affect the GP run, it only 35 | %affects the reporting of which individual is currently considered "best". 36 | bestTrainInds = find(gp.fitness.values == bestTrainFitness); 37 | 38 | nodes = gp.fitness.complexity(bestTrainInds); 39 | 40 | [~,ind] = min(nodes); 41 | bestTrainInd = bestTrainInds(ind); 42 | 43 | %store best of current population (training) 44 | gp.state.best.fitness = bestTrainFitness; 45 | gp.state.best.individual = gp.pop{bestTrainInd}; 46 | gp.state.best.returnvalues = gp.fitness.returnvalues{bestTrainInd}; 47 | gp.state.best.complexity = getcomplexity(gp.state.best.individual); 48 | gp.state.best.nodecount = getnumnodes(gp.state.best.individual); 49 | 50 | gp.results.history.bestfitness(gp.state.count,1) = bestTrainFitness; 51 | gp.state.best.index = bestTrainInd; 52 | 53 | %calc. mean and std. dev. fitness (exc. inf values) 54 | notinfInd = ~isinf(gp.fitness.values); 55 | gp.state.meanfitness = mean(gp.fitness.values(notinfInd)); 56 | gp.state.std_devfitness = std(gp.fitness.values(notinfInd)); 57 | gp.results.history.meanfitness(gp.state.count,1) = gp.state.meanfitness; 58 | gp.results.history.std_devfitness(gp.state.count,1) = gp.state.std_devfitness; 59 | 60 | %update best of run so far on training data 61 | if gp.state.count == 1 %if first gen then "best of run" is best of current gen 62 | 63 | gp.results.best.fitness = gp.state.best.fitness; 64 | gp.results.best.individual = gp.state.best.individual; 65 | gp.results.best.returnvalues = gp.state.best.returnvalues; 66 | gp.results.best.complexity = gp.state.best.complexity; 67 | gp.results.best.nodecount = gp.state.best.nodecount; 68 | gp.results.best.foundatgen = 0; 69 | gp.results.best.eval_individual = tree2evalstr(gp.state.best.individual,gp); 70 | 71 | %update run "best" fitness if current gen best fitness is better (or 72 | %the same but less complex) 73 | else 74 | 75 | updateTrainBest = false; 76 | 77 | %update 'best' depending on chosen measure of complexity 78 | if gp.fitness.complexityMeasure 79 | stateComp = gp.state.best.complexity; 80 | bestComp = gp.results.best.complexity; 81 | else 82 | stateComp = gp.state.best.nodecount; 83 | bestComp = gp.results.best.nodecount; 84 | end 85 | 86 | %if minimising fitness function 87 | if gp.fitness.minimisation 88 | 89 | if (gp.state.best.fitness < gp.results.best.fitness) ... 90 | || ( (stateComp < bestComp) && (gp.state.best.fitness == gp.results.best.fitness) ) 91 | updateTrainBest = true; 92 | end 93 | 94 | %if maximising fitness function 95 | else 96 | 97 | if (gp.state.best.fitness > gp.results.best.fitness) ... 98 | || ( (stateComp < bestComp) && (gp.state.best.fitness == gp.results.best.fitness) ) 99 | updateTrainBest = true; 100 | end 101 | end 102 | 103 | if updateTrainBest 104 | gp.results.best.fitness = gp.state.best.fitness; 105 | gp.results.best.individual = gp.state.best.individual; 106 | gp.results.best.returnvalues = gp.state.best.returnvalues; 107 | gp.results.best.complexity = gp.state.best.complexity; 108 | gp.results.best.nodecount = gp.state.best.nodecount; 109 | gp.results.best.foundatgen = gp.state.count - 1; 110 | gp.results.best.eval_individual = tree2evalstr(gp.state.best.individual,gp); 111 | end 112 | 113 | end 114 | -------------------------------------------------------------------------------- /examples/concrete.mat: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/is-centre/gptips2f-matlab/4672d69cdfc457f1d45a33403569447f736fa654/examples/concrete.mat -------------------------------------------------------------------------------- /examples/cubic_config.m: -------------------------------------------------------------------------------- 1 | function gp=cubic_config(gp) 2 | %CUBIC_CONFIG Config file for multigene regression on a simple cubic polynomial. 3 | % 4 | % GP = CUBIC_CONFIG(GP) generates a parameter structure GP that specifies 5 | % the GPTIPS run settings for multigene regression on data generated by 6 | % the cubic polynomial: 7 | % 8 | % 3 2 9 | % 3.4 x1 + 2.9 x1 + 6.2 x1 + 0.75 10 | % 11 | % Example: 12 | % 13 | % GP = RUNGP(@cubic_config) uses this configuration file to perform 14 | % symbolic regression on the cubic polynonial data and returns the 15 | % results in the data struct GP. 16 | % 17 | % Copyright (c) 2009-2015 Dominic Searson 18 | % 19 | % GPTIPS 2 20 | % 21 | % See also SALUSTOWICZ1D_CONFIG, UBALL_CONFIG, RIPPLE_CONFIG, 22 | % REGRESSMULTI_FITFUN, RUNGP 23 | 24 | %generate cubic polynomial data for train and test sets 25 | %and use GPTIPS defaults for everything else 26 | xtr = (-10: 0.2 : 10)'; 27 | ytr = 3.4 * xtr.^3 + 2.9*xtr.^2 + 6.2*xtr + 0.75; 28 | gp.userdata.xtrain = xtr; 29 | gp.userdata.ytrain = ytr; 30 | 31 | xte = (-9.5 : 0.333 : 9.5)'; 32 | yte = 3.4 * xte.^3 + 2.9*xte.^2 + 6.2*xte + 0.75; 33 | gp.userdata.xtest = xte; 34 | gp.userdata.ytest = yte; 35 | gp.userdata.name = 'Cubic poly'; 36 | 37 | %termination criterion 38 | gp.fitness.terminate = true; 39 | gp.fitness.terminate_value = 1e-6; 40 | 41 | %function nodes 42 | gp.nodes.functions.name = {'times','minus','plus','add3','mult3'}; 43 | 44 | -------------------------------------------------------------------------------- /examples/demo2data.mat: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/is-centre/gptips2f-matlab/4672d69cdfc457f1d45a33403569447f736fa654/examples/demo2data.mat -------------------------------------------------------------------------------- /examples/gpdemo1.m: -------------------------------------------------------------------------------- 1 | %GPDEMO1 GPTIPS 2 demo of simple symbolic regression on Koza's quartic polynomial. 2 | % 3 | % Demonstrates simple (naive) symbolic regression and some post run 4 | % analysis functions such as SUMMARY, RUNTREE and GPPRETTY to simplify 5 | % expressions if the Symbolic Math Toolbox is present. Also shows the use 6 | % of DRAWTREES to visualise the tree structure of GPTIPS individuals. 7 | % 8 | % (c) Dominic Searson 2009-2015 9 | % 10 | % GPTIPS 2 11 | % 12 | % See also GPDEMO1_CONFIG, QUARTIC_FITFUN, GPDEMO2, GPDEMO3, GPDEMO4, 13 | % SUMMARY, RUNTREE, GPPRETTY 14 | 15 | clc; 16 | disp('GPTIPS 2 Demo 1: naive symbolic regression'); 17 | disp('------------------------------------------'); 18 | disp('Naive symbolic regression on 20 data points genenerated by the'); 19 | disp('quartic polynomial y=x+x^2+x^3+x^4 in the range -1 < x < 1'); 20 | disp('The GP run configuration file is gpdemo1_config.m'); 21 | disp(' '); 22 | disp('In this example, the direct output of a single evolved GP tree is used to'); 23 | disp('model the data generated by the quartic polynomial.'); 24 | disp('The function nodes used are TIMES, MINUS, PLUS and RDIVIDE.'); 25 | disp('The only terminal node used is the input x and trees are constrained'); 26 | disp('to have a maximum depth of 12.'); 27 | disp(' '); 28 | disp('A population size of 50 is run for 100 generations.'); 29 | disp('The ''fitness'' is sum of absolute differences between the actual and'); 30 | disp('the predicted y values and GPTIPS tries to minimise this.'); 31 | disp(' '); 32 | 33 | disp('First, call GPTIPS using the configuration in gpdemo1_config.m'); 34 | disp('using:'); 35 | disp('>>gp=rungp(@gpdemo1_config);'); 36 | disp('Press a key to continue'); 37 | disp(' '); 38 | pause; 39 | gp=rungp(@gpdemo1_config); 40 | 41 | disp('Next, plot summary information of run using:'); 42 | disp('>>summary(gp)'); 43 | disp('Press a key to continue'); 44 | disp(' '); 45 | pause; 46 | summary(gp,false); 47 | 48 | disp('Run the best model of the run on the fitness function using:'); 49 | disp('>>runtree(gp,''best'');'); 50 | disp('Press a key to continue'); 51 | disp(' '); 52 | pause; 53 | runtree(gp,'best'); 54 | 55 | disp(' '); 56 | disp('The best model of the run is stored in the field:'); 57 | disp('gp.results.best.eval_individual{1} :'); 58 | disp(' '); 59 | disp( gp.results.best.eval_individual{1}); 60 | disp(' '); 61 | 62 | disp(['This model has a tree depth of ' int2str( getdepth(gp.results.best.individual{1}))]); 63 | disp(['It was found at generation ' int2str(gp.results.best.foundatgen)]); 64 | disp(['and has fitness ' num2str(gp.results.best.fitness)]); 65 | 66 | %If Symbolic Math toolbox is present 67 | if gp.info.toolbox.symbolic() 68 | 69 | disp(' '); 70 | disp('Using the symbolic math toolbox simplified versions of this'); 71 | disp('expression can be found: ') 72 | disp('E.g. using the the GPPRETTY command on the best model: '); 73 | disp('>>gppretty(gp,''best'') '); 74 | disp('Press a key to continue'); 75 | disp(' '); 76 | pause; 77 | gppretty(gp,'best'); 78 | disp(' '); 79 | disp('If you are lucky then this is the quartic polynomial used to'); 80 | disp('generate the data.'); 81 | disp('NOTE: In general, it is unusual for GP to evolve the exact form'); 82 | disp('of the generative function.'); 83 | end 84 | 85 | disp(' '); 86 | disp('Next, visualise the tree structure of the best model of the run:'); 87 | disp('>>drawtrees(gp,''best'');'); 88 | disp('Press a key to continue'); 89 | disp(' '); 90 | pause; 91 | drawtrees(gp,'best'); -------------------------------------------------------------------------------- /examples/gpdemo1_config.m: -------------------------------------------------------------------------------- 1 | function gp = gpdemo1_config(gp) 2 | %GPDEMO1_CONFIG Config file demonstrating simple (naive) symbolic regression. 3 | % 4 | % The simple quartic polynomial (y=x+x^2+x^3+x^4) from John Koza's 1992 5 | % Genetic Programming book is used. It is very easy to solve. 6 | % 7 | % GP = GPDEMO1_CONFIG(GP) returns the user specified parameter structure 8 | % GP for the quartic polynomial problem. 9 | % 10 | % Example: 11 | % 12 | % GP = GPTIPS(@GPDEMO1_CONFIG) performs a GPTIPS run using this 13 | % configuration file and returns the results in a structure called GP. 14 | % 15 | % Copyright (c) 2009-2015 Dominic Searson 16 | % 17 | % GPTIPS 2 18 | % 19 | % See also QUARTIC_FITFUN, GPDEMO1 20 | 21 | %run control 22 | gp.runcontrol.pop_size = 50; 23 | gp.runcontrol.num_gen = 100; 24 | gp.runcontrol.verbose = 25; 25 | 26 | %selection 27 | gp.selection.tournament.size = 2; 28 | 29 | %fitness function 30 | gp.fitness.fitfun = @quartic_fitfun; 31 | 32 | %quartic polynomial data 33 | x=linspace(-1,1,20)'; 34 | gp.userdata.x = x; 35 | gp.userdata.y = x + x.^2 +x.^3 + x.^4; 36 | gp.userdata.name = 'Quartic Polynomial'; 37 | 38 | %input configuration 39 | gp.nodes.inputs.num_inp = 1; 40 | 41 | %quartic example doesn't need constants 42 | gp.nodes.const.p_ERC = 0; 43 | 44 | %maximum depth of trees 45 | gp.treedef.max_depth = 12; 46 | 47 | %maximum depth of sub-trees created by mutation operator 48 | gp.treedef.max_mutate_depth = 7; 49 | 50 | %genes 51 | gp.genes.multigene = false; 52 | 53 | %define function nodes 54 | gp.nodes.functions.name = {'times','minus','plus','rdivide'}; -------------------------------------------------------------------------------- /examples/gpdemo2.m: -------------------------------------------------------------------------------- 1 | %GPDEMO2 GPTIPS 2 demo of multigene regression on a non-linear function. 2 | % 3 | % Demonstrates multigene symbolic regression and some post run analysis 4 | % functions such as SUMMARY, RUNTREE, POPBROWSER, DRAWTREES and the use 5 | % of the Symbolic Math Toolbox with GPPRETTY to simplify expressions. 6 | % 7 | % (c) Dominic Searson 2009-2015 8 | % 9 | % GPTIPS 2 10 | % 11 | % See also GPDEMO2_CONFIG, REGRESSMULTI_FITFUN, GPDEMO1, GPDEMO3, 12 | % GPDEMO4, DRAWTREES, SUMMARY, RUNTREE, GPPRETTY, POPBROWSER 13 | 14 | clc; 15 | disp('GPTIPS 2 Demo 2: multigene symbolic regression'); 16 | disp('----------------------------------------------'); 17 | disp('Multigene regression on 400 data points genenerated by the non-linear'); 18 | disp('function with 4 inputs y = exp(2*x1*sin(pi*x4)) + sin(x2*x3).'); 19 | disp(' '); 20 | disp('The configuration file is gpdemo2_config.m and the raw data is in demo2data.mat'); 21 | disp('At the end of the run the predictions of the evolved model will be plotted'); 22 | disp('for the training data as well as for an independent test data set generated'); 23 | disp('from the same function.'); 24 | disp(' '); 25 | disp('Here, 3 genes are used (plus a bias term) so the form of the model will be: '); 26 | disp('ypred = c0 + c1*tree1 + c2*tree2 + c3*tree3'); 27 | disp('where c0 = bias and c1, c2 and c3 are the gene weights.'); 28 | disp(' '); 29 | disp('The bias and weights (i.e. regression coefficients) are automatically'); 30 | disp('determined by a least squares procedure for each multigene model.'); 31 | disp(' '); 32 | disp('In this run the following function nodes are used: '); 33 | disp('TIMES MINUS PLUS SQRT SQUARE SIN COS EXP ADD3 MULT3'); 34 | disp(' '); 35 | disp('The run is configured to proceed for 100 generations or to terminate'); 36 | disp('when a fitness (RMSE) of 0.003 is achieved.'); 37 | disp(' '); 38 | disp('First, run GPTIPS using the configuration in gpdemo2_config.m'); 39 | disp('>>gp = rungp(@gpdemo2_config);'); 40 | disp('Press a key to continue'); 41 | disp(' ');pause; 42 | gp = rungp(@gpdemo2_config); 43 | 44 | disp('Plot summary information of run using'); 45 | disp('>>summary(gp)'); 46 | 47 | disp('Press a key to continue'); 48 | disp(' ');pause;summary(gp); 49 | 50 | disp('Run the best model on the training data using:'); 51 | disp('>>runtree(gp,''best'');'); 52 | 53 | disp('Press a key to continue'); 54 | disp(' ');pause;runtree(gp,'best'); 55 | 56 | %If Symbolic Math toolbox is present 57 | if gp.info.toolbox.symbolic() 58 | disp('Using the symbolic math toolbox it is possible to combine the gene'); 59 | disp('expressions with the gene weights (regression coefficients) to display'); 60 | disp('a single overall model that predicts the output using the inputs x1,'); 61 | disp('x2, x3 and x4.'); 62 | disp('E.g. using the the GPPRETTY function on the best individual on '); 63 | disp('the training data.'); 64 | disp('>>gppretty(gp,''best'')'); 65 | disp('Press a key to continue'); 66 | disp(' '); 67 | pause; 68 | gppretty(gp,'best'); 69 | disp(' '); 70 | disp('Additionally, the DRAWTREES function can be used to draw the genes in any'); 71 | disp('model to a browser window.'); 72 | disp('E.g. to draw the genes in the ''best'' model on the training data use'); 73 | disp('>>drawtrees(gp,''best'');'); 74 | disp('Press a key to continue');disp(' ');pause; 75 | drawtrees(gp,'best'); 76 | end 77 | 78 | disp(' '); 79 | disp('The POPBROWSER function may be used to display the population of evolved'); 80 | disp('models in terms of their expressional complexity as well as their'); 81 | disp('performance (1 - R^2).'); 82 | disp('POPBROWSER can be used to identify models that perform relatively well'); 83 | disp('and are less complex than the ''best'' model in the population (which is'); 84 | disp('highlighted with a red circle). Clicking on a circle reveals the'); 85 | disp('model ID as well as its simplified symbolic equation (if the Symbolic.'); 86 | disp('Math toolbox is installed.'); 87 | disp('The POPBROWSER is launched using:'); 88 | disp('>>popbrowser(gp)'); 89 | disp('Press a key to continue.'); 90 | pause; 91 | popbrowser(gp); -------------------------------------------------------------------------------- /examples/gpdemo2_config.m: -------------------------------------------------------------------------------- 1 | function gp=gpdemo2_config(gp) 2 | %GPDEMO2_CONFIG Config for multigene symbolic regression on data (y) generated from a non-linear function of 4 inputs (x1, x2, x3, x4). 3 | % 4 | % GP = GPDEMO2_CONFIG(GP) returns a parameter structure GP containing the 5 | % settings for GPDEMO2. 6 | % 7 | % Remarks: 8 | % 9 | % There is one output y which is a non-linear function of the four inputs 10 | % y = exp(2*x1*sin(pi*x4)) + sin(x2*x3). The objective of the GP run is 11 | % to evolve a multiple gene symbolic function of x1, x2, x3 and x4 that 12 | % closely approximates y. 13 | % 14 | % This function was described by: 15 | % 16 | % Cherkassky, V., Gehring, D., Mulier F, Comparison of adaptive methods 17 | % for function estimation from samples, IEEE Transactions on Neural 18 | % Networks, 7 (4), pp. 969- 984, 1996. (Function 10 in Appendix 1) 19 | % 20 | % Example: 21 | % 22 | % GP = rungp(@gpdemo2_config) uses this configuration file to perform 23 | % symbolic regression with multiple gene individuals on the data from the 24 | % above function. 25 | % 26 | % (C) Dominic Searson 2009-2015 27 | % 28 | % GPTIPS 2 29 | % 30 | % See also GPDEMO2, GPDEMO3_CONFIG, GPDEM03, GPDEMO1, REGRESSMULTI_FITFUN 31 | 32 | %run control parameters 33 | gp.runcontrol.pop_size = 100; 34 | gp.runcontrol.num_gen = 100; 35 | 36 | %selection 37 | gp.selection.tournament.size = 6; 38 | 39 | %termination 40 | gp.fitness.terminate = true; 41 | gp.fitness.terminate_value = 0.003; 42 | 43 | %load in the raw x and y data 44 | load demo2data 45 | gp.userdata.xtrain = x(101:500,:); %training set (inputs) 46 | gp.userdata.ytrain = y(101:500,1); %training set (output) 47 | gp.userdata.xtest = x(1:100,:); %testing set (inputs) 48 | gp.userdata.ytest = y(1:100,1); %testing set (output) 49 | gp.userdata.name = 'Cherkassky function'; 50 | 51 | %genes 52 | gp.genes.max_genes = 3; 53 | 54 | %define function nodes 55 | gp.nodes.functions.name = {'times','minus','plus','sqrt','square','sin','cos','exp','add3','mult3'}; -------------------------------------------------------------------------------- /examples/gpdemo3.m: -------------------------------------------------------------------------------- 1 | % GPDEMO3 GPTIPS 2 demo of multigene symbolic regression on non-linear simulated pH data. 2 | % 3 | % Demonstrates multigene symbolic regression and some post run analysis 4 | % functions such as SUMMARY and RUNTREE and the use of the Symbolic Math 5 | % Toolbox to simplify expressions and create HTML reports using 6 | % PARETOREPORT, GPMODELREPORT and DRAWTREES to visualise the models. 7 | % 8 | % (c) Dominic Searson 2009-2015 9 | % 10 | % GPTIPS 2 11 | % 12 | % See also GPDEMO3_CONFIG, GPDEMO1, GPDEMO2, GPDEMO4, PARETOREPORT, 13 | % GPMODELREPORT, DRAWTREES, SUMMARY, RUNTREE, GPPRETTY, POPBROWSER 14 | 15 | clc; 16 | disp('GPTIPS 2 Demo 3: multigene pareto symbolic regression on pH data'); 17 | disp('------------------------------------------------------------------'); 18 | disp('In this example, the training data is 700 steady state data points'); 19 | disp('from a simulation of a pH neutralisation process.'); 20 | disp(' ' ); 21 | disp('Here we use use pareto tournaments to bias the model discovery process'); 22 | disp('towards low complexity models.'); 23 | disp(' '); 24 | disp('GPTIPS is run 3 times for a maximum of 10 seconds per run or until a'); 25 | disp('RMSE of 0.2 is reached. The runs are merged into a single population'); 26 | disp('at the end.'); 27 | disp(' '); 28 | disp('The output y has an unknown non-linear dependence on the 4 inputs x1,'); 29 | disp('x2, x3 and x4.'); 30 | disp(' '); 31 | disp('300 data points are available as a test set to validate the evolved model(s).'); 32 | disp(''); 33 | disp('The configuration file is gpdemo3_config.m and the raw data is in ph2data.mat'); 34 | disp(' '); 35 | disp('Here, 6 genes are used (plus a bias term) so the form of the model will be'); 36 | disp('ypred = c0 + c1*tree1 + ... + c6*tree6'); 37 | disp('where ypred = predicted output, c0 = bias and c1,...,c6 are the gene weights.') 38 | disp(' '); 39 | disp('Genes are limited to a depth of 4.'); 40 | disp(' '); 41 | disp('The function nodes used are: TIMES MINUS PLUS TANH MULT3 ADD3'); 42 | disp(' '); 43 | disp('First, run GPTIPS using the configuration in gpdemo3_config.m :'); 44 | disp('>>gp=rungp(@gpdemo3_config);'); 45 | disp('Press a key to continue'); 46 | disp(' '); 47 | pause; 48 | 49 | %run GPTIPS using the configuration in gpdemo3_config.m 50 | gp = rungp(@gpdemo3_config); 51 | 52 | %run the best individual of the run on the fitness function 53 | disp(' '); 54 | disp('Evaluate the ''best'' individual of the run using:'); 55 | disp('>>runtree(gp,''best'');'); 56 | 57 | disp('Press a key to continue');disp(' ');pause; 58 | runtree(gp,'best'); 59 | 60 | %run the best individual of the run on the fitness function 61 | disp(' '); 62 | disp('Next, display the population in terms of performance and complexity.'); 63 | disp('Because pareto tournaments have been enabled you should notice a'); 64 | disp('''well-defined'' pareto front (green circles).'); 65 | disp('This should indicate good models in terms of the performance/complexity'); 66 | disp('tradeoff rather than simply the ''best'' model on the training data (this'); 67 | disp('may be considerably more complex than very slightly ''lower'' performing'); 68 | disp('models.'); 69 | disp('>>popbrowser(gp);'); 70 | 71 | disp('Press a key to continue');disp(' ');pause; 72 | popbrowser(gp); 73 | 74 | %If Symbolic Math toolbox is present 75 | if gp.info.toolbox.symbolic() 76 | 77 | %pareto report 78 | disp(' '); 79 | disp('The PARETOREPORT function generates a standalone interactive HTML report'); 80 | disp('listing the multigene regression models on the Pareto front in terms'); 81 | disp('of their simplified equation structure, expressional complexity and'); 82 | disp('performance on the training data (R2).'); 83 | disp('These models correspond to the green circles in the popbrowser'); 84 | disp('visualisation and can be sorted by performance or complexity by'); 85 | disp('clicking on the appropriate column header.'); 86 | disp('>>paretoreport(gp);'); 87 | disp('Press a key to continue');disp(' ');pause; 88 | paretoreport(gp); 89 | 90 | %gppretty 91 | disp(' '); 92 | disp('It is possible to display any multigene model at the command line.'); 93 | disp('E.g. to use the GPPRETTY command on the ''best'' model on the training data: '); 94 | disp('>>gppretty(gp,''best'')'); 95 | disp('Press a key to continue'); 96 | disp(' '); 97 | pause; 98 | gppretty(gp,'best'); 99 | end 100 | disp(' '); 101 | disp('Additionally, the DRAWTREES function can be used to draw the genes in any'); 102 | disp('model to a browser window.'); 103 | disp('E.g. to draw the genes in the ''best'' model on the training data use'); 104 | disp('>>drawtrees(gp,''best'');'); 105 | disp('Press a key to continue');disp(' ');pause; 106 | drawtrees(gp,'best'); 107 | 108 | %reports 109 | if gp.info.toolbox.symbolic() 110 | 111 | disp(' '); 112 | disp('Finally, for multigene models the GPMODELREPORT function can be '); 113 | disp('used to generate a comprehensive model performance report for '); 114 | disp('reference purposes. This is created in a browser window.'); 115 | disp('E.g. to generate a performance report for the ''best'' model'); 116 | disp('on the training data use'); 117 | disp('>>gpmodelreport(gp,''best'');'); 118 | disp('Press a key to continue');disp(' ');pause; 119 | gpmodelreport(gp,'best'); 120 | end -------------------------------------------------------------------------------- /examples/gpdemo3_config.m: -------------------------------------------------------------------------------- 1 | function gp = gpdemo3_config(gp) 2 | %GPDEMO3_CONFIG Config file demonstrating multigene symbolic regression on data from a simulated pH neutralisation process. 3 | % 4 | % This is the configuration file that GPDEMO3 calls. 5 | % 6 | % GP = GPDEMO3_CONFIG(GP) generates a parameter structure GP that 7 | % specifies the GPTIPS run settings. 8 | % 9 | % In this example, a maximum run time of 10 seconds is allowed (3 runs). 10 | % 11 | % Remarks: 12 | % The data in this example is taken a simulation of a pH neutralisation 13 | % process with one output (pH), which is a non-linear function of the 14 | % four inputs. 15 | % 16 | % Example: 17 | % GP = RUNGP(@GPDEMO3_CONFIG) uses this configuration file to perform 18 | % symbolic regression with multiple gene individuals on the pH data. The 19 | % results and parameters used are stored in fields of the returned GP 20 | % structure. 21 | % 22 | % Copyright (c) 2009-2015 Dominic Searson 23 | % 24 | % GPTIPS 2 25 | % 26 | % See also REGRESSMULTI_FITFUN, GPDEMO3, GPDEMO2, GPDEMO1, RUNGP 27 | 28 | %run control parameters 29 | gp.runcontrol.pop_size = 250; 30 | gp.runcontrol.timeout = 10; 31 | gp.runcontrol.runs = 3; 32 | 33 | %selection 34 | gp.selection.tournament.size = 25; 35 | gp.selection.tournament.p_pareto = 0.7; 36 | gp.selection.elite_fraction = 0.7; 37 | gp.nodes.const.p_int= 0.5; 38 | 39 | %fitness 40 | gp.fitness.terminate = true; 41 | gp.fitness.terminate_value = 0.2; 42 | 43 | %set up user data 44 | load ph2data 45 | 46 | gp.userdata.xtest = nx; %testing set (inputs) 47 | gp.userdata.ytest = ny; %testing set (output) 48 | gp.userdata.xtrain = x; %training set (inputs) 49 | gp.userdata.ytrain = y; %training set (output) 50 | gp.userdata.name = 'pH'; 51 | 52 | %genes 53 | gp.genes.max_genes = 6; 54 | 55 | %define building block function nodes 56 | gp.nodes.functions.name = {'times','minus','plus','tanh','mult3','add3'}; -------------------------------------------------------------------------------- /examples/gpdemo4.m: -------------------------------------------------------------------------------- 1 | % GPDEMO4 GPTIPS 2 demo of multigene symbolic regression on a concrete compressive strength data set. 2 | % 3 | % The output being modelled is concrete compressive strength (MPa) and 4 | % the input variables are: 5 | % 6 | % Cement (x1) - kg in a m3 mixture 7 | % Blast furnace slag (x2) - kg in a m3 mixture 8 | % Fly ash (x3) - kg in a m3 mixture 9 | % Water (x4) - kg in a m3 mixture 10 | % Superplasticiser (x5) - kg in a m3 mixture 11 | % Coarse aggregate (x6) - kg in a m3 mixture 12 | % Fine aggregate (x7) - kg in a m3 mixture 13 | % Age (x8) - range 1 - 365 days 14 | % 15 | % Demonstrates feature selection in multigene symbolic regression and 16 | % some post run analysis functions. 17 | % 18 | % (c) Dominic Searson 2009-2015 19 | % 20 | % GPTIPS 2 21 | % 22 | % See also GPDEMO4_CONFIG, GPDEMO1, GPDEMO2, GPDEMO4, PARETOREPORT, 23 | % GPMODELREPORT, DRAWTREES, SUMMARY, RUNTREE, GPPRETTY, POPBROWSER 24 | 25 | clc; 26 | disp('GPTIPS 2 Demo 4: feature selection with concrete compressive strength data set'); 27 | disp('------------------------------------------------------------------------------'); 28 | disp('The output being modelled is concrete compressive strength (MPa) and'); 29 | disp('the input variables are:'); 30 | disp(' '); 31 | disp(' Cement (x1) - kg in a m3 mixture'); 32 | disp(' Blast furnace slag (x2) - kg in a m3 mixture'); 33 | disp(' Fly ash (x3) - kg in a m3 mixture'); 34 | disp(' Water (x4) - kg in a m3 mixture'); 35 | disp(' Superplasticiser (x5) - kg in a m3 mixture'); 36 | disp(' Coarse aggregate (x6) - kg in a m3 mixture'); 37 | disp(' Fine aggregate (x7) - kg in a m3 mixture'); 38 | disp(' Age (x8) - range 1 - 365 days'); 39 | disp(' '); 40 | disp('To demonstrate feature selection in GPTIPS another 50 variables '); 41 | disp('consisting of normally distributed noise have been added to form the'); 42 | disp('input variables x9 to x58.'); 43 | disp(' '); 44 | disp('The configuration file is gpdemo4_config.m and the raw data is in'); 45 | disp('concrete.mat'); 46 | disp(' '); 47 | disp('The data has been divided into a training set, a holdout validation set'); 48 | disp('and a testing set.'); 49 | disp(' '); 50 | disp('GPTIPS is run twice for a maximum of 30 seconds per run or until a'); 51 | disp('RMSE of 6.5 is reached. The runs are merged into a single population'); 52 | disp('at the end.'); 53 | disp(' '); 54 | disp('6 genes are used (plus a bias term) so the form of the model will be'); 55 | disp('ypred = c0 + c1*tree1 + ... + c6*tree6'); 56 | disp('where ypred = predicted output, c0 = bias and c1,...,c6 are the gene weights.') 57 | disp(' '); 58 | disp('Genes are limited to a depth of 4.'); 59 | disp(' '); 60 | disp('The function nodes used are:'); 61 | disp('TIMES MINUS PLUS RDIVIDE SQUARE TANH EXP LOG MULT3 ADD3 SQRT CUBE'); 62 | disp('POWER NEGEXP NEG ABS'); 63 | disp(' '); 64 | disp('The input variables that appear in the best model on the training'); 65 | disp('and validation data sets can be displayed at run time by '); 66 | disp('including the following two settings in gpdemo4_config.m : '); 67 | disp(' '); 68 | disp('gp.runcontrol.showBestInputs = true;'); 69 | disp('gp.runcontrol.showValBestInputs = true;'); 70 | disp(' '); 71 | disp('GPTIPS is run with the configuration in gpdemo4_config.m using :'); 72 | disp('>>gp=rungp(@gpdemo4_config);'); 73 | disp('Press a key to continue'); 74 | disp(' '); 75 | pause; 76 | gp = rungp(@gpdemo4_config); 77 | 78 | %Run the best val individual of the run on the fitness function 79 | disp(' '); 80 | disp('Evaluate the best validation individual of'); 81 | disp('the runs on the fitness function using:'); 82 | disp('>>runtree(gp,''valbest'');'); 83 | 84 | disp('Press a key to continue'); 85 | disp(' '); 86 | pause; 87 | runtree(gp,'valbest'); 88 | 89 | %If Symbolic Math toolbox is present 90 | if gp.info.toolbox.symbolic() 91 | 92 | disp(' '); 93 | disp('Next, use the the GPPRETTY command on the best validation individual: '); 94 | disp('>>gppretty(gp,''valbest'')'); 95 | disp('Press a key to continue'); 96 | disp(' '); 97 | pause; 98 | 99 | gppretty(gp,'valbest'); 100 | disp(' '); 101 | disp('If the runs have been successful, the only variables present in '); 102 | disp('the best validation model should be the following:'); 103 | disp('Cement Slag Ash Water Plastic Course Fine Age'); 104 | disp('(these are defined as variable name aliases in gpdemo4_config.m'); 105 | disp('using the gp.nodes.inputs.names setting.)'); 106 | disp(' '); 107 | disp('Less successful runs may contain the noise variables x9 - x58.'); 108 | disp('If the results seem poor, try running the demo again.'); 109 | 110 | end 111 | 112 | disp('Press a key to continue'); 113 | disp(' '); 114 | pause; 115 | disp('To visualise at the frequency distribution of input variables in all'); 116 | disp('models with an R^2 >= 0.75 the GPPOPVARS function can be used.'); 117 | disp('This should show a high frequency of variables x1 - x8 and a low'); 118 | disp('frequency of the irrelevant noise inputs.'); 119 | disp('>>gppopvars(gp,0.75);'); 120 | disp(' '); 121 | disp('Press a key to continue'); 122 | disp(' '); 123 | pause 124 | gppopvars(gp,0.75); 125 | disp(' '); 126 | 127 | if gp.info.toolbox.symbolic() 128 | disp('Finally, an HTML report listing the models on the Pareto optimal front'); 129 | disp('of model expressional complexity and performance can be generated using'); 130 | disp('the PARETOREPORT function.'); 131 | disp('>>paretoreport(gp)'); 132 | disp(' '); 133 | disp('Press a key to continue'); 134 | disp(' '); 135 | pause; 136 | paretoreport(gp); 137 | end -------------------------------------------------------------------------------- /examples/gpdemo4_config.m: -------------------------------------------------------------------------------- 1 | function gp = gpdemo4_config(gp) 2 | %GPDEMO4_CONFIG Config file demonstrating feature selection with multigene symbolic regression. 3 | % 4 | % This is the configuration file that GPDEMO4 calls. 5 | % 6 | % GP = GPDEMO4_CONFIG(GP) generates a parameter structure GP that 7 | % specifies the GPTIPS run settings. 8 | % 9 | % Remarks: 10 | % The data used in this example was retrieved from the UCI Machine 11 | % Learning Repository: 12 | % 13 | % http://archive.ics.uci.edu/ml/datasets/Concrete+Compressive+Strength 14 | % 15 | % The output being modelled is concrete compressive strength (MPa) and the 16 | % independent variables are: 17 | % 18 | % Cement (x1) - kg in a m3 mixture 19 | % Blast furnace slag (x2) - kg in a m3 mixture 20 | % Fly ash (x3) - kg in a m3 mixture 21 | % Water (x4) - kg in a m3 mixture 22 | % Superplasticiser (x5) - kg in a m3 mixture 23 | % Coarse aggregate (x6) - kg in a m3 mixture 24 | % Fine aggregate (x7) - kg in a m3 mixture 25 | % Age (x8) - range 1 - 365 days 26 | % 27 | % Plus 50 irrelevant noise variables (x9 -> x58). 28 | % 29 | % Example: 30 | % 31 | % GP = RUNGP(@gpdemo4_config) uses this configuration file to perform 32 | % symbolic regression with multiple gene individuals on the concrete data. 33 | % The results and parameters used are stored in fields of the returned GP 34 | % structure. 35 | % 36 | % Further remarks: 37 | % 38 | % The demo configuration shows that variable selection is implicitly 39 | % performed by the GPTIPS algorithm. 40 | % 41 | % Copyright (c) 2009-2015 Dominic Searson 42 | % 43 | % GPTIPS 2 44 | % 45 | % See also REGRESSMULTI_FITFUN, GPDEMO4, GPDEMO3 GPDEMO2, GPDEMO1, RUNGP 46 | 47 | %run control 48 | gp.runcontrol.pop_size = 300; 49 | gp.runcontrol.num_gen = 500; 50 | gp.runcontrol.showBestInputs = true; 51 | gp.runcontrol.showValBestInputs = true; 52 | gp.runcontrol.timeout = 30; 53 | gp.runcontrol.runs = 2; 54 | 55 | %selection 56 | gp.selection.tournament.size = 15; 57 | gp.selection.elite_fraction = 0.3; 58 | 59 | %fitness 60 | gp.fitness.terminate = true; 61 | gp.fitness.terminate_value = 7; 62 | 63 | %multigene 64 | gp.genes.max_genes = 6; 65 | 66 | %constants 67 | gp.nodes.const.p_ERC = 0.05; 68 | 69 | %data 70 | load concrete; 71 | 72 | %allocate to train, validation and test groups 73 | gp.userdata.xtrain = Concrete_Data(tr_ind,1:8); 74 | gp.userdata.ytrain = Concrete_Data(tr_ind,9); 75 | 76 | gp.userdata.ytest = Concrete_Data(te_ind,9); 77 | gp.userdata.xtest = Concrete_Data(te_ind,1:8); 78 | 79 | gp.userdata.xval = gp.userdata.xtrain(val_ind,1:8); 80 | gp.userdata.yval = gp.userdata.ytrain(val_ind); 81 | 82 | gp.userdata.xtrain = gp.userdata.xtrain(tr_ind2,:); 83 | gp.userdata.ytrain = gp.userdata.ytrain(tr_ind2); 84 | 85 | %add 100 noise variables to make problem harder in terms of feature 86 | %selection 87 | gp.userdata.xtrain = [gp.userdata.xtrain 100*randn(size(gp.userdata.xtrain,1),50) ]; 88 | gp.userdata.xtest = [gp.userdata.xtest 100*randn(size(gp.userdata.xtest,1),50) ]; 89 | gp.userdata.xval = [gp.userdata.xval 100*randn(size(gp.userdata.xval,1),50) ]; 90 | 91 | %enables hold out validation set 92 | gp.userdata.user_fcn = @regressmulti_fitfun_validate; 93 | 94 | %give known variables aliases (this can include basic HTML markup) 95 | gp.nodes.inputs.names = {'Cement','Slag','Ash','Water','Plastic','Course','Fine','Age'}; 96 | 97 | %name for dataset 98 | gp.userdata.name = 'Concrete'; 99 | 100 | %define building block function nodes 101 | gp.nodes.functions.name = {'times','minus','plus','rdivide','square','tanh',... 102 | 'exp','log','mult3','add3','sqrt','cube','negexp','neg','abs'}; -------------------------------------------------------------------------------- /examples/ph2data.mat: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/is-centre/gptips2f-matlab/4672d69cdfc457f1d45a33403569447f736fa654/examples/ph2data.mat -------------------------------------------------------------------------------- /examples/quartic_fitfun.m: -------------------------------------------------------------------------------- 1 | function [fitness,gp] = quartic_fitfun(evalstr,gp) 2 | %QUARTIC_FITFUN Fitness function for simple ("naive") symbolic regression on the quartic polynomial y = x + x^2 + x^3 + x^4. 3 | % 4 | % FITNESS = QUARTIC_FITFUN(EVALSTR,GP) returns the FITNESS value of the 5 | % symbolic expression contained within the cell array EVALSTR using the 6 | % information in the GP struct. 7 | % 8 | % Copyright (c) 2009-2015 Dominic Searson 9 | % 10 | % GPTIPS 2 11 | % 12 | % See also GPDEMO1_CONFIG, GPDEMO1 13 | 14 | %extract x and y data from GP struct 15 | x1 = gp.userdata.x; 16 | y = gp.userdata.y; 17 | 18 | %evaluate the tree (assuming only 1 gene is suppled in this case - if the 19 | %user specified multigene config then only the first gene encountered will be used) 20 | eval(['out=' evalstr{1} ';']); 21 | 22 | %fitness is sum of absolute differences between actual and predicted y 23 | fitness = sum( abs(out-y) ); 24 | 25 | %if this is a post run call to this function then plot graphs 26 | if gp.state.run_completed 27 | figure; 28 | plot(x1,y,'o-','LineWidth',2,'color',[0 0.45 0.74]);grid on;hold on; 29 | plot(x1,out,'x-','LineWidth',2,'color',[0.85 0.33 0.1]); 30 | xlabel('x');ylabel('y'); 31 | legend('y','predicted y');legend boxoff; hold off; 32 | title('Prediction of quartic polynomial over range [-1 1]'); 33 | end 34 | 35 | -------------------------------------------------------------------------------- /examples/ripple_config.m: -------------------------------------------------------------------------------- 1 | function gp = ripple_config(gp) 2 | %RIPPLE_CONFIG Config file for multigene regression on the 2D Ripple function. 3 | % 4 | % GP = RIPPLE_CONFIG(GP) generates a parameter structure GP that 5 | % specifies the GPTIPS run settings for the 2D Ripple function. This is 6 | % a function f of two input variables as follows: 7 | % 8 | % f(x1,x2) = (x1 - 3)(x2 - 3) + 2 sin((x1 - 4) (x2 - 4)) 9 | % 10 | % Note: 11 | % 12 | % The settings in this file are not intended to be 'optimal'. Feel free 13 | % to experiment with them. 14 | % 15 | % Example: 16 | % 17 | % GP = RUNGP(@RIPPLE_CONFIG) uses this configuration file to perform 18 | % symbolic regression with multigene individuals. 19 | % 20 | % Copyright (c) 2009-2015 Dominic Searson 21 | % 22 | % GPTIPS 2 23 | % 24 | % See also SALUSTOWICZ1D_CONFIG, UBALL_CONFIG, CUBIC_CONFIG, 25 | % REGRESSMULTI_FITFUN, RUNGP 26 | 27 | %run control 28 | gp.runcontrol.pop_size = 200; 29 | gp.runcontrol.num_gen = 500; 30 | gp.runcontrol.verbose = 5; 31 | gp.runcontrol.timeout = 30; 32 | gp.runcontrol.runs = 3; 33 | gp.runcontrol.parallel.auto = true; 34 | 35 | %selection 36 | gp.selection.tournament.size = 20; 37 | gp.selection.tournament.p_pareto = 0.2; 38 | gp.selection.elite_fraction = 0.3; 39 | 40 | %genes 41 | gp.genes.max_genes = 8; %increase this for more 'accurate' but more unwieldy models 42 | 43 | %data 44 | %generate 300 random training data points in the range 0.05 6.05 45 | x = 0.05 + rand(300,2) * 6; 46 | y = (x(:,1)-3).*(x(:,2)-3) + 2*sin( (x(:,1)-4).*(x(:,2)-4) ); 47 | 48 | gp.userdata.ytrain = y; 49 | gp.userdata.xtrain = x; 50 | 51 | %generate 1000 random test data 52 | x = 0.05 + rand(1000,2) * 6; 53 | y = (x(:,1)-3).*(x(:,2)-3) + 2*sin( (x(:,1)-4).*(x(:,2)-4) ); 54 | 55 | gp.userdata.ytest = y; 56 | gp.userdata.xtest = x; 57 | 58 | %generate holdout validation in same region as training space 59 | x=0.05 + rand(200,2) * 6; 60 | y=(x(:,1)-3).*(x(:,2)-3) + 2*sin( (x(:,1)-4).*(x(:,2)-4) ); 61 | 62 | gp.userdata.yval = y; 63 | gp.userdata.xval = x; 64 | gp.userdata.name = 'Ripple 2D'; 65 | gp.userdata.user_fcn = @regressmulti_fitfun_validate; 66 | 67 | %function nodes 68 | gp.nodes.functions.name = {'times','minus','plus','rdivide','square',... 69 | 'sin','cos','exp','mult3','add3','sqrt','cube','power','negexp','neg','abs','log'}; -------------------------------------------------------------------------------- /examples/salustowicz1d_config.m: -------------------------------------------------------------------------------- 1 | function gp=salustowicz1d_config(gp) 2 | %SALUSTOWICZ1D_CONFIG Multigene regression config for one dimensional Salustowicz function. 3 | % 4 | % GP = SALUSTOWICZ1D_CONFIG(GP) generates a parameter structure GP that 5 | % specifies the GPTIPS run settings for the one dimensional Salustowicz 6 | % function . This is a function f of a single input variable as follows: 7 | % 8 | % f(x) = exp(-x) x^3 cos(x) sin(x) (sin(x)^2 cos(x) - 1) 9 | % 10 | % Example: 11 | % 12 | % GP = RUNGP(@SALUSTOWICZ1D_CONFIG) uses this configuration file to 13 | % perform symbolic regression on the data with multigene individuals. 14 | % 15 | % Note: 16 | % 17 | % The settings in this file are not intended to be 'optimal'. Feel free 18 | % to experiment with them. 19 | % 20 | % Copyright (c) 2009-2015 Dominic Searson 21 | % 22 | % GPTIPS 2 23 | % 24 | % See also RIPPLE_CONFIG, UBALL_CONFIG, CUBIC_CONFIG, 25 | % REGRESSMULTI_FITFUN, RUNGP 26 | 27 | %run control 28 | gp.runcontrol.pop_size = 200; 29 | gp.runcontrol.timeout = 30; 30 | gp.runcontrol.num_gen = 500; 31 | gp.runcontrol.runs = 3; 32 | gp.runcontrol.parallel.auto = true; 33 | 34 | %selection 35 | gp.selection.tournament.size = 20; 36 | gp.selection.tournament.p_pareto = 0.3; 37 | gp.selection.elite_fraction = 0.3; 38 | 39 | %meta data 40 | gp.userdata.name = 'Salustowicz - 1D'; 41 | 42 | %genes 43 | gp.genes.max_genes = 6; 44 | 45 | %generate training data 46 | x = [0.05:0.2:10]'; %generate training points in range [0.05 10] 47 | gp.userdata.xtrain = x; 48 | sx = sin(x); 49 | cx = cos(x); 50 | gp.userdata.ytrain = exp(-x) .* (x.^3) .*sx .* cx .* ((sx.^2) .*cx -1); 51 | 52 | %generate random test data in same range 53 | x = rand(500,1) * 9.95 + 0.05; 54 | sx = sin(x); 55 | cx = cos(x); 56 | gp.userdata.ytest = exp(-x) .* (x.^3) .*sx .* cx .* ((sx.^2) .*cx -1); 57 | gp.userdata.xtest = x; 58 | 59 | %function nodes 60 | gp.nodes.functions.name = {'times','minus','plus','rdivide','square','sin',... 61 | 'cos','exp','power','cube','sqrt','add3','mult3','negexp','neg','abs'}; 62 | -------------------------------------------------------------------------------- /examples/uball_config.m: -------------------------------------------------------------------------------- 1 | function gp = uball_config(gp) 2 | %UBALL_CONFIG Multigene regression config for the n dimensional Unwrapped Ball function. 3 | % 4 | % GP = UBALL_CONFIG(GP) generates a parameter structure GP that specifies 5 | % the GPTIPS run settings. 6 | % 7 | % Example: 8 | % 9 | % GP = RUNGP(@UBALL_CONFIG) uses this configuration file to perform 10 | % symbolic regression with multigene individuals. 11 | % 12 | % Remarks: 13 | % 14 | % The 5D version of this function was proposed as a 'standardised' GP 15 | % symbolic regression benchmark (Vladislavleva-4) in White et al., 16 | % 'Better GP benchmarks: community survey results and proposals', Genet. 17 | % Program Evolvable Mach. (2013) 14:3:29 (Table 4). 18 | % 19 | % It is unusual in that the test data in the above reference requires 20 | % extrapolation beyond the bounds of the training data. 21 | % 22 | % (C) Dominic Searson 2009-2015 23 | % 24 | % GPTIPS 2 25 | % 26 | % See also SALUSTOWICZ1D_CONFIG, RIPPLE_CONFIG, CUBIC_CONFIG, 27 | % REGRESSMULTI_FITFUN, RUNGP 28 | 29 | %run control 30 | gp.runcontrol.pop_size = 200; 31 | gp.runcontrol.num_gen = 1000; 32 | gp.runcontrol.verbose = 5; 33 | gp.runcontrol.timeout = 45; 34 | gp.runcontrol.runs = 3 ; 35 | gp.runcontrol.parallel.auto = true; 36 | gp.selection.elite_fraction = 0.25; 37 | 38 | %selection 39 | gp.selection.tournament.size = 20; 40 | gp.selection.tournament.p_pareto = 0.2; 41 | 42 | %genes 43 | gp.genes.max_genes = 8; 44 | gp.treedef.max_depth = 5; 45 | 46 | %set dimension of problem 47 | n = 5; 48 | 49 | %training data 50 | %generate 512 n dimensional data points in the range [0.05 6.05] 51 | x = 0.05 + rand(512,n)*6; 52 | 53 | dsum = 0; 54 | for i=1:n 55 | dsum = dsum + (( x(:,i)- 3 ).^2); 56 | end 57 | y = 10./(5+dsum); 58 | 59 | gp.userdata.ytrain = y; 60 | gp.userdata.xtrain = x; 61 | 62 | %testing data 63 | %generate 512 test data inside the training range 64 | x = 0.05 + rand(512,n)*6; 65 | 66 | dsum = 0; 67 | for i=1:n 68 | dsum = dsum + (( x(:,i)- 3 ).^2); 69 | end 70 | y = 10./(5+dsum); 71 | 72 | gp.userdata.ytest = y; 73 | gp.userdata.xtest = x; 74 | 75 | %validation data 76 | %generate holdout validation in same region as training space 77 | x = 0.05 + rand(512,n)*6; 78 | 79 | dsum = 0; 80 | for i=1:n 81 | dsum = dsum + (( x(:,i)- 3 ).^2); 82 | end 83 | y = 10./(5+dsum); 84 | 85 | gp.userdata.yval = y; 86 | gp.userdata.xval = x; 87 | 88 | %enables hold out validation set 89 | gp.userdata.user_fcn = @regressmulti_fitfun_validate; 90 | 91 | %Add a name to data set 92 | gp.userdata.name = ['Uball (n = ' num2str(n) ')']; 93 | 94 | %function nodes 95 | gp.nodes.functions.name = {'times','minus','plus','rdivide','square','sin',... 96 | 'cos','cube','exp','power','sqrt','add3','mult3','log','negexp','abs','neg'}; -------------------------------------------------------------------------------- /fitness/regressmulti_fitfun_validate.m: -------------------------------------------------------------------------------- 1 | function [valfitness,gp,ypredval] = regressmulti_fitfun_validate(gp) 2 | %REGRESSMULTI_FITFUN_VALIDATE Evaluate current 'best' multigene regression model on validation data set. 3 | % 4 | % [FITNESS,GP,YPREDVAL] = REGRESSMULTI_FITFUN_VALIDATE(GP) returns the 5 | % FITNESS of the the current 'best' multigene model contained in the GP 6 | % data structure (as evaluated on the training data). FITNESS is the root 7 | % mean squared prediction (RMSE) error on the validation data set. 8 | % 9 | % Remarks on the use of validation data: 10 | % 11 | % This is used in conjunction with the multigene regression fitness 12 | % function REGRESSMULTI_FITFUN. 13 | % 14 | % This function is called at the end of each generation by adding the 15 | % following line to the user's run configuration file: 16 | % 17 | % GP.USERDATA.USER_FCN = @REGRESSMULTI_FITFUN_VALIDATE 18 | % 19 | % In addition, the user's GPTIPS configuration file should populate the 20 | % following required fields for the training data assuming Nval 21 | % observations on the input and output data: GP.USERDATA.XVAL should be a 22 | % (Nval x n) matrix where the ith column contains the Nval observations 23 | % of the ith input variable xi. GP.USERDATA.YVAL should be a (Nval x 1) 24 | % vector containing the corresponding observations of the response 25 | % variable y. 26 | % 27 | % Copyright (c) 2009-2015 Dominic Searson 28 | % 29 | % GPTIPS 2 30 | % 31 | % See also REGRESSMULTI_FITFUN, GP_USERFCN 32 | 33 | % on first call, check that validation data is present 34 | if (gp.state.count == 1) && (~isfield(gp.userdata,'xval')) || (~isfield(gp.userdata,'yval')) || ... 35 | isempty(gp.userdata.xval) || isempty(gp.userdata.yval) 36 | error('Cannot perform holdout validation because no validation data was found.'); 37 | end 38 | 39 | % If ADFs are used, make them known to this function 40 | if gp.nodes.adf.use, assignadf(gp); end 41 | 42 | %get evalstr for current 'best' individual on training data 43 | evalstr = gp.results.best.eval_individual; 44 | 45 | %process evalstr with regex to allow direct access to data 46 | pat = 'x(\d+)'; 47 | evalstr = regexprep(evalstr,pat,'gp.userdata.xval(:,$1)'); 48 | y = gp.userdata.yval; 49 | 50 | numDataPoints = size(y,1); 51 | numGenes = length(evalstr); 52 | 53 | %set up a matrix to store the tree outputs plus a bias column of ones 54 | gene_outputs = ones(numDataPoints,numGenes+1); 55 | 56 | %eval each gene in the current individual 57 | for i=1:numGenes 58 | ind = i + 1; 59 | eval(['gene_outputs(:,ind)=' evalstr{i} ';']); 60 | 61 | %check for nonsensical answers and break out early with an 'inf' if so 62 | if any(~isfinite(gene_outputs(:,ind))) || any(~isreal(gene_outputs(:,ind))) 63 | valfitness = Inf; 64 | return 65 | end 66 | end 67 | 68 | %retrieve regression weights 69 | theta = gp.results.best.returnvalues; 70 | 71 | %calc. prediction of validation data set using the retreived weights 72 | ypredval = gene_outputs * theta; 73 | 74 | %calculate RMS prediction error on validation data 75 | valfitness = sqrt(mean((y-ypredval).^2)); 76 | 77 | %on 1st gen, initialise validation set info in the GP structure 78 | if ~gp.state.init_val 79 | gp.results.history.valfitness(1:gp.runcontrol.num_gen,1) = 0; 80 | gp.results.history.valfitness(1) = valfitness; 81 | gp.results.valbest = gp.results.best; 82 | gp.results.valbest.valfitness = valfitness; 83 | gp.state.init_val = true; 84 | end 85 | 86 | gp.results.best.valfitness = valfitness; 87 | gp.results.history.valfitness(gp.state.count,1) = valfitness; 88 | 89 | %update 'best' validation if fitness is better or its the same with less 90 | %nodes/complexity 91 | if gp.fitness.complexityMeasure 92 | bcomp = gp.results.best.complexity; 93 | vcomp = gp.results.valbest.complexity; 94 | else 95 | bcomp = gp.results.best.nodecount; 96 | vcomp = gp.results.valbest.nodecount; 97 | end 98 | 99 | if valfitness < gp.results.valbest.valfitness || ... 100 | (valfitness == gp.results.valbest.valfitness && bcomp < vcomp) 101 | gp.results.valbest = gp.results.best; 102 | gp.results.valbest.valfitness = valfitness; 103 | gp.results.valbest.foundatgen = gp.state.count - 1; 104 | end -------------------------------------------------------------------------------- /nodefcn/add3.m: -------------------------------------------------------------------------------- 1 | function y = add3(x1,x2,x3) 2 | %ADD3 Node. Ternary addition function. 3 | % 4 | % Y = ADD3(X1,X2,X3) returns as Y the result of X1 + X2 + X3 5 | % 6 | % Copyright (c) 2009-2015 Dominic Searson 7 | % 8 | % GPTIPS 2 9 | % 10 | % See also MULT3 11 | 12 | y=x1 + x2 + x3; -------------------------------------------------------------------------------- /nodefcn/cube.m: -------------------------------------------------------------------------------- 1 | function y = cube(x) 2 | %CUBE Node. Calculates the element by element cube of a vector. 3 | % 4 | % Y = CUBE(X) performs the vector operation Y = X.^3 5 | % 6 | % Copyright (c) 2009-2015 Dominic Searson 7 | % 8 | % GPTIPS 2 9 | % 10 | % See also SQUARE 11 | y = x.^3; 12 | -------------------------------------------------------------------------------- /nodefcn/gauss.m: -------------------------------------------------------------------------------- 1 | function y = gauss(x) 2 | %GAUSS Node. Gaussian function of input. 3 | % 4 | % Y = GAUSS(X) computes y = exp(x.^2) 5 | % 6 | % Copyright (c) 2009-2015 Dominic Searson 7 | % 8 | % GPTIPS 2 9 | % 10 | % See also NEGEXP 11 | y = exp(x.^2); -------------------------------------------------------------------------------- /nodefcn/gpand.m: -------------------------------------------------------------------------------- 1 | function c = gpand(a,b) 2 | %GPAND Node. Wrapper for logical AND 3 | % 4 | % Copyright (c) 2009-2015 Dominic Searson 5 | % 6 | % GPTIPS 2 7 | % 8 | % See also GPNOT, GPOR, GTH, LTH, MAXX, MINX, STEP, THRESH, IFLTE 9 | 10 | if any(isnan(a)) || any(isnan(b)) 11 | c = nan; 12 | return; 13 | end 14 | c = double(and(real(a),real(b))); -------------------------------------------------------------------------------- /nodefcn/gpnot.m: -------------------------------------------------------------------------------- 1 | function b = gpnot(a) 2 | %GPNOT Node. Wrapper for logical NOT 3 | % 4 | % Copyright (c) 2009-2015 Dominic Searson 5 | % 6 | % GPTIPS 2 7 | % 8 | % See also GPOR, GPAND, LTH, GTH, MAXX, MINX, NEG, IFLTE, STEP, THRESH 9 | 10 | if any(isnan(a)) 11 | b = nan; 12 | return; 13 | end 14 | 15 | b = double(not(real(a))); -------------------------------------------------------------------------------- /nodefcn/gpor.m: -------------------------------------------------------------------------------- 1 | function c = gpor(a,b) 2 | %GPOR Node. Wrapper for logical or 3 | % 4 | % Copyright (c) 2009-2015 Dominic Searson 5 | % 6 | % GPTIPS 2 7 | % 8 | % See also GPNOT, GPAND, LTH, GTH, MAXX, MINX, NEG, IFLTE, STEP, THRESH 9 | 10 | if any(isnan(a)) || any(isnan(b)) 11 | c = nan; 12 | return; 13 | end 14 | c = double(or(real(a),real(b))); -------------------------------------------------------------------------------- /nodefcn/gth.m: -------------------------------------------------------------------------------- 1 | function y = gth(a,b) 2 | %GTH Node. Greater than operator 3 | % 4 | % Performs an element by element comparison of A and B and returns 5 | % 1 if A(i) > B(i) and 0 otherwise. 6 | % 7 | % (c) Dominic Searson 2009-2015 8 | % 9 | % GPTIPS 2 10 | % 11 | % See also LTH, STEP, THRESH, IFLTE, MINX, MAXX, NEG, GPAND, GPNOT, GPOR 12 | 13 | y = double(lt(a,b)); -------------------------------------------------------------------------------- /nodefcn/iflte.m: -------------------------------------------------------------------------------- 1 | function x = iflte(a,b,c,d) 2 | % IFLTE Node. Performs an element wise IF THEN ELSE operation on vectors and scalars. 3 | % 4 | % X = IFLTE(A,B,C,D) computes X as follows on an element by element 5 | % basis: 6 | % 7 | % If A <= B then X = C else X = D. 8 | % 9 | % Remarks: 10 | % 11 | % If all of A,B,C,D are scalars then X will be scalar. If some, but not 12 | % all, of A,B,C,D are scalars then X will be a vector. 13 | % 14 | % Copyright (c) 2009-2015 Dominic Searson 15 | % 16 | % GPTIPS 2 17 | % 18 | % See also THRESH, STEP, MINX, MAXX, NEG, GTH, LTH, GPNOT, GPAND, GPOR 19 | 20 | x = ((a<=b) .* c) + ((a>b) .* d); 21 | 22 | -------------------------------------------------------------------------------- /nodefcn/lth.m: -------------------------------------------------------------------------------- 1 | function y = lth(a,b) 2 | %LTH Node. Less than operator 3 | % 4 | % Performs an element by element comparison of A and B and returns 5 | % 1 if A(i) < B(i) and 0 otherwise. 6 | % 7 | % (c) Dominic Searson 2009-2015 8 | % 9 | % GPTIPS 2 10 | % 11 | % See also GTH, STEP, THRESH, IFLTE, MINX, MAXX, NEG, GPAND, GPNOT, GPOR 12 | 13 | y = double(lt(a,b)); -------------------------------------------------------------------------------- /nodefcn/maxx.m: -------------------------------------------------------------------------------- 1 | function m = maxx(x1,x2) 2 | %MAXX Node. Returns the maximum of x1 and x2. 3 | % 4 | % Copyright (c) 2009-2015 Dominic Searson 5 | % 6 | % GPTIPS 2 7 | % 8 | % See also MINX, LTH, GTH, STEP, THRESH, IFLTE, GPAND, GPNOT, GPOR 9 | m = max(x1,x2); -------------------------------------------------------------------------------- /nodefcn/minx.m: -------------------------------------------------------------------------------- 1 | function m = minx(x1,x2) 2 | %MINX Node. Returns the minimum of x1 and x2. 3 | % 4 | % Copyright (c) 2009-2015 Dominic Searson 5 | % 6 | % GPTIPS 2 7 | % 8 | % See also MAXX, GTH, LTH, STEP, THRESH, IFLTE, GPAND, GPNOT, GPOR 9 | m = min(x1,x2); -------------------------------------------------------------------------------- /nodefcn/mult3.m: -------------------------------------------------------------------------------- 1 | function y = mult3(x1,x2,x3) 2 | %MULT3 Node. Ternary multiplication function. 3 | % 4 | % Y = MULT3(X1,X2,X3) returns as Y the result of X1.*X2.*X3 5 | % 6 | % (c) Dominic Searson 2009-2015 7 | % 8 | % GPTIPS 2 9 | % 10 | % See also ADD3 11 | 12 | y=x1.*x2.*x3; -------------------------------------------------------------------------------- /nodefcn/neg.m: -------------------------------------------------------------------------------- 1 | function y = neg(x) 2 | %NEG Node. Returns -1 times the argument. 3 | % 4 | % Y = NEG(X) 5 | % 6 | % Remarks: 7 | % 8 | % Computed on an element by element basis, and if X is scalar then Y will 9 | % be scalar. 10 | % 11 | % Copyright (c) 2009-2015 Dominic Searson 12 | % 13 | % GPTIPS 2 14 | % 15 | % See also GPNOT, GPAND, GPOR, MAXX, MINX, STEP, THRESH, IFLTE 16 | 17 | y = x .* -1; 18 | 19 | -------------------------------------------------------------------------------- /nodefcn/negexp.m: -------------------------------------------------------------------------------- 1 | function y = negexp(x) 2 | %NEGEXP Node. Calculate exp(-x) on an element by element basis. 3 | % 4 | % Y = NEGEXP(X) 5 | % 6 | % Copyright (c) 2009-2015 Dominic Searson 7 | % 8 | % GPTIPS 2 9 | % 10 | % See also NEG, GAUSS 11 | 12 | y = exp(-x); -------------------------------------------------------------------------------- /nodefcn/pdiv.m: -------------------------------------------------------------------------------- 1 | function x = pdiv(arg1,arg2) 2 | %PDIV Node. Performs a protected element by element divide. 3 | % 4 | % X = PDIV(ARG1,ARG2) 5 | % 6 | % For each element in ARG1 and ARG2: 7 | % 8 | % X = ARG1/ARG2 9 | % 10 | % Unless ARG2 == 0 then X = 0. 11 | % 12 | % Remarks: 13 | % Renamed from PDIVIDE due to Symbolic Math Toolbox bug. Also -inf bug 14 | % fixed since v1.0 15 | % 16 | % (c) Dominic Searson 2009-2015 17 | % 18 | % GPTIPS 2 19 | % 20 | % See also PLOG, RDIVIDE 21 | 22 | x = arg1./arg2; 23 | i = isinf(x); 24 | x(i) = 0; 25 | -------------------------------------------------------------------------------- /nodefcn/plog.m: -------------------------------------------------------------------------------- 1 | function out = plog(x) 2 | %PLOG Node. Calculate the element by element protected natural log of a vector. 3 | % 4 | % OUT = PLOG(X) performs LOG(ABS(X)) 5 | % 6 | % Copyright (c) 2009-2015 Dominic Searson 7 | % 8 | % GPTIPS 2 9 | % 10 | % See also PDIV 11 | 12 | out = log(abs(x)); 13 | out(isinf(out)) = 0; 14 | -------------------------------------------------------------------------------- /nodefcn/psqroot.m: -------------------------------------------------------------------------------- 1 | function y = psqroot(x) 2 | %PSQROOT Node. Computes element by element protected square root of a vector 3 | % 4 | % Y = PSQROOT(X) performs the vector operation Y = SQRT(ABS(X)) 5 | % 6 | % Copyright (c) 2009-2015 Dominic Searson 7 | % 8 | % GPTIPS 2 9 | % 10 | % See also SQUARE, CUBE 11 | y = sqrt(abs(x)); 12 | -------------------------------------------------------------------------------- /nodefcn/square.m: -------------------------------------------------------------------------------- 1 | function y = square(x) 2 | %SQUARE Node. Calculates the element by element square of a vector or matrix 3 | % 4 | % Y = SQUARE(X) performs the operation Y = X.^2 5 | % 6 | % Copyright (c) 2009-2015 Dominic Searson 7 | % 8 | % GPTIPS 2 9 | % 10 | % See also CUBE 11 | y = x.^2; 12 | -------------------------------------------------------------------------------- /nodefcn/step.m: -------------------------------------------------------------------------------- 1 | function y = step(x) 2 | %STEP Node. Threshold function that returns 1 if the argument is >= 0 and 0 otherwise. 3 | % 4 | % Y = STEP(X) 5 | % 6 | % Remarks: 7 | % 8 | % Computed on an element by element basis, and if X is scalar then Y will 9 | % be scalar. 10 | % 11 | % Copyright (c) 2009-2015 Dominic Searson 12 | % 13 | % GPTIPS 2 14 | % 15 | % See also THRESH, IFLTE, MAXX, MINX, NEG, LTH, GTH, GPAND, GPNOT, GPOR 16 | 17 | y = ( (x>=0) .* 1) + ( (x<0) .* 0); 18 | 19 | -------------------------------------------------------------------------------- /nodefcn/thresh.m: -------------------------------------------------------------------------------- 1 | function x = thresh(a,b) 2 | %THRESH Node. Threshold function that returns 1 if the first argument is >= to the second argument and returns 0 otherwise. 3 | % 4 | % X = THRESH(A,B) 5 | % 6 | % This performs: 7 | % 8 | % If A >= B then X = 1 else X = 0 on an element by element basis. 9 | % 10 | % Remarks: 11 | % 12 | % If both of A and B are scalars then X will be scalar. If one or more of 13 | % A and B are vectors then X will be a vector. 14 | % 15 | % Copyright (c) 2009-2015 Dominic Searson 16 | % 17 | % GPTIPS 2 18 | % 19 | % See also IFLTE, STEP, MAXX, MINX, GTH, LTH, GPAND, GPNOT, GPOR 20 | 21 | x=((a>=b) .* 1) + ((b>a) .* 0); 22 | 23 | -------------------------------------------------------------------------------- /rules/gprule_no_nested_adfs.m: -------------------------------------------------------------------------------- 1 | function [gp_out, fit] = gprule_no_nested_adfs(gp, expr, params) 2 | %GPRULE_NO_NESTED_ADFS Individuals with nested ADFs are discarded 3 | % Individuals having the trait of being expressed through nested ADFs 4 | % should not have any chances of survival. 5 | 6 | gp_out = gp; 7 | 8 | % Params are not used at the moment 9 | params; %#ok 10 | 11 | % In expressions, ADFs are always encoded as certain known symbols 12 | adfs = gp.nodes.adf.seed_str; 13 | 14 | adf_found = false; % Found ADF 15 | open_count = 0; % Open parentheses counter 16 | k = 1; % Character pointer 17 | fit = 1.0; % By default, the gene is fit for survival 18 | 19 | % Parser body 20 | % Here we assume that some rules for expression strings hold, 21 | % e.g., that they are always complete [TODO: potential bug] 22 | while k <= length(expr) 23 | 24 | if ~isempty(strfind(adfs, expr(k))) %#ok: backwards compatibility 25 | if adf_found % Nesting detected 26 | fit = 0.0; 27 | break; 28 | else 29 | adf_found = true; 30 | k = k + 1; 31 | end 32 | end 33 | 34 | if strcmpi(expr(k), '(') && adf_found 35 | open_count = open_count + 1; 36 | end 37 | 38 | if strcmpi(expr(k), ')') && adf_found 39 | open_count = open_count - 1; 40 | if open_count == 0 41 | adf_found = false; 42 | end 43 | end 44 | 45 | k = k + 1; % Increment character counter 46 | 47 | end 48 | 49 | end 50 | 51 | -------------------------------------------------------------------------------- /rules/gprule_template.m: -------------------------------------------------------------------------------- 1 | function [gp_out, fit] = gprule_template(gp, expr, params) 2 | %GPRULE_TEMPLATE Use this template to define a custom rule for fitness 3 | % These rules are applied to genes, no complete expressions 4 | % 5 | % Input arguments: GP is the main structure, EXPR is the encoded gene 6 | % expression, PARAMS are additional parameters needed for running the rule 7 | % 8 | % Output arguments: GP_OUT is the GP structure (can be left unassigned if 9 | % unmodified) and FIT is the fitness of this particular gene as determined 10 | % by the rule logic. 11 | % 12 | % NB! FIT has a completely different meaning here compared to the overall 13 | % fitness function. Here, FIT = 0.0 means the individual is unfit for 14 | % survival, while a FIT = 1.0 means the opposite. 15 | 16 | % Params are left unused in this case (no params) 17 | params; %#ok 18 | 19 | % The structure is here if it is needed; 20 | % NB! Do not forget to pass it as output! 21 | gp_out = gp; 22 | 23 | % This is custom decision logic, but the idea is that 24 | % the output of decision must be constrained to [0,1]. 25 | 26 | % As implemented in this particular example, FIT is always "true" or "pass" 27 | % depending on the context, unless the input expression is empty. 28 | % 29 | % However, the logic of the rule may be quite complicated. 30 | 31 | fit = 1.0; 32 | if isempty(expr) 33 | fit = 0.0; 34 | end 35 | 36 | end 37 | 38 | -------------------------------------------------------------------------------- /user/gp_userfcn.m: -------------------------------------------------------------------------------- 1 | function gp = gp_userfcn(gp) 2 | %GP_USERFCN Calls a user defined function once per generation if one has been specified in the field GP.USERDATA.USER_FCN. 3 | % 4 | % Remarks: 5 | % 6 | % The user defined function must accept (as 1st) and return (as 2nd) the 7 | % GP structure as arguments. 8 | % 9 | % Example: 10 | % 11 | % In multigene symbolic regression the function 12 | % regressmulti_fitfun_validate can be called once per generation to 13 | % evaluate the best individual so far on a validation ('holdout') data 14 | % set. 15 | % 16 | % Copyright (c) 2009-2015 Dominic Searson 17 | % 18 | % GPTIPS 2 19 | 20 | if ~isempty(gp.userdata.user_fcn) 21 | [~,gp] = feval(gp.userdata.user_fcn,gp); 22 | end 23 | --------------------------------------------------------------------------------