├── README.md
├── conversion
├── Contents.m
├── HTMLequation.m
├── gpmigrate2_2f.m
├── gpmodel2func.m
├── gpmodel2mfile.m
├── gpmodel2struct.m
├── gpmodel2sym.m
├── gpmodelgenes2mfile.m
└── processOrgChartJS.m
├── core
├── Contents.m
├── assignadf.m
├── bootsample.m
├── comparemodelsREC.m
├── crossover.m
├── displaystats.m
├── drawtrees.m
├── evalfitness.m
├── evalfitness_par.m
├── extract.m
├── genebrowser.m
├── genefilter.m
├── genes2gpmodel.m
├── getcomplexity.m
├── getdepth.m
├── getnumnodes.m
├── gp_2d_mesh.m
├── gp_toolboxlist.m
├── gpcheck.m
├── gpdefaults.m
├── gpfinalise.m
├── gpinit.m
├── gpinitparallel.m
├── gpmodelfilter.m
├── gpmodelreport.m
├── gpmodelvars.m
├── gppopvars.m
├── gppretty.m
├── gprandom.m
├── gpreformat.m
├── gpresids.m
├── gpsimplify.m
├── gpsym.m
├── gpterminate.m
├── gptic.m
├── gptoc.m
├── gptoolboxcheck.m
├── gptreestructure.m
├── initbuild.m
├── kogene.m
├── mergegp.m
├── mutate.m
├── ndfsort_rank1.m
├── paretoreport.m
├── parseadf.m
├── picknode.m
├── popbrowser.m
├── popbuild.m
├── pref2inf.m
├── regressionErrorCharacteristic.m
├── rungp.m
├── runtree.m
├── scangenes.m
├── selection.m
├── standaloneModelStats.m
├── summary.m
├── tree2evalstr.m
├── treegen.m
├── uniquegenes.m
└── updatestats.m
├── examples
├── concrete.mat
├── cubic_config.m
├── demo2data.mat
├── gpdemo1.m
├── gpdemo1_config.m
├── gpdemo2.m
├── gpdemo2_config.m
├── gpdemo3.m
├── gpdemo3_config.m
├── gpdemo4.m
├── gpdemo4_config.m
├── ph2data.mat
├── quartic_fitfun.m
├── ripple_config.m
├── salustowicz1d_config.m
└── uball_config.m
├── fitness
├── regressmulti_fitfun.m
└── regressmulti_fitfun_validate.m
├── gpl.txt
├── nodefcn
├── add3.m
├── cube.m
├── gauss.m
├── gpand.m
├── gpnot.m
├── gpor.m
├── gth.m
├── iflte.m
├── lth.m
├── maxx.m
├── minx.m
├── mult3.m
├── neg.m
├── negexp.m
├── pdiv.m
├── plog.m
├── psqroot.m
├── square.m
├── step.m
└── thresh.m
├── rules
├── gprule_no_nested_adfs.m
└── gprule_template.m
└── user
└── gp_userfcn.m
/README.md:
--------------------------------------------------------------------------------
1 | # GPTIPS2F Toolbox for MATLAB #
2 |
3 | GPTIPS2F is the evolution of the second version of the MATLAB toolbox developed by Dr. Dominic Searson.
4 |
5 | Since 2017, a fork of the toolbox (‘2F’) is maintained by Dr. Aleksei Tepljakov, TalTech University, https://taltech.ee/
6 |
7 | **Note that the version number of GPTIPS2F has been reset to 1.0 since Dec 6, 2018.**
8 |
9 | ## New features ##
10 |
11 | * Preset Random Constants (PRCs): a subset of Ephemeral Random Constants (ERCs) the difference being that PRCs are randomly chosen from a predefined set.
12 | * Automatically Defined Functions (ADFs): basically templates that are seeded into the initial population and can also arise naturally during mutation.
13 | * Evolutionary rules: define rules for discarding individuals with certain undesirable traits or significantly decrease their chances of staying in the population (alpha testing feature).
14 |
15 | ## Installation ##
16 |
17 | Run the code `addpath(genpath(gptips2f))` and save the resulting path afterwards.
18 |
19 | ## Documentation ##
20 | Please consider [the original docs](https://sites.google.com/site/gptips4matlab/). Updated docs will be published in due time. Meanwhile, read on to learn how to use the ADF feature.
21 |
22 | ## A brief on how to use ADFs ##
23 |
24 | Contrary to the original meaning of the term, ADFs---automatically defined functions---are defined manually by the user prior to the regression procedure. The purpose of ADFs at the moment is to enforce some structure into the otherwise unstructured modeling problem. Think of them as intelligent design elements---those that would be most useful in a certain context.
25 |
26 | Here's how to set up ADFs in MATLAB code in the gp config file:
27 | ```
28 | % ADF expressions. Can contain any custom function also.
29 | gp.nodes.adf.expr = {'$1+?1*$2', '$1*(#1+$2)'};
30 |
31 | % For terminals (the arguments of the ADF functions) use
32 | % $1, $2, ... to define normal terminals
33 | % ?1, ?2, ... to mark a terminal where ERC should go
34 | % #1, #2, ... to mark a terminal where PRC should go
35 | % Don't forget to enumerate these correctly! Counters are unique to each ADF
36 | % (see example above)
37 |
38 | % Assign names to ADFs
39 | gp.nodes.adf.name = {'simple_rnd_sum', 'simple_rnd_prod'};
40 |
41 | % Predefined random constants (PRC) settings
42 |
43 | % If a PRC is generated (see above), only the entries in this set are used
44 | gp.nodes.pconst.set = [0.1 0.2 0.3 0.4 0.5];
45 |
46 | % General probability of generating a PRC (does not affect "#nn" terminals)
47 | gp.nodes.pconst.p_PRC = 0.25;
48 |
49 | % Probability of generating an ADF
50 | gp.nodes.adf.p_gen = 0.25;
51 |
52 | % Enforce structure on mutation
53 | gp.nodes.adf.arg_force = true;
54 | ```
55 |
56 | ## A brief on how to use Rules ##
57 |
58 | Evolutionary rules allow to either discard or otherwise decrease the chances of survival of individuals having certain traits. These traits must be extracted from symbolic gene expressions and tested in user-defined rule functions. An example of how to structure the latter can be found in the `rules` folder under the name `gprule_template.m`.
59 |
60 | There is also an implemented rule called `gprule_no_nested_adfs.m` available in the same folder. This rule will allow to remove from the population any individual who has genes where nested ADFs appear. This is useful for a certain class of regression problems. In the config function, this rule is set up as follows:
61 | ```
62 | % Evolutionary rules
63 | gp.evolution.rules.use = true;
64 | gp.evolution.rules.sets = {{@gprule_no_nested_adfs, []}};
65 | ```
66 |
67 | **Note that this is an alpha feature.**
--------------------------------------------------------------------------------
/conversion/Contents.m:
--------------------------------------------------------------------------------
1 | % GPTIPS2F
2 | % Symbolic Data Mining for MATLAB evolved
3 | % (c) Aleksei Tepljakov, 2017-...
4 | % (c) Dominic Searson 2009-2015
5 | %
6 | % Files
7 | % gpmodel2func - Converts a multigene symbolic regression model to an anonymous function and returns the function handle.
8 | % gpmodel2mfile - Converts a multigene regression model to a standalone M file.
9 | % gpmodel2struct - Create a struct describing a multigene regression model.
10 | % gpmodel2sym - Create a simplified Symbolic Math object for a multigene symbolic regression model.
11 | % gpmodelfilter - Object to filter a population of multigene symbolic regression models.
12 | % gpmodelgenes2mfile - Converts individual genes of a multigene symbolic regression model to a standalone M file.
13 | % HTMLequation - Returns an HTML formatted multigene regression model equation.
14 | % processOrgChartJS - Writes Google org chart JavaScript for genes to an existing HTML file.
15 |
--------------------------------------------------------------------------------
/conversion/HTMLequation.m:
--------------------------------------------------------------------------------
1 | function strOut = HTMLequation(gp,eqn,numDigits)
2 | %HTMLEQUATION Returns an HTML formatted multigene regression model equation.
3 | %
4 | % HTMLEQN = HTMLEQUATION(GP,EQN) where EQN is a Symbolic Math object
5 | % representing the GPTIPS model and GP is the GPTIPS data structure
6 | % returns a string HTMLEQN containing the equation in HTML format.
7 | %
8 | % HTMLEQN = HTMLEQUATION(GP,ID) where ID is a numeric model identifier in
9 | % the GP data structure. Returns a string HTMLEQN containing the overall
10 | % model equation in HTML format.
11 | %
12 | % HTMLEQN = HTMLEQUATION(GP,ID, NUMDIGITS) does the same but formats the
13 | % displayed precision to NUMDIGITS where NUMDIGITS >=2.
14 | %
15 | % HTMLEQN = HTMLEQUATION(GP,'best') returns a string HTMLEQN containing
16 | % the 'best' model equation (on the training data) in HTML format.
17 | %
18 | % HTMLEQN = HTMLEQUATION(GP,'valbest') returns a string HTMLEQN
19 | % containing the 'best' model (on the validation data) equation in HTML
20 | % format.
21 | %
22 | % HTMLEQN = HTMLEQUATION(GP,'testbest') returns a string HTMLEQN
23 | % containing the 'best' model (on the test data) equation in HTML format.
24 | %
25 | % Note:
26 | %
27 | % The model equation is not by default returned with full numeric
28 | % precision.
29 | %
30 | % Copyright (c) 2009-2015 Dominic Searson
31 | %
32 | % GPTIPS 2
33 | %
34 | % See also SYM/VPA, GPMODEL2SYM, GPMODELREPORT, PARETOREPORT, GPPRETTY
35 |
36 | strOut = [];
37 |
38 | if nargin < 2
39 | disp('Usage is HTMLEQUATION(EQN,GP)');
40 | return;
41 | end
42 |
43 | if ~gp.info.toolbox.symbolic()
44 | error('The Symbolic Math Toolbox is required to use this function.');
45 | end
46 |
47 | if nargin < 3 || isempty(numDigits)
48 | numDigits = 4;
49 | end
50 |
51 | if numDigits < 2
52 | error('The number of digits must be > 1.');
53 | end
54 |
55 | modelform = true;
56 |
57 | if isa(eqn,'sym')
58 | modelform = false;
59 | elseif isnumeric(eqn)
60 | model = gpmodel2struct(gp,eqn,false,true,false);
61 | elseif strcmpi(eqn,'best')
62 | model = gpmodel2struct(gp,'best',false,true,false);
63 | elseif strcmpi(eqn,'testbest')
64 | model = gpmodel2struct(gp,'testbest',false,true,false);
65 | elseif strcmpi(eqn,'valbest')
66 | model = gpmodel2struct(gp,'valbest',false,true,false);
67 | else %otherwise assume that the 'eqn' is in raw form, e.g.'x1'
68 | modelform = false;
69 | end
70 |
71 | if modelform
72 |
73 | if model.valid
74 | eqn = formattedPrecisChar(model.sym,numDigits);
75 | else
76 | disp(['Invalid model specified. Reason: ' model.invalidReason]);
77 | return;
78 | end
79 |
80 | else
81 | eqn = formattedPrecisChar(eqn,numDigits);
82 | end
83 |
84 | pat1 = 'x(\d+)'; %subscript regex;
85 | pat2='\^([-]?\d+(\.\d*)?|\.\d+)'; %decimal numerical power superscript regex;
86 | pat3 = '\^((\()[-]?\d+(/)\d+(\)))'; %fractional numerical power subscript
87 |
88 | strOut = regexprep(eqn,pat1,'x$1');
89 | strOut = regexprep(strOut,pat2,'$1');
90 |
91 | %specific replacements for sqrt being represented as '^(1/2)' by symbolic math
92 | %toolbox
93 | strOut = strrep(strOut,'^(1/2)','1/2');
94 | strOut = strrep(strOut,'^(3/2)','3/2');
95 | strOut = strrep(strOut,'^(1/4)','1/4');
96 | strOut = strrep(strOut,'^(1/8)','1/8');
97 |
98 | %any other numerical fractions
99 | strOut = regexprep(strOut,pat3,'$1');
100 |
101 | %also, get rid of '*' for display purposes
102 | strOut = strrep(strOut,'*',' ');
103 |
104 | %replace plain versions of user defined variable names with marked up
105 | %versions
106 | for i = 1:numel(gp.nodes.inputs.names)
107 | if ~isempty(gp.nodes.inputs.names{i})
108 | strOut = strrep(strOut,gp.nodes.inputs.namesPlain{i},gp.nodes.inputs.names{i});
109 | end
110 | end
111 |
112 | function eqn = formattedPrecisChar(symEq,numDigits)
113 | %gets the char form of the SYM equation with controlled numeric precision.
114 | %Uses MuPAD's Pref::outputDigits setting and may not work properly on old
115 | %Matlab versions
116 |
117 | %exit if a char eqn.
118 | if ~isa(symEq,'sym')
119 | eqn = symEq;
120 | return;
121 | end
122 |
123 | verReallyOld = verLessThan('matlab', '7.7.0');
124 |
125 | if nargin < 2 || isempty(numDigits)
126 | numDigits = 4;
127 | end
128 |
129 | %get existing value of MuPAD's 'Pref::outputDigits' setting so this can be
130 | %be restored after this function call.
131 | userDigits = char(feval(symengine,'Pref::outputDigits'));
132 |
133 | if ~verReallyOld
134 | evalin(symengine,['Pref::outputDigits(' int2str(numDigits) ')']);
135 | end
136 |
137 | %use VPA here (not sure why this is actually necessary having just set
138 | %Pref::outputDigits but it seems to be reqd in some cases)
139 | eqn = char(vpa(symEq,numDigits));
140 |
141 | if ~verReallyOld
142 | evalin(symengine,['Pref::outputDigits(' userDigits ')']);
143 | end
144 |
--------------------------------------------------------------------------------
/conversion/gpmigrate2_2f.m:
--------------------------------------------------------------------------------
1 | function gp = gpmigrate2_2f(gp)
2 | %GPMIGRATE2_2F Migrate parameters of GP structure from V2 to V2F of toolbox
3 | % Some parameters of the GP strucure have changed in the new version of
4 | % the toolbox. Therefore, this function is provided to update those
5 | % parameters. Simply do GP = GPMIGRATE2_2F(GP) to get those parameters
6 | % fixed.
7 | %
8 | % List of parameter changes:
9 | % * Toolboxes are now correctly detected on all computers
10 | % dynamically by means of anonymous function handles. This means that
11 | % when you run GPTIPS 2F on a computer that doesn't have MATLAB's
12 | % Symbolic Math toolbox installed, for instance, and then use the
13 | % resulting GP structure on a computer that does have it installed,
14 | % the toolbox is detected and you can proceed.
15 |
16 | % Correct toolbox check using handles
17 | [gp.info.toolbox.symbolic, gp.info.toolbox.parallel, gp.info.toolbox.stats] = gptoolboxcheck;
18 |
19 | end
20 |
21 |
--------------------------------------------------------------------------------
/conversion/gpmodel2func.m:
--------------------------------------------------------------------------------
1 | function f = gpmodel2func(gp,ID)
2 | %GPMODEL2FUNC Converts a multigene symbolic regression model to an anonymous function and returns the function handle.
3 | %
4 | % F = GPMODEL2FUNC(GP,ID) converts the model identified by numeric
5 | % identifier ID to a standard MATLAB anonymous function with function
6 | % handle F.
7 | %
8 | % F = GPMODEL2FUNC(GP,'best') converts the best model on the training
9 | % data to a MATLAB anonymous function with function handle F.
10 | %
11 | % F = GPMODEL2FUNC(GP,'valbest') converts the best model on the
12 | % validation data (if it exists) to a MATLAB anonymous function with
13 | % function handle F.
14 | %
15 | % F = GPMODEL2FUNC(GP,'testbest') converts the best model on the test
16 | % data (if it exists) to a MATLAB anonymous function with function handle
17 | % F.
18 | %
19 | % F = GPMODEL2FUNC(GP,GPMODEL) operates on the GPMODEL struct
20 | % representing a multigene regression model, i.e. the struct returned by
21 | % the functions GPMODEL2STRUCT or GENES2GPMODEL.
22 | %
23 | % The anonymous function F can then be used to evaluate the model on new
24 | % data. It can also be used in MATLAB visualisation tools such as the
25 | % EZSURF function.
26 | %
27 | % Copyright (c) 2009-2015 Dominic Searson
28 | %
29 | % GPTIPS 2
30 | %
31 | % See also GPMODEL2SYM, GPMODEL2MFILE, FUNCTION_HANDLE, MATLABFUNCTION,
32 | % EZPLOT, EZCONTOUR, EZSURF
33 |
34 | if ~gp.info.toolbox.symbolic()
35 | error('The Symbolic Math Toolbox is required to use this function.');
36 | end
37 |
38 | if nargin < 2
39 | disp('Basic usage is F = GPMODEL2FUNC(GP,ID)');
40 | disp('or F = GPMODEL2FUNC(GP,''best'')');
41 | disp('or F = GPMODEL2FUNC(GP,''valbest'')');
42 | disp('or F = GPMODEL2FUNC(GP,''testbest'')');
43 | return;
44 | end
45 |
46 | if isnumeric(ID) && ( ID < 1 || ID > numel(gp.pop) )
47 | error('The supplied numerical model indentifier ID is not valid.');
48 | end
49 |
50 | if ~strncmpi('regressmulti', func2str(gp.fitness.fitfun),12)
51 | error('GPMODEL2FUNC may only be used for multigene symbolic regression problems.');
52 | end
53 |
54 | % If ADFs are used, make them known to this function
55 | if gp.nodes.adf.use, assignadf(gp); end
56 |
57 | s = gpmodel2sym(gp,ID);
58 |
59 | if isempty(s)
60 | error('Not a valid model ID or model selector');
61 | end
62 |
63 | f = matlabFunction(s);
64 |
65 | % If ADFs are used, make them known to this function, and re-evaluate
66 | if gp.nodes.adf.use
67 | assignadf(gp);
68 | f = func2str(f);
69 | f = eval(f);
70 | end
71 |
--------------------------------------------------------------------------------
/conversion/gpmodel2mfile.m:
--------------------------------------------------------------------------------
1 | function gpmodel2mfile(gp,ID,filename,useAlias)
2 | %GPMODEL2MFILE Converts a multigene regression model to a standalone M file.
3 | %
4 | % GPMODEL2MFILE(GP,'best') converts the "best" model in the population to
5 | % a function in GPMODEL.M
6 | %
7 | % GPMODEL2MFILE(GP,'best','FILENAME') converts the "best" model in the
8 | % population to a function in FILENAME.M
9 | %
10 | % GPMODEL2MFILE(GP,'valbest','FILENAME') converts the "best" model, as
11 | % evaluated on the holdout validation data (if it exists), to a function
12 | % in FILENAME.M
13 | %
14 | % GPMODEL2MFILE(GP,'testbest','FILENAME') converts the "best" model, as
15 | % evaluated on the test data (if it exists), to a function in FILENAME.M
16 | %
17 | % GPMODEL2MFILE(GP,ID,'FILENAME') converts the model with numeric
18 | % identifier ID in the population to a function in FILENAME.M
19 | %
20 | % GPMODEL2MFILE(GP,ID,'FILENAME',USEALIAS) does the same but uses
21 | % variable aliases (if defined in gp.nodes.inputs.names) if USEALIAS is
22 | % TRUE.
23 | %
24 | % GPMODEL2MFILE(GP,GPMODEL) operates on the GPMODEL struct representing a
25 | % multigene regression model, i.e. the struct returned by the functions
26 | % GPMODEL2STRUCT or GENES2GPMODEL.
27 | %
28 | % Remarks:
29 | %
30 | % This function is designed for use with multigene symbolic regression
31 | % models created with the REGRESSMULTI_FITFUN fitness function (or
32 | % variant thereof having a file name beginning 'REGRESSMULTI').
33 | %
34 | % Note:
35 | %
36 | % Requires Symbolic Math Toolbox.
37 | %
38 | % Copyright (c) 2009-2015 Dominic Searson
39 | %
40 | % GPTIPS 2
41 | %
42 | % See also GPMODELGENES2MFILE, GPMODEL2SYM, GPMODELREPORT,
43 | % GPMODEL2STRUCT, GPMODEL2FUNC
44 |
45 | if nargin < 2
46 | disp('Usage is GPMODEL2MFILE(GP,ID) where ID is the numeric identifier of the desired model');
47 | disp('or GPMODEL2MFILE(GP,''BEST'') to convert the best model of run');
48 | disp('or GPMODEL2MFILE(GP,''VALBEST'') to convert the best validation model of run.');
49 | disp('or GPMODEL2MFILE(GP,''TESTBEST'') to convert the best test model of run.');
50 | return;
51 | end
52 |
53 | if ~strncmpi('regressmulti',func2str(gp.fitness.fitfun),12)
54 | error('GPMODEL2MFILE may only be used for multigene symbolic regression problems.');
55 | end
56 |
57 | if nargin < 3 || isempty(filename)
58 | filename = 'gpmodel';
59 | end
60 |
61 | if nargin < 4 || isempty(useAlias)
62 | useAlias = false;
63 | end
64 |
65 | if ~ischar(filename)
66 | error('The filename parameter must be a string.');
67 | end
68 |
69 | if gp.info.toolbox.symbolic()
70 |
71 | %simplify and get sym object for overall model
72 | exprSym = gpmodel2sym(gp,ID,false,useAlias);
73 |
74 | if isempty(exprSym)
75 | error('Not a valid model ID or model selector');
76 | end
77 |
78 | %vectorise the whole multigene model and extract to string
79 | vectorisedModelStr = ['ypred=' vectorize(exprSym)];
80 |
81 | %replace x1,x2,x3 etc with x(:,1), x(:,2), x(:,3) etc.
82 | pat='x(\d+)';
83 | vectorisedModelStr = regexprep(vectorisedModelStr,pat,'x(:,$1)');
84 |
85 | if strcmp(filename(end-1:end),'.m')
86 | filename = filename(1:end-2);
87 | end
88 |
89 | %open file and write m file header etc
90 | fid = fopen([filename '.m'],'w+');
91 |
92 | fprintf(fid,['function ypred=' filename '(x)']);
93 | fprintf(fid,'\n');
94 | fprintf(fid,['%%' upper(filename) ...
95 | ' This model file was automatically generated by the GPTIPS 2 function gpmodel2mfile on ' datestr(now)]);
96 | fprintf(fid,'\n\n');
97 |
98 | % If ADFs are used, write them as well
99 | if gp.nodes.adf.use
100 | for k=1:length(gp.nodes.adf.name)
101 | fprintf(fid,[gp.nodes.adf.name{k} '=' gp.nodes.adf.eval{k} ';']);
102 | fprintf(fid,'\n');
103 | end
104 | fprintf(fid,'\n');
105 | end
106 |
107 | %write model
108 | fprintf(fid,[vectorisedModelStr ';']);
109 | fprintf(fid,'\n');
110 | fclose(fid);
111 | disp(['Model sucessfully written to ' filename '.m']);
112 | else
113 | error('The Symbolic Math Toolbox is required to use this function.');
114 | end
115 |
--------------------------------------------------------------------------------
/conversion/gpmodel2sym.m:
--------------------------------------------------------------------------------
1 | function [symObj,cellsymObj] = gpmodel2sym(gp,ID,fastMode,useAlias)
2 | %GPMODEL2SYM Create a simplified Symbolic Math object for a multigene symbolic regression model.
3 | %
4 | % SYMOBJ = GPMODEL2SYM(GP,ID) simplifies and gets the symbolic regression
5 | % model SYMOBJ with mumeric identifier ID in the GPTIPS datastructure GP.
6 | %
7 | % SYMOBJ = GPMODEL2SYM(GP,'best') gets the best model of the run (as
8 | % evaluated on the training data).
9 | %
10 | % SYMOBJ = GPMODEL2SYM(GP,'valbest') gets the model that performed best
11 | % on the validation data (if this data exists).
12 | %
13 | % SYMOBJ = GPMODEL2SYM(GP,'testbest') gets the model that performed best
14 | % on the test data (if this data exists).
15 | %
16 | % [SYMOBJ,SYMGENES] = GPMODEL2SYM(GP,ID) gets the overall symbolic model
17 | % SYMOBJ as well as a cell row array of symbolic objects SYMGENES
18 | % containing the individual genes (and bias term) that comprise the
19 | % overall model. The 1st element of SYMGENES is the bias term. The
20 | % (n+1)th element of SYMGENES is the nth gene of the model.
21 | %
22 | % [SYMOBJ,SYMGENES] = GPMODEL2SYM(GP,ID,FASTMODE) does the same but uses
23 | % a slightly faster simplification method (MuPAD symbolic:simplify
24 | % instead of symbolic:Simplify)
25 | %
26 | % [SYMOBJ,SYMGENES] = GPMODEL2SYM(GP,ID,FASTMODE,USEALIAS) does the same
27 | % but uses aliased variable names (if they exist in
28 | % gp.nodes.inputs.names) in the returned SYM objects when USEALIAS is
29 | % TRUE (default = FALSE)
30 | %
31 | % SYMOBJ = GPMODEL2SYM(GP,GPMODEL) operates on the GPMODEL struct
32 | % representing a multigene regression model, i.e. the struct returned by
33 | % the functions GPMODEL2STRUCT or GENES2GPMODEL.
34 | %
35 | % Remarks:
36 | %
37 | % As of GPTIPS 2, SYM objects are created using the full default
38 | % precision of the Symbolic Math toolbox and numbers are represented
39 | % (except for ERCs) by a ratio of 2 large integers. E.g. consider the
40 | % following model SYM.
41 | %
42 | % (5589981434161623*x53)/(8796093022208*x30*(x35 + x46))
43 | %
44 | % To display SYM objects to N significant digits, use the VPA and CHAR
45 | % commands, e.g.
46 | %
47 | % VPA(SYMOBJ,N)
48 | %
49 | % For instance, using VPA(SYMOBJ,4) on the above example gives:
50 | %
51 | % (635.5*x53)/(x30*(x35 + x46))
52 | %
53 | % You can also use PRETTY(VPA(SYMOBJ,4)) to get a more nicely formatted
54 | % display of the model SYM object (this is similar to the command line
55 | % output of the GPPRETTY function). This seems glitchy in some versions
56 | % of MATLAB however.
57 | %
58 | % Further remarks:
59 | %
60 | % This assumes each population member is a multigene regression model,
61 | % i.e. created using the fitness function REGRESSMULTI_FITFUN - or a
62 | % variant thereof with filename beginning REGRESSMULTI.
63 | %
64 | % (c) Dominic Searson 2009-2015
65 | %
66 | % GPTIPS 2
67 | %
68 | % See also GPPRETTY, GPMODEL2MFILE, GPMODELREPORT, GPMODEL2STRUCT,
69 | % GPMODEL2FUNC
70 |
71 | symObj = [];cellsymObj = [];
72 |
73 | if ~gp.info.toolbox.symbolic()
74 | error('The Symbolic Math Toolbox is required to use this function.');
75 | end
76 |
77 | if nargin < 2
78 | disp('Basic usage is SYMOBJ = GPMODEL2SYM(GP,ID)');
79 | disp('or SYMOBJ = GPMODEL2SYM(GP,''best'')');
80 | disp('or SYMOBJ = GPMODEL2SYM(GP,''valbest'')');
81 | return;
82 | end
83 |
84 | if nargin < 3|| isempty(fastMode)
85 | fastMode = false;
86 | end
87 |
88 | if nargin < 4|| isempty(useAlias)
89 | useAlias = false;
90 | end
91 |
92 | if isnumeric(ID) && ( ID < 1 || ID > numel(gp.pop) )
93 | error('The supplied numerical model indentifier ID is not valid.');
94 | end
95 |
96 | if ~strncmpi('regressmulti', func2str(gp.fitness.fitfun),12)
97 | error('GPMODEL2SYM may only be used for multigene symbolic regression problems.');
98 | end
99 |
100 | try
101 | [symObj, cellsymObj] = gppretty(gp,ID,[],true,fastMode,useAlias);
102 | catch
103 | error(['Could not get symbolic object(s) for this model. ' lasterr]);
104 | end
--------------------------------------------------------------------------------
/conversion/gpmodelgenes2mfile.m:
--------------------------------------------------------------------------------
1 | function gpmodelgenes2mfile(gp,ID,filename,useAlias)
2 | %GPMODELGENES2MFILE Converts individual genes of a multigene symbolic regression model to a standalone M file.
3 | %
4 | % GPMODELGENES2MFILE(GP,'best') converts the "best" model, as evaluated
5 | % on the training data, to a standalone function in GPGENESMODEL.M
6 | %
7 | % GPMODELGENES2MFILE(GP,'best','FILENAME') converts the 'best' model, as
8 | % evaluated on the training data, to a standalone function in FILENAME.M.
9 | %
10 | % GPMODELGENES2MFILE(GP,'valbest','FILENAME') converts the 'best' model,
11 | % as evaluated on the validation data, to a function in FILENAME.M
12 | %
13 | % GPMODELGENES2MFILE(GP,'testbest','FILENAME') converts the 'best' model,
14 | % as evaluated on the test data, to a function in FILENAME.M
15 | %
16 | % GPMODELGENES2MFILE(GP,ID,'FILENAME') converts the model with numeric
17 | % identifier ID in the population to a function in FILENAME.M
18 | %
19 | % GPMODELGENES2MFILE(GP,ID,'FILENAME',USEALIAS) does the same but uses
20 | % variable aliases (if defined in gp.nodes.inputs.names) if USEALIAS is
21 | % TRUE.
22 | %
23 | % GPMODELGENES2MFILE(GP,GPMODEL) operates on the GPMODEL struct
24 | % representing a multigene regression model, i.e. the struct returned by
25 | % the functions GPMODEL2STRUCT or GENES2GPMODEL.
26 | %
27 | % Remarks:
28 | %
29 | % This function is designed for use with multigene symbolic models
30 | % created with the REGRESSMULTI_FITFUN fitness function (or variant
31 | % thereof with fitness function filename beginning REGRESSMULTI). The
32 | % difference between this and GPMODEL2MFILE is that within the model file
33 | % generated the indivdiual outputs of the separate genes are exported as
34 | % the variable YGENES.
35 | %
36 | % Note:
37 | %
38 | % Requires Symbolic Math Toolbox.
39 | %
40 | % Copyright (c) 2009-2015 Dominic Searson
41 | %
42 | % GPTIPS 2
43 | %
44 | % See also GPMODEL2MFILE, GPMODEL2SYM, GPMODEL2STRUCT, GPMODEL2FUNC
45 |
46 | if nargin < 2
47 | disp('Usage is GPMODELGENES2MFILE(GP,IND) where IND is the population index of the desired individual');
48 | disp('or GPMODELGENES2MFILE(GP,''BEST'') to convert the best individual of run');
49 | disp('or GPMODELGENES2MFILE(GP,''VALBEST'') to convert the best validation individual of run.');
50 | return;
51 | end
52 |
53 | if nargin<3 || isempty(filename)
54 | filename = 'gpgenesmodel';
55 | end
56 |
57 | if nargin<4 || isempty(useAlias)
58 | useAlias = false;
59 | end
60 |
61 | if ~ischar(filename)
62 | error('The filename parameter must be a string.');
63 | end
64 |
65 | if gp.info.toolbox.symbolic()
66 |
67 | %get model data
68 | disp('Simplifying model...');
69 |
70 | [~,genes_sym] = gppretty(gp,ID,false,true,false,useAlias);
71 |
72 | pat='x(\d+)';
73 | numGenes = numel(genes_sym);
74 | vectorModelStr = cell(numGenes,1);
75 |
76 | %loop through the individual genes and vectorise each separately (in
77 | %contrast with gpmodel2mfile which combines genes before simplication
78 | %and vectorisation)
79 | for i=1:numGenes
80 |
81 | %vectorise the whole multigene model and extract to string
82 | vectorModelStr{i}=['ygenes(:,' num2str(i) ')=' vectorize(genes_sym{i})];
83 |
84 | %replace x1,x2,x3 etc with x(:,1), x(:,2), x(:,3) etc.
85 | vectorModelStr{i} = regexprep(vectorModelStr{i},pat,'x(:,$1)');
86 | end
87 |
88 | if strcmp(filename(end-1:end),'.m')
89 | filename = filename(1:end-2);
90 | end
91 |
92 | %now open file and write m file header etc
93 | fid = fopen([filename '.m'],'w+');
94 |
95 | fprintf(fid,['function [ypred,ygenes]=' filename '(x)']);
96 | fprintf(fid,'\n');
97 | fprintf(fid,['%%' upper(filename) ' This model file was automatically generated by the GPTIPS 2 function gpmodelgenes2mfile on ' datestr(now)]);
98 | fprintf(fid,'\n\n');
99 |
100 | %set up ygenes variable
101 | fprintf(fid, ['\nygenes = zeros(size(x,1),' num2str(numGenes) ');\n']);
102 |
103 | %print line to evaluate each gene
104 | for i=1:numGenes
105 | fprintf(fid,[vectorModelStr{i} ';']);
106 | fprintf(fid,'\n');
107 | end
108 |
109 | %print line to get overall prediction ypred
110 | fprintf(fid, '\nypred = sum(ygenes,2);');
111 | fclose(fid);
112 |
113 | disp(['Model sucessfully written to ' filename '.m']);
114 | else
115 | error('The Symbolic Math Toolbox is required to use this function.');
116 | end
--------------------------------------------------------------------------------
/conversion/processOrgChartJS.m:
--------------------------------------------------------------------------------
1 | function processOrgChartJS(fid,gp,gpmodel)
2 | %PROCESSORGCHARTJS Writes Google org chart JavaScript for genes to an existing HTML file.
3 | %
4 | % PROCESSORGCHARTJS(FID,GP,GPMODEL) writes the JavaScript for the
5 | % multigene regression model structure GPMODEL to the file handle FID
6 | % (which should be open for writing to and should be closed manually
7 | % after a call to this function).
8 | %
9 | % Copyright (c) 2009-2015 Dominic Searson
10 | %
11 | % GPTIPS 2
12 | %
13 | % See also GPTREESTRUCTURE, GPMODELREPORT, DRAWTREES
14 |
15 | %loop through genes and populate a data table for each
16 | for n=1:gpmodel.genes.num_genes
17 | treestr = gpmodel.genes.geneStrs{n};
18 | treestruct = gptreestructure(gp,treestr);
19 | numNodes = length(treestruct);
20 |
21 | %set up data table to hold node info
22 | fprintf(fid,'\n');
40 | end
--------------------------------------------------------------------------------
/core/Contents.m:
--------------------------------------------------------------------------------
1 | % GPTIPS2
2 | % Symbolic Data Mining for MATLAB evolved
3 | % (c) Aleksei Tepljakov 2017-...
4 | % (c) Dominic Searson 2009-2015
5 | %
6 | % Files
7 | % bootsample - Get an index vector to sample with replacement a data matrix X.
8 | % comparemodelsREC - Graphical REC performance curves for between 1 and 5 multigene models.
9 | % crossover - Sub-tree crossover of encoded tree expressions to produce 2 new ones.
10 | % displaystats - Displays run stats periodically.
11 | % drawtrees - Draws the tree structure(s) of an individual in a web browser.
12 | % evalfitness - Calls the user specified fitness function.
13 | % evalfitness_par - Calls the user specified fitness function (parallel version).
14 | % extract - Extract a subtree from an encoded tree expression.
15 | % genebrowser - Visually analyse unique genes in a population and identify horizontal bloat.
16 | % genefilter - Removes highly correlated genes from a unique GENES struct.
17 | % genes2gpmodel - Create a data structure representing a multigene symbolic regression model from the specified gene list.
18 | % getcomplexity - Returns the expressional complexity of an encoded tree or a cell array of trees.
19 | % getdepth - Returns the tree depth of an encoded tree expression.
20 | % getnumnodes - Returns the number of nodes in an encoded tree expression or the total node count for a cell array of expressions.
21 | % gp_2d_mesh - Creates new training matrix containing pairwise values of all x1 and x2.
22 | % gpcheck - Perform pre-run error checks.
23 | % gpdefaults - Initialises the GPTIPS struct by creating default parameter values.
24 | % gpfinalise - Finalises a run.
25 | % gpinit - Initialises a run.
26 | % gpinitparallel - Initialise the Parallel Computing Toolbox.
27 | % gpmodelfilter - Object to filter a population of multigene symbolic regression models.
28 | % gpmodelreport - Generate an HTML report on the specified multigene regression model.
29 | % gpmodelvars - Display the frequency of input variables present in the specified model.
30 | % gppopvars - Display frequency of the input variables present in models in the population.
31 | % gppretty - Simplify and prettify a multigene symbolic regression model.
32 | % gprandom - Sets random number generator seed according to system clock or user seed.
33 | % gpreformat - Reformats encoded trees so that the Symbolic Math toolbox can process them properly.
34 | % gpsimplify - Simplify SYM expressions in a less glitchy way than SIMPLIFY or SIMPLE.
35 | % gpterminate - Check for early termination of run.
36 | % gptic - Updates the running time of this run.
37 | % gptoc - Updates the running time of this run.
38 | % gptoolboxcheck - Checks if certain toolboxes are installed and licensed.
39 | % gptreestructure - Create cell array containing tree structure connectivity and label information for an encoded tree expression.
40 | % initbuild - Generate an initial population of GP individuals.
41 | % kogene - Knock out genes from a cell array of tree expressions.
42 | % mergegp - Merges two GP population structs into a new one.
43 | % mutate - Mutate an encoded symbolic tree expression.
44 | % ndfsort_rank1 - Fast non dominated sorting algorithm for 2 objectives only - returns only rank 1 solutions.
45 | % paretoreport - Generate an HTML performance/complexity report on the Pareto front of the population.
46 | % picknode - Select a node (or nodes) of specified type from an encoded GP expression and return its position.
47 | % popbrowser - Visually browse complexity and performance characteristics of a population.
48 | % popbuild - Build next population of individuals.
49 | % pref2inf - Recursively extract arguments from a prefix expression and convert to infix where possible.
50 | % regressionErrorCharacteristic - Generates REC curve data using actual and predicted output vectors.
51 | % rungp - Runs GPTIPS 2 using the specified configuration file.
52 | % runtree - Run the fitness function on an individual in the current population.
53 | % scangenes - Scan a single multigene individual for all input variables and return a frequency vector.
54 | % selection - Selects an individual from the current population.
55 | % standaloneModelStats - Compute model performance stats for actual and predicted values.
56 | % summary - Plots basic summary information from a run.
57 | % tree2evalstr - Converts encoded tree expressions into math expressions that MATLAB can evaluate directly.
58 | % treegen - Generate a new encoded GP tree expression.
59 | % uniquegenes - Returns a GENES structure containing the unique genes in a population.
60 | % updatestats - Update run statistics.
61 |
--------------------------------------------------------------------------------
/core/assignadf.m:
--------------------------------------------------------------------------------
1 | function gp = assignadf(gp)
2 | %ASSIGNADF Create ADF function handles in the caller workspace
3 |
4 | % Get both function names and the corresponding evaluation symbols
5 | funNames = gp.nodes.adf.name;
6 | funSymbols = gp.nodes.adf.eval;
7 |
8 | % Assign in caller's workspace
9 | for k=1:length(funNames)
10 | assignin('caller', funNames{k}, str2func(funSymbols{k}));
11 | end
12 |
13 | end
14 |
15 |
--------------------------------------------------------------------------------
/core/bootsample.m:
--------------------------------------------------------------------------------
1 | function xind = bootsample(x,sampleSize)
2 | %BOOTSAMPLE Get an index vector to sample with replacement a data matrix X.
3 | %
4 | % XIND = BOOTSAMPLE(X,SAMPLESIZE) samples the data matrix X to return the
5 | % row indices XIND of X.
6 | %
7 | % Copyright (c) 2009-2015 Dominic Searson
8 | %
9 | % GPTIPS 2
10 |
11 | xSize = size(x,1);
12 |
13 | %generate the required sample indices and sample
14 | xind = ceil(xSize*rand(sampleSize,1));
15 |
16 |
17 |
--------------------------------------------------------------------------------
/core/comparemodelsREC.m:
--------------------------------------------------------------------------------
1 | function comparemodelsREC(gp,modelList,dataset,showBest,showValBest,showTestBest)
2 | %COMPAREMODELSREC Graphical REC performance curves for between 1 and 5 multigene models.
3 | %
4 | % COMPAREMODELSREC(GP,MODELLIST,DATASET,SHOWBEST,SHOWVALBEST,SHOWTESTBEST)
5 | % generates a graphical REC (regression error characteristic) comparison
6 | % of multigene regression models in GP with the numeric IDs in the row
7 | % vector MODELLIST. DATASET is either 'train','test' or 'val' and setting
8 | % SHOWBEST to TRUE also shows the best model in GP (as evaluated on the
9 | % training data) and setting SHOWVALBEST to TRUE shows the best model of
10 | % the run as evaluated on the validation data (if it exists). Similarly,
11 | % setting SHOWTESTBEST to TRUE shows the best model as evaluated on the
12 | % test data (if it exists).
13 | %
14 | % Examples:
15 | %
16 | % COMPAREMODELSREC(GP,[10 19 12]) shows REC curves for models 10, 19 and
17 | % 12 on the training data.
18 | %
19 | % COMPAREMODELSREC(GP,[10 19 12],'test',TRUE) shows REC curves for
20 | % models 10, 19 and 12 on the test data and also shows the 'best' model
21 | % of the run (as evaluated on the training data).
22 | %
23 | % Remarks:
24 | %
25 | % REC curves are similar to receiever operating characteristic (ROC)
26 | % curves for classifiers. The REC curve for a model shows the proportion
27 | % of data (on the Y axis, as a fraction between 0 and 1) that was
28 | % predicted with an accuracy equal to or better than the absolute error
29 | % on the corresponding X axis. Better models lie above and to the left
30 | % of worse models.
31 | %
32 | % Based on the method outlined in "Regression Error Characteristic
33 | % Curves", Jinbo Bi & Kristin P. Bennett, Proceedings of the Twentieth
34 | % International Conference on Machine Learning (ICML-2003), Washington
35 | % DC, 2003.
36 | %
37 | % Copyright (c) 2009-2015 Dominic Searson
38 | %
39 | % GPTIPS 2
40 | %
41 | % See also POPBROWSER, GPMODELREPORT, REGRESSIONERRORCHARACTERISTIC
42 |
43 | if nargin < 5 || isempty(showValBest)
44 | showValBest = false;
45 | end
46 |
47 | if nargin < 4 || isempty(showBest)
48 | showBest = false;
49 | end
50 |
51 | if nargin < 5 || isempty(showTestBest)
52 | showTestBest = false;
53 | end
54 |
55 | if nargin < 3
56 | dataset = 'train';
57 | end
58 |
59 | if nargin < 2 || isempty(modelList)
60 | disp('Usage is COMPAREMODELSREC(GP,MODELLIST)');
61 | return;
62 | end
63 |
64 | if ~strcmpi(func2str(gp.fitness.fitfun),'regressmulti_fitfun');
65 | error('This function is only for use on multigene regression models.');
66 | end
67 |
68 | if isempty(dataset)
69 | dataset = 'train';
70 | end
71 |
72 | if strcmpi(dataset,'train')
73 | dset = 1;
74 |
75 | elseif strcmpi(dataset,'test')
76 |
77 | if ~isfield(gp.userdata,'xtest')
78 | error('Cannot find testing data.');
79 | end
80 | dset = 2;
81 |
82 | elseif strcmpi(dataset,'val')
83 |
84 | if ~isfield(gp.userdata,'xval')
85 | error('There is no validation data.');
86 | end
87 | dset = 3;
88 |
89 | else
90 | error('The dataset must be ''train'', ''test'' or ''val''.');
91 | end
92 |
93 | if ~isnumeric(modelList) || size(modelList,1) > 1
94 | error('Supplied model list must be a row vector of model indices.');
95 | end
96 |
97 | numModels = numel(modelList);
98 |
99 | if numModels > 5
100 | error('Maximum number of models to compare is 5.');
101 | end
102 |
103 | plotdata = struct;
104 |
105 | for i=1:numModels
106 | model = gpmodel2struct(gp,modelList(i),false,false,false);
107 | if ~model.valid
108 | error(['Model ' num2str(modelList(i)) ' in the supplied list is invalid because: ' model.invalidReason]);
109 | end
110 |
111 | if dset == 1
112 | [xdata,ydata,xref,yref] = regressionErrorCharacteristic(gp.userdata.ytrain,model.train.ypred);
113 | elseif dset == 2
114 | [xdata,ydata,xref,yref] = regressionErrorCharacteristic(gp.userdata.ytest,model.test.ypred);
115 | else
116 | [xdata,ydata,xref,yref] = regressionErrorCharacteristic(gp.userdata.yval,model.val.ypred);
117 | end
118 |
119 | plotdata.xref = xref;
120 | plotdata.yref = yref;
121 | plotdata.model(i).xdata = xdata;
122 | plotdata.model(i).ydata = ydata;
123 | end
124 |
125 | %plot
126 | pcol = 'bgrcy';
127 | recFig = figure('visible','off','name','GPTIPS 2 - REC model comparison','numbertitle','off');
128 |
129 | plot(plotdata.model(1).xdata,plotdata.model(1).ydata,'b','LineWidth',2);
130 | hold on;
131 | legstr = cell(0);
132 | legstr{1} = ['Model ID: ' num2str(modelList(1))];
133 |
134 | pre2014b = verLessThan('matlab','8.4'); %R2014b (HG2)
135 |
136 | for i=2:numModels
137 | if pre2014b
138 | plot(plotdata.model(i).xdata,plotdata.model(i).ydata,'LineWidth',2,'Color',pcol(i));
139 | else
140 | plot(plotdata.model(i).xdata,plotdata.model(i).ydata,'LineWidth',2);
141 | end
142 | legstr{i} = ['Model ID: ' num2str(modelList(i))];
143 | end
144 |
145 | xlabel('Absolute error');
146 | ylabel('Accuracy');
147 | grid on;
148 |
149 | if ~isempty(gp.userdata.name)
150 | dstr = [' Data set: ' gp.userdata.name];
151 | else
152 | dstr = '';
153 | end
154 |
155 | if dset == 1
156 | title(['REC model comparison. Training data.' dstr],'interpreter','none','fontWeight','bold');
157 | elseif dset == 2
158 | title(['REC model comparison. Test data.' dstr],'interpreter','none','fontWeight','bold');
159 | else
160 | title(['REC model comparison. Validation data.' dstr],'interpreter','none','fontWeight','bold');
161 | end
162 |
163 | %add best and valbest if specified
164 | if showBest
165 | model = gpmodel2struct(gp,'best');
166 |
167 | if dset == 1
168 | [xdata,ydata] = regressionErrorCharacteristic(gp.userdata.ytrain,model.train.ypred);
169 | elseif dset == 2
170 | [xdata,ydata] = regressionErrorCharacteristic(gp.userdata.ytest,model.test.ypred);
171 | else
172 | [xdata,ydata] = regressionErrorCharacteristic(gp.userdata.yval,model.val.ypred);
173 | end
174 |
175 | plot(xdata,ydata,'k','LineWidth',2);
176 | legstr = horzcat(legstr,'''Best'' model');
177 | end
178 |
179 | if showValBest && isfield(gp.userdata,'xval') && ~isempty(gp.userdata.xval)
180 | model = gpmodel2struct(gp,'valbest');
181 |
182 | if dset == 1
183 | [xdata,ydata] = regressionErrorCharacteristic(gp.userdata.ytrain,model.train.ypred);
184 | elseif dset == 2
185 | [xdata,ydata] = regressionErrorCharacteristic(gp.userdata.ytest,model.test.ypred);
186 | else
187 | [xdata,ydata] = regressionErrorCharacteristic(gp.userdata.yval,model.val.ypred);
188 | end
189 |
190 | plot(xdata,ydata,'m','LineWidth',2);
191 | legstr = horzcat(legstr,'''Valbest'' model');
192 | end
193 |
194 | if showTestBest && isfield(gp.userdata,'xtest') && ~isempty(gp.userdata.xtest)
195 | model = gpmodel2struct(gp,'testbest');
196 |
197 | if dset == 1
198 | [xdata,ydata] = regressionErrorCharacteristic(gp.userdata.ytrain,model.train.ypred);
199 | elseif dset == 2
200 | [xdata,ydata] = regressionErrorCharacteristic(gp.userdata.ytest,model.test.ypred);
201 | else
202 | [xdata,ydata] = regressionErrorCharacteristic(gp.userdata.yval,model.val.ypred);
203 | end
204 |
205 | plot(xdata,ydata,'g','LineWidth',2);
206 | legstr = horzcat(legstr,'''Testbest'' model');
207 | end
208 |
209 | legend(legstr);legend boxoff;
210 | hold off;
211 | set(recFig,'visible','on');
212 |
--------------------------------------------------------------------------------
/core/crossover.m:
--------------------------------------------------------------------------------
1 | function [son,daughter] = crossover(mum,dad,gp)
2 | %CROSSOVER Sub-tree crossover of encoded tree expressions to produce 2 new ones.
3 | %
4 | % [SON,DAUGHTER] = CROSSOVER(MUM,DAD,GP) uses standard subtree crossover
5 | % on the expressions MUM and DAD to produce the offspring expressions SON
6 | % and DAUGHTER.
7 | %
8 | % Copyright (c) 2009-2015 Dominic Searson
9 | %
10 | % GPTIPS 2
11 | %
12 | % See also MUTATE
13 |
14 | %select random crossover nodes in mum and dad expressions
15 | m_position = picknode(mum,0,gp);
16 | d_position = picknode(dad,0,gp);
17 |
18 | %extract main and subtree expressions
19 | [m_main,m_sub] = extract(m_position,mum);
20 | [d_main,d_sub] = extract(d_position,dad);
21 |
22 | %combine to form 2 new GP trees
23 | daughter = strrep(m_main,'$',d_sub);
24 | son = strrep(d_main,'$',m_sub);
--------------------------------------------------------------------------------
/core/displaystats.m:
--------------------------------------------------------------------------------
1 | function displaystats(gp)
2 | %DISPLAYSTATS Displays run stats periodically.
3 | %
4 | % DISPLAYSTATS(GP) updates the screen with run stats at the interval
5 | % specified in GP.RUN.VERBOSE
6 | %
7 | % Copyright (c) 2009-2015 Dominic Searson
8 | %
9 | % GPTIPS 2
10 | %
11 | % See also UPDATESTATS
12 |
13 | %only display info if required
14 | if ~gp.runcontrol.verbose || gp.runcontrol.quiet || ...
15 | mod(gp.state.count-1,gp.runcontrol.verbose)
16 | return
17 | end
18 |
19 | gen = gp.state.count - 1;
20 |
21 | %multiple independent run count display
22 | if gp.runcontrol.runs > 1 && gen == 0
23 | disp(['Run ' int2str(gp.state.run) ' of ' int2str(gp.runcontrol.runs)]);
24 | end
25 |
26 | disp(['Generation ' num2str(gen)]);
27 | disp(['Best fitness: ' num2str(gp.results.best.fitness)]);
28 | disp(['Mean fitness: ' num2str(gp.state.meanfitness)]);
29 |
30 | if gp.fitness.complexityMeasure
31 | disp(['Best complexity: ' num2str(gp.results.best.complexity)]);
32 | else
33 | disp(['Best node count: ' num2str(gp.results.best.nodecount)]);
34 | end
35 |
36 | %display inputs in "best training" individual, if enabled.
37 | if gp.runcontrol.showBestInputs
38 | hitvec = gpmodelvars(gp,'best');
39 | inputs = num2str(find(hitvec));
40 | inputs = strtrim(cellstr(inputs));
41 | pat = '(\d+)';
42 | inputs = regexprep(inputs,pat,'x$1 '); %note: trailing space essential
43 | inputs = [inputs{:}]; % converts char cell array to a single string
44 | disp(['Inputs in best individual: ' inputs]);
45 | end
46 |
47 | %display inputs in "best validation" individual, if enabled.
48 | if gp.runcontrol.showValBestInputs && isfield(gp.userdata,'xval') ...
49 | && ~isempty(gp.userdata.xval) && isfield(gp.results,'valbest')
50 | hitvec = gpmodelvars(gp,'valbest');
51 | inputs = num2str(find(hitvec));
52 | inputs = strtrim(cellstr(inputs));
53 | pat='(\d+)';
54 | inputs = regexprep(inputs,pat,'x$1 '); %note: trailing space essential
55 | inputs = [inputs{:}]; % converts char cell array to a single string
56 | disp(['Inputs in best validation individual: ' inputs]);
57 | end
58 |
59 | disp(' ');
--------------------------------------------------------------------------------
/core/evalfitness.m:
--------------------------------------------------------------------------------
1 | function gp = evalfitness(gp)
2 | %EVALFITNESS Calls the user specified fitness function.
3 | %
4 | % GP = EVALFITNESS(GP) evaluates the the fitnesses of individuals stored
5 | % in the GP structure and updates various other fields of GP accordingly.
6 | %
7 | % Copyright (c) 2009-2015 Dominic Searson
8 | %
9 | % GPTIPS 2
10 | %
11 | % See also TREE2EVALSTR, EVALFITNESS_PAR
12 |
13 | %check parallel mode.
14 | if gp.runcontrol.parallel.enable && gp.runcontrol.parallel.ok
15 | gp = evalfitness_par(gp);
16 | return;
17 |
18 | %regular version
19 | else
20 |
21 | % If ADFs are used, make them known to this function
22 | if gp.nodes.adf.use, assignadf(gp); end
23 |
24 | for i = 1:gp.runcontrol.pop_size
25 |
26 | gp.state.current_individual = i;
27 |
28 | %retrieve values if cached
29 | if gp.runcontrol.usecache && gp.fitness.cache.isKey(i)
30 | cache = gp.fitness.cache(i);
31 | gp.fitness.complexity(i,1) = cache.complexity;
32 | gp.fitness.values(i,1) = cache.value;
33 | gp.fitness.returnvalues{i,1} = cache.returnvalues;
34 |
35 | else
36 | %preprocess cell array of string expressions into a form that
37 | %Matlab can evaluate
38 | evalstr = tree2evalstr(gp.pop{i},gp);
39 |
40 | %store complexity of individual (either number of nodes or tree
41 | %expressional complexity)
42 | if gp.fitness.complexityMeasure
43 | gp.fitness.complexity(i,1) = getcomplexity(gp.pop{i});
44 | else
45 | gp.fitness.complexity(i,1) = getnumnodes(gp.pop{i});
46 | end
47 |
48 | [fitness,gp] = feval(gp.fitness.fitfun,evalstr,gp);
49 | gp.fitness.values(i) = fitness;
50 |
51 | end
52 | end
53 | end
--------------------------------------------------------------------------------
/core/evalfitness_par.m:
--------------------------------------------------------------------------------
1 | function gp = evalfitness_par(gp)
2 | %EVALFITNESS_PAR Calls the user specified fitness function (parallel version).
3 | %
4 | % GP = EVALFITNESS_PAR(GP) evaluates the the fitnesses of individuals
5 | % stored in the GP structure and updates various other fields of GP
6 | % accordingly.
7 | %
8 | % This has the same functionality as EVALFITNESS but makes use of the
9 | % Mathworks Parallel Computing Toolbox to distribute the fitness
10 | % computations across multiple cores.
11 | %
12 | % Copyright (c) 2009-2015 Dominic Searson
13 | %
14 | % GPTIPS 2
15 | %
16 | % See also TREE2EVALSTR, EVALFITNESS
17 |
18 | popSize = gp.runcontrol.pop_size;
19 | evalstrs = cell(popSize,1);
20 | complexityMeasure = gp.fitness.complexityMeasure;
21 | complexities = zeros(popSize,1);
22 | fitfun = gp.fitness.fitfun;
23 | fitvals = zeros(popSize,1);
24 | returnvals = cell(popSize,1);
25 |
26 | if gp.runcontrol.usecache
27 | usecache = true;
28 | else
29 | usecache = false;
30 | end
31 |
32 | % If ADFs are used, make them known to this function
33 | if gp.nodes.adf.use, assignadf(gp); end
34 |
35 | parfor i = 1:popSize;
36 |
37 | %assign copy of gp inside parfor loop to a temp struct
38 | tempgp = gp;
39 |
40 | %update state of temp variable to index of the individual that is about to
41 | %be evaluated
42 | tempgp.state.current_individual = i;
43 |
44 | if usecache && tempgp.fitness.cache.isKey(i)
45 |
46 | cache = tempgp.fitness.cache(i);
47 | complexities(i) = cache.complexity;
48 | fitvals(i) = cache.value;
49 | returnvals{i} = cache.returnvalues;
50 |
51 | else
52 |
53 | %process coded trees into evaluable matlab expressions
54 | evalstrs{i} = tree2evalstr(tempgp.pop{i},gp);
55 |
56 | %store complexity of individual (either number of nodes or tree
57 | % "expressional complexity").
58 | if complexityMeasure
59 | complexities(i) = getcomplexity(tempgp.pop{i});
60 | else
61 | complexities(i) = getnumnodes(tempgp.pop{i});
62 | end
63 |
64 | %evaluate gp individual using fitness function
65 | [fitness,tempgp] = feval(fitfun,evalstrs{i},tempgp);
66 | returnvals{i} = tempgp.fitness.returnvalues{i};
67 | fitvals(i) = fitness;
68 | end
69 | end %end of parfor loop
70 |
71 | %attach returned values to original GP structure
72 | gp.fitness.values = fitvals;
73 | gp.fitness.returnvalues = returnvals;
74 | gp.fitness.complexity = complexities;
75 | gp.state.current_individual = popSize;
--------------------------------------------------------------------------------
/core/extract.m:
--------------------------------------------------------------------------------
1 | function [mainTree,subTree] = extract(index,parentExpr)
2 | %EXTRACT Extract a subtree from an encoded tree expression.
3 | %
4 | % [MAINTREE,SUBTREE] = EXTRACT(INDEX,PARENTEXPR)
5 | %
6 | % Input args:
7 | % PARENTEXPR (the parent string expression)
8 | % INDEX (the index in PARENTEXPR of the root node of the subtree to be
9 | % extracted)
10 | %
11 | % Output args:
12 | % MAINTREE (PARENTEXPR with the removed subtree replaced by '$')
13 | % SUBTREE (the extracted subtree)
14 | %
15 | % Copyright (c) 2009-2015 Dominic Searson
16 | %
17 | % GPTIPS 2
18 | %
19 | % See also PICKNODE, GETDEPTH, GETCOMPLEXITY
20 |
21 | cnode = parentExpr(index);
22 | iplus = index + 1;
23 | iminus = index - 1;
24 | endpos = numel(parentExpr);
25 |
26 | if cnode == 'x' %extracting an input terminal (x1, x2 etc.)
27 |
28 | section = parentExpr(iplus:endpos);
29 | inp_comma_ind = strfind(section,',');
30 | inp_brack_ind = strfind(section,')');
31 |
32 | %if none found then string must consist of single input
33 | if isempty(inp_brack_ind) && isempty(inp_comma_ind)
34 | mainTree = '$';
35 | subTree = parentExpr;
36 | else
37 | inp_ind = sort([inp_brack_ind inp_comma_ind]);
38 | final_ind = inp_ind(1) + index;
39 | subTree = parentExpr(index:final_ind-1);
40 | mainTree = [parentExpr(1:iminus) '$' parentExpr(final_ind:endpos)];
41 | end
42 |
43 | return
44 |
45 | elseif cnode == '[' %ERC
46 |
47 | cl_sbr = strfind(parentExpr(iplus:endpos),']');
48 | final_ind = cl_sbr(1)+index;
49 | subTree = parentExpr(index:final_ind);
50 | mainTree = [parentExpr(1:iminus) '$' parentExpr(final_ind+1:endpos)];
51 | return
52 |
53 | elseif cnode=='?' % ERC token
54 | subTree = cnode;
55 | mainTree = parentExpr;
56 | mainTree(index) = '$';
57 | return
58 |
59 | elseif cnode=='#' % PRC token
60 | subTree = cnode;
61 | mainTree = parentExpr;
62 | mainTree(index) = '$';
63 | return
64 |
65 | else %otherwise extract a tree with a function node as root
66 |
67 | %subtree defined when number open brackets=number of closed brackets
68 | search_seg = parentExpr(index:endpos);
69 |
70 | %get indices of open brackets
71 | op = strfind(search_seg,'(');
72 | cl = strfind(search_seg,')');
73 |
74 | %compare indices to determine point where num_open=num_closed
75 | tr_op = op(2:end);
76 | l_tr_op = numel(tr_op);
77 |
78 | hibvec = tr_op-cl(1:l_tr_op);
79 |
80 | cl_ind = find(hibvec > 0);
81 |
82 | if isempty(cl_ind)
83 | j = cl(numel(op));
84 | else
85 | cl_ind = cl_ind(1);
86 | j = cl(cl_ind);
87 | end
88 |
89 | subTree = search_seg(1:j);
90 | mainTree = [parentExpr(1:iminus) '$' parentExpr(j+index:endpos)];
91 | return
92 | end
--------------------------------------------------------------------------------
/core/genefilter.m:
--------------------------------------------------------------------------------
1 | function genes = genefilter(genes,corrThresh)
2 | %GENEFILTER Removes highly correlated genes from a unique GENES struct.
3 | %
4 | % GENESFILT = GENEFILTER(GENES) returns a unique genes structure
5 | % GENESFILT with the most complex gene of any highly correlated pair of
6 | % genes removed from the GENES structure. Hence, the most complex of any
7 | % pair of genes whose outputs (as evaluated on the training data) have a
8 | % correlation of >= (1 - CORRTHESH) is removed.
9 | %
10 | % This has the effect of reducing the size of the 'gene space' whilst
11 | % leaving the predictive ability of the remaining genes largely
12 | % unchanged. The default correlation threshold of CORRTHRESH is 0.001.
13 | %
14 | % GENESFILT = GENEFILTER(GENES,CORRTHRESH) does the same but CORRTHRESH
15 | % is the user supplied correlation threshold.
16 | %
17 | % Remarks:
18 | %
19 | % Prior to using this function, the GENES structure should be generated
20 | % using the UNIQUEGENES function.
21 | %
22 | % Copyright (c) 2009-2015 Dominic Searson
23 | %
24 | % GPTIPS 2
25 | %
26 | % See also UNIQUEGENES, GENEBROWSER, GPMODELFILTER
27 |
28 | if nargin < 1
29 | error('Basic usage is GENEFILTER(GENES)');
30 | end
31 |
32 | if ~gptoolboxcheck
33 | error('The Symbolic Math Toolbox is required to use this function.');
34 | end
35 |
36 | %set the default correlation threshold
37 | %(i.e. when pairs of genes that have a correlation > (1-threshold)
38 | %then the most complex gene in the pair is removed.
39 | if nargin < 2 || isempty(corrThresh)
40 | corrThresh = 0.001;
41 | end
42 |
43 | disp(['Filtering genes with correlation threshold ' num2str(corrThresh)]);
44 |
45 | %get abs. correlation matrix of gene outputs
46 | c = abs(corrcoef(genes.geneOutputsTrain));
47 |
48 | %loop pairwise through genes and flag correlates for removal.
49 | %when correlated then always remove the more complex of the genes.
50 | removals = false(genes.numUniqueGenes,1);
51 |
52 | for i=1:genes.numUniqueGenes
53 |
54 | %if already removed then skip
55 | if removals(i)
56 | continue;
57 | end
58 |
59 | %flag matches for ith gene
60 | matches = find( c(:,i) >= (1 - corrThresh) );
61 |
62 | %remove the identity match
63 | ident = (matches == i);
64 | matches(ident) = [];
65 |
66 | %if no other matches then jump to next
67 | if isempty(matches)
68 | continue;
69 | end
70 |
71 | %loop through matches for ith gene and get complexities for each match
72 | identComplexity = genes.complexity(i);
73 | for j=1:numel(matches)
74 |
75 | complexity = genes.complexity(matches(j));
76 |
77 | %flag for removal jth gene if complexity of jth is higher
78 | %flag for removal ith gene otherwise
79 | if identComplexity <= complexity
80 | removals(matches(j)) = true;
81 | else
82 | removals(i) = true;
83 | end
84 |
85 | end
86 |
87 | end
88 |
89 | %tidy up GENES structure
90 | genes.filtered = corrThresh;
91 | genes.numUniqueGenes = genes.numUniqueGenes-sum(removals);
92 | remain = ~removals;
93 | genes.rtrain = genes.rtrain(remain);
94 | genes.complexity = genes.complexity(remain);
95 | genes.uniqueGenesSym = genes.uniqueGenesSym(remain);
96 | genes.uniqueGenesCoded = genes.uniqueGenesCoded(remain);
97 | genes.uniqueGenesDecoded = genes.uniqueGenesDecoded(remain);
98 | genes.geneOutputsTrain = genes.geneOutputsTrain(:,remain);
99 |
100 | if ~isempty(genes.geneOutputsTest)
101 | genes.geneOutputsTest = genes.geneOutputsTest(:,remain);
102 | end
103 |
104 | if ~isempty(genes.geneOutputsVal)
105 | genes.geneOutputsVal = genes.geneOutputsVal(:,remain);
106 | end
107 |
108 | disp(['Number of genes removed : ' int2str(sum(removals))]);
109 | disp(['Number of genes retained: ' int2str(sum(remain))]);
110 |
111 | genes.source = 'genefilter';
--------------------------------------------------------------------------------
/core/genes2gpmodel.m:
--------------------------------------------------------------------------------
1 | function gpmodel = genes2gpmodel(gp,genes,uniqueGeneList,tbxStats,createSyms,modelStruc,fastSymMode)
2 | %GENES2GPMODEL Create a data structure representing a multigene symbolic regression model from the specified gene list.
3 | %
4 | % GPMODEL = GENES2GPMODEL(GP,GENES,UNIQUEGENELIST) creates a multigene
5 | % model GPMODEL (functionally identical to that obtained using the
6 | % GPMODEL2STRUCT function) using the list of genes in UNIQUEGENELIST
7 | % where the unique genes are contained within the GENES data structure.
8 | % The list is comprised of numeric model IDs, e.g. [77 22 99 1].
9 | %
10 | % The 'optimal' gene weighting coefficients are computed using a least
11 | % squares procedure on the training data.
12 | %
13 | % Note:
14 | %
15 | % The GENES data structure must be first obtained using the UNIQUEGENES
16 | % function. See GPMODEL2STRUCT for more details about the GPMODEL data
17 | % structure returned.
18 | %
19 | % Copyright (c) 2009-2015 Dominic Searson
20 | %
21 | % GPTIPS 2
22 | %
23 | % See also UNIQUEGENES, GPMODEL2STRUCT, GENEBROWSER, GENEFILTER
24 |
25 | if nargin < 3
26 | error('Usage is GENES2GPMODEL(GP,GENES,UNIQUEGENELIST)');
27 | end
28 |
29 | if nargin < 6 || isempty(fastSymMode)
30 | fastSymMode = false;
31 | end
32 |
33 | %scan model genes for input frequency, depth, complexity? (default = yes).
34 | %(setting to false is a bit quicker but doesn't compute the structural info)
35 | if nargin < 6 || isempty(modelStruc)
36 | modelStruc = true;
37 | end
38 |
39 | %create symbolic objects of overall model and genes? (default = yes).
40 | %(setting to false is significantly quicker but you don't get the symbolic
41 | %math object for each gene)
42 | if nargin < 5 || isempty(createSyms)
43 | createSyms = true;
44 | end
45 |
46 | %compute statistics toolbox stats? (default = no).
47 | if nargin < 4 || isempty(tbxStats)
48 | tbxStats = false;
49 | end
50 |
51 | %remove duplicates in list
52 | uniqueGeneList = unique(uniqueGeneList);
53 | noGene = (uniqueGeneList == 0);
54 | uniqueGeneList(noGene) = [];
55 |
56 | %get gene encodings from genes structure
57 | geneStrs = genes.uniqueGenesCoded(uniqueGeneList)';
58 |
59 | %create gpmodel struct
60 | gpmodel = gpmodel2struct(gp,geneStrs,tbxStats,createSyms,modelStruc,fastSymMode);
61 | gpmodel.source = 'genes2gpmodel';
--------------------------------------------------------------------------------
/core/getcomplexity.m:
--------------------------------------------------------------------------------
1 | function comp = getcomplexity(expr)
2 | %GETCOMPLEXITY Returns the expressional complexity of an encoded tree or a cell array of trees.
3 | %
4 | % COMP = GETCOMPLEXITY(EXPR) finds the expressional complexity COMP of
5 | % the encoded GPTIPS expression EXPR or cell array of expressions EXPR.
6 | %
7 | % Remarks:
8 | %
9 | % "Expressional Complexity" is defined as the sum of nodes of all
10 | % sub-trees within a tree (i.e. as defined by Guido F. Smits and Mark
11 | % Kotanchek) in "Pareto-front exploitation in symbolic regression", pp.
12 | % 283 - 299, Genetic Programming Theory and Practice II, 2004.
13 | %
14 | % Copyright (c) 2009-2015 Dominic Searson
15 | %
16 | % GPTIPS 2
17 | %
18 | % See also GETDEPTH, GETNUMNODES
19 |
20 | if isa(expr,'char')
21 |
22 | comp = getcomp(expr);
23 | return;
24 |
25 | elseif iscell(expr)
26 |
27 | numexpr = numel(expr);
28 |
29 | if numexpr < 1
30 | error('Cell array must contain at least one valid symbolic expression');
31 | else
32 | comp = 0;
33 | for i=1:numexpr
34 | comp = comp + getcomp(expr{i});
35 | end
36 | end
37 | else
38 | error('Illegal argument');
39 | end
40 |
41 | function comp = getcomp(expr)
42 | %GETCOMP Get complexity from a single tree.
43 |
44 | %count leaf nodes (excluding zero arity functions).
45 | %these have complexity measure of 1.
46 | leafcount = numel(strfind(expr,'x')) + numel(strfind(expr,'['));
47 |
48 | %iterate across remaining function
49 | %nodes, getting the number of nodes of each subtree found
50 | ind = strfind(expr,'(');
51 | if isempty(ind)
52 | comp = leafcount;
53 | return;
54 | end
55 |
56 | ind = ind - 1;
57 |
58 | comp = 0;
59 | for i = 1:length(ind)
60 | [~,subtree] = extract(ind(i),expr);
61 | comp = comp + getnn(subtree);
62 | end
63 |
64 | comp = comp + leafcount;
65 |
66 | function numnodes = getnn(expr)
67 | %GETNN get number of nodes from a single tree
68 |
69 | %number of nodes = number of open brackets + number of inputs + number of
70 | %ERC constants
71 | open_br = strfind(expr,'(');
72 | open_sq_br = strfind(expr,'[');
73 | inps = strfind(expr,'x');
74 |
75 | num_open = numel(open_br);
76 | num_const = numel(open_sq_br);
77 | num_inps = numel(inps);
78 | numnodes = num_open + num_const + num_inps;
--------------------------------------------------------------------------------
/core/getdepth.m:
--------------------------------------------------------------------------------
1 | function depth=getdepth(expr)
2 | %GETDEPTH Returns the tree depth of an encoded tree expression.
3 | %
4 | % DEPTH = GETDEPTH(EXPR) returns the DEPTH of the tree expression EXPR.
5 | %
6 | % Remarks:
7 | %
8 | % A single node is a tree of depth 1.
9 | %
10 | % Copyright (c) 2009-2015 Dominic Searson
11 | %
12 | % GPTIPS 2
13 | %
14 | % See also GETNUMNODES, GETCOMPLEXITY
15 |
16 | %check to make sure expression is not actually a cell array of size 1
17 | if iscell(expr)
18 | expr = expr{1};
19 | end
20 |
21 | %workaround for arity zero functions, which are represented by f()
22 | %string replace '()' with the empty string
23 | expr = strrep(expr,'()','');
24 | open_br = strfind(expr,'(');
25 | close_br = strfind(expr,')');
26 | num_open = numel(open_br);
27 |
28 | if ~num_open %i.e. a single node
29 | depth = 1;
30 | return
31 | elseif num_open == 1
32 | depth = 2;
33 | return
34 | else
35 | %depth = max consecutive number of open brackets+1
36 | br_vec = zeros(1,numel(expr));
37 | br_vec(open_br) = 1;
38 | br_vec(close_br) = -1;
39 | depth = max(cumsum(br_vec)) + 1;
40 | end
--------------------------------------------------------------------------------
/core/getnumnodes.m:
--------------------------------------------------------------------------------
1 | function numnodes = getnumnodes(expr)
2 | %GETNUMNODES Returns the number of nodes in an encoded tree expression or the total node count for a cell array of expressions.
3 | %
4 | % NUMNODES = GETNUMNODES(EXPR) finds the number of nodes in the GP
5 | % expression EXPR or cell array of expressions EXPR.
6 | %
7 | % Copyright (c) 2009-2015 Dominic Searson
8 | %
9 | % GPTIPS 2
10 | %
11 | % See also GETDEPTH, GETCOMPLEXITY
12 |
13 | if isa(expr,'char')
14 |
15 | numnodes = getnn(expr);
16 | return;
17 |
18 | elseif iscell(expr)
19 |
20 | numexpr = numel(expr);
21 |
22 | if numexpr < 1
23 | error('Cell array must contain at least one valid symbolic expression');
24 | else
25 | numnodes = 0;
26 | for i=1:numexpr
27 | numnodes = numnodes + getnn(expr{i});
28 | end
29 | end
30 | else
31 | error('Illegal argument');
32 | end
33 |
34 | function numnodes=getnn(expr)
35 | %GETNN get numnodes from a single symbolic string
36 |
37 | %number of nodes = number of open brackets +number of inputs +number of
38 | %constants
39 | open_br = strfind(expr,'(');
40 | open_sq_br = strfind(expr,'[');
41 | inps = strfind(expr,'x');
42 |
43 | num_open = numel(open_br);
44 | num_const = numel(open_sq_br);
45 | num_inps = numel(inps);
46 |
47 | numnodes = num_open + num_const + num_inps;
--------------------------------------------------------------------------------
/core/gp_2d_mesh.m:
--------------------------------------------------------------------------------
1 | function x = gp_2d_mesh(x1,x2)
2 | %GP_2D_MESH Creates new training matrix containing pairwise values of all x1 and x2.
3 | %
4 | % X = GP_2D_MESH(X1,X2)
5 | %
6 | % e.g. if X1 = [0 1]' and X2 = [5 6]' then
7 | %
8 | % X = [0 5
9 | % 0 6
10 | % 1 5
11 | % 1 6]
12 | %
13 | % Copyright (c) 2009-2015 Dominic Searson
14 | %
15 | % GPTIPS 2
16 |
17 | %create a training "mesh" over the range spanned by x1 and x2
18 | numx1 = length(x1);
19 | numx2 = length(x2);
20 | x=[];
21 |
22 | for i = 1:numx1
23 | xtemp = ones(numx2,2);
24 | xtemp(:,1) = xtemp(:,1)*x1(i);
25 | xtemp(:,2) = x2;
26 | x = [x;xtemp];
27 | end
28 |
--------------------------------------------------------------------------------
/core/gp_toolboxlist.m:
--------------------------------------------------------------------------------
1 | function toolboxes = gp_toolboxlist()
2 | %GP_TOOLBOXLIST Generate a list of currently available MATLAB toolboxes
3 |
4 | v = ver;
5 | [toolboxes{1:length(v)}] = deal(v.Name);
6 |
7 | end
8 |
9 |
--------------------------------------------------------------------------------
/core/gpinitparallel.m:
--------------------------------------------------------------------------------
1 | function gp = gpinitparallel(gp)
2 | %GPINITPARALLEL Initialise the Parallel Computing Toolbox.
3 | %
4 | % GP = GPINITPARALLEL(GP) initialises the Parallel Computing Toolbox if
5 | % it has been enabled and the Parallel Computing Toolbox license is
6 | % present.
7 | %
8 | % IMPORTANT:
9 | %
10 | % There is a known JVM bug in versions of MATLAB (all platforms) prior to
11 | % version R2013b (6.3). This causes a failure of the Parallel Computing
12 | % Toolbox in most cases.
13 | %
14 | % There is a fix/workaround for this here:
15 | %
16 | % http://www.mathworks.com/support/bugreports/919688
17 | %
18 | % Please apply this fix if you are using a version prior to R2013b.
19 | %
20 | % Copyright (c) 2009-2015 Dominic Searson
21 | %
22 | % GPTIPS 2
23 | %
24 | % See also GPINIT, GPTOOLBOXCHECK
25 |
26 | gp.runcontrol.parallel.ok = false;
27 | parToolbox = gp.info.toolbox.parallel();
28 |
29 | if parToolbox && (gp.runcontrol.parallel.auto || gp.runcontrol.parallel.enable)
30 | mps = getMaxPoolSize;
31 | end
32 |
33 | %in auto mode - try to start parallel mode with auto settings but exit and
34 | %proceed in standard mode if no license
35 | if gp.runcontrol.parallel.auto
36 | if parToolbox
37 | gp.runcontrol.parallel.enable = true;
38 | gp.runcontrol.parallel.autosize = true;
39 | gp.runcontrol.parallel.numWorkers = mps;
40 | else
41 | gp.runcontrol.parallel.enable = false;
42 | return;
43 | end
44 | end
45 |
46 | if gp.runcontrol.parallel.enable
47 |
48 | if ~parToolbox
49 | error('You do not have a license for the Parallel Computing Toolbox or it is not installed. Please set gp.runcontrol.parallel.enable = false.');
50 | end
51 |
52 | ps = getCurrentPoolSize;
53 |
54 | %check if pool is already open and of right size
55 | if (ps > 0) && ( (gp.runcontrol.parallel.autosize && (ps == mps) ) || ...
56 | (~gp.runcontrol.parallel.autosize && (ps == gp.runcontrol.parallel.numWorkers) ) )
57 | gp.runcontrol.parallel.ok = true;
58 | gp.runcontrol.parallel.numWorkers = ps;
59 |
60 | if ~gp.runcontrol.quiet
61 | disp(' ');
62 | disp(['GPTIPS: proceeding in parallel mode with ' num2str(gp.runcontrol.parallel.numWorkers) ' workers.']);
63 | disp(' ');
64 | end
65 |
66 | else %otherwise stop then start correct size pool
67 |
68 | if ps > 0
69 | if ~gp.runcontrol.quiet
70 | disp(' ');
71 | disp('GPTIPS: stopping existing pool for parallel computation.');
72 | disp(' ');
73 | end
74 |
75 | if verLessThan('matlab','8.3.0')
76 | matlabpool close;
77 | else
78 | delete(gcp('nocreate'));
79 | end
80 | end
81 |
82 | try %start new pool
83 |
84 | if ~gp.runcontrol.quiet
85 | disp(' ');
86 | disp('GPTIPS: attempting to start new pool for parallel computation.');
87 | disp(' ');
88 | end
89 |
90 | if verLessThan('matlab','8.3.0')
91 | matlabpool close force local;
92 | eval(['matlabpool ' num2str(gp.runcontrol.parallel.numWorkers)]);
93 | else
94 | delete(gcp('nocreate'));
95 |
96 | if gp.runcontrol.parallel.autosize
97 | a = parpool;
98 | gp.runcontrol.parallel.numWorkers = a.NumWorkers;
99 | else
100 | parpool(gp.runcontrol.parallel.numWorkers);
101 | end
102 |
103 | end
104 |
105 | gp.runcontrol.parallel.ok = true;
106 |
107 | if ~gp.runcontrol.quiet
108 | disp(' ');
109 | disp(['GPTIPS: proceeding in parallel mode with ' num2str(gp.runcontrol.parallel.numWorkers) ' workers.']);
110 | disp(' ');
111 | end
112 |
113 | catch
114 | gp.runcontrol.parallel.ok = false;
115 |
116 | if ~gp.runcontrol.quiet
117 | warning('There was a problem starting GPTIPS in parallel computation mode.');
118 | warning('To run GPTIPS in non-parallel mode set gp.runcontrol.parallel.enable = false in your config file.');
119 | error(['GPTIPS: could not start pool for parallel computation because: ' lasterr]);
120 | end
121 |
122 | end
123 | end
124 |
125 | else
126 | gp.runcontrol.parallel.ok = false;
127 | end
128 |
129 | function x = getCurrentPoolSize
130 | %GETCURRENTPOOLSIZE Get the current parallel pool size. Workaround
131 | %function: calls either matlabpool('size') or pool_size hack depending on
132 | %MATLAB version
133 |
134 | if verLessThan('matlab', '7.7.0')
135 | x = pool_size;
136 | elseif verLessThan('matlab','8.3.0')
137 | x = matlabpool('size');
138 | else
139 | pool = gcp('nocreate');
140 |
141 | if isempty(pool)
142 | x = 0;
143 | else
144 | if pool.Connected
145 | x = pool.NumWorkers;
146 | else
147 | x = 0;
148 | end
149 | end
150 |
151 | end
152 |
153 | function x = getMaxPoolSize
154 | %getMaxPoolSize - try to get the max number of workers in a pool for the
155 | %default profile on the current machine
156 |
157 | if verLessThan('matlab','8.3.0')
158 | c = findResource('scheduler', 'configuration', defaultParallelConfig);
159 | x = c.ClusterSize;
160 | else
161 | c = parcluster();
162 | x = c.NumWorkers;
163 | end
164 |
165 | function x = pool_size
166 | %POOL_SIZE - Mathworks hack to return size of current MATLABPOOL. Used for
167 | %versions prior to MATLAB 7.7 (R2008b)
168 | %
169 | % See:
170 | % http://www.mathworks.co.uk/support/solutions/en/data/1-5UDHQP/index.html
171 |
172 | session = com.mathworks.toolbox.distcomp.pmode.SessionFactory.getCurrentSession;
173 | if ~isempty( session ) && session.isSessionRunning() && session.isPoolManagerSession()
174 | client = distcomp.getInteractiveObject();
175 | if strcmp( client.CurrentInteractiveType, 'matlabpool' )
176 | x = session.getLabs().getNumLabs();
177 | else
178 | x = 0;
179 | end
180 | else
181 | x = 0;
182 | end
--------------------------------------------------------------------------------
/core/gpmodelvars.m:
--------------------------------------------------------------------------------
1 | function hvec=gpmodelvars(gp,ID)
2 | %GPMODELVARS Display the frequency of input variables present in the specified model.
3 | %
4 | % GPMODELVARS(GP,ID) displays input frequency for the model with numeric
5 | % identifier ID in the GPTIPS datastructure GP.
6 | %
7 | % GPMODELVARS(GP,'best') does the same for the 'best' model of the run
8 | % (as evaluated on the training data).
9 | %
10 | % GPMODELVARS(GP,'valbest') does the same for the model that performed
11 | % best on the validation data (if this data exists).
12 | %
13 | % GPMODELVARS(GP,'testbest') does the same for the model that performed
14 | % best on the test data (if this data exists).
15 | %
16 | % GPMODELVARS(GP,GPMODEL) operates on the GPMODEL struct representing a
17 | % multigene regression model, i.e. the struct returned by the functions
18 | % GPMODEL2STRUCT or GENES2GPMODEL.
19 | %
20 | % HITVEC = GPMODELVARS(GP,'valbest') returns a frequency vector and
21 | % suppresses graphical output.
22 | %
23 | % Copyright (c) 2009-2015 Dominic Searson
24 | %
25 | % GPTIPS 2
26 | %
27 | % See also GPPOPVARS
28 |
29 | hitvec = [];
30 |
31 | if nargin < 2
32 | disp('Usage is GPMODELVARS(GP,ID) where ID is the numeric identifier of the selected individual.');
33 | disp('or GPMODELVARS(GP,''BEST'') to run the best individual of the run');
34 | disp('or GPMODELVARS(GP,''VALBEST'') to run the best validation individual of the run.');
35 | disp('or GPMODELVARS(GP,''TESTBEST'') to run the best test individual of the final population.');
36 | return;
37 | end
38 |
39 | if nargout < 1
40 | graph = true;
41 | else
42 | graph = false;
43 | end
44 |
45 | if isfield(gp.userdata,'name') && ~isempty(gp.userdata.name)
46 | setname = ['Data: ' gp.userdata.name];
47 | else
48 | setname = '';
49 | end
50 |
51 | %get a specified individual from the population and scan it for input
52 | %variables
53 | if isnumeric(ID)
54 |
55 | if ID < 1 || ID > numel(gp.pop)
56 | disp('Invalid value of supplied numerical identifier ID.');
57 | return;
58 | end
59 |
60 | model = gp.pop{ID};
61 | title_str = ['Input frequency in individual with ID: ' int2str(ID)];
62 |
63 | elseif ischar(ID) && strcmpi(ID,'best')
64 |
65 | model = gp.results.best.individual;
66 | title_str = 'Input frequency in best individual.';
67 |
68 | elseif ischar(ID) && strcmpi(ID,'valbest')
69 |
70 | % check that validation data is present
71 | if ~isfield(gp.results,'valbest')
72 | error('No validation data was found. Try GPMODELVARS(GP,''BEST'') instead.');
73 | end
74 |
75 | model = gp.results.valbest.individual;
76 | title_str = 'Input frequency in best validation individual.';
77 |
78 | elseif ischar(ID) && strcmpi(ID,'testbest')
79 |
80 | % check that test data is present
81 | if ~isfield(gp.results,'testbest')
82 | error('No test data was found. Try GPMODELVARS(GP,''BEST'') instead.');
83 | end
84 |
85 | model = gp.results.testbest.individual;
86 | title_str = 'Input frequency in best test individual.';
87 |
88 | %for a supplied cell array of encoded tree expressions
89 | elseif iscell(ID)
90 | model = ID;
91 | title_str = ('Input frequency in user model.');
92 |
93 | elseif isstruct(ID) && isfield(ID,'genes')
94 | model = ID.genes.geneStrs;
95 | title_str = ('Input frequency in user model.');
96 |
97 | else
98 | error('Illegal argument');
99 | end
100 |
101 | %perform scan
102 | numx = gp.nodes.inputs.num_inp;
103 | hitvec = scangenes(model,numx);
104 |
105 | % plot results as barchart
106 | if graph
107 | h = figure;
108 | set(h,'name','GPTIPS 2 single model input frequency','numbertitle','off');
109 | a = gca;
110 | bar(a,hitvec);shading faceted;
111 |
112 | if ~verLessThan('matlab','8.4') %R2014b
113 | a.Children.FaceColor = [0 0.45 0.74];
114 | a.Children.BaseLine.Visible = 'off';
115 | else
116 | b = get(a,'Children');
117 | set(b,'FaceColor',[0 0.45 0.74],'ShowBaseLine','off');
118 | end
119 | axis tight;
120 | xlabel('Input');
121 | ylabel('Input frequency');
122 | title({setname, title_str},'Fontweight','bold');
123 | grid on;
124 | end
125 |
126 | hitvec = hitvec';
127 |
128 | if nargout > 0
129 | hvec = hitvec;
130 | end
--------------------------------------------------------------------------------
/core/gppopvars.m:
--------------------------------------------------------------------------------
1 | function hvec = gppopvars(gp,R2thresh,complexityThresh)
2 | %GPPOPVARS Display frequency of the input variables present in models in the population.
3 | %
4 | % For multigene symbolic regression problems:
5 | %
6 | % GPPOPVARS(GP) displays an input frequency barchart for all models in GP
7 | % with R2 >= 0.6 where R2 is the coefficient of the determination for the
8 | % models on the training data.
9 | %
10 | % GPPOPVARS(GP,R2THRESH) displays an input frequency barchart for all
11 | % models with R2 >= R2THRESH. R2THRESH must be > 0 and <= 1.
12 | %
13 | % GPPOPVARS(GP,R2THRESH,COMPLEXITYTHRESH) displays an input frequency
14 | % barchart for all models with R2 >= R2THRESH which have complexities
15 | % (node count or expressional complexity) lower or equal to
16 | % COMPLEXITYTHRESH.
17 | %
18 | % HITVEC = GPPOPVARS(GP,R2THRESH,COMPLEXITYTHRESH) returns a frequency
19 | % vector and suppresses graphical output.
20 | %
21 | % For other (non multigene regression) problems:
22 | %
23 | % HITVEC = GPPOPVARS(GP) displays and returns an input frequency vector
24 | % for the best 5% of the population.
25 | %
26 | % HITVEC = GPPOPVARS(GP,TOP_FRAC) displays and returns an input frequency
27 | % vector for the best fraction TOP_FRAC of the population. TOP_FRAC must
28 | % be > 0 and <= 1.
29 | %
30 | % Remarks:
31 | %
32 | % Assumes lower fitnesses are better.
33 | %
34 | % Copyright (c) 2009-2015 Dominic Searson
35 | %
36 | % GPTIPS 2
37 | %
38 | % See also GPMODELVARS
39 |
40 | if nargin < 1
41 | disp('Basic usage is GPPOPVARS(GP).');
42 |
43 | if nargout > 0
44 | hvec = [];
45 | end
46 |
47 | return
48 | end
49 |
50 | if nargout < 1
51 | graph = true;
52 | else
53 | graph = false;
54 | end
55 |
56 | if nargin < 3
57 | complexityThresh = Inf;
58 | end
59 |
60 | %set threshold defaults
61 | if nargin < 2
62 | R2thresh = 0.6;
63 | top_frac = 0.05;
64 | end
65 |
66 | if R2thresh <= 0 || R2thresh > 1
67 | error('Supplied R^2 threshold must be greater than zero and less than or equal to 1.');
68 | end
69 |
70 | if complexityThresh < 1
71 | error('Complexity/nodecount threshold must be greater than zero.');
72 | end
73 |
74 | numx = gp.nodes.inputs.num_inp;
75 | hitvec = zeros(1,numx);
76 |
77 | if isfield(gp.userdata,'name') && ~isempty(gp.userdata.name)
78 | setname = ['Data: ' gp.userdata.name];
79 | else
80 | setname = '';
81 | end
82 |
83 | %for mg symbolic regression models
84 | if strncmpi('regressmulti',func2str(gp.fitness.fitfun),12)
85 |
86 | %get indices of models satisfying constraints
87 | inds = (gp.fitness.r2train >= R2thresh) & (gp.fitness.complexity <= complexityThresh);
88 | numModels = sum(inds);
89 |
90 | if numModels == 0
91 | disp(['No models matching the supplied criteria (R2 >= ' num2str(R2thresh)...
92 | ', complexity <= ' num2str(complexityThresh) ') were found.']);
93 | hitvec = [];
94 | return
95 | end
96 |
97 | inds2scan = find(inds);
98 | num2scan = numel(inds2scan);
99 |
100 | if num2scan > 200
101 | disp('Please wait, scanning models ...');
102 | end
103 |
104 | for i=1:num2scan
105 | model = gp.pop{i};
106 | hitvec = hitvec + scangenes(model,numx);
107 | end
108 |
109 | if ~isinf(complexityThresh)
110 | if gp.fitness.complexityMeasure
111 | endstr = [' and complexity <= ' num2str(complexityThresh) '.'];
112 | else
113 | endstr = [' and node count <= ' num2str(complexityThresh) '.'];
114 | end
115 | else
116 | endstr = '.';
117 | end
118 | titlestr = {setname,['Input frequency in ' num2str(numModels) ' models (from ' num2str(gp.runcontrol.pop_size) ') with R^2 >= ' num2str(R2thresh) endstr]};
119 |
120 | else %non mg symbolic regression
121 |
122 | num2process = ceil(top_frac*gp.runcontrol.pop_size);
123 |
124 | [~,sort_ind] = sort(gp.fitness.values);
125 | sort_ind = sort_ind(1:num2process);
126 |
127 | if ~gp.fitness.minimisation
128 | sort_ind = flipud(sort_ind);
129 | end
130 |
131 | %loop through population
132 | for i=1:num2process
133 | model = gp.pop{sort_ind(i)};
134 | hitvec = hitvec + scangenes(model,numx);
135 | end
136 | titlestr= ['Input frequency - best ' num2str(100*top_frac) '% of whole population.'];
137 | end
138 |
139 | % plot results as barchart
140 | if graph
141 |
142 | h = figure;
143 | set(h,'name','GPTIPS 2 Population input frequency','numbertitle','off');
144 | a = gca;
145 | bar(a,hitvec);
146 | if ~verLessThan('matlab','8.4') %R2014b
147 | a.Children.FaceColor = [0 0.45 0.74];
148 | a.Children.BaseLine.Visible = 'off';
149 | else
150 | b = get(a,'Children');
151 | set(b,'FaceColor',[0 0.45 0.74],'ShowBaseLine','off');
152 | end
153 | grid on;
154 | xlabel('Input');
155 | ylabel('Input frequency');
156 | title(titlestr,'FontWeight','bold');
157 | hitvec=hitvec';
158 | else
159 | h = [];
160 | end
161 |
162 | if nargout > 0
163 | hvec = hitvec';
164 | end
165 |
--------------------------------------------------------------------------------
/core/gprandom.m:
--------------------------------------------------------------------------------
1 | function gp = gprandom(gp,seed)
2 | %GPRANDOM Sets random number generator seed according to system clock or user seed.
3 | %
4 | % (c) Dominic Searson 2009-2015
5 | %
6 | % GPTIPS 2
7 |
8 | if nargin < 2 || isempty(seed)
9 | seed = sum(100*clock);
10 | end
11 |
12 | gp.info.PRNGseed = seed;
13 |
14 | if verLessThan('matlab', '7.7.0');
15 | rand('twister', gp.info.PRNGseed);
16 | elseif verLessThan('matlab','8.1')
17 | s = RandStream.create('mt19937ar','seed',gp.info.PRNGseed);
18 | RandStream.setDefaultStream(s);
19 | else
20 | s = RandStream.create('mt19937ar','seed',gp.info.PRNGseed);
21 | RandStream.setGlobalStream(s);
22 | end
--------------------------------------------------------------------------------
/core/gpsimplify.m:
--------------------------------------------------------------------------------
1 | function symOut = gpsimplify(symIn,numSteps,verOld,fastMode,seconds)
2 | %GPSIMPLIFY Simplify SYM expressions in a less glitchy way than SIMPLIFY or SIMPLE.
3 | %
4 | % SYMOUT = GPSIMPLIFY(SYMIN, NUMSTEPS, VEROLD, FAST) simplifies SYMIN
5 | % using a maximum of NUMSTEPS using the MuPAD engine directly if VEROLD
6 | % is FALSE and falling back to the standard MATLAB SIMPLIFY method is
7 | % VEROLD is TRUE.
8 | %
9 | % The slower (but 'better') MuPAD Engine symbolic:Simplify method is used
10 | % if FASTMODE is FALSE and the possibly faster symbolic:simplify method
11 | % is used if FASTMODE = TRUE. The default is FASTMODE = FALSE.
12 | %
13 | % According to Mathworks, the MuPAD symbolic:simplify method is "faster,
14 | % but less reliable and controllable" than the algorithm used by MuPAD
15 | % symbolic:Simplify.
16 | %
17 | % It is recommended that you use the functions GPPRETTY and GPMODEL2SYM
18 | % to simplify GPTIPS expressions, rather than calling this function
19 | % directly.
20 | %
21 | % Copyright (c) 2009-2015 Dominic Searson
22 | %
23 | % GPTIPS 2
24 | %
25 | % See also GPPRETTY, GPMODEL2SYM, GPMODEL2STRUCT, SYM/VPA
26 |
27 | if nargin < 4 || isempty(fastMode)
28 | fastMode = false;
29 | end
30 |
31 | if nargin < 5 || isempty(seconds)
32 | seconds = '0.1';
33 | end
34 |
35 | try
36 | if verOld %for 2007 MATLAB, default to standard simplify method
37 | symOut = simplify(symIn);
38 | else
39 | %otherwise call Mupad symengine directly
40 |
41 | if fastMode
42 | %Note that according to:
43 | %http://www.mathworks.co.uk/help/symbolic/mupad_ug/use-general-simplification-functions.html
44 | %The MuPAD symbolic:simplify method "searches for a simpler
45 | %form by rewriting the terms of an expression." and "generally,
46 | %this method is faster, but less reliable and controllable than
47 | %the algorithm used by symbolic:Simplify."
48 | symOut = evalin(symengine,['symbolic:simplify(' char(symIn) ')']);
49 | else %and similarly, symbolic:Simplify "performs more extensive search
50 | %for a simple form of an expression. For some expressions, this
51 | %function can be slower, but more flexible and more powerful
52 | %than simplify."
53 | symOut = evalin(symengine,['symbolic:Simplify(' char(symIn) ',Seconds = ' num2str(seconds) ', Steps = ' num2str(numSteps) ')']);
54 | end
55 | end
56 | catch
57 | symOut = symIn;
58 | end
--------------------------------------------------------------------------------
/core/gpsym.m:
--------------------------------------------------------------------------------
1 | function s = gpsym(gp, expr)
2 | %GPSYM Compatibility function for MATLAB pre-R2017b and post-R2018a
3 | %The GP structure must be known to GPSYM otherwise it will not be able
4 | %to differentiate between symbolic names found in the expressions
5 |
6 | % Known functions that should be excluded from becoming symbols
7 | known_fun = {'abs' 'acos' 'acosd' 'acosh' 'acot' 'acotd' 'acoth' 'acsc' ...
8 | 'acscd' 'acsch' 'adjoint' 'airy' 'and' 'angle' 'any' 'asec' 'asecd' ...
9 | 'asech' 'asin' 'asind' 'asinh' 'atan' 'atan2' 'atan2d' 'atand' ...
10 | 'atanh' 'bernoulli' 'bernstein' 'bernsteinMatrix' 'besselh' ...
11 | 'besseli' 'besselj' 'besselk' 'bessely' 'beta' 'cat' 'catalan' ...
12 | 'ceil' 'charpoly' 'chebyshevT' 'chebyshevU' 'checkUnits' 'chol' ...
13 | 'coeffs' 'collect' 'colspace' 'combine' 'compose' 'cond' 'conj' ...
14 | 'cos' 'cosd' 'cosh' 'coshint' 'cosint' 'cospi' 'cot' 'cotd' 'coth' ...
15 | 'csc' 'cscd' 'csch' 'ctranspose' 'cumprod' 'cumsum' 'curl' ...
16 | 'daeFunction' 'dawson' 'deg2rad' 'det' 'diag' 'diff' 'dilog' ...
17 | 'dirac' 'display' 'divergence' 'divisors' 'ei' 'eig' 'eliminate' ...
18 | 'ellipj' 'ellipke' 'ellipticCE' 'ellipticCK' 'ellipticCPi' ...
19 | 'ellipticE' 'ellipticF' 'ellipticK' 'ellipticNome' 'ellipticPi' ...
20 | 'eq' 'equationsToMatrix' 'erf' 'erfc' 'erfcinv' 'erfi' 'erfinv' ...
21 | 'euler' 'eulergamma' 'exp' 'expand' 'expint' 'expm' 'factor' ...
22 | 'factorial' 'factorIntegerPower' 'fibonacci' 'find' 'findSymType' ...
23 | 'finverse' 'fix' 'floor' 'fortran' 'fourier' 'frac' 'fresnelc' ...
24 | 'fresnels' 'functionalDerivative' 'funm' 'gamma' 'gammaln' 'gbasis' ...
25 | 'gcd' 'ge' 'gegenbauerC' 'gradient' 'gt' 'harmonic' 'has' ...
26 | 'heaviside' 'hermiteForm' 'hermiteH' 'hessian' 'horner' 'htrans' ...
27 | 'hurwitzZeta' 'hypergeom' 'hypot' 'ifourier' 'igamma' 'ihtrans' ...
28 | 'ilaplace' 'imag' 'in' 'incidenceMatrix' 'int' 'inv' 'isAlways' ...
29 | 'isequal' 'isequaln' 'ismember' 'isnan' 'isolate' 'iztrans' ...
30 | 'jacobiAM' 'jacobian' 'jacobiCD' 'jacobiCN' 'jacobiCS' 'jacobiDC' ...
31 | 'jacobiDN' 'jacobiDS' 'jacobiNC' 'jacobiND' 'jacobiNS' 'jacobiP' ...
32 | 'jacobiSC' 'jacobiSD' 'jacobiSN' 'jacobiZeta' 'jordan' 'kron' ...
33 | 'kroneckerDelta' 'kummerU' 'laguerreL' 'lambertw' 'laplace' ...
34 | 'laplacian' 'lcm' 'ldivide' 'le' 'legendreP' 'lhs' 'limit' ...
35 | 'linsolve' 'log' 'log10' 'log2' 'logint' 'logm' 'lt' 'lu' 'max' ...
36 | 'meijerG' 'min' 'minpoly' 'minus' 'mixedUnits' 'mldivide' 'mod' ...
37 | 'mpower' 'mrdivide' 'mtimes' 'nchoosek' 'ndims' 'ne' 'nextprime' ...
38 | 'nnz' 'nonzeros' 'norm' 'not' 'nthprime' 'nthroot' 'null' 'numden' ...
39 | 'numel' 'odeToVectorField' 'or' 'orth' 'pade' 'partfrac' 'permute' ...
40 | 'piecewise' 'pinv' 'plus' 'pochhammer' 'poles' 'polylog' ...
41 | 'polynomialDegree' 'polynomialReduce' 'potential' 'power' ...
42 | 'powermod' 'prevprime' 'prod' 'psi' 'qr' 'quorem' 'rad2deg' 'rank' ...
43 | 'rdivide' 'real' 'rectangularPulse' 'reduceDAEIndex' ...
44 | 'reduceDAEToODE' 'reduceDifferentialOrder' 'reduceRedundancies' ...
45 | 'rem' 'reshape' 'resultant' 'rewrite' 'rhs' 'root' 'round' 'rref' ...
46 | 'rsums' 'sec' 'secd' 'sech' 'series' 'sign' 'signIm' 'simplify' ...
47 | 'simplifyFraction' 'sin' 'sinc' 'sind' 'single' 'sinh' 'sinhint' ...
48 | 'sinint' 'sinpi' 'smithForm' 'sort' 'sqrt' 'sqrtm' 'ssinint' ...
49 | 'subexpr' 'subs' 'sum' 'svd' 'symprod' 'symsum' 'tan' 'tand' 'tanh' ...
50 | 'taylor' 'times' 'toeplitz' 'transpose' 'triangularPulse' 'tril' ...
51 | 'triu' 'uminus' 'uplus' 'vectorPotential' 'vertcat' 'vpa' ...
52 | 'vpaintegral' 'vpasolve' 'whittakerM' 'whittakerW' 'wrightOmega' ...
53 | 'xor' 'zeta' 'ztrans'};
54 |
55 | % Here we need to take into account that expr may be numeric in which
56 | % case the str2sym function will fail. sym(), on the other hand, will
57 | % return a symbolic object while converting a double prec. number to a
58 | % fraction, if possible.
59 | % TODO: Possible bug hidden in here? Must test.
60 | if ~exist('str2sym', 'file') || isnumeric(expr)
61 | s = sym(expr);
62 | else
63 |
64 | funcs = [gp.nodes.functions.name gp.nodes.adf.syms]; % Combine all
65 | funcs = unique(funcs); % Remove repeating entries
66 | funcs = setdiff(funcs, known_fun); % Remove fundamental functions
67 | for k=1:numel(funcs)
68 | assignin('caller', funcs{k}, sym(funcs{k}));
69 | end
70 |
71 | % Finally, create the symbolic expression
72 | s = str2sym(expr);
73 | end
74 |
75 | end
76 |
77 |
--------------------------------------------------------------------------------
/core/gpterminate.m:
--------------------------------------------------------------------------------
1 | function gp = gpterminate(gp)
2 | %GPTERMINATE Check for early termination of run.
3 | %
4 | % GP = GPTERMINATE(GP) checks if the GPTIPS run should be terminated at
5 | % the end of the current generation according to a fitness criterion or
6 | % run timeout.
7 | %
8 | % Copyright (c) 2009-2015 Dominic Searson
9 | %
10 | % GPTIPS 2
11 | %
12 | % See also GPFINALISE, GPTIC, GPTOC
13 |
14 | %check if fitness termination criterion met
15 | if gp.fitness.terminate
16 |
17 | if gp.fitness.minimisation
18 | if gp.results.best.fitness <= gp.fitness.terminate_value
19 | gp.state.terminate = true;
20 | gp.state.terminationReason = ['Fitness criterion acheived <= ' ...
21 | num2str(gp.fitness.terminate_value)];
22 | end
23 | else
24 | if gp.results.best.fitness >= gp.fitness.terminate_value;
25 | gp.state.terminate = true;
26 | gp.state.terminationReason = ['Fitness criterion acheived >= ' ...
27 | num2str(gp.fitness.terminate_value)];
28 | end
29 | end
30 | end
31 |
32 | %run timeout
33 | if ~gp.state.terminate && (gp.state.runTimeElapsed >= gp.runcontrol.timeout)
34 | gp.state.terminate = true;
35 | gp.state.terminationReason = ['Timeout >= ' num2str(gp.runcontrol.timeout) ' sec'];
36 | end
--------------------------------------------------------------------------------
/core/gptic.m:
--------------------------------------------------------------------------------
1 | function gp = gptic(gp)
2 | %GPTIC Updates the running time of this run.
3 | %
4 | % GP = GPTIC(GP) begins the start timer for the current generation.
5 | %
6 | % Copyright (c) 2009-2015 Dominic Searson
7 | %
8 | % GPTIPS 2
9 | %
10 | % See also GPTOC
11 |
12 | gp.state.tic = tic;
--------------------------------------------------------------------------------
/core/gptoc.m:
--------------------------------------------------------------------------------
1 | function gp = gptoc(gp)
2 | %GPTOC Updates the running time of this run.
3 | %
4 | % GP = GPTOC(GP) updates the running time for the current run.
5 | %
6 | % Copyright (c) 2014 Dominic Searson
7 | %
8 | % GPTIPS 2
9 | %
10 | % See also GPTIC
11 |
12 | gp.state.runTimeElapsed = gp.state.runTimeElapsed + toc(gp.state.tic);
13 |
14 |
--------------------------------------------------------------------------------
/core/gptoolboxcheck.m:
--------------------------------------------------------------------------------
1 | function [symbolicResult, parallelResult, statsResult] = gptoolboxcheck
2 | %GPTOOLBOXCHECK Checks if certain toolboxes are installed and licensed.
3 | %
4 | % Checks for the Stats, Parallel Computing and Symbolic Math toolboxes.
5 | %
6 | % Copyright (c) 2009-2015 Dominic Searson
7 | %
8 | % GPTIPS 2
9 | %
10 | % See also GPINIT
11 |
12 | symbolicResult = @() (ismember('Symbolic Math Toolbox',gp_toolboxlist()) && license('test','symbolic_toolbox'));
13 | parallelResult = @() (ismember('Parallel Computing Toolbox',gp_toolboxlist()) && license('test','distrib_computing_toolbox'));
14 | statsResult = @() (( ismember('Statistics Toolbox',gp_toolboxlist()) || ismember('Statistics and Machine Learning Toolbox',gp_toolboxlist()) ) ...
15 | && license('test','statistics_toolbox'));
16 |
17 |
--------------------------------------------------------------------------------
/core/gptreestructure.m:
--------------------------------------------------------------------------------
1 | function treestructure = gptreestructure(gp,expr)
2 | %GPTREESTRUCTURE Create cell array containing tree structure connectivity and label information for an encoded tree expression.
3 | %
4 | % TREESTRUCTURE = GPTREESTRUCTURE(GP,EXPR) on the encoded expression
5 | % string EXPR - e.g. g(a(m(x2),x1)) - returns the cell array
6 | % TREESTRUCTURE, each element of which contains a MATLAB structure
7 | % describing tree connectivity information for a node in EXPR. The ith
8 | % structure in TREESTRUCTURE describes the ith node in EXPR. Node 1 is
9 | % the root node (other nodes are numbered in an arbitrary but consistent
10 | % way).
11 | %
12 | % Copyright (c) 2009-2015 Dominic Searson
13 | %
14 | % GPTIPS 2
15 | %
16 | % See also DRAWTREES, GPMODELREPORT, TREEGEN
17 |
18 | %get a list of all nodes in expression, left to right
19 | nodes = picknode(expr,6,gp);
20 | numnodes = length(nodes);
21 |
22 | %construct cell array for holding info for each node
23 | treestructure = cell(1,numnodes);
24 |
25 | %proceeding from left to right, get the name of the function node and the
26 | %id of its parent and its own node id
27 | for i=1:numnodes
28 |
29 | node = struct;
30 | shortName = expr(nodes(i));
31 | node.id = i;
32 |
33 | %lookup node in function table
34 | floc = strfind(gp.nodes.functions.afid,shortName);
35 |
36 | %if not empty then it is a function node
37 | if ~isempty(floc)
38 | node.name = lower(gp.nodes.functions.active_name_UC{floc});
39 | else %otherwise it is a terminal or constant so extract it
40 |
41 | [~,node.name] = extract(nodes(i),expr);
42 |
43 | %if terminal then replace with lookup name, if it exists
44 | if node.name(1) == 'x'
45 |
46 | newnameInd = str2double(strrep(node.name,'x',''));
47 | if newnameInd <= numel(gp.nodes.inputs.names) && ~isempty(gp.nodes.inputs.names{newnameInd})
48 | node.name = gp.nodes.inputs.names{newnameInd};
49 | end
50 |
51 | if gp.info.toolbox.symbolic()
52 | node.name = HTMLequation(gp,node.name);
53 | end
54 |
55 | %if ERC then strip brackets and reduce digits
56 | elseif node.name(1) == '['
57 | newname = strrep(node.name,'[','');
58 | newname = strrep(newname,']','');
59 |
60 | if length(newname) > 6 %clip long ERCs
61 | newname = newname(1:6);
62 | end
63 | node.name = newname;
64 | end
65 | end
66 |
67 | %if the id is 1 then it is root node and has no parent
68 | if node.id == 1
69 | node.parentId = [];
70 |
71 | else %otherwise find the parent id of this node
72 |
73 | %search left across string for opening '(' not matched by
74 | % a closing ')'. The node to the left of this is the parent node.
75 | substring = expr(1:nodes(i)-1);
76 | len = length(substring);
77 |
78 | numClosed = 0;
79 | for j=len:-1:1
80 | if substring(j) =='(' && numClosed == 0
81 | node.parentId = find(nodes==(j-1));
82 | break
83 | elseif substring(j) == ')'
84 | numClosed = numClosed + 1;
85 | elseif substring(j) == '('
86 | numClosed = numClosed - 1;
87 | end
88 |
89 | end
90 | end
91 | treestructure{1,i} = node;
92 | end
--------------------------------------------------------------------------------
/core/initbuild.m:
--------------------------------------------------------------------------------
1 | function gp=initbuild(gp)
2 | %INITBUILD Generate an initial population of GP individuals.
3 | %
4 | % GP = INITBUILD(GP) creates an initial population using the parameters
5 | % in the structure GP. Various changes to fields of GP are made.
6 | %
7 | % Remarks:
8 | %
9 | % Each individual in the population can contain 1 or more separate trees
10 | % (if more than 1 then each tree is referred to as a gene of the
11 | % individual). This is set by the user in the config file in the field
12 | % GP.GENES.MULTIGENE.
13 | %
14 | % Each individual is a cell array. You can use cell addressing to
15 | % retrieve the individual genes. E.g. GP.POP{3}{2} will return the 2nd
16 | % gene in the third individual.
17 | %
18 | % Trees are (by default) constructed using a probabilistic version of
19 | % Koza's ramped 1/2 and 1/2 method. E.g. if maximum tree depth is 5 and
20 | % population is 100 then, on average, 20 will be generated at each depth
21 | % (1/2 using 'grow' and 1/2 using 'full').
22 | %
23 | % Multiple copies of genes in an individual are disallowed when
24 | % individuals are created. There is, however, no such restriction imposed
25 | % on future generations.
26 | %
27 | % (c) Dominic Searson 2009-2015
28 | %
29 | % GPTIPS 2
30 | %
31 | % See also POPBUILD, TREEGEN
32 |
33 | % Extract temp variables from the gp structure
34 | popSize = gp.runcontrol.pop_size;
35 | maxNodes = gp.treedef.max_nodes;
36 | maxGenes = gp.genes.max_genes;
37 |
38 | %override any gene settings if using single gene gp
39 | if ~gp.genes.multigene
40 | maxGenes = 1;
41 | end
42 |
43 | %initialise vars
44 | gp.pop = cell(popSize,1);
45 | numGenes = 1;
46 |
47 | %building process
48 | for i=1:popSize %loop through population
49 |
50 | %randomly pick num of genes in individual
51 | if maxGenes > 1
52 | numGenes = ceil(rand*maxGenes);
53 | end
54 |
55 | individ = cell(1,(numGenes)); %construct empty individual
56 |
57 | for z = 1:numGenes %loop through genes in each individual and generate a tree for each
58 |
59 | %generate gene z and check that genes 1...z-1 are different
60 | geneLoopCounter = 0;
61 | while true
62 |
63 | geneLoopCounter = geneLoopCounter + 1;
64 |
65 | %generate a trial tree for gene z
66 | temp = treegen(gp);
67 | numnodes = getnumnodes(temp);
68 |
69 | if numnodes <= maxNodes
70 |
71 | copyDetected = false;
72 |
73 | if z > 1 %check previous genes for copies
74 |
75 | for j = 1:z-1
76 | if strcmp(temp,individ{1,j})
77 | copyDetected = true;
78 | break
79 | end
80 | end
81 |
82 | end
83 |
84 | if ~copyDetected
85 | break
86 | end
87 |
88 | end %max nodes check
89 |
90 | %display a warning if having difficulty building trees due to
91 | %constraints
92 | if ~gp.runcontrol.quiet && geneLoopCounter > 10
93 | disp('initbuild: iterating tree build loop because of uniqueness constraints.');
94 | end
95 |
96 | end %while loop
97 |
98 | individ{1,z} = temp;
99 |
100 | end %gene loop
101 |
102 | %write new individual to population cell array
103 | gp.pop{i,1} = individ;
104 | end
105 |
--------------------------------------------------------------------------------
/core/kogene.m:
--------------------------------------------------------------------------------
1 | function treestrs = kogene(treestrs, knockout)
2 | %KOGENE Knock out genes from a cell array of tree expressions.
3 | %
4 | % TREESTRS = KOGENE(TREESTRS, KNOCKOUT) removes genes from the cell array
5 | % TREESTRS with a boolean vector KNOCKOUT which must be the same length
6 | % as TREESTRS. GENES corresponding to a an entry of 'true' are removed.
7 | %
8 | % Copyright (c) 2009-2015 Dominic Searson
9 | %
10 | % GPTIPS 2
11 |
12 | if numel(knockout) ~= numel(treestrs)
13 | error('Knockout must be a vector the same length as the number of genes in the supplied individual');
14 | end
15 |
16 | knockout = logical(knockout);
17 |
18 | treestrs(knockout) = [];
19 |
20 | if isempty(treestrs)
21 | error('Cannot perform knockout. There are no genes left in the selected individual.');
22 | end
23 |
--------------------------------------------------------------------------------
/core/mergegp.m:
--------------------------------------------------------------------------------
1 | function gp1 = mergegp(gp1,gp2,multirun)
2 | %MERGEGP Merges two GP population structs into a new one.
3 | %
4 | % GPNEW = MERGEGP(GP1,GP2) takes the results of 2 GPTIPS runs in GP1 and
5 | % GP2 and merges the populations into a new population GPNEW.
6 | %
7 | % Remarks:
8 | %
9 | % The merged populations do not have to have exactly the same run
10 | % settings in terms of population size, number of generations, selection
11 | % method type etc. but they do need to have the same function sets (i.e.
12 | % mathematical building blocks), inputs and be based on the same data set
13 | % (i.e. same training, validation and test sets). If you randomise the
14 | % allocation of training, test and validation sets in your config file,
15 | % then the results (in terms of fitness, R^2 etc.) will not be
16 | % meaningful.
17 | %
18 | % Important:
19 | %
20 | % The merged structure GPNEW can be used like any other GP results
21 | % structure (e.g. in POPBROWSER, RUNTREE etc.). However, many of the
22 | % fields in GPNEW will not necessarily apply to all the data in the new
23 | % structure. For instance, gp.runcontrol.pop_size may be different in GP2
24 | % to GP1 and gp.results.history and gp.state will certainly be different.
25 | % Note that GPNEW will always inherit these fields from GP1.
26 | %
27 | % Copyright (c) 2009-2015 Dominic Searson
28 | %
29 | % GPTIPS 2
30 | %
31 | % See also GPMODELFILTER
32 |
33 | disp('Merging GP populations ...');
34 |
35 | if nargin < 3
36 | multirun = false;
37 | end
38 |
39 | %do some tests to see if the GP structures are ostensibly OK to be merged.
40 | if ~isequal(gp1.nodes.inputs,gp2.nodes.inputs)
41 | error('GP structures cannot be merged because the input nodes are not configured identically.');
42 | end
43 |
44 | if ~isequal(gp1.nodes.functions,gp2.nodes.functions)
45 | error('GP structures cannot be merged because the function nodes are not configured identically.');
46 | end
47 |
48 | %update the appropriate fields of gp1.
49 | popsize1 = gp1.runcontrol.pop_size;
50 | popsize2 = gp2.runcontrol.pop_size;
51 | inds = popsize1+1:(popsize1+popsize2);
52 |
53 | gp1.pop(inds) = gp2.pop(:);
54 | gp1.fitness.returnvalues(inds) = gp2.fitness.returnvalues(:);
55 | gp1.fitness.complexity(inds) = gp2.fitness.complexity(:);
56 | gp1.fitness.nodecount(inds) = gp2.fitness.nodecount(:);
57 | gp1.fitness.values(inds) = gp2.fitness.values(:);
58 | gp1.runcontrol.pop_size = popsize1 + popsize2;
59 |
60 | %check for existence of r2train r2val and r2test fields before merging
61 | if isfield(gp1.fitness,'r2train') && isfield(gp2.fitness,'r2train')
62 | gp1.fitness.r2train(inds) = gp2.fitness.r2train(:);
63 | end
64 |
65 | if isfield(gp1.fitness,'r2val') && isfield(gp2.fitness,'r2val')
66 | gp1.fitness.r2val(inds) = gp2.fitness.r2val(:);
67 | end
68 |
69 | if isfield(gp1.fitness,'r2test') && isfield(gp2.fitness,'r2test')
70 | gp1.fitness.r2test(inds) = gp2.fitness.r2test(:);
71 | end
72 |
73 | %check to see if 'best' on validation data set is present
74 | if isfield(gp1.results,'valbest') && isfield(gp2.results,'valbest')
75 | valPresent = true;
76 | else
77 | valPresent = false;
78 | end
79 |
80 | %check to see if 'testbest' on test data set is present
81 | if isfield(gp1.results,'testbest') && isfield(gp2.results,'testbest')
82 | testPresent = true;
83 | else
84 | testPresent = false;
85 | end
86 |
87 | %find the best and valbest of gp1 and gp2 and get complexity values.
88 | if gp1.fitness.complexityMeasure
89 | traincomp1 = gp1.results.best.complexity;
90 | traincomp2 = gp2.results.best.complexity;
91 |
92 | if valPresent
93 | valcomp1 = gp1.results.valbest.complexity;
94 | valcomp2 = gp2.results.valbest.complexity;
95 | end
96 |
97 | if testPresent
98 | testcomp1 = gp1.results.testbest.complexity;
99 | testcomp2 = gp2.results.testbest.complexity;
100 | end
101 |
102 | else
103 | traincomp1 = gp1.results.best.nodecount;
104 | traincomp2 = gp2.results.best.nodecount;
105 |
106 | if valPresent
107 | valcomp1 = gp1.results.valbest.nodecount;
108 | valcomp2 = gp2.results.valbest.nodecount;
109 | end
110 |
111 | if testPresent
112 | testcomp1 = gp1.results.testbest.nodecount;
113 | testcomp2 = gp2.results.testbest.nodecount;
114 | end
115 | end
116 |
117 | if gp1.fitness.minimisation
118 | if gp2.results.best.fitness < gp1.results.best.fitness || ...
119 | ((gp2.results.best.fitness == gp1.results.best.fitness) && traincomp2 < traincomp1)
120 | gp1.results.best = gp2.results.best;
121 | end
122 |
123 | if valPresent
124 | if gp2.results.valbest.valfitness < gp1.results.valbest.valfitness || ...
125 | ((gp2.results.valbest.valfitness == gp1.results.valbest.valfitness) && valcomp2 < valcomp1)
126 | gp1.results.valbest = gp2.results.valbest;
127 | end
128 | end
129 |
130 | if testPresent
131 | if gp2.results.testbest.testfitness < gp1.results.testbest.testfitness || ...
132 | ((gp2.results.testbest.testfitness == gp1.results.testbest.testfitness) && testcomp2 < testcomp1)
133 | gp1.results.testbest = gp2.results.testbest;
134 | end
135 | end
136 |
137 | else %maximisation
138 | if gp2.results.best.fitness > gp1.results.best.fitness || ...
139 | ((gp2.results.best.fitness == gp1.results.best.fitness) && traincomp2 < traincomp1)
140 | gp1.results.best = gp2.results.best;
141 | end
142 |
143 | if valPresent
144 | if gp2.results.valbest.valfitness > gp1.results.valbest.valfitness || ...
145 | ((gp2.results.valbest.valfitness == gp1.results.valbest.valfitness) && valcomp2 < valcomp1)
146 | gp1.results.valbest = gp2.results.valbest;
147 | end
148 | end
149 |
150 | if gp2.results.testbest.testfitness > gp1.results.testbest.testfitness || ...
151 | ((gp2.results.testbest.testfitness == gp1.results.testbest.testfitness) && testcomp2 < testcomp1)
152 | gp1.results.testbest = gp2.results.testbest;
153 | end
154 | end
155 |
156 | %merge info
157 | if ~gp1.info.merged && ~gp2.info.merged
158 | gp1.info.mergedPopSizes = [popsize1 popsize2];
159 | elseif gp1.info.merged && ~gp2.info.merged
160 | gp1.info.mergedPopSizes = [gp1.info.mergedPopSizes popsize2];
161 | elseif ~gp1.info.merged && gp2.info.merged
162 | gp1.info.mergedPopSizes = [popsize1 gp2.info.mergedPopSizes];
163 | elseif gp1.info.merged && gp2.info.merged
164 | gp1.info.mergedPopSizes = [gp1.info.mergedPopSizes gp2.info.mergedPopSizes];
165 | end
166 |
167 | %counter for number of times gp1 has experienced a merge
168 | gp1.info.merged = gp1.info.merged + 1;
169 | gp1.info.duplicatesRemoved = false;
170 |
171 | %set stop time to time taken for all runs in multirun
172 | if multirun
173 | gp1.info.stopTime = gp2.info.stopTime;
174 | end
--------------------------------------------------------------------------------
/core/mutate.m:
--------------------------------------------------------------------------------
1 | function exprMut = mutate(expr,gp)
2 | %MUTATE Mutate an encoded symbolic tree expression.
3 | %
4 | % EXPRMUT = MUTATE(EXPR,GP) mutates the encoded symbolic expression EXPR
5 | % using the parameters in GP to produce expression EXPRXMUT.
6 | %
7 | % Remarks:
8 | %
9 | % This function uses the GP.OPERATORS.MUTATION.MUTATE_PAR field to
10 | % determine probabilistically which type of mutation event to use.
11 | %
12 | % The types are:
13 | %
14 | % mutate_type=1 (Default) Ordinary Koza style subtree mutation
15 | % mutate_type=2 Mutate a non-constant terminal to another non-constant
16 | % terminal.
17 | % mutate_type=3 Constant perturbation. Generate small number using
18 | % Gaussian and add to existing constant.
19 | % mutate_type=4 Zero constant. Sets the selected constant to zero.
20 | % mutate_type=5 Randomise constant. Substitutes a random new value for
21 | % the selected constant.
22 | % mutate_type=6 Unity constant. Sets the selected constant to 1.
23 | %
24 | % The probabilities of each mutate_type being selected are stored in the
25 | % GP.OPERATORS.MUTATION.MUTATE_PAR field which must be a row vector of
26 | % length 6.
27 | %
28 | % I.e.
29 | %
30 | % GP.OPERATORS.MUTATION.MUTATE_PAR = [p_1 p_2 p_3 p_4 p_5 p_6]
31 | %
32 | % where p_1 = probability of mutate_type 1 being used
33 | % p_2= " " " 2 " "
34 | %
35 | % and so on.
36 | %
37 | % Example:
38 | %
39 | % If GP.OPERATORS.MUTATION.MUTATE_PAR = [0.5 0.25 0.25 0 0] then approx.
40 | % 1/2 of mutation events will be ordinary subtree mutations, 1/4 will be
41 | % Gaussian perturbations of a constant and 1/4 will be setting a constant
42 | % to zero.
43 | %
44 | % Note:
45 | %
46 | % If a mutation of a certain type cannot be performed (e.g. there are no
47 | % constants in the tree) then a default subtree mutation is performed
48 | % instead.
49 | %
50 | % Copyright (c) 2009-2015 Dominic Searson
51 | %
52 | % GPTIPS 2
53 | %
54 | % See also CROSSOVER
55 |
56 | %pick a mutation event type based on the weights in mutate_par
57 | rnum = rand;
58 |
59 | while true %loop until a mutation occurs
60 |
61 | %ordinary subtree mutation
62 | if rnum <= gp.operators.mutation.cumsum_mutate_par(1)
63 |
64 | %first get the string position of the node
65 | position = picknode(expr,0,gp);
66 |
67 | %remove the logical subtree for the selected node
68 | %and return the main tree with '$' replacing the extracted subtree
69 | maintree = extract(position,expr);
70 |
71 | %generate a new tree to replace the old
72 | %Pick a depth between 1 and gp.treedef.max_mutate_depth;
73 | depth = ceil(rand*gp.treedef.max_mutate_depth);
74 | newtree = treegen(gp,depth);
75 |
76 | %replace the '$' with the new tree
77 | exprMut = strrep(maintree,'$',newtree);
78 |
79 | break;
80 |
81 | %substitute terminals
82 | elseif rnum <= gp.operators.mutation.cumsum_mutate_par(2)
83 |
84 | position = picknode(expr,4,gp);
85 |
86 | if position == -1
87 | rnum = 0;
88 | continue;
89 | else
90 | maintree = extract(position,expr); %extract the constant
91 | exprMut = strrep(maintree,'$',['x' ...
92 | sprintf('%d',ceil(rand*gp.nodes.inputs.num_inp))]); %replace it with random terminal
93 | break;
94 | end
95 |
96 | %constant Gaussian perturbation
97 | elseif rnum <= gp.operators.mutation.cumsum_mutate_par(3)
98 |
99 | position = picknode(expr,3,gp);
100 |
101 | if position == -1
102 | rnum = 0;
103 | continue;
104 | else
105 |
106 | [maintree,subtree] = extract(position,expr);
107 |
108 | %process constant
109 | oldconst = sscanf(subtree(2:end-1),'%f');
110 | newconst = oldconst + (randn*gp.operators.mutation.gaussian.std_dev);
111 |
112 | %use appropriate string formatting for new constant
113 | if newconst == fix(newconst)
114 | newtree = ['[' sprintf('%.0f',newconst) ']'];
115 | else
116 | newtree = ['[' sprintf(gp.nodes.const.format,newconst) ']'];
117 | end
118 |
119 | %replace '$' with the new tree
120 | exprMut = strrep(maintree,'$',newtree);
121 | break
122 |
123 | end
124 |
125 | %make constant zero
126 | elseif rnum <= gp.operators.mutation.cumsum_mutate_par(4)
127 |
128 | position = picknode(expr,3,gp);
129 |
130 | if position == -1
131 | rnum = 0;
132 | continue;
133 | else
134 | maintree = extract(position,expr); %extract the constant
135 | exprMut = strrep(maintree,'$','[0]'); %replace it with zero
136 | break
137 | end
138 |
139 | %randomise constant
140 | elseif rnum <= gp.operators.mutation.cumsum_mutate_par(5)
141 | position = picknode(expr,3,gp);
142 |
143 | if position == -1
144 | rnum = 0;
145 | continue;
146 | else
147 | maintree = extract(position,expr); %extract the constant
148 |
149 | %generate a constant in the range
150 | const = rand*(gp.nodes.const.range(2) - gp.nodes.const.range(1)) + gp.nodes.const.range(1);
151 |
152 | %convert the constant to square bracketed string form:
153 | if const == fix(const)
154 | arg = ['[' sprintf('%.0f',const) ']'];
155 | else
156 | arg = ['[' sprintf(gp.nodes.const.format,const) ']'];
157 | end
158 |
159 | exprMut = strrep(maintree,'$',arg); %insert new constant
160 | break
161 | end
162 |
163 | %set a constant to 1
164 | elseif rnum <= gp.operators.mutation.cumsum_mutate_par(6)
165 |
166 | position = picknode(expr,3,gp);
167 |
168 | if position == -1
169 | rnum = 0;
170 | continue;
171 | else
172 | maintree = extract(position,expr); %extract the constant
173 | exprMut = strrep(maintree,'$','[1]'); %replace it with one
174 | break
175 | end
176 | end
177 | end
--------------------------------------------------------------------------------
/core/ndfsort_rank1.m:
--------------------------------------------------------------------------------
1 | function xrank = ndfsort_rank1(x)
2 | %NDFSORT_RANK1 Fast non dominated sorting algorithm for 2 objectives only - returns only rank 1 solutions.
3 | %
4 | % XRANK = NDFSORT_RANK1(X) performed on a (N x 2) matrix X where N is the
5 | % number of solutions and there are 2 objectives separates out the
6 | % solutions into different levels (ranks) of non-domination. The returned
7 | % vector XRANK contains only rank 1 solutions. Solutions with rank 1 come
8 | % from the non-dominated front of X.
9 | %
10 | % Remarks:
11 | %
12 | % Based on the sorting method described on page 184 of: "A fast and
13 | % elitist multiobjective genetic algorithm: NSGA-II" by Kalyanmoy Deb,
14 | % Amrit Pratap, Sameer Agarwal, T. Meyarivan. IEEE Transactions on
15 | % Evolutionary Computation Vol. 6, No. 2, April 2002, pp. 182-197.
16 | %
17 | % (c) Dominic Searson 2009-2015
18 | %
19 | % GPTIPS 2
20 | %
21 | % See also SELECTION, GPMODELFILTER, PARETOREPORT, POPBROWSER
22 |
23 | P = size(x,1);
24 | np = zeros(P,1); %current domination level
25 | xrank = np; %rank vector
26 |
27 | for p=1:P
28 |
29 | for q=1:P
30 |
31 | if q ~= p
32 | if doesx1Domx2(x(q,:),x(p,:))
33 | np(p) = np(p) + 1;
34 | end
35 | end
36 |
37 | end
38 |
39 | if np(p) == 0
40 | xrank(p) = 1;
41 | end
42 |
43 | end
44 |
45 | function result = doesx1Domx2(x1,x2)
46 | %Returns true if first solution dominates second, false otherwise. Here,
47 | %dominance means if one solution beats the other on 1 criterion and is no
48 | %worse on the other criterion (not strict dominance).
49 |
50 | result = false;
51 |
52 | if (x1(1) < x2(1) && x1(2) <= x2(2)) || (x1(1) <= x2(1) && x1(2) < x2(2))
53 | result = true;
54 | end
55 |
--------------------------------------------------------------------------------
/core/parseadf.m:
--------------------------------------------------------------------------------
1 | function [fcn, newexpr, args, seed] = parseadf(expr, opt)
2 | %PARSEADF Parse the ADF expression and return anonymous fcn, args, and seed
3 | %
4 | % Calling sequence:
5 | % [FCN, NEWEXPR, ARGS, SEED] = PARSEADF(EXPR, OPT)
6 | %
7 | % where FCN contains the string for the anonymous function or a handle,
8 | % depending on the option, NEWEXPR contains the new expression string with
9 | % generated arguments and ARGS contains a string containing these arguments
10 | % encolsed in ( ). SEED contains the gene seed for generating the initial
11 | % population. The input argument OPT is a string that contains either 's'
12 | % or 'f'. In case of 's' the function outputs FCN as string (default), and
13 | % in case of 'f' an anonymous function handle is returned.
14 | %
15 | % When designing your expressions, use any valid functions with the
16 | % following symbols representing input arguments:
17 | %
18 | % $n - Free connection point
19 | % #m - Preset random constant (PRC) point that later turns into an ERC
20 | % ?k - An ephemeral random constant (ERC) point
21 | % where n, m, and k are indices. You can thus have the same terminal
22 | % applied as several arguments in the function. The use of indices is
23 | % mandatory. All the input arguments are ultimately replaced by x1,x2,x3,..
24 | % and the corresponding expression (or anonymous function handle) is
25 | % returned.
26 | %
27 | % For example, input argument 'times($1,times(#1,plus($1,?1)))'
28 | %
29 | % Returns result '@(x1,x2,x3) times(x2,times(x1,plus(x2,x3)))'
30 | % with seed (#,$,?), the latter used during initial tree generation.
31 | %
32 | % Note that the function must accept at least one argument. Depending on
33 | % the context you can include constants into function calls. In the
34 | % algorithm, the insides of the ADF are never revealed, so it is treated as
35 | % just another function, with the big difference that you can specify
36 | % locations to which terminal nodes (PRCs and ERCs) are attached with 100%
37 | % probability during the initial creation of the population.
38 | %
39 | % (c) Aleksei Tepljakov 2017
40 | %
41 | % GPTIPS2F
42 |
43 | % Check input
44 | if ~isa(expr,'char')
45 | error('Expression must be a character array describing an ADF function.');
46 | end
47 |
48 | if nargin < 2
49 | opt = 's';
50 | end
51 |
52 | % Parse the arguments of the array
53 | argname = 'x'; % Default argument is "x"
54 | argind = 1; % The index of the argument
55 | seed = ''; % The seed is used when the initial population is built
56 | newexpr = expr; % Updated expression with proper arguments inserted
57 |
58 | % Parse the expression using regex
59 | regex = '[\$\#\?][0-9]+';
60 | params = unique(regexp(expr, regex, 'match'));
61 |
62 | for k=1:length(params)
63 | % Assign x1, x2, x3, etc. to corresponding arguments
64 | oldarg = params{k};
65 | newexpr = strrep(newexpr, oldarg, [argname num2str(argind)]);
66 | argind = argind + 1;
67 |
68 | % Construct the seed
69 | seed = [seed oldarg(1) ','];
70 | end
71 |
72 | % Are there any arguments?
73 | if argind == 1
74 | error('The ADF must have at least one input argument.');
75 | end
76 |
77 | % Amend the seed
78 | seed = ['(' seed(1:end-1) ')'];
79 |
80 | % Create the anonymous function
81 | args = '';
82 | for k=1:(argind-1)
83 | args = [args argname num2str(k) ','];
84 | end
85 |
86 | % Amend the arguments and construct the expression
87 | args = ['(' args(1:end-1) ')'];
88 | myfun = ['@' args ' ' newexpr];
89 |
90 | % Assign correct output
91 | switch lower(opt)
92 | case 's'
93 | fcn = myfun;
94 | case 'f'
95 | fcn = eval(myfun);
96 | otherwise
97 | error('Unknown option given in second argument.');
98 | end
99 |
100 | end
101 |
102 |
--------------------------------------------------------------------------------
/core/picknode.m:
--------------------------------------------------------------------------------
1 | function position = picknode(expr,nodetype,gp)
2 | %PICKNODE Select a node (or nodes) of specified type from an encoded GP expression and return its position.
3 | %
4 | % POSITION = PICKNODE(EXPR,NODETYPE,GP) returns the POSITION where EXPR
5 | % is the symbolic expression, NODETYPE is the specified node type and GP
6 | % is the GPTIPS data struct.
7 | %
8 | % Remarks:
9 | %
10 | % For NODETYPEs 0, 3 or 4 this function returns the string POSITION of
11 | % the node or -1 if no appropriate node can be found. If NODETYPE is 5 or
12 | % 6 then POSITION is a vector of sorted node indices. If NODETYPE = 7
13 | % then POSITION is an unsorted vector of node indices.
14 | %
15 | % Set NODETYPE argument as follows:
16 | %
17 | % 0 = any node
18 | % 1 = unused
19 | % 2 = unused
20 | % 3 = constant (ERC) selection only
21 | % 4 = input selection only
22 | % 5 = indices of nodes with a builtin MATLAB infix representation,
23 | % sorted left to right (offline use)
24 | % 6 = indices of all nodes, sorted left to right (offline use)
25 | % 7 = gets indices of cube and square functions (offline use)
26 | %
27 | % Copyright (c) 2009-2015 Dominic Searson
28 | %
29 | % GPTIPS 2
30 | %
31 | % See also EXTRACT, MUTATE, CROSSOVER
32 |
33 | %mask negative constants
34 | x = strrep(expr,'[-','["');
35 |
36 | if nodetype == 0 %pick any node
37 |
38 | %get indices of all function node, constant and input node locations
39 | %NB char(97) ='a' and char(122) = 'z' and double('[') = 91
40 | xd = double(x);
41 | ind = find((xd <= 122 & xd >= 97) | xd==91);
42 |
43 | elseif nodetype == 3 %just constants
44 |
45 | ind = strfind(x,'[');
46 |
47 | elseif nodetype == 4 %just inputs
48 |
49 | ind = strfind(x,'x');
50 |
51 | elseif nodetype == 5 %special option, only selects plus, minus, times,rdivide,power (intended for offline use)
52 |
53 | idcount = 0; ind = [];
54 | for i=1:numel(gp.nodes.functions.name)
55 |
56 | if gp.nodes.functions.active(i)
57 | idcount = idcount+1;
58 | func = gp.nodes.functions.name{i};
59 |
60 | if strcmpi(func,'times') || strcmpi(func,'plus') ...
61 | || strcmpi(func,'minus') || strcmpi(func,'rdivide') || strcmpi(func,'power')
62 | id = gp.nodes.functions.afid(idcount);
63 | %look for id in string and concatenate to indices vector
64 | ind = [ind strfind(x,id)];
65 | end
66 |
67 | end
68 | end
69 | position = sort(ind);
70 | return
71 |
72 | elseif nodetype == 6 %offline use (all nodes, sorted left to right)
73 |
74 | %get indices of function and input locations
75 | temp_ind = find(double(x)<=122 & double(x)>=97);
76 |
77 | %locations of '@','%' and '&' which may have been added by
78 | %gpreformat.m
79 |
80 | %special case of post-run added power node
81 | temp_ind = [temp_ind strfind(x,'@')];
82 |
83 | %special case of post-run added times node
84 | temp_ind = [temp_ind strfind(x,'&')];
85 |
86 | %special case of post-run added exp node
87 | temp_ind = [temp_ind strfind(x,'%')];
88 |
89 | %add indices of constant locations
90 | ind = [temp_ind strfind(x,'[')];
91 |
92 | position = sort(ind);
93 | return
94 |
95 | elseif nodetype == 7 %gets all square and cube node indices
96 |
97 | idcount = 0;ind = [];
98 | for i=1:numel(gp.nodes.functions.name)
99 |
100 | if gp.nodes.functions.active(i)
101 | idcount = idcount+1;
102 | func = gp.nodes.functions.name{i};
103 |
104 | if strcmpi(func,'square') || strcmpi(func,'cube')
105 | id = gp.nodes.functions.afid(idcount);
106 | %look for id in string and concatenate to indices vector
107 | ind = [ind strfind(x,id)];
108 | end
109 |
110 | end
111 | end
112 | if isempty(ind)
113 | position = -1;
114 | else
115 | position = ind;
116 | end
117 | return;
118 |
119 | end
120 |
121 | %count legal nodes
122 | num_nodes = numel(ind);
123 |
124 | %if none legal, return -1
125 | if ~num_nodes
126 | position = -1;
127 | return;
128 | else
129 | %select a random node from those indexed
130 | fun_choice = ceil(rand*num_nodes);
131 | position = ind(fun_choice);
132 | end
--------------------------------------------------------------------------------
/core/pref2inf.m:
--------------------------------------------------------------------------------
1 | function [sendup,status] = pref2inf(expr,gp)
2 | %PREF2INF Recursively extract arguments from a prefix expression and convert to infix where possible.
3 | %
4 | % [SENDUP,STATUS] = PREF2INF(EXPR,GP)
5 | %
6 | % Copyright (c) 2009-2015 Dominic Searson
7 | %
8 | % GPTIPS 2
9 | %
10 | % See also PICKNODE, EXTRACT, GPREFORMAT, TREE2EVALSTR
11 |
12 | %get indices of all nodes in current expression
13 | ind = picknode(expr,6,gp);
14 | if isempty(ind)
15 | sendup = expr; %exits if current expression has no further extractable args
16 | status = 0;
17 | return
18 | end
19 |
20 | %get first node
21 | ind = ind(1);
22 |
23 | %get number of arguments of node
24 | try
25 | args = gp.nodes.functions.arity_argt0(strfind(gp.nodes.functions.afid,expr(ind)));
26 | if isempty(args)
27 | % if has zero arity then exit
28 | sendup = expr;
29 | status = 0;
30 | return;
31 | end
32 | catch
33 | %also exit if error
34 | sendup = expr;
35 | status = 0;
36 | return;
37 | end
38 |
39 | if args == 1
40 |
41 | %get subtree rooted in this node
42 | [~,subtree] = extract(ind,expr);
43 | node = subtree(1);
44 |
45 | %get index of 2nd node in 'sub_tree' (i.e. first logical argument)
46 | keynodeInd = picknode(subtree,6,gp);
47 | keynodeInd = keynodeInd(2);
48 |
49 | %extract the 1st argument expression
50 | [~,arg1] = extract(keynodeInd,subtree);
51 |
52 | [rec1,rec1_status] = pref2inf(arg1,gp);
53 |
54 | if rec1_status==1
55 | arg1 = rec1;
56 | end
57 |
58 | sendup = [node '(' arg1 ')'];
59 | status = 1;
60 |
61 | elseif args > 2
62 |
63 | %get subtree rooted in this node
64 | [~,subtree] = extract(ind,expr);
65 | node = subtree(1);
66 |
67 | %get index of 2nd node in 'sub_tree' (i.e. first logical argument)
68 | keynodeInd = picknode(subtree,6,gp); %
69 | keynodeInd = keynodeInd(2);
70 |
71 | %extract the 1st argument expression
72 | [remainder,arg1] = extract(keynodeInd,subtree);
73 |
74 | %extract the second argument expression from the remainder
75 | %find the 1st node after $ in the remainder as this will be the
76 | %keynode of the 2nd argument we wish to extract
77 | keynodeInd = picknode(remainder,6,gp);
78 | tokenInd = strfind(remainder,'$');
79 |
80 | keynodeLoc = find(keynodeInd > tokenInd);
81 | keynodeInd = keynodeInd(keynodeLoc);
82 | keynode_ind_1 = keynodeInd(1);
83 | [remainder2,arg2] = extract(keynode_ind_1,remainder);
84 |
85 | %extract the third argument expression from the remainder
86 | %find the 1st node after $ in remainder2 as this will be the keynode of
87 | %the 3nd argument we wish to extract
88 | keynodeInd = picknode(remainder2,6,gp);
89 | tokenInd = strfind(remainder2,'$');
90 |
91 | keynodeLoc = find(keynodeInd > max(tokenInd));
92 | keynodeInd = keynodeInd(keynodeLoc);
93 | keynode_ind_1 = keynodeInd(1);
94 | [remainder3,arg3] = extract(keynode_ind_1,remainder2);
95 |
96 | [rec1,rec1_status] = pref2inf(arg1,gp);
97 | [rec2,rec2_status] = pref2inf(arg2,gp);
98 | [rec3,rec3_status] = pref2inf(arg3,gp);
99 |
100 | if rec1_status==1
101 | arg1 = rec1;
102 | end
103 |
104 | if rec2_status==1
105 | arg2 = rec2;
106 | end
107 |
108 | if rec3_status==1
109 | arg3 = rec3;
110 | end
111 |
112 | sendup = [node '(' arg1 ',' arg2 ',' arg3 ')'];
113 | status = 1;
114 |
115 | if args > 3
116 |
117 | %extract the fourth argument expression from the remainder
118 | %find the 1st node after $ in remainder3 as this will be the
119 | %keynode of the 4nd argument we wish to extract
120 | keynodeInd = picknode(remainder3,6,gp);
121 | tokenInd = strfind(remainder3,'$');
122 |
123 | keynodeLoc = find(keynodeInd > max(tokenInd));
124 | keynodeInd = keynodeInd(keynodeLoc);
125 | keynode_ind_1 = keynodeInd(1);
126 | [~,arg4] = extract(keynode_ind_1,remainder3);
127 | [rec4,rec4_status] = pref2inf(arg4,gp);
128 |
129 | if rec4_status==1
130 | arg4 = rec4;
131 | end
132 |
133 | sendup = [node '(' arg1 ',' arg2 ',' arg3 ',' arg4 ')'];
134 | status = 1;
135 | end
136 |
137 | else %must have exactly 2 args
138 |
139 | ind = picknode(expr,6,gp);
140 | %get subtree rooted in this node
141 | [maintree,subtree] = extract(ind,expr);
142 | node = subtree(1);
143 |
144 | %get index of 2nd node in 'sub_tree' (i.e. first logical argument)
145 | keynodeInd = picknode(subtree,6,gp); %
146 | keynodeInd = keynodeInd(2);
147 |
148 | %extract the 1st argument expression
149 | [remainder,arg1] = extract(keynodeInd,subtree);
150 |
151 | %extract the second argument expression from the remainder
152 | %find the 1st node after $ in the remainder as this will be the
153 | %keynode of the 2nd argument we wish to extract
154 | keynodeInd = picknode(remainder,6,gp);
155 | tokenInd = strfind(remainder,'$');
156 |
157 | keynodeLoc = find(keynodeInd>tokenInd);
158 | keynodeInd = keynodeInd(keynodeLoc);
159 | keynode_ind_1 = keynodeInd(1);
160 | [~,arg2] = extract(keynode_ind_1,remainder);
161 |
162 | %process arguments of these arguments if any exist
163 | [rec1,rec1_status] = pref2inf(arg1,gp);
164 | [rec2,rec2_status] = pref2inf(arg2,gp);
165 |
166 | if rec1_status==1
167 | arg1 = rec1;
168 | end
169 |
170 | if rec2_status==1
171 | arg2 = rec2;
172 | end
173 |
174 | %If the node is of infix type (for Matlab symbolic purposes)
175 | % then send up the results differently
176 | afid_ind = strfind(gp.nodes.functions.afid, node);
177 | nodename = gp.nodes.functions.active_name_UC{afid_ind};
178 |
179 | if strcmpi(nodename,'times') || strcmpi(nodename,'minus') || ...
180 | strcmpi(nodename,'plus') || strcmpi(nodename,'rdivide') || strcmpi(nodename,'power')
181 |
182 | sendup = ['(' arg1 ')' node '(' arg2 ')'];
183 | else
184 | sendup = [node '(' arg1 ',' arg2 ')'];
185 | end
186 | sendup = strrep(maintree,'$',sendup);
187 |
188 | status = 1; %i.e. ok
189 | end
--------------------------------------------------------------------------------
/core/regressionErrorCharacteristic.m:
--------------------------------------------------------------------------------
1 | function [xdata,ydata,xdataref,ydataref] = regressionErrorCharacteristic(y,ypred)
2 | %REGRESSIONERRORCHARACTERISTIC Generates REC curve data using actual and predicted output vectors.
3 | %
4 | % [XDATA,YDATA] = REGRESSIONERRORCHARACTERISTIC(Y,YPRED) generates REC
5 | % (regression error characteristic) curve data for the model prediction
6 | % YPRED of the actual response Y.
7 | %
8 | % [XDATA,YDATA,XDATAREF,YDATAREF] = REGRESSIONERRORCHARACTERISTIC(Y,YPRED)
9 | % generates REC curve data as well as data for naive reference prediction
10 | % (in this case the mean of the Y data is used as a naive prediction of
11 | % any Y). This is similar to the ZeroR model in Weka, e,g, see
12 | % http://weka.wikispaces.com/ZeroR
13 | %
14 | % Remarks:
15 | %
16 | % Based on the method outlined in "Regression Error Characteristic
17 | % Curves", Jinbo Bi & Kristin P. Bennett, Proceedings of the Twentieth
18 | % International Conference on Machine Learning (ICML-2003), Washington
19 | % DC, 2003.
20 | %
21 | % Note:
22 | %
23 | % This function only generates REC data, to generate a graph use
24 | % COMPAREMODELSREC
25 | %
26 | % Copyright (c) 2009-2015 Dominic Searson
27 | %
28 | % GPTIPS 2
29 | %
30 | % See also COMPAREMODELSREC
31 |
32 | %first compute loss function abs(y-ypred)
33 | err = abs(y - ypred);
34 | m = numel(err);
35 | eprev = 0;
36 | correct = 0;
37 |
38 | %count of plottable points on curve
39 | datapoints = 1;
40 |
41 | %sort errors
42 | errs= sort(err);
43 |
44 | %generate plot data for model prediction
45 | %(x - abs. error, y - fraction of sample accurate within error)
46 | for i=1:m
47 | if errs(i) > eprev
48 | xdata(datapoints,1)= eprev;
49 | ydata(datapoints,1) = correct/m;
50 | datapoints = datapoints + 1;
51 | eprev = errs(i);
52 | end
53 | correct = correct + 1;
54 | end
55 |
56 | xdata(datapoints,1) = errs(m);
57 | ydata(datapoints,1) = correct/m;
58 |
59 | %next do the same for reference model
60 | err = abs(y-mean(y));
61 | errs= sort(err);
62 | eprev = 0;
63 | correct = 0;
64 | datapoints = 1;
65 |
66 | %generate plot data for reference prediction
67 | %(x - abs. error, y - fraction of sample accurate within error)
68 | for i=1:m
69 | if errs(i) > eprev
70 | xdataref(datapoints,1) = eprev;
71 | ydataref(datapoints,1) = correct/m;
72 | datapoints = datapoints + 1;
73 | eprev = errs(i);
74 | end
75 | correct = correct + 1;
76 | end
77 |
78 | xdataref(datapoints,1) = errs(m);
79 | ydataref(datapoints,1) = correct/m;
--------------------------------------------------------------------------------
/core/rungp.m:
--------------------------------------------------------------------------------
1 | function gpout = rungp(configFile)
2 | %RUNGP Runs GPTIPS 2 using the specified configuration file.
3 | %
4 | % GP = RUNGP(@CONFIGFILE) runs GPTIPS using the user parameters contained
5 | % in the configuration file CONFIGFILE.M and returns the results in the
6 | % GP data structure. Post run analysis commands may then be run on this
7 | % structure, e.g. POPBROWSER(GP), SUMMARY(GP), RUNTREE(GP,'best') etc.
8 | %
9 | % Demos:
10 | %
11 | % Use GPDEMO1 at the commmand line for a demonstration of naive symbolic
12 | % regression (not multigene).
13 | %
14 | % For demos involving multigene symbolic regression see GPDEMO2, GPDEMO3
15 | % and GPDEMO4.
16 | %
17 | % Addionally, example config files for muligene regression on data
18 | % generated by several non-linear mathematical functions are
19 | % CUBIC_CONFIG, RIPPLE_CONFIG, UBALL_CONFIG and SALUSTOWICZ1D_CONFIG.
20 | %
21 | % Remarks:
22 | %
23 | % For further information see the GPTIPS websites at:
24 | %
25 | % https://sites.google.com/site/gptips4matlab/
26 | %
27 | % Also, see the following paper for a summary of GPTIPS 2 capabilities:
28 | %
29 | % Searson DP, GPTIPS 2: an open-source software platform for symbolic
30 | % data mining, Chapter 22 in Gandomi, AH, Alavi, AH, Ryan, C (eds).
31 | % Springer Handbook of Genetic Programming Applications, Springer, 2015.
32 | % (author proof freely available at http://arxiv.org/abs/1412.4690)
33 | %
34 | % ----------------------------------------------------------------------
35 | %
36 | % This program is free software: you can redistribute it and/or modify it
37 | % under the terms of the GNU General Public License as published by the
38 | % Free Software Foundation, either version 3 of the License, or (at your
39 | % option) any later version.
40 | %
41 | % This program is distributed in the hope that it will be useful, but
42 | % WITHOUT ANY WARRANTY; without even the implied warranty of
43 | % MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
44 | % General Public License for more details.
45 | %
46 | % You should have received a copy of the GNU General Public License along
47 | % with this program. If not, see .
48 | %
49 | % ----------------------------------------------------------------------
50 | %
51 | % Copyright (c) 2009-2015 Dominic Searson
52 | %
53 | % GPTIPS 2
54 | %
55 | % See also GPDEMO1, GPDEMO2, GPDEMO3, GPDEMO4, CUBIC_CONFIG,
56 | % UBALL_CONFIG, SALUSTOWICZ1D_CONFIG, RIPPLE_CONFIG
57 |
58 | %if no config supplied then report error
59 | if nargin < 1 || isempty(configFile)
60 | disp('To run GPTIPS a configuration m file function handle must be specified');
61 | disp('e.g. GP = RUNGP(@CUBIC_CONFIG)');
62 | return;
63 | end
64 |
65 | %generate prepared data structure with some default run parameter values
66 | gp = gpdefaults();
67 |
68 | %setup random number generator
69 | gp = gprandom(gp);
70 |
71 | %run user configuration file
72 | gp = feval(configFile,gp);
73 |
74 | %multiple runs (are merged after each run)
75 | for run=1:gp.runcontrol.runs
76 |
77 | gp.state.run = run;
78 |
79 | %run user configuration file each time for subsequent runs, unless
80 | %suppressed (default)
81 | if run > 1 && ~gp.runcontrol.suppressConfig
82 | gp = feval(configFile,gp);
83 | end
84 |
85 | %attach name of config file for reference
86 | gp.info.configFile = configFile;
87 |
88 | %perform error checks
89 | gp = gpcheck(gp);
90 |
91 | %perform initialisation
92 | gp = gpinit(gp);
93 |
94 | %main generation loop
95 | for count=1:gp.runcontrol.num_gen
96 |
97 | %start gen timer
98 | gp = gptic(gp);
99 |
100 | if count == 1
101 |
102 | %generate the initial population
103 | gp = initbuild(gp);
104 |
105 | else
106 |
107 | %use crossover, mutation etc. to generate a new population
108 | gp = popbuild(gp);
109 |
110 | end
111 |
112 | %calculate fitnesses of population members
113 | gp = evalfitness(gp);
114 |
115 | %update run statistics
116 | gp = updatestats(gp);
117 |
118 | %call user defined function
119 | gp = gp_userfcn(gp);
120 |
121 | %display current stats on screen
122 | displaystats(gp);
123 |
124 | %end gen timer
125 | gp = gptoc(gp);
126 |
127 | %check termination criteria
128 | gp = gpterminate(gp);
129 |
130 | %save gp structure
131 | if gp.runcontrol.savefreq && ...
132 | ~mod(gp.state.count-1,gp.runcontrol.savefreq)
133 | save gptips_checkpoint gp
134 | end
135 |
136 | %break out of generation loop if termination required
137 | if gp.state.terminate
138 | if ~gp.runcontrol.quiet
139 | disp(['Termination criterion met (' ...
140 | gp.state.terminationReason '). Terminating run.']);
141 | end
142 | break;
143 | end
144 |
145 | end %end generation loop
146 |
147 | %finalise the run
148 | gp = gpfinalise(gp);
149 |
150 | %merge independent runs
151 | if gp.state.run > 1
152 | gpout = mergegp(gpout,gp,true);
153 | else
154 | gpout = gp;
155 | end
156 |
157 | end %end independent run loop
--------------------------------------------------------------------------------
/core/runtree.m:
--------------------------------------------------------------------------------
1 | function fitness = runtree(gp,ID,knockout)
2 | %RUNTREE Run the fitness function on an individual in the current population.
3 | %
4 | % RUNTREE(GP,ID) runs the fitness function using a user selected
5 | % individual with numeric identifier ID from the current population.
6 | % Despite the name RUNTREE will run either a single or a multigene
7 | % individual.
8 | %
9 | % RUNTREE(GP,'best') runs the fitness function using the 'best'
10 | % individual in the current population.
11 | %
12 | % RUNTREE(GP,'valbest') runs the fitness function using the best
13 | % individual on the validation data set (if it exists - for symbolic
14 | % regression problems that use the fitness function REGRESSMULTI_FITFUN).
15 | %
16 | % RUNTREE(GP,'testbest') runs the fitness function using the best
17 | % individual on the test data set (if it exists - for symbolic
18 | % regression problems that use the fitness function REGRESSMULTI_FITFUN).
19 | %
20 | % RUNTREE(GP,GPMODEL) runs the fitness function using the GPMODEL struct
21 | % representing a multigene regression model, i.e. the struct returned by
22 | % the functions GPMODEL2STRUCT or GENES2GPMODEL.
23 | %
24 | % FITNESS = RUNTREE(GP,ID) also returns the FITNESS value as returned by
25 | % the fitness function specified in gp.fitness.fitfun
26 | %
27 | % Additional functionality:
28 | %
29 | % RUNTREE can also accept an optional third argument KNOCKOUT which
30 | % should be a boolean vector the with same number of entries as genes in
31 | % the individual to be run. This evaluates the individual with the
32 | % indicated genes removed ('knocked out').
33 | %
34 | % E.g. RUNTREE(GP,'best',[1 0 0 1]) knocks out the 1st and 4th genes from
35 | % the best individual of run. In the case of multigene symbolic
36 | % regression (i.e. if the fitness function is REGRESSMULTI_FITFUN) the
37 | % weights for the remaining genes are recomputed by least squares
38 | % regression on the training data.
39 | %
40 | % Copyright (c) 2009-2015 Dominic Searson
41 | %
42 | % GPTIPS 2
43 | %
44 | % See also EVALFITNESS, GPMODELREPORT
45 |
46 | if nargin < 2 || isempty(ID)
47 | disp('Basic usage is RUNTREE(GP,ID) where ID is the population identifier of the selected individual.');
48 | disp('or RUNTREE(GP,''best'') to run the best individual in the population.');
49 |
50 | if nargout > 0
51 | fitness = [];
52 | end
53 |
54 | return;
55 |
56 | elseif nargin < 3
57 | knockout = 0;
58 | end
59 |
60 | if isempty(knockout) || ~any(knockout)
61 | doknockout = false;
62 | else
63 | doknockout = true;
64 | end
65 |
66 | i = ID;
67 |
68 | if isnumeric(ID)
69 |
70 | if i > 0 && i <= gp.runcontrol.pop_size
71 |
72 | %set this in case the fitness function needs to retrieve
73 | %the right returnvalues
74 | gp.state.current_individual = i;
75 | treestrs = tree2evalstr(gp.pop{i},gp);
76 |
77 | %if genes are being knocked out then remove appropriate gene
78 | if doknockout
79 | treestrs = kogene(treestrs, knockout);
80 | gp.state.force_compute_theta = true; %need to recompute gene weights if doing symbolic regression
81 | gp.userdata.showgraphs = true; %if using symbolic regression, plot graphs
82 | end
83 |
84 | fitness = feval(gp.fitness.fitfun,treestrs,gp);
85 |
86 | else
87 | error('A valid population member ID must be entered, e.g. 1, 99 or ''best''');
88 | end
89 |
90 | elseif ischar(ID) && strcmpi(ID,'best')
91 |
92 | gp.fitness.returnvalues{gp.state.current_individual} = gp.results.best.returnvalues;
93 | treestrs = gp.results.best.eval_individual;
94 |
95 | if doknockout
96 | treestrs = kogene(treestrs, knockout);
97 | gp.state.force_compute_theta = true;
98 | gp.userdata.showgraphs = true;
99 | end
100 |
101 | fitness = feval(gp.fitness.fitfun,treestrs,gp);
102 |
103 | elseif ischar(ID) && strcmpi(ID,'valbest')
104 |
105 | % check that validation results/data present
106 | if ~isfield(gp.results,'valbest')
107 | disp('No validation results/data were found. Try runtree(gp,''best'') instead.');
108 | return;
109 | end
110 |
111 | %copy "valbest" return values to a slot in the "current" return values
112 | gp.fitness.returnvalues{gp.state.current_individual} = gp.results.valbest.returnvalues;
113 |
114 | treestrs = gp.results.valbest.eval_individual;
115 |
116 | if doknockout
117 | treestrs = kogene(treestrs, knockout);
118 | gp.state.force_compute_theta = true;
119 | gp.userdata.showgraphs = true;
120 | end
121 |
122 | fitness=feval(gp.fitness.fitfun,treestrs,gp);
123 |
124 | elseif ischar(ID) && strcmpi(ID,'testbest')
125 |
126 | % check that test data results are present
127 | if ~isfield(gp.results,'testbest')
128 | disp('No test results/data were found. Try runtree(gp,''best'') instead.');
129 | return;
130 | end
131 |
132 | %copy "testbest" return values to a slot in the "current" return values
133 | gp.fitness.returnvalues{gp.state.current_individual} = gp.results.testbest.returnvalues;
134 |
135 | treestrs = gp.results.testbest.eval_individual;
136 |
137 | if doknockout
138 | treestrs = kogene(treestrs, knockout);
139 | gp.state.force_compute_theta = true;
140 | gp.userdata.showgraphs = true;
141 | end
142 |
143 | fitness=feval(gp.fitness.fitfun,treestrs,gp);
144 |
145 | %if the selected individual is a GPMODEL struct (NB knockout disabled
146 | %for this form)
147 | elseif isa(ID,'struct') && isfield(ID,'source') && ...
148 | (strcmpi(ID.source,'gpmodel2struct') || strcmpi(ID.source,'genes2gpmodel') );
149 |
150 | treestrs = ID.genes.geneStrs;
151 | treestrs = tree2evalstr(treestrs,gp);
152 | gp.fitness.returnvalues{gp.state.current_individual} = ID.genes.geneWeights;
153 | fitness = feval(gp.fitness.fitfun,treestrs,gp);
154 |
155 | else
156 | error('Invalid argument.');
157 | end
--------------------------------------------------------------------------------
/core/scangenes.m:
--------------------------------------------------------------------------------
1 | function xhits = scangenes(genes,numx)
2 | %SCANGENES Scan a single multigene individual for all input variables and return a frequency vector.
3 | %
4 | % XHITS = SCANGENES(GENES,NUMX) scans the cell array GENES for NUMX
5 | % input variables and returns the frequency vector XHITS.
6 | %
7 | % Copyright (c) 2009-2015 Dominic Searson
8 | %
9 | % GPTIPS 2
10 | %
11 | % See also GPMODELVARS
12 |
13 | numgenes = length(genes);
14 | xhits = zeros(1,numx);
15 |
16 | %loop through inputs
17 | for i=1:numx
18 | k1 = [];
19 | k2 = [];
20 | istr = ['x' int2str(i)];
21 |
22 | %loop through genes, look for current input
23 | for j=1:numgenes
24 |
25 | k1 = strfind(genes{j}, [istr ',']);
26 | k2 = strfind(genes{j},[istr ')']);
27 |
28 | %workaround for special case (trees containing a single terminal node)
29 | k3 = strfind(genes{j},istr);
30 | if ~isempty(k3) && numel(k3) == 1
31 | if length(istr) == length(genes{j});
32 | xhits(i) = xhits(i) + 1;
33 | end
34 | end
35 |
36 | if ~isempty(k1)
37 | xhits(i) = xhits(i) + length(k1);
38 | end
39 |
40 | if ~isempty(k2)
41 | xhits(i) = xhits(i) + length(k2);
42 | end
43 |
44 | end
45 |
46 | end
--------------------------------------------------------------------------------
/core/selection.m:
--------------------------------------------------------------------------------
1 | function ID = selection(gp)
2 | %SELECTION Selects an individual from the current population.
3 | %
4 | % Selection is based on fitness and/or complexity using either regular or
5 | % Pareto tournaments.
6 | %
7 | % ID = SELECTION(GP) returns the numeric identifier of the selected
8 | % individual.
9 | %
10 | % Remarks:
11 | %
12 | % Selection method is either:
13 | %
14 | % A) Standard fitness based tournament (with the option of lexicographic
15 | % selection pressure).
16 | %
17 | % This is always used when the field GP.SELECTION.TOURNAMENT.P_PARETO
18 | % equals 0.
19 | %
20 | % OR
21 | %
22 | % B) Pareto tournament
23 | %
24 | % This is used with a probability equal to
25 | % GP.SELECTION.TOURNAMENT.P_PARETO
26 | %
27 | % Remarks:
28 | %
29 | % Method A (standard tournament)
30 | % ------------------------------
31 | %
32 | % For a tournament of size N
33 | %
34 | % 1) N individuals are randomly selected from the population with
35 | % reselection allowed.
36 | %
37 | % 2) The population index of the best individual in the tournament is
38 | % returned.
39 | %
40 | % 3) If one or more individuals in the tournament have the same best
41 | % fitness then one of these is selected randomly, unless
42 | % GP.SELECTION.TOURNAMENT.LEX_PRESSURE is set to true. In this case the
43 | % one with the lowest complexity is selected (if there is more than one
44 | % individual with the best fitness AND the same complexity then one of
45 | % these individuals is randomly selected.)
46 | %
47 | % OR
48 | %
49 | % Method B (Pareto tournament)
50 | % ----------------------------
51 | %
52 | % This is used probabalistically when GP.SELECTION.TOURNAMENT.P_PARETO > 0
53 | %
54 | % For a tournament of size N
55 | %
56 | % 1) N individuals are randomly selected from the population with
57 | % reselection allowed.
58 | %
59 | % 2) These N individuals are then Pareto sorted (sorted using fitness and
60 | % complexity as competing objectives) to obtain a (rank 1) pareto set of
61 | % individuals that are non-dominated in terms of both fitness and
62 | % complexity (this set can be thought of as being the pareto front of the
63 | % tournament). The pareto sort is based on on the sorting method
64 | % described on page 184 of:
65 | %
66 | % "A fast and elitist multiobjective genetic algorithm: NSGA-II" by
67 | % Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, T. Meyarivan IEEE
68 | % Transactions on Evolutionary Computation VOL. 6, NO. 2, APRIL 2002, pp.
69 | % 182-197.
70 | %
71 | % 3) A random individual is then selected from the pareto set and its
72 | % index IND in the population is returned.
73 | %
74 | % Copyright (c) 2009-2015 Dominic Searson
75 | %
76 | % GPTIPS 2
77 | %
78 | % See also POPBUILD, INITBUILD
79 |
80 | %Method A
81 | if rand >= gp.selection.tournament.p_pareto
82 |
83 | %pick N individuals at random
84 | tour_ind = ceil(rand(gp.selection.tournament.size,1)*gp.runcontrol.pop_size);
85 |
86 | %retrieve fitness values of tournament members
87 | tour_fitness = gp.fitness.values(tour_ind);
88 |
89 | %check if max or min problem
90 | if gp.fitness.minimisation
91 | bestfitness = min(tour_fitness);
92 | else
93 | bestfitness = max(tour_fitness);
94 | end
95 |
96 | %locate indices of best individuals in tournament
97 | bestfitness_tour_ind = find(tour_fitness == bestfitness);
98 | number_of_best = numel(bestfitness_tour_ind);
99 |
100 | %use plain lexicographic parsimony pressure a la Sean Luke
101 | %and Livui Panait (Lexicographic Parsimony Pressure, GECCO 2002, pp. 829-836).
102 | if number_of_best > 1
103 |
104 | if gp.selection.tournament.lex_pressure
105 |
106 | tcomps = gp.fitness.complexity(tour_ind(bestfitness_tour_ind));
107 |
108 | [~,min_ind] = min(tcomps);
109 | bestfitness_tour_ind = bestfitness_tour_ind(min_ind);
110 |
111 | else %otherwise randomly pick one
112 |
113 | bestfitness_tour_ind = bestfitness_tour_ind(ceil(rand*number_of_best));
114 |
115 | end
116 | end
117 |
118 | ID = tour_ind(bestfitness_tour_ind);
119 |
120 | else %Method B: Pareto tournament
121 |
122 | %pick N individuals at random
123 | tour_ind = ceil(rand(gp.selection.tournament.size,1)*gp.runcontrol.pop_size);
124 |
125 | %retrieve fitness values of tournament members
126 | tour_fitness = gp.fitness.values(tour_ind);
127 |
128 | %retrieve complexities of tournament members
129 | tour_comp = gp.fitness.complexity(tour_ind);
130 |
131 | %perform fast pareto sort on tournament members
132 | %and select winner randomly from rank 1 solutions
133 | if gp.fitness.minimisation
134 | mo = [tour_fitness tour_comp];
135 | else
136 | mo = [ (-1 *tour_fitness) tour_comp];
137 | end
138 |
139 | %remove any 'infs' from consideration and recall if necessary
140 | infs= isinf(mo(:,1));
141 | mo(infs,:)=[];
142 | if isempty(mo)
143 | ID = selection(gp);
144 | return;
145 | end
146 |
147 | tour_ind(infs,:)=[];
148 |
149 | rank = ndfsort_rank1(mo);
150 | rank1_tour_ind = tour_ind(rank == 1);
151 | num_rank1 = numel(rank1_tour_ind);
152 | ID = rank1_tour_ind(ceil(rand * num_rank1));
153 | end
--------------------------------------------------------------------------------
/core/standaloneModelStats.m:
--------------------------------------------------------------------------------
1 | function stats = standaloneModelStats(yactual,ypredicted)
2 | %STANDALONEMODELSTATS Compute model performance stats for actual and predicted values.
3 | %
4 | % STATS = STANDALONEMODELSTATS(YACTUAL,YPREDICTED) returns a structure
5 | % STATS containing model performance metrics.
6 | %
7 | % Copyright (c) 2009-2015 Dominic Searson
8 | %
9 | % GPTIPS 2
10 |
11 | %residuals
12 | e = (yactual-ypredicted);
13 | stats.e = e;
14 |
15 | %rsquared
16 | stats.rsq = 1 - sum(e.^2)/sum( (yactual-mean(yactual)).^2 );
17 |
18 | %corrcoef
19 | c = corrcoef(yactual, ypredicted);
20 | stats.r = c(1,2);
21 |
22 | %rmse
23 | stats.rmse = sqrt(mean(e.^2));
24 |
25 | %mae
26 | stats.mae = mean(abs(e));
27 |
28 | %mse
29 | stats.mse = mean(e.^2);
30 |
31 |
32 |
--------------------------------------------------------------------------------
/core/summary.m:
--------------------------------------------------------------------------------
1 | function summary(gp,plotlog)
2 | %SUMMARY Plots basic summary information from a run.
3 | %
4 | % By default, log(fitness) is plotted rather than raw fitness when all
5 | % best fitness history values are > 0. Use SUMMARY(GP,FALSE) to plot raw
6 | % fitness values.
7 | %
8 | % Copyright (c) 2009-2015 Dominic Searson
9 | %
10 | % GPTIPS 2
11 | %
12 | % See also RUNTREE, POPBROWSER, GPPOPVARS, GPMODELVARS
13 |
14 | if nargin < 1
15 | disp('Usage is SUMMARY(GP) to plot log fitness values or SUMMARY(GP,FALSE) to plot raw fitness values.');
16 | return;
17 | end
18 |
19 | %plot log of fitness/rmse by default
20 | if nargin < 2 || isempty(plotlog)
21 | plotlog = true;
22 | end
23 |
24 | %check if this is a merged gp structure, because the run history data will
25 | %only apply to the first structure in the merge.
26 | if gp.info.merged
27 | mergeStr = '1st run in merged population';
28 | else
29 | mergeStr = 'run';
30 | end
31 |
32 | if ~isempty(gp.userdata.name)
33 | setname = ['. Data: ' gp.userdata.name];
34 | else
35 | setname='';
36 | end
37 |
38 | if strncmpi('regressmulti',func2str(gp.fitness.fitfun),12)
39 | ylab1 = 'Log RMSE';
40 | ylab2 = 'RMSE';
41 | else
42 | ylab1 = ['Log ' gp.fitness.label];
43 | ylab2 = gp.fitness.label;
44 | end
45 |
46 | if any(gp.results.history.bestfitness <= 0)
47 | plotlog = false;
48 | end
49 |
50 | h = figure;
51 | set(h,'name','GPTIPS 2 run summary','numbertitle','off');
52 |
53 | if ~plotlog && strncmpi('regressmulti',func2str(gp.fitness.fitfun),12)
54 | ylab1 = 'RMSE';
55 | end
56 |
57 |
58 | if plotlog
59 | subplot(2,1,1);
60 | stairs(0:1:gp.state.count-1,log(gp.results.history.bestfitness),'LineWidth',2,'Color',[0 0.45 0.74]);
61 | ylabel(ylab1);
62 | else
63 | subplot(2,1,1);
64 | stairs(0:1:gp.state.count-1,gp.results.history.bestfitness,'LineWidth',2,'Color',[0 0.45 0.74]);
65 | ylabel(ylab2);
66 | end
67 | a=axis;
68 | a(2) = gp.state.count-1;axis(a);
69 | grid on;
70 |
71 |
72 | title({['Summary of ' mergeStr], ['Config: ' char(gp.info.configFile) setname '.']},'interpreter','none','fontWeight','bold');
73 | xlabel('Generation');
74 | legend('Best fitness');
75 | legend boxoff;
76 |
77 | subplot(2,1,2);
78 | hold on;
79 | plot([0:1:gp.state.count-1],gp.results.history.meanfitness,'-x','LineWidth',2,'Color',[0.85 0.33 0.1]);
80 | plot([0:1:gp.state.count-1],gp.results.history.meanfitness+gp.results.history.std_devfitness,'r:');hold on;
81 | plot([0:1:gp.state.count-1],gp.results.history.meanfitness-gp.results.history.std_devfitness,'r:');
82 | a=axis;
83 | a(2) = gp.state.count-1;axis(a);
84 | set(gca,'box','on');
85 | legend('Mean fitness (+ - 1 std. dev)');
86 | legend boxoff;
87 | xlabel('Generation');
88 | ylabel(ylab2);
89 | grid on;
90 | hold off;
--------------------------------------------------------------------------------
/core/tree2evalstr.m:
--------------------------------------------------------------------------------
1 | function decodedArray = tree2evalstr(encodedArray,gp)
2 | %TREE2EVALSTR Converts encoded tree expressions into math expressions that MATLAB can evaluate directly.
3 | %
4 | % DECODEDARRAY = TREE2EVALSTR(ENCODEDARRAY,GP) processes the encoded
5 | % symbolic expressions in the cell array ENCODEDARRAY using the
6 | % information in the GP data structure and creates a cell array
7 | % DECODEDARRAY, each element of which is an evaluable Matlab string
8 | % expression.
9 | %
10 | % Remarks:
11 | %
12 | % GPTIPS uses a encoded string representation of expressions which is not
13 | % very human readable, but is fairly compact and makes events like
14 | % crossover and mutation easier to handle. An example of such an
15 | % expression is a(b(x1,a(x4,x3)),c(x2,x1)). Before an expression can be
16 | % evaluated it is converted using TREE2EVALSTR to produce an evaluable
17 | % math expression, e.g. times(minus(x1,times(x4,x3)),plus(x2,x1)).
18 | %
19 | % Copyright (c) 2009-2015 Dominic Searson
20 | %
21 | % GPTIPS 2
22 | %
23 | % See also EVALFITNESS, TREEGEN, GPREFORMAT
24 |
25 | %loop through active function list and replace with function names
26 | for j=1:gp.nodes.functions.num_active
27 | encodedArray = strrep(encodedArray,gp.nodes.functions.afid(j),gp.nodes.functions.active_name_UC{j});
28 | end
29 |
30 | decodedArray = lower(encodedArray);
--------------------------------------------------------------------------------
/core/uniquegenes.m:
--------------------------------------------------------------------------------
1 | function genes = uniquegenes(gp,modelList)
2 | %UNIQUEGENES Returns a GENES structure containing the unique genes in a population.
3 | %
4 | % GENES = UNIQUEGENES(GP) scans through the symbolic regression models
5 | % stored in the population contained in GP and extracts the unique set of
6 | % genes that makes up the population of 'valid' models. This unique set
7 | % is returned in a MATLAB data struct GENES. (See GPMODEL2STRUCT for what
8 | % constitutes a 'valid' model.)
9 | %
10 | % GENES = UNIQUEGENES(GP, MODELLIST) does the same but only scans the
11 | % population members with indices contained in the row vector MODELLIST.
12 | %
13 | % E.g. GENES = UNIQUEGENES(GP, [1 3 9 10]) returns a structure GENES
14 | % containing the unique genes in the models from GP with population
15 | % indices 1, 3, 9 and 10.
16 | %
17 | % Copyright (c) 2009-2015 Dominic Searson
18 | %
19 | % GPTIPS 2
20 | %
21 | % See also GENEFILTER, GENEBROWSER, GPMODELFILTER, GPMODEL2STRUCT
22 |
23 | if nargin < 1
24 | disp('Basic usage is UNIQUEGENES(GP).');
25 | if nargout > 0
26 | genes = [];
27 | end
28 | return
29 | end
30 |
31 | if ~gp.info.toolbox.symbolic()
32 | error('The Symbolic Math Toolbox is required to use this function.');
33 | end
34 |
35 | if nargin < 2
36 | modelList = 1:gp.runcontrol.pop_size;
37 | end
38 |
39 | if ~strncmpi(func2str(gp.fitness.fitfun),'regressmulti',12);
40 | error('This function is intended only for use on populations containing multigene regression models.');
41 | end
42 |
43 | if ~isnumeric(modelList) || size(modelList,1) > 1
44 | error('Supplied model list must be a row vector of model indices.');
45 | end
46 |
47 | verOld = verLessThan('matlab', '7.7.0');
48 |
49 | numSuppliedModels = numel(modelList);
50 |
51 | %check model validity
52 | numGenes = 0;
53 | models = cell(numSuppliedModels,1);
54 | disp(['Scanning ' num2str(numSuppliedModels) ' models ...']);
55 |
56 | for i = 1:numSuppliedModels
57 | models{i} = gpmodel2struct(gp,modelList(i),false,false,false);
58 | if ~models{i}.valid
59 | %if model is 'invalid' do nothing and skip to next model
60 | else
61 | numGenes = numGenes + models{i}.genes.num_genes;
62 | end
63 | end
64 |
65 | if numGenes == 0;
66 | error('No valid models found in supplied population list.');
67 | end
68 |
69 | %addin genes from 'best' model on training data and 'valbest' and
70 | %'testbest'
71 | if numSuppliedModels == gp.runcontrol.pop_size
72 | modelbest = gpmodel2struct(gp,'best', false, false, false);
73 | modelvalbest = gpmodel2struct(gp,'valbest', false, false, false);
74 | modeltestbest = gpmodel2struct(gp,'testbest', false, false, false);
75 |
76 | numGenes = numGenes + modelbest.genes.num_genes;
77 |
78 | if modelvalbest.valid
79 | numGenes = numGenes + modelvalbest.genes.num_genes;
80 | end
81 |
82 | if modeltestbest.valid
83 | numGenes = numGenes + modeltestbest.genes.num_genes;
84 | end
85 | else
86 | modelvalbest.valid = false;
87 | modeltestbest.valid = false;
88 | modelbest.valid = false;
89 | end
90 |
91 | %loop through valid models and get all genes
92 | allGenes = cell(numGenes,1);
93 | offset = 0;
94 |
95 | for i=1:numSuppliedModels
96 | if models{i}.valid
97 | numModelGenes = models{i}.genes.num_genes;
98 | allGenes(1+offset:numModelGenes+offset) = models{i}.genes.geneStrs;
99 | offset = offset + numModelGenes;
100 | end
101 | end
102 |
103 | if modelbest.valid
104 | numModelGenes = modelbest.genes.num_genes;
105 | allGenes(1+offset:numModelGenes+offset) = modelbest.genes.geneStrs;
106 | offset = offset + numModelGenes;
107 | end
108 |
109 | if modelvalbest.valid
110 | numModelGenes = modelvalbest.genes.num_genes;
111 | allGenes(1+offset:numModelGenes+offset) = modelvalbest.genes.geneStrs;
112 | offset = offset + numModelGenes;
113 | end
114 |
115 | if modeltestbest.valid
116 | numModelGenes = modeltestbest.genes.num_genes;
117 | allGenes(1+offset:numModelGenes+offset) = modeltestbest.genes.geneStrs;
118 | offset = offset + numModelGenes;
119 | end
120 |
121 | unique_genes = unique(allGenes);
122 | disp(['Total encoded genes found: ' num2str(numel(allGenes))]);
123 | disp(['Unique encoded genes found: ' num2str(numel(unique_genes))]);
124 | disp('Decoding & simplifying genes...please wait.');
125 |
126 | genes.uniqueGenesCoded = unique_genes;
127 | genes.uniqueGenesDecoded = gpreformat(gp,unique_genes)';
128 | genes.numModelsScanned = numSuppliedModels;
129 |
130 | symgenes = cell(numel(genes.uniqueGenesDecoded),1);
131 | simpleGenes = cell(numel(genes.uniqueGenesDecoded),1);
132 | simpleGenesChar = cell(numel(genes.uniqueGenesDecoded),1);
133 |
134 | for i=1:numel(symgenes)
135 |
136 | disp(['Simplifying gene ' num2str(i)]);
137 | symgenes{i} = sym(genes.uniqueGenesDecoded{i});
138 |
139 | try
140 | simpleGenes{i} = gpsimplify(symgenes{i},10,verOld, true);
141 | catch
142 | warning(['Could not simplify gene ' num2str(i)]);
143 | simpleGenes{i} = symgenes{i};
144 | end
145 |
146 | simpleGenesChar{i} = char(simpleGenes{i});
147 | end
148 |
149 | [~, uniqueInds] = unique(simpleGenesChar);
150 | genes.symUniqueInds = sort(uniqueInds);
151 | genes.uniqueGenesSym = simpleGenes(genes.symUniqueInds);
152 | genes.numUniqueGenes = numel(genes.uniqueGenesSym);
153 |
154 | %rectify non-sym gene lists
155 | genes.uniqueGenesCoded = genes.uniqueGenesCoded(genes.symUniqueInds);
156 | genes.uniqueGenesDecoded = genes.uniqueGenesDecoded(genes.symUniqueInds);
157 |
158 | genes.totalGenes = numel(allGenes);
159 | genes.numUniqueGenes = numel(genes.uniqueGenesSym);
160 | disp(['Unique decoded genes found: ' num2str(genes.numUniqueGenes)]);
161 |
162 | %modify the GP structure to just contain the unique genes
163 | gp.pop = genes.uniqueGenesCoded;
164 | gp.state.run_completed = true;
165 | gp.state.force_compute_theta = true;
166 | gp.runcontrol.pop_size = genes.numUniqueGenes;
167 | gp.userdata.showgraphs = false;
168 | gp.userdata.stats = false;
169 |
170 | codedGenes2Use = genes.uniqueGenesCoded;
171 | geneOutputs = zeros(length(gp.userdata.ytrain),genes.numUniqueGenes);
172 |
173 | if isfield(gp.userdata,'yval') && ~isempty(gp.userdata.yval)
174 | geneOutputsVal = zeros(length(gp.userdata.yval),genes.numUniqueGenes);
175 | else
176 | geneOutputsVal = [];
177 | end
178 |
179 | if isfield(gp.userdata,'ytest') && ~isempty(gp.userdata.ytest)
180 | geneOutputsTest = zeros(length(gp.userdata.ytest),genes.numUniqueGenes);
181 | else
182 | geneOutputsTest = [];
183 | end
184 |
185 | rtrainGenes = zeros(genes.numUniqueGenes,1);
186 | geneComplexity = zeros(genes.numUniqueGenes,1);
187 |
188 | %evaluate the genes against the output & get corrcoef
189 | for i=1:numel(codedGenes2Use)
190 | evalstr = tree2evalstr(codedGenes2Use{i},gp);
191 | gp.state.current_individual = i;
192 | [fitness,gp,~,ypredtrain,~,~,~,~,~,~,...
193 | gene_outputs,gene_outputs_test,gene_outputs_val] = feval(gp.fitness.fitfun,{evalstr},gp);
194 |
195 | %in rare cases genes in isolation give overflows etc, just set the
196 | %corrcoef to zero.
197 | if isinf(fitness)
198 | rtrainGenes(i,1) = 0;
199 | else
200 | c = abs(corrcoef(ypredtrain,gp.userdata.ytrain));
201 | rtrainGenes(i,1) = c(1,2);
202 | geneComplexity(i,1) = getcomplexity(codedGenes2Use{i});
203 | geneOutputs(:,i) = gene_outputs(:,2);
204 | if ~isempty(gene_outputs_test)
205 | geneOutputsTest(:,i) = gene_outputs_test(:,2);
206 | elseif i == 1
207 | geneOutputsTest = [];
208 | end
209 |
210 | if ~isempty(gene_outputs_val)
211 | geneOutputsVal(:,i) = gene_outputs_val(:,2);
212 | elseif i == 1
213 | geneOutputsVal = [];
214 | end
215 | end
216 | end
217 |
218 | %tidy up
219 | genes.about = 'A struct containing unique genes from a population.';
220 | genes.geneOutputsTrain = geneOutputs;
221 | genes.geneOutputsTest = geneOutputsTest;
222 | genes.geneOutputsVal = geneOutputsVal;
223 | genes.rtrain = rtrainGenes;
224 | genes.complexity = geneComplexity;
225 | genes = rmfield(genes,'symUniqueInds');
226 | genes.filtered = false;
227 | genes = orderfields(genes);
228 | genes.source = 'uniqueGenes';
--------------------------------------------------------------------------------
/core/updatestats.m:
--------------------------------------------------------------------------------
1 | function gp = updatestats(gp)
2 | %UPDATESTATS Update run statistics.
3 | %
4 | % GP = UPDATESTATS(GP) updates the stats of the GP struct.
5 | %
6 | % Copyright (c) 2009-2015 Dominic Searson
7 | %
8 | % GPTIPS 2
9 | %
10 | % See also DISPLAYSTATS, GPINIT, GPFINALISE
11 |
12 | %check for NaN fitness values
13 | nanInd = isnan(gp.fitness.values);
14 |
15 | %write best fitness of current generation to gp struct
16 | if gp.fitness.minimisation
17 |
18 | gp.fitness.values(nanInd) = Inf;
19 | bestTrainFitness = min(gp.fitness.values);
20 |
21 | else %maximisation of fitness function
22 |
23 | %replace Inf and NaN with -Inf
24 | infInd = isinf(gp.fitness.values);
25 | gp.fitness.values(infInd) = -Inf;
26 | gp.fitness.values(nanInd) = -Inf;
27 | bestTrainFitness = max(gp.fitness.values);
28 |
29 | end
30 |
31 | %There may be more than one individual with best fitness. If so designate
32 | %as "best" the one with the lowest complexity/node count. If there is more
33 | %than one with the same complexity then just pick the first one
34 | %(effectively a random selection). This doesn't affect the GP run, it only
35 | %affects the reporting of which individual is currently considered "best".
36 | bestTrainInds = find(gp.fitness.values == bestTrainFitness);
37 |
38 | nodes = gp.fitness.complexity(bestTrainInds);
39 |
40 | [~,ind] = min(nodes);
41 | bestTrainInd = bestTrainInds(ind);
42 |
43 | %store best of current population (training)
44 | gp.state.best.fitness = bestTrainFitness;
45 | gp.state.best.individual = gp.pop{bestTrainInd};
46 | gp.state.best.returnvalues = gp.fitness.returnvalues{bestTrainInd};
47 | gp.state.best.complexity = getcomplexity(gp.state.best.individual);
48 | gp.state.best.nodecount = getnumnodes(gp.state.best.individual);
49 |
50 | gp.results.history.bestfitness(gp.state.count,1) = bestTrainFitness;
51 | gp.state.best.index = bestTrainInd;
52 |
53 | %calc. mean and std. dev. fitness (exc. inf values)
54 | notinfInd = ~isinf(gp.fitness.values);
55 | gp.state.meanfitness = mean(gp.fitness.values(notinfInd));
56 | gp.state.std_devfitness = std(gp.fitness.values(notinfInd));
57 | gp.results.history.meanfitness(gp.state.count,1) = gp.state.meanfitness;
58 | gp.results.history.std_devfitness(gp.state.count,1) = gp.state.std_devfitness;
59 |
60 | %update best of run so far on training data
61 | if gp.state.count == 1 %if first gen then "best of run" is best of current gen
62 |
63 | gp.results.best.fitness = gp.state.best.fitness;
64 | gp.results.best.individual = gp.state.best.individual;
65 | gp.results.best.returnvalues = gp.state.best.returnvalues;
66 | gp.results.best.complexity = gp.state.best.complexity;
67 | gp.results.best.nodecount = gp.state.best.nodecount;
68 | gp.results.best.foundatgen = 0;
69 | gp.results.best.eval_individual = tree2evalstr(gp.state.best.individual,gp);
70 |
71 | %update run "best" fitness if current gen best fitness is better (or
72 | %the same but less complex)
73 | else
74 |
75 | updateTrainBest = false;
76 |
77 | %update 'best' depending on chosen measure of complexity
78 | if gp.fitness.complexityMeasure
79 | stateComp = gp.state.best.complexity;
80 | bestComp = gp.results.best.complexity;
81 | else
82 | stateComp = gp.state.best.nodecount;
83 | bestComp = gp.results.best.nodecount;
84 | end
85 |
86 | %if minimising fitness function
87 | if gp.fitness.minimisation
88 |
89 | if (gp.state.best.fitness < gp.results.best.fitness) ...
90 | || ( (stateComp < bestComp) && (gp.state.best.fitness == gp.results.best.fitness) )
91 | updateTrainBest = true;
92 | end
93 |
94 | %if maximising fitness function
95 | else
96 |
97 | if (gp.state.best.fitness > gp.results.best.fitness) ...
98 | || ( (stateComp < bestComp) && (gp.state.best.fitness == gp.results.best.fitness) )
99 | updateTrainBest = true;
100 | end
101 | end
102 |
103 | if updateTrainBest
104 | gp.results.best.fitness = gp.state.best.fitness;
105 | gp.results.best.individual = gp.state.best.individual;
106 | gp.results.best.returnvalues = gp.state.best.returnvalues;
107 | gp.results.best.complexity = gp.state.best.complexity;
108 | gp.results.best.nodecount = gp.state.best.nodecount;
109 | gp.results.best.foundatgen = gp.state.count - 1;
110 | gp.results.best.eval_individual = tree2evalstr(gp.state.best.individual,gp);
111 | end
112 |
113 | end
114 |
--------------------------------------------------------------------------------
/examples/concrete.mat:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/is-centre/gptips2f-matlab/4672d69cdfc457f1d45a33403569447f736fa654/examples/concrete.mat
--------------------------------------------------------------------------------
/examples/cubic_config.m:
--------------------------------------------------------------------------------
1 | function gp=cubic_config(gp)
2 | %CUBIC_CONFIG Config file for multigene regression on a simple cubic polynomial.
3 | %
4 | % GP = CUBIC_CONFIG(GP) generates a parameter structure GP that specifies
5 | % the GPTIPS run settings for multigene regression on data generated by
6 | % the cubic polynomial:
7 | %
8 | % 3 2
9 | % 3.4 x1 + 2.9 x1 + 6.2 x1 + 0.75
10 | %
11 | % Example:
12 | %
13 | % GP = RUNGP(@cubic_config) uses this configuration file to perform
14 | % symbolic regression on the cubic polynonial data and returns the
15 | % results in the data struct GP.
16 | %
17 | % Copyright (c) 2009-2015 Dominic Searson
18 | %
19 | % GPTIPS 2
20 | %
21 | % See also SALUSTOWICZ1D_CONFIG, UBALL_CONFIG, RIPPLE_CONFIG,
22 | % REGRESSMULTI_FITFUN, RUNGP
23 |
24 | %generate cubic polynomial data for train and test sets
25 | %and use GPTIPS defaults for everything else
26 | xtr = (-10: 0.2 : 10)';
27 | ytr = 3.4 * xtr.^3 + 2.9*xtr.^2 + 6.2*xtr + 0.75;
28 | gp.userdata.xtrain = xtr;
29 | gp.userdata.ytrain = ytr;
30 |
31 | xte = (-9.5 : 0.333 : 9.5)';
32 | yte = 3.4 * xte.^3 + 2.9*xte.^2 + 6.2*xte + 0.75;
33 | gp.userdata.xtest = xte;
34 | gp.userdata.ytest = yte;
35 | gp.userdata.name = 'Cubic poly';
36 |
37 | %termination criterion
38 | gp.fitness.terminate = true;
39 | gp.fitness.terminate_value = 1e-6;
40 |
41 | %function nodes
42 | gp.nodes.functions.name = {'times','minus','plus','add3','mult3'};
43 |
44 |
--------------------------------------------------------------------------------
/examples/demo2data.mat:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/is-centre/gptips2f-matlab/4672d69cdfc457f1d45a33403569447f736fa654/examples/demo2data.mat
--------------------------------------------------------------------------------
/examples/gpdemo1.m:
--------------------------------------------------------------------------------
1 | %GPDEMO1 GPTIPS 2 demo of simple symbolic regression on Koza's quartic polynomial.
2 | %
3 | % Demonstrates simple (naive) symbolic regression and some post run
4 | % analysis functions such as SUMMARY, RUNTREE and GPPRETTY to simplify
5 | % expressions if the Symbolic Math Toolbox is present. Also shows the use
6 | % of DRAWTREES to visualise the tree structure of GPTIPS individuals.
7 | %
8 | % (c) Dominic Searson 2009-2015
9 | %
10 | % GPTIPS 2
11 | %
12 | % See also GPDEMO1_CONFIG, QUARTIC_FITFUN, GPDEMO2, GPDEMO3, GPDEMO4,
13 | % SUMMARY, RUNTREE, GPPRETTY
14 |
15 | clc;
16 | disp('GPTIPS 2 Demo 1: naive symbolic regression');
17 | disp('------------------------------------------');
18 | disp('Naive symbolic regression on 20 data points genenerated by the');
19 | disp('quartic polynomial y=x+x^2+x^3+x^4 in the range -1 < x < 1');
20 | disp('The GP run configuration file is gpdemo1_config.m');
21 | disp(' ');
22 | disp('In this example, the direct output of a single evolved GP tree is used to');
23 | disp('model the data generated by the quartic polynomial.');
24 | disp('The function nodes used are TIMES, MINUS, PLUS and RDIVIDE.');
25 | disp('The only terminal node used is the input x and trees are constrained');
26 | disp('to have a maximum depth of 12.');
27 | disp(' ');
28 | disp('A population size of 50 is run for 100 generations.');
29 | disp('The ''fitness'' is sum of absolute differences between the actual and');
30 | disp('the predicted y values and GPTIPS tries to minimise this.');
31 | disp(' ');
32 |
33 | disp('First, call GPTIPS using the configuration in gpdemo1_config.m');
34 | disp('using:');
35 | disp('>>gp=rungp(@gpdemo1_config);');
36 | disp('Press a key to continue');
37 | disp(' ');
38 | pause;
39 | gp=rungp(@gpdemo1_config);
40 |
41 | disp('Next, plot summary information of run using:');
42 | disp('>>summary(gp)');
43 | disp('Press a key to continue');
44 | disp(' ');
45 | pause;
46 | summary(gp,false);
47 |
48 | disp('Run the best model of the run on the fitness function using:');
49 | disp('>>runtree(gp,''best'');');
50 | disp('Press a key to continue');
51 | disp(' ');
52 | pause;
53 | runtree(gp,'best');
54 |
55 | disp(' ');
56 | disp('The best model of the run is stored in the field:');
57 | disp('gp.results.best.eval_individual{1} :');
58 | disp(' ');
59 | disp( gp.results.best.eval_individual{1});
60 | disp(' ');
61 |
62 | disp(['This model has a tree depth of ' int2str( getdepth(gp.results.best.individual{1}))]);
63 | disp(['It was found at generation ' int2str(gp.results.best.foundatgen)]);
64 | disp(['and has fitness ' num2str(gp.results.best.fitness)]);
65 |
66 | %If Symbolic Math toolbox is present
67 | if gp.info.toolbox.symbolic()
68 |
69 | disp(' ');
70 | disp('Using the symbolic math toolbox simplified versions of this');
71 | disp('expression can be found: ')
72 | disp('E.g. using the the GPPRETTY command on the best model: ');
73 | disp('>>gppretty(gp,''best'') ');
74 | disp('Press a key to continue');
75 | disp(' ');
76 | pause;
77 | gppretty(gp,'best');
78 | disp(' ');
79 | disp('If you are lucky then this is the quartic polynomial used to');
80 | disp('generate the data.');
81 | disp('NOTE: In general, it is unusual for GP to evolve the exact form');
82 | disp('of the generative function.');
83 | end
84 |
85 | disp(' ');
86 | disp('Next, visualise the tree structure of the best model of the run:');
87 | disp('>>drawtrees(gp,''best'');');
88 | disp('Press a key to continue');
89 | disp(' ');
90 | pause;
91 | drawtrees(gp,'best');
--------------------------------------------------------------------------------
/examples/gpdemo1_config.m:
--------------------------------------------------------------------------------
1 | function gp = gpdemo1_config(gp)
2 | %GPDEMO1_CONFIG Config file demonstrating simple (naive) symbolic regression.
3 | %
4 | % The simple quartic polynomial (y=x+x^2+x^3+x^4) from John Koza's 1992
5 | % Genetic Programming book is used. It is very easy to solve.
6 | %
7 | % GP = GPDEMO1_CONFIG(GP) returns the user specified parameter structure
8 | % GP for the quartic polynomial problem.
9 | %
10 | % Example:
11 | %
12 | % GP = GPTIPS(@GPDEMO1_CONFIG) performs a GPTIPS run using this
13 | % configuration file and returns the results in a structure called GP.
14 | %
15 | % Copyright (c) 2009-2015 Dominic Searson
16 | %
17 | % GPTIPS 2
18 | %
19 | % See also QUARTIC_FITFUN, GPDEMO1
20 |
21 | %run control
22 | gp.runcontrol.pop_size = 50;
23 | gp.runcontrol.num_gen = 100;
24 | gp.runcontrol.verbose = 25;
25 |
26 | %selection
27 | gp.selection.tournament.size = 2;
28 |
29 | %fitness function
30 | gp.fitness.fitfun = @quartic_fitfun;
31 |
32 | %quartic polynomial data
33 | x=linspace(-1,1,20)';
34 | gp.userdata.x = x;
35 | gp.userdata.y = x + x.^2 +x.^3 + x.^4;
36 | gp.userdata.name = 'Quartic Polynomial';
37 |
38 | %input configuration
39 | gp.nodes.inputs.num_inp = 1;
40 |
41 | %quartic example doesn't need constants
42 | gp.nodes.const.p_ERC = 0;
43 |
44 | %maximum depth of trees
45 | gp.treedef.max_depth = 12;
46 |
47 | %maximum depth of sub-trees created by mutation operator
48 | gp.treedef.max_mutate_depth = 7;
49 |
50 | %genes
51 | gp.genes.multigene = false;
52 |
53 | %define function nodes
54 | gp.nodes.functions.name = {'times','minus','plus','rdivide'};
--------------------------------------------------------------------------------
/examples/gpdemo2.m:
--------------------------------------------------------------------------------
1 | %GPDEMO2 GPTIPS 2 demo of multigene regression on a non-linear function.
2 | %
3 | % Demonstrates multigene symbolic regression and some post run analysis
4 | % functions such as SUMMARY, RUNTREE, POPBROWSER, DRAWTREES and the use
5 | % of the Symbolic Math Toolbox with GPPRETTY to simplify expressions.
6 | %
7 | % (c) Dominic Searson 2009-2015
8 | %
9 | % GPTIPS 2
10 | %
11 | % See also GPDEMO2_CONFIG, REGRESSMULTI_FITFUN, GPDEMO1, GPDEMO3,
12 | % GPDEMO4, DRAWTREES, SUMMARY, RUNTREE, GPPRETTY, POPBROWSER
13 |
14 | clc;
15 | disp('GPTIPS 2 Demo 2: multigene symbolic regression');
16 | disp('----------------------------------------------');
17 | disp('Multigene regression on 400 data points genenerated by the non-linear');
18 | disp('function with 4 inputs y = exp(2*x1*sin(pi*x4)) + sin(x2*x3).');
19 | disp(' ');
20 | disp('The configuration file is gpdemo2_config.m and the raw data is in demo2data.mat');
21 | disp('At the end of the run the predictions of the evolved model will be plotted');
22 | disp('for the training data as well as for an independent test data set generated');
23 | disp('from the same function.');
24 | disp(' ');
25 | disp('Here, 3 genes are used (plus a bias term) so the form of the model will be: ');
26 | disp('ypred = c0 + c1*tree1 + c2*tree2 + c3*tree3');
27 | disp('where c0 = bias and c1, c2 and c3 are the gene weights.');
28 | disp(' ');
29 | disp('The bias and weights (i.e. regression coefficients) are automatically');
30 | disp('determined by a least squares procedure for each multigene model.');
31 | disp(' ');
32 | disp('In this run the following function nodes are used: ');
33 | disp('TIMES MINUS PLUS SQRT SQUARE SIN COS EXP ADD3 MULT3');
34 | disp(' ');
35 | disp('The run is configured to proceed for 100 generations or to terminate');
36 | disp('when a fitness (RMSE) of 0.003 is achieved.');
37 | disp(' ');
38 | disp('First, run GPTIPS using the configuration in gpdemo2_config.m');
39 | disp('>>gp = rungp(@gpdemo2_config);');
40 | disp('Press a key to continue');
41 | disp(' ');pause;
42 | gp = rungp(@gpdemo2_config);
43 |
44 | disp('Plot summary information of run using');
45 | disp('>>summary(gp)');
46 |
47 | disp('Press a key to continue');
48 | disp(' ');pause;summary(gp);
49 |
50 | disp('Run the best model on the training data using:');
51 | disp('>>runtree(gp,''best'');');
52 |
53 | disp('Press a key to continue');
54 | disp(' ');pause;runtree(gp,'best');
55 |
56 | %If Symbolic Math toolbox is present
57 | if gp.info.toolbox.symbolic()
58 | disp('Using the symbolic math toolbox it is possible to combine the gene');
59 | disp('expressions with the gene weights (regression coefficients) to display');
60 | disp('a single overall model that predicts the output using the inputs x1,');
61 | disp('x2, x3 and x4.');
62 | disp('E.g. using the the GPPRETTY function on the best individual on ');
63 | disp('the training data.');
64 | disp('>>gppretty(gp,''best'')');
65 | disp('Press a key to continue');
66 | disp(' ');
67 | pause;
68 | gppretty(gp,'best');
69 | disp(' ');
70 | disp('Additionally, the DRAWTREES function can be used to draw the genes in any');
71 | disp('model to a browser window.');
72 | disp('E.g. to draw the genes in the ''best'' model on the training data use');
73 | disp('>>drawtrees(gp,''best'');');
74 | disp('Press a key to continue');disp(' ');pause;
75 | drawtrees(gp,'best');
76 | end
77 |
78 | disp(' ');
79 | disp('The POPBROWSER function may be used to display the population of evolved');
80 | disp('models in terms of their expressional complexity as well as their');
81 | disp('performance (1 - R^2).');
82 | disp('POPBROWSER can be used to identify models that perform relatively well');
83 | disp('and are less complex than the ''best'' model in the population (which is');
84 | disp('highlighted with a red circle). Clicking on a circle reveals the');
85 | disp('model ID as well as its simplified symbolic equation (if the Symbolic.');
86 | disp('Math toolbox is installed.');
87 | disp('The POPBROWSER is launched using:');
88 | disp('>>popbrowser(gp)');
89 | disp('Press a key to continue.');
90 | pause;
91 | popbrowser(gp);
--------------------------------------------------------------------------------
/examples/gpdemo2_config.m:
--------------------------------------------------------------------------------
1 | function gp=gpdemo2_config(gp)
2 | %GPDEMO2_CONFIG Config for multigene symbolic regression on data (y) generated from a non-linear function of 4 inputs (x1, x2, x3, x4).
3 | %
4 | % GP = GPDEMO2_CONFIG(GP) returns a parameter structure GP containing the
5 | % settings for GPDEMO2.
6 | %
7 | % Remarks:
8 | %
9 | % There is one output y which is a non-linear function of the four inputs
10 | % y = exp(2*x1*sin(pi*x4)) + sin(x2*x3). The objective of the GP run is
11 | % to evolve a multiple gene symbolic function of x1, x2, x3 and x4 that
12 | % closely approximates y.
13 | %
14 | % This function was described by:
15 | %
16 | % Cherkassky, V., Gehring, D., Mulier F, Comparison of adaptive methods
17 | % for function estimation from samples, IEEE Transactions on Neural
18 | % Networks, 7 (4), pp. 969- 984, 1996. (Function 10 in Appendix 1)
19 | %
20 | % Example:
21 | %
22 | % GP = rungp(@gpdemo2_config) uses this configuration file to perform
23 | % symbolic regression with multiple gene individuals on the data from the
24 | % above function.
25 | %
26 | % (C) Dominic Searson 2009-2015
27 | %
28 | % GPTIPS 2
29 | %
30 | % See also GPDEMO2, GPDEMO3_CONFIG, GPDEM03, GPDEMO1, REGRESSMULTI_FITFUN
31 |
32 | %run control parameters
33 | gp.runcontrol.pop_size = 100;
34 | gp.runcontrol.num_gen = 100;
35 |
36 | %selection
37 | gp.selection.tournament.size = 6;
38 |
39 | %termination
40 | gp.fitness.terminate = true;
41 | gp.fitness.terminate_value = 0.003;
42 |
43 | %load in the raw x and y data
44 | load demo2data
45 | gp.userdata.xtrain = x(101:500,:); %training set (inputs)
46 | gp.userdata.ytrain = y(101:500,1); %training set (output)
47 | gp.userdata.xtest = x(1:100,:); %testing set (inputs)
48 | gp.userdata.ytest = y(1:100,1); %testing set (output)
49 | gp.userdata.name = 'Cherkassky function';
50 |
51 | %genes
52 | gp.genes.max_genes = 3;
53 |
54 | %define function nodes
55 | gp.nodes.functions.name = {'times','minus','plus','sqrt','square','sin','cos','exp','add3','mult3'};
--------------------------------------------------------------------------------
/examples/gpdemo3.m:
--------------------------------------------------------------------------------
1 | % GPDEMO3 GPTIPS 2 demo of multigene symbolic regression on non-linear simulated pH data.
2 | %
3 | % Demonstrates multigene symbolic regression and some post run analysis
4 | % functions such as SUMMARY and RUNTREE and the use of the Symbolic Math
5 | % Toolbox to simplify expressions and create HTML reports using
6 | % PARETOREPORT, GPMODELREPORT and DRAWTREES to visualise the models.
7 | %
8 | % (c) Dominic Searson 2009-2015
9 | %
10 | % GPTIPS 2
11 | %
12 | % See also GPDEMO3_CONFIG, GPDEMO1, GPDEMO2, GPDEMO4, PARETOREPORT,
13 | % GPMODELREPORT, DRAWTREES, SUMMARY, RUNTREE, GPPRETTY, POPBROWSER
14 |
15 | clc;
16 | disp('GPTIPS 2 Demo 3: multigene pareto symbolic regression on pH data');
17 | disp('------------------------------------------------------------------');
18 | disp('In this example, the training data is 700 steady state data points');
19 | disp('from a simulation of a pH neutralisation process.');
20 | disp(' ' );
21 | disp('Here we use use pareto tournaments to bias the model discovery process');
22 | disp('towards low complexity models.');
23 | disp(' ');
24 | disp('GPTIPS is run 3 times for a maximum of 10 seconds per run or until a');
25 | disp('RMSE of 0.2 is reached. The runs are merged into a single population');
26 | disp('at the end.');
27 | disp(' ');
28 | disp('The output y has an unknown non-linear dependence on the 4 inputs x1,');
29 | disp('x2, x3 and x4.');
30 | disp(' ');
31 | disp('300 data points are available as a test set to validate the evolved model(s).');
32 | disp('');
33 | disp('The configuration file is gpdemo3_config.m and the raw data is in ph2data.mat');
34 | disp(' ');
35 | disp('Here, 6 genes are used (plus a bias term) so the form of the model will be');
36 | disp('ypred = c0 + c1*tree1 + ... + c6*tree6');
37 | disp('where ypred = predicted output, c0 = bias and c1,...,c6 are the gene weights.')
38 | disp(' ');
39 | disp('Genes are limited to a depth of 4.');
40 | disp(' ');
41 | disp('The function nodes used are: TIMES MINUS PLUS TANH MULT3 ADD3');
42 | disp(' ');
43 | disp('First, run GPTIPS using the configuration in gpdemo3_config.m :');
44 | disp('>>gp=rungp(@gpdemo3_config);');
45 | disp('Press a key to continue');
46 | disp(' ');
47 | pause;
48 |
49 | %run GPTIPS using the configuration in gpdemo3_config.m
50 | gp = rungp(@gpdemo3_config);
51 |
52 | %run the best individual of the run on the fitness function
53 | disp(' ');
54 | disp('Evaluate the ''best'' individual of the run using:');
55 | disp('>>runtree(gp,''best'');');
56 |
57 | disp('Press a key to continue');disp(' ');pause;
58 | runtree(gp,'best');
59 |
60 | %run the best individual of the run on the fitness function
61 | disp(' ');
62 | disp('Next, display the population in terms of performance and complexity.');
63 | disp('Because pareto tournaments have been enabled you should notice a');
64 | disp('''well-defined'' pareto front (green circles).');
65 | disp('This should indicate good models in terms of the performance/complexity');
66 | disp('tradeoff rather than simply the ''best'' model on the training data (this');
67 | disp('may be considerably more complex than very slightly ''lower'' performing');
68 | disp('models.');
69 | disp('>>popbrowser(gp);');
70 |
71 | disp('Press a key to continue');disp(' ');pause;
72 | popbrowser(gp);
73 |
74 | %If Symbolic Math toolbox is present
75 | if gp.info.toolbox.symbolic()
76 |
77 | %pareto report
78 | disp(' ');
79 | disp('The PARETOREPORT function generates a standalone interactive HTML report');
80 | disp('listing the multigene regression models on the Pareto front in terms');
81 | disp('of their simplified equation structure, expressional complexity and');
82 | disp('performance on the training data (R2).');
83 | disp('These models correspond to the green circles in the popbrowser');
84 | disp('visualisation and can be sorted by performance or complexity by');
85 | disp('clicking on the appropriate column header.');
86 | disp('>>paretoreport(gp);');
87 | disp('Press a key to continue');disp(' ');pause;
88 | paretoreport(gp);
89 |
90 | %gppretty
91 | disp(' ');
92 | disp('It is possible to display any multigene model at the command line.');
93 | disp('E.g. to use the GPPRETTY command on the ''best'' model on the training data: ');
94 | disp('>>gppretty(gp,''best'')');
95 | disp('Press a key to continue');
96 | disp(' ');
97 | pause;
98 | gppretty(gp,'best');
99 | end
100 | disp(' ');
101 | disp('Additionally, the DRAWTREES function can be used to draw the genes in any');
102 | disp('model to a browser window.');
103 | disp('E.g. to draw the genes in the ''best'' model on the training data use');
104 | disp('>>drawtrees(gp,''best'');');
105 | disp('Press a key to continue');disp(' ');pause;
106 | drawtrees(gp,'best');
107 |
108 | %reports
109 | if gp.info.toolbox.symbolic()
110 |
111 | disp(' ');
112 | disp('Finally, for multigene models the GPMODELREPORT function can be ');
113 | disp('used to generate a comprehensive model performance report for ');
114 | disp('reference purposes. This is created in a browser window.');
115 | disp('E.g. to generate a performance report for the ''best'' model');
116 | disp('on the training data use');
117 | disp('>>gpmodelreport(gp,''best'');');
118 | disp('Press a key to continue');disp(' ');pause;
119 | gpmodelreport(gp,'best');
120 | end
--------------------------------------------------------------------------------
/examples/gpdemo3_config.m:
--------------------------------------------------------------------------------
1 | function gp = gpdemo3_config(gp)
2 | %GPDEMO3_CONFIG Config file demonstrating multigene symbolic regression on data from a simulated pH neutralisation process.
3 | %
4 | % This is the configuration file that GPDEMO3 calls.
5 | %
6 | % GP = GPDEMO3_CONFIG(GP) generates a parameter structure GP that
7 | % specifies the GPTIPS run settings.
8 | %
9 | % In this example, a maximum run time of 10 seconds is allowed (3 runs).
10 | %
11 | % Remarks:
12 | % The data in this example is taken a simulation of a pH neutralisation
13 | % process with one output (pH), which is a non-linear function of the
14 | % four inputs.
15 | %
16 | % Example:
17 | % GP = RUNGP(@GPDEMO3_CONFIG) uses this configuration file to perform
18 | % symbolic regression with multiple gene individuals on the pH data. The
19 | % results and parameters used are stored in fields of the returned GP
20 | % structure.
21 | %
22 | % Copyright (c) 2009-2015 Dominic Searson
23 | %
24 | % GPTIPS 2
25 | %
26 | % See also REGRESSMULTI_FITFUN, GPDEMO3, GPDEMO2, GPDEMO1, RUNGP
27 |
28 | %run control parameters
29 | gp.runcontrol.pop_size = 250;
30 | gp.runcontrol.timeout = 10;
31 | gp.runcontrol.runs = 3;
32 |
33 | %selection
34 | gp.selection.tournament.size = 25;
35 | gp.selection.tournament.p_pareto = 0.7;
36 | gp.selection.elite_fraction = 0.7;
37 | gp.nodes.const.p_int= 0.5;
38 |
39 | %fitness
40 | gp.fitness.terminate = true;
41 | gp.fitness.terminate_value = 0.2;
42 |
43 | %set up user data
44 | load ph2data
45 |
46 | gp.userdata.xtest = nx; %testing set (inputs)
47 | gp.userdata.ytest = ny; %testing set (output)
48 | gp.userdata.xtrain = x; %training set (inputs)
49 | gp.userdata.ytrain = y; %training set (output)
50 | gp.userdata.name = 'pH';
51 |
52 | %genes
53 | gp.genes.max_genes = 6;
54 |
55 | %define building block function nodes
56 | gp.nodes.functions.name = {'times','minus','plus','tanh','mult3','add3'};
--------------------------------------------------------------------------------
/examples/gpdemo4.m:
--------------------------------------------------------------------------------
1 | % GPDEMO4 GPTIPS 2 demo of multigene symbolic regression on a concrete compressive strength data set.
2 | %
3 | % The output being modelled is concrete compressive strength (MPa) and
4 | % the input variables are:
5 | %
6 | % Cement (x1) - kg in a m3 mixture
7 | % Blast furnace slag (x2) - kg in a m3 mixture
8 | % Fly ash (x3) - kg in a m3 mixture
9 | % Water (x4) - kg in a m3 mixture
10 | % Superplasticiser (x5) - kg in a m3 mixture
11 | % Coarse aggregate (x6) - kg in a m3 mixture
12 | % Fine aggregate (x7) - kg in a m3 mixture
13 | % Age (x8) - range 1 - 365 days
14 | %
15 | % Demonstrates feature selection in multigene symbolic regression and
16 | % some post run analysis functions.
17 | %
18 | % (c) Dominic Searson 2009-2015
19 | %
20 | % GPTIPS 2
21 | %
22 | % See also GPDEMO4_CONFIG, GPDEMO1, GPDEMO2, GPDEMO4, PARETOREPORT,
23 | % GPMODELREPORT, DRAWTREES, SUMMARY, RUNTREE, GPPRETTY, POPBROWSER
24 |
25 | clc;
26 | disp('GPTIPS 2 Demo 4: feature selection with concrete compressive strength data set');
27 | disp('------------------------------------------------------------------------------');
28 | disp('The output being modelled is concrete compressive strength (MPa) and');
29 | disp('the input variables are:');
30 | disp(' ');
31 | disp(' Cement (x1) - kg in a m3 mixture');
32 | disp(' Blast furnace slag (x2) - kg in a m3 mixture');
33 | disp(' Fly ash (x3) - kg in a m3 mixture');
34 | disp(' Water (x4) - kg in a m3 mixture');
35 | disp(' Superplasticiser (x5) - kg in a m3 mixture');
36 | disp(' Coarse aggregate (x6) - kg in a m3 mixture');
37 | disp(' Fine aggregate (x7) - kg in a m3 mixture');
38 | disp(' Age (x8) - range 1 - 365 days');
39 | disp(' ');
40 | disp('To demonstrate feature selection in GPTIPS another 50 variables ');
41 | disp('consisting of normally distributed noise have been added to form the');
42 | disp('input variables x9 to x58.');
43 | disp(' ');
44 | disp('The configuration file is gpdemo4_config.m and the raw data is in');
45 | disp('concrete.mat');
46 | disp(' ');
47 | disp('The data has been divided into a training set, a holdout validation set');
48 | disp('and a testing set.');
49 | disp(' ');
50 | disp('GPTIPS is run twice for a maximum of 30 seconds per run or until a');
51 | disp('RMSE of 6.5 is reached. The runs are merged into a single population');
52 | disp('at the end.');
53 | disp(' ');
54 | disp('6 genes are used (plus a bias term) so the form of the model will be');
55 | disp('ypred = c0 + c1*tree1 + ... + c6*tree6');
56 | disp('where ypred = predicted output, c0 = bias and c1,...,c6 are the gene weights.')
57 | disp(' ');
58 | disp('Genes are limited to a depth of 4.');
59 | disp(' ');
60 | disp('The function nodes used are:');
61 | disp('TIMES MINUS PLUS RDIVIDE SQUARE TANH EXP LOG MULT3 ADD3 SQRT CUBE');
62 | disp('POWER NEGEXP NEG ABS');
63 | disp(' ');
64 | disp('The input variables that appear in the best model on the training');
65 | disp('and validation data sets can be displayed at run time by ');
66 | disp('including the following two settings in gpdemo4_config.m : ');
67 | disp(' ');
68 | disp('gp.runcontrol.showBestInputs = true;');
69 | disp('gp.runcontrol.showValBestInputs = true;');
70 | disp(' ');
71 | disp('GPTIPS is run with the configuration in gpdemo4_config.m using :');
72 | disp('>>gp=rungp(@gpdemo4_config);');
73 | disp('Press a key to continue');
74 | disp(' ');
75 | pause;
76 | gp = rungp(@gpdemo4_config);
77 |
78 | %Run the best val individual of the run on the fitness function
79 | disp(' ');
80 | disp('Evaluate the best validation individual of');
81 | disp('the runs on the fitness function using:');
82 | disp('>>runtree(gp,''valbest'');');
83 |
84 | disp('Press a key to continue');
85 | disp(' ');
86 | pause;
87 | runtree(gp,'valbest');
88 |
89 | %If Symbolic Math toolbox is present
90 | if gp.info.toolbox.symbolic()
91 |
92 | disp(' ');
93 | disp('Next, use the the GPPRETTY command on the best validation individual: ');
94 | disp('>>gppretty(gp,''valbest'')');
95 | disp('Press a key to continue');
96 | disp(' ');
97 | pause;
98 |
99 | gppretty(gp,'valbest');
100 | disp(' ');
101 | disp('If the runs have been successful, the only variables present in ');
102 | disp('the best validation model should be the following:');
103 | disp('Cement Slag Ash Water Plastic Course Fine Age');
104 | disp('(these are defined as variable name aliases in gpdemo4_config.m');
105 | disp('using the gp.nodes.inputs.names setting.)');
106 | disp(' ');
107 | disp('Less successful runs may contain the noise variables x9 - x58.');
108 | disp('If the results seem poor, try running the demo again.');
109 |
110 | end
111 |
112 | disp('Press a key to continue');
113 | disp(' ');
114 | pause;
115 | disp('To visualise at the frequency distribution of input variables in all');
116 | disp('models with an R^2 >= 0.75 the GPPOPVARS function can be used.');
117 | disp('This should show a high frequency of variables x1 - x8 and a low');
118 | disp('frequency of the irrelevant noise inputs.');
119 | disp('>>gppopvars(gp,0.75);');
120 | disp(' ');
121 | disp('Press a key to continue');
122 | disp(' ');
123 | pause
124 | gppopvars(gp,0.75);
125 | disp(' ');
126 |
127 | if gp.info.toolbox.symbolic()
128 | disp('Finally, an HTML report listing the models on the Pareto optimal front');
129 | disp('of model expressional complexity and performance can be generated using');
130 | disp('the PARETOREPORT function.');
131 | disp('>>paretoreport(gp)');
132 | disp(' ');
133 | disp('Press a key to continue');
134 | disp(' ');
135 | pause;
136 | paretoreport(gp);
137 | end
--------------------------------------------------------------------------------
/examples/gpdemo4_config.m:
--------------------------------------------------------------------------------
1 | function gp = gpdemo4_config(gp)
2 | %GPDEMO4_CONFIG Config file demonstrating feature selection with multigene symbolic regression.
3 | %
4 | % This is the configuration file that GPDEMO4 calls.
5 | %
6 | % GP = GPDEMO4_CONFIG(GP) generates a parameter structure GP that
7 | % specifies the GPTIPS run settings.
8 | %
9 | % Remarks:
10 | % The data used in this example was retrieved from the UCI Machine
11 | % Learning Repository:
12 | %
13 | % http://archive.ics.uci.edu/ml/datasets/Concrete+Compressive+Strength
14 | %
15 | % The output being modelled is concrete compressive strength (MPa) and the
16 | % independent variables are:
17 | %
18 | % Cement (x1) - kg in a m3 mixture
19 | % Blast furnace slag (x2) - kg in a m3 mixture
20 | % Fly ash (x3) - kg in a m3 mixture
21 | % Water (x4) - kg in a m3 mixture
22 | % Superplasticiser (x5) - kg in a m3 mixture
23 | % Coarse aggregate (x6) - kg in a m3 mixture
24 | % Fine aggregate (x7) - kg in a m3 mixture
25 | % Age (x8) - range 1 - 365 days
26 | %
27 | % Plus 50 irrelevant noise variables (x9 -> x58).
28 | %
29 | % Example:
30 | %
31 | % GP = RUNGP(@gpdemo4_config) uses this configuration file to perform
32 | % symbolic regression with multiple gene individuals on the concrete data.
33 | % The results and parameters used are stored in fields of the returned GP
34 | % structure.
35 | %
36 | % Further remarks:
37 | %
38 | % The demo configuration shows that variable selection is implicitly
39 | % performed by the GPTIPS algorithm.
40 | %
41 | % Copyright (c) 2009-2015 Dominic Searson
42 | %
43 | % GPTIPS 2
44 | %
45 | % See also REGRESSMULTI_FITFUN, GPDEMO4, GPDEMO3 GPDEMO2, GPDEMO1, RUNGP
46 |
47 | %run control
48 | gp.runcontrol.pop_size = 300;
49 | gp.runcontrol.num_gen = 500;
50 | gp.runcontrol.showBestInputs = true;
51 | gp.runcontrol.showValBestInputs = true;
52 | gp.runcontrol.timeout = 30;
53 | gp.runcontrol.runs = 2;
54 |
55 | %selection
56 | gp.selection.tournament.size = 15;
57 | gp.selection.elite_fraction = 0.3;
58 |
59 | %fitness
60 | gp.fitness.terminate = true;
61 | gp.fitness.terminate_value = 7;
62 |
63 | %multigene
64 | gp.genes.max_genes = 6;
65 |
66 | %constants
67 | gp.nodes.const.p_ERC = 0.05;
68 |
69 | %data
70 | load concrete;
71 |
72 | %allocate to train, validation and test groups
73 | gp.userdata.xtrain = Concrete_Data(tr_ind,1:8);
74 | gp.userdata.ytrain = Concrete_Data(tr_ind,9);
75 |
76 | gp.userdata.ytest = Concrete_Data(te_ind,9);
77 | gp.userdata.xtest = Concrete_Data(te_ind,1:8);
78 |
79 | gp.userdata.xval = gp.userdata.xtrain(val_ind,1:8);
80 | gp.userdata.yval = gp.userdata.ytrain(val_ind);
81 |
82 | gp.userdata.xtrain = gp.userdata.xtrain(tr_ind2,:);
83 | gp.userdata.ytrain = gp.userdata.ytrain(tr_ind2);
84 |
85 | %add 100 noise variables to make problem harder in terms of feature
86 | %selection
87 | gp.userdata.xtrain = [gp.userdata.xtrain 100*randn(size(gp.userdata.xtrain,1),50) ];
88 | gp.userdata.xtest = [gp.userdata.xtest 100*randn(size(gp.userdata.xtest,1),50) ];
89 | gp.userdata.xval = [gp.userdata.xval 100*randn(size(gp.userdata.xval,1),50) ];
90 |
91 | %enables hold out validation set
92 | gp.userdata.user_fcn = @regressmulti_fitfun_validate;
93 |
94 | %give known variables aliases (this can include basic HTML markup)
95 | gp.nodes.inputs.names = {'Cement','Slag','Ash','Water','Plastic','Course','Fine','Age'};
96 |
97 | %name for dataset
98 | gp.userdata.name = 'Concrete';
99 |
100 | %define building block function nodes
101 | gp.nodes.functions.name = {'times','minus','plus','rdivide','square','tanh',...
102 | 'exp','log','mult3','add3','sqrt','cube','negexp','neg','abs'};
--------------------------------------------------------------------------------
/examples/ph2data.mat:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/is-centre/gptips2f-matlab/4672d69cdfc457f1d45a33403569447f736fa654/examples/ph2data.mat
--------------------------------------------------------------------------------
/examples/quartic_fitfun.m:
--------------------------------------------------------------------------------
1 | function [fitness,gp] = quartic_fitfun(evalstr,gp)
2 | %QUARTIC_FITFUN Fitness function for simple ("naive") symbolic regression on the quartic polynomial y = x + x^2 + x^3 + x^4.
3 | %
4 | % FITNESS = QUARTIC_FITFUN(EVALSTR,GP) returns the FITNESS value of the
5 | % symbolic expression contained within the cell array EVALSTR using the
6 | % information in the GP struct.
7 | %
8 | % Copyright (c) 2009-2015 Dominic Searson
9 | %
10 | % GPTIPS 2
11 | %
12 | % See also GPDEMO1_CONFIG, GPDEMO1
13 |
14 | %extract x and y data from GP struct
15 | x1 = gp.userdata.x;
16 | y = gp.userdata.y;
17 |
18 | %evaluate the tree (assuming only 1 gene is suppled in this case - if the
19 | %user specified multigene config then only the first gene encountered will be used)
20 | eval(['out=' evalstr{1} ';']);
21 |
22 | %fitness is sum of absolute differences between actual and predicted y
23 | fitness = sum( abs(out-y) );
24 |
25 | %if this is a post run call to this function then plot graphs
26 | if gp.state.run_completed
27 | figure;
28 | plot(x1,y,'o-','LineWidth',2,'color',[0 0.45 0.74]);grid on;hold on;
29 | plot(x1,out,'x-','LineWidth',2,'color',[0.85 0.33 0.1]);
30 | xlabel('x');ylabel('y');
31 | legend('y','predicted y');legend boxoff; hold off;
32 | title('Prediction of quartic polynomial over range [-1 1]');
33 | end
34 |
35 |
--------------------------------------------------------------------------------
/examples/ripple_config.m:
--------------------------------------------------------------------------------
1 | function gp = ripple_config(gp)
2 | %RIPPLE_CONFIG Config file for multigene regression on the 2D Ripple function.
3 | %
4 | % GP = RIPPLE_CONFIG(GP) generates a parameter structure GP that
5 | % specifies the GPTIPS run settings for the 2D Ripple function. This is
6 | % a function f of two input variables as follows:
7 | %
8 | % f(x1,x2) = (x1 - 3)(x2 - 3) + 2 sin((x1 - 4) (x2 - 4))
9 | %
10 | % Note:
11 | %
12 | % The settings in this file are not intended to be 'optimal'. Feel free
13 | % to experiment with them.
14 | %
15 | % Example:
16 | %
17 | % GP = RUNGP(@RIPPLE_CONFIG) uses this configuration file to perform
18 | % symbolic regression with multigene individuals.
19 | %
20 | % Copyright (c) 2009-2015 Dominic Searson
21 | %
22 | % GPTIPS 2
23 | %
24 | % See also SALUSTOWICZ1D_CONFIG, UBALL_CONFIG, CUBIC_CONFIG,
25 | % REGRESSMULTI_FITFUN, RUNGP
26 |
27 | %run control
28 | gp.runcontrol.pop_size = 200;
29 | gp.runcontrol.num_gen = 500;
30 | gp.runcontrol.verbose = 5;
31 | gp.runcontrol.timeout = 30;
32 | gp.runcontrol.runs = 3;
33 | gp.runcontrol.parallel.auto = true;
34 |
35 | %selection
36 | gp.selection.tournament.size = 20;
37 | gp.selection.tournament.p_pareto = 0.2;
38 | gp.selection.elite_fraction = 0.3;
39 |
40 | %genes
41 | gp.genes.max_genes = 8; %increase this for more 'accurate' but more unwieldy models
42 |
43 | %data
44 | %generate 300 random training data points in the range 0.05 6.05
45 | x = 0.05 + rand(300,2) * 6;
46 | y = (x(:,1)-3).*(x(:,2)-3) + 2*sin( (x(:,1)-4).*(x(:,2)-4) );
47 |
48 | gp.userdata.ytrain = y;
49 | gp.userdata.xtrain = x;
50 |
51 | %generate 1000 random test data
52 | x = 0.05 + rand(1000,2) * 6;
53 | y = (x(:,1)-3).*(x(:,2)-3) + 2*sin( (x(:,1)-4).*(x(:,2)-4) );
54 |
55 | gp.userdata.ytest = y;
56 | gp.userdata.xtest = x;
57 |
58 | %generate holdout validation in same region as training space
59 | x=0.05 + rand(200,2) * 6;
60 | y=(x(:,1)-3).*(x(:,2)-3) + 2*sin( (x(:,1)-4).*(x(:,2)-4) );
61 |
62 | gp.userdata.yval = y;
63 | gp.userdata.xval = x;
64 | gp.userdata.name = 'Ripple 2D';
65 | gp.userdata.user_fcn = @regressmulti_fitfun_validate;
66 |
67 | %function nodes
68 | gp.nodes.functions.name = {'times','minus','plus','rdivide','square',...
69 | 'sin','cos','exp','mult3','add3','sqrt','cube','power','negexp','neg','abs','log'};
--------------------------------------------------------------------------------
/examples/salustowicz1d_config.m:
--------------------------------------------------------------------------------
1 | function gp=salustowicz1d_config(gp)
2 | %SALUSTOWICZ1D_CONFIG Multigene regression config for one dimensional Salustowicz function.
3 | %
4 | % GP = SALUSTOWICZ1D_CONFIG(GP) generates a parameter structure GP that
5 | % specifies the GPTIPS run settings for the one dimensional Salustowicz
6 | % function . This is a function f of a single input variable as follows:
7 | %
8 | % f(x) = exp(-x) x^3 cos(x) sin(x) (sin(x)^2 cos(x) - 1)
9 | %
10 | % Example:
11 | %
12 | % GP = RUNGP(@SALUSTOWICZ1D_CONFIG) uses this configuration file to
13 | % perform symbolic regression on the data with multigene individuals.
14 | %
15 | % Note:
16 | %
17 | % The settings in this file are not intended to be 'optimal'. Feel free
18 | % to experiment with them.
19 | %
20 | % Copyright (c) 2009-2015 Dominic Searson
21 | %
22 | % GPTIPS 2
23 | %
24 | % See also RIPPLE_CONFIG, UBALL_CONFIG, CUBIC_CONFIG,
25 | % REGRESSMULTI_FITFUN, RUNGP
26 |
27 | %run control
28 | gp.runcontrol.pop_size = 200;
29 | gp.runcontrol.timeout = 30;
30 | gp.runcontrol.num_gen = 500;
31 | gp.runcontrol.runs = 3;
32 | gp.runcontrol.parallel.auto = true;
33 |
34 | %selection
35 | gp.selection.tournament.size = 20;
36 | gp.selection.tournament.p_pareto = 0.3;
37 | gp.selection.elite_fraction = 0.3;
38 |
39 | %meta data
40 | gp.userdata.name = 'Salustowicz - 1D';
41 |
42 | %genes
43 | gp.genes.max_genes = 6;
44 |
45 | %generate training data
46 | x = [0.05:0.2:10]'; %generate training points in range [0.05 10]
47 | gp.userdata.xtrain = x;
48 | sx = sin(x);
49 | cx = cos(x);
50 | gp.userdata.ytrain = exp(-x) .* (x.^3) .*sx .* cx .* ((sx.^2) .*cx -1);
51 |
52 | %generate random test data in same range
53 | x = rand(500,1) * 9.95 + 0.05;
54 | sx = sin(x);
55 | cx = cos(x);
56 | gp.userdata.ytest = exp(-x) .* (x.^3) .*sx .* cx .* ((sx.^2) .*cx -1);
57 | gp.userdata.xtest = x;
58 |
59 | %function nodes
60 | gp.nodes.functions.name = {'times','minus','plus','rdivide','square','sin',...
61 | 'cos','exp','power','cube','sqrt','add3','mult3','negexp','neg','abs'};
62 |
--------------------------------------------------------------------------------
/examples/uball_config.m:
--------------------------------------------------------------------------------
1 | function gp = uball_config(gp)
2 | %UBALL_CONFIG Multigene regression config for the n dimensional Unwrapped Ball function.
3 | %
4 | % GP = UBALL_CONFIG(GP) generates a parameter structure GP that specifies
5 | % the GPTIPS run settings.
6 | %
7 | % Example:
8 | %
9 | % GP = RUNGP(@UBALL_CONFIG) uses this configuration file to perform
10 | % symbolic regression with multigene individuals.
11 | %
12 | % Remarks:
13 | %
14 | % The 5D version of this function was proposed as a 'standardised' GP
15 | % symbolic regression benchmark (Vladislavleva-4) in White et al.,
16 | % 'Better GP benchmarks: community survey results and proposals', Genet.
17 | % Program Evolvable Mach. (2013) 14:3:29 (Table 4).
18 | %
19 | % It is unusual in that the test data in the above reference requires
20 | % extrapolation beyond the bounds of the training data.
21 | %
22 | % (C) Dominic Searson 2009-2015
23 | %
24 | % GPTIPS 2
25 | %
26 | % See also SALUSTOWICZ1D_CONFIG, RIPPLE_CONFIG, CUBIC_CONFIG,
27 | % REGRESSMULTI_FITFUN, RUNGP
28 |
29 | %run control
30 | gp.runcontrol.pop_size = 200;
31 | gp.runcontrol.num_gen = 1000;
32 | gp.runcontrol.verbose = 5;
33 | gp.runcontrol.timeout = 45;
34 | gp.runcontrol.runs = 3 ;
35 | gp.runcontrol.parallel.auto = true;
36 | gp.selection.elite_fraction = 0.25;
37 |
38 | %selection
39 | gp.selection.tournament.size = 20;
40 | gp.selection.tournament.p_pareto = 0.2;
41 |
42 | %genes
43 | gp.genes.max_genes = 8;
44 | gp.treedef.max_depth = 5;
45 |
46 | %set dimension of problem
47 | n = 5;
48 |
49 | %training data
50 | %generate 512 n dimensional data points in the range [0.05 6.05]
51 | x = 0.05 + rand(512,n)*6;
52 |
53 | dsum = 0;
54 | for i=1:n
55 | dsum = dsum + (( x(:,i)- 3 ).^2);
56 | end
57 | y = 10./(5+dsum);
58 |
59 | gp.userdata.ytrain = y;
60 | gp.userdata.xtrain = x;
61 |
62 | %testing data
63 | %generate 512 test data inside the training range
64 | x = 0.05 + rand(512,n)*6;
65 |
66 | dsum = 0;
67 | for i=1:n
68 | dsum = dsum + (( x(:,i)- 3 ).^2);
69 | end
70 | y = 10./(5+dsum);
71 |
72 | gp.userdata.ytest = y;
73 | gp.userdata.xtest = x;
74 |
75 | %validation data
76 | %generate holdout validation in same region as training space
77 | x = 0.05 + rand(512,n)*6;
78 |
79 | dsum = 0;
80 | for i=1:n
81 | dsum = dsum + (( x(:,i)- 3 ).^2);
82 | end
83 | y = 10./(5+dsum);
84 |
85 | gp.userdata.yval = y;
86 | gp.userdata.xval = x;
87 |
88 | %enables hold out validation set
89 | gp.userdata.user_fcn = @regressmulti_fitfun_validate;
90 |
91 | %Add a name to data set
92 | gp.userdata.name = ['Uball (n = ' num2str(n) ')'];
93 |
94 | %function nodes
95 | gp.nodes.functions.name = {'times','minus','plus','rdivide','square','sin',...
96 | 'cos','cube','exp','power','sqrt','add3','mult3','log','negexp','abs','neg'};
--------------------------------------------------------------------------------
/fitness/regressmulti_fitfun_validate.m:
--------------------------------------------------------------------------------
1 | function [valfitness,gp,ypredval] = regressmulti_fitfun_validate(gp)
2 | %REGRESSMULTI_FITFUN_VALIDATE Evaluate current 'best' multigene regression model on validation data set.
3 | %
4 | % [FITNESS,GP,YPREDVAL] = REGRESSMULTI_FITFUN_VALIDATE(GP) returns the
5 | % FITNESS of the the current 'best' multigene model contained in the GP
6 | % data structure (as evaluated on the training data). FITNESS is the root
7 | % mean squared prediction (RMSE) error on the validation data set.
8 | %
9 | % Remarks on the use of validation data:
10 | %
11 | % This is used in conjunction with the multigene regression fitness
12 | % function REGRESSMULTI_FITFUN.
13 | %
14 | % This function is called at the end of each generation by adding the
15 | % following line to the user's run configuration file:
16 | %
17 | % GP.USERDATA.USER_FCN = @REGRESSMULTI_FITFUN_VALIDATE
18 | %
19 | % In addition, the user's GPTIPS configuration file should populate the
20 | % following required fields for the training data assuming Nval
21 | % observations on the input and output data: GP.USERDATA.XVAL should be a
22 | % (Nval x n) matrix where the ith column contains the Nval observations
23 | % of the ith input variable xi. GP.USERDATA.YVAL should be a (Nval x 1)
24 | % vector containing the corresponding observations of the response
25 | % variable y.
26 | %
27 | % Copyright (c) 2009-2015 Dominic Searson
28 | %
29 | % GPTIPS 2
30 | %
31 | % See also REGRESSMULTI_FITFUN, GP_USERFCN
32 |
33 | % on first call, check that validation data is present
34 | if (gp.state.count == 1) && (~isfield(gp.userdata,'xval')) || (~isfield(gp.userdata,'yval')) || ...
35 | isempty(gp.userdata.xval) || isempty(gp.userdata.yval)
36 | error('Cannot perform holdout validation because no validation data was found.');
37 | end
38 |
39 | % If ADFs are used, make them known to this function
40 | if gp.nodes.adf.use, assignadf(gp); end
41 |
42 | %get evalstr for current 'best' individual on training data
43 | evalstr = gp.results.best.eval_individual;
44 |
45 | %process evalstr with regex to allow direct access to data
46 | pat = 'x(\d+)';
47 | evalstr = regexprep(evalstr,pat,'gp.userdata.xval(:,$1)');
48 | y = gp.userdata.yval;
49 |
50 | numDataPoints = size(y,1);
51 | numGenes = length(evalstr);
52 |
53 | %set up a matrix to store the tree outputs plus a bias column of ones
54 | gene_outputs = ones(numDataPoints,numGenes+1);
55 |
56 | %eval each gene in the current individual
57 | for i=1:numGenes
58 | ind = i + 1;
59 | eval(['gene_outputs(:,ind)=' evalstr{i} ';']);
60 |
61 | %check for nonsensical answers and break out early with an 'inf' if so
62 | if any(~isfinite(gene_outputs(:,ind))) || any(~isreal(gene_outputs(:,ind)))
63 | valfitness = Inf;
64 | return
65 | end
66 | end
67 |
68 | %retrieve regression weights
69 | theta = gp.results.best.returnvalues;
70 |
71 | %calc. prediction of validation data set using the retreived weights
72 | ypredval = gene_outputs * theta;
73 |
74 | %calculate RMS prediction error on validation data
75 | valfitness = sqrt(mean((y-ypredval).^2));
76 |
77 | %on 1st gen, initialise validation set info in the GP structure
78 | if ~gp.state.init_val
79 | gp.results.history.valfitness(1:gp.runcontrol.num_gen,1) = 0;
80 | gp.results.history.valfitness(1) = valfitness;
81 | gp.results.valbest = gp.results.best;
82 | gp.results.valbest.valfitness = valfitness;
83 | gp.state.init_val = true;
84 | end
85 |
86 | gp.results.best.valfitness = valfitness;
87 | gp.results.history.valfitness(gp.state.count,1) = valfitness;
88 |
89 | %update 'best' validation if fitness is better or its the same with less
90 | %nodes/complexity
91 | if gp.fitness.complexityMeasure
92 | bcomp = gp.results.best.complexity;
93 | vcomp = gp.results.valbest.complexity;
94 | else
95 | bcomp = gp.results.best.nodecount;
96 | vcomp = gp.results.valbest.nodecount;
97 | end
98 |
99 | if valfitness < gp.results.valbest.valfitness || ...
100 | (valfitness == gp.results.valbest.valfitness && bcomp < vcomp)
101 | gp.results.valbest = gp.results.best;
102 | gp.results.valbest.valfitness = valfitness;
103 | gp.results.valbest.foundatgen = gp.state.count - 1;
104 | end
--------------------------------------------------------------------------------
/nodefcn/add3.m:
--------------------------------------------------------------------------------
1 | function y = add3(x1,x2,x3)
2 | %ADD3 Node. Ternary addition function.
3 | %
4 | % Y = ADD3(X1,X2,X3) returns as Y the result of X1 + X2 + X3
5 | %
6 | % Copyright (c) 2009-2015 Dominic Searson
7 | %
8 | % GPTIPS 2
9 | %
10 | % See also MULT3
11 |
12 | y=x1 + x2 + x3;
--------------------------------------------------------------------------------
/nodefcn/cube.m:
--------------------------------------------------------------------------------
1 | function y = cube(x)
2 | %CUBE Node. Calculates the element by element cube of a vector.
3 | %
4 | % Y = CUBE(X) performs the vector operation Y = X.^3
5 | %
6 | % Copyright (c) 2009-2015 Dominic Searson
7 | %
8 | % GPTIPS 2
9 | %
10 | % See also SQUARE
11 | y = x.^3;
12 |
--------------------------------------------------------------------------------
/nodefcn/gauss.m:
--------------------------------------------------------------------------------
1 | function y = gauss(x)
2 | %GAUSS Node. Gaussian function of input.
3 | %
4 | % Y = GAUSS(X) computes y = exp(x.^2)
5 | %
6 | % Copyright (c) 2009-2015 Dominic Searson
7 | %
8 | % GPTIPS 2
9 | %
10 | % See also NEGEXP
11 | y = exp(x.^2);
--------------------------------------------------------------------------------
/nodefcn/gpand.m:
--------------------------------------------------------------------------------
1 | function c = gpand(a,b)
2 | %GPAND Node. Wrapper for logical AND
3 | %
4 | % Copyright (c) 2009-2015 Dominic Searson
5 | %
6 | % GPTIPS 2
7 | %
8 | % See also GPNOT, GPOR, GTH, LTH, MAXX, MINX, STEP, THRESH, IFLTE
9 |
10 | if any(isnan(a)) || any(isnan(b))
11 | c = nan;
12 | return;
13 | end
14 | c = double(and(real(a),real(b)));
--------------------------------------------------------------------------------
/nodefcn/gpnot.m:
--------------------------------------------------------------------------------
1 | function b = gpnot(a)
2 | %GPNOT Node. Wrapper for logical NOT
3 | %
4 | % Copyright (c) 2009-2015 Dominic Searson
5 | %
6 | % GPTIPS 2
7 | %
8 | % See also GPOR, GPAND, LTH, GTH, MAXX, MINX, NEG, IFLTE, STEP, THRESH
9 |
10 | if any(isnan(a))
11 | b = nan;
12 | return;
13 | end
14 |
15 | b = double(not(real(a)));
--------------------------------------------------------------------------------
/nodefcn/gpor.m:
--------------------------------------------------------------------------------
1 | function c = gpor(a,b)
2 | %GPOR Node. Wrapper for logical or
3 | %
4 | % Copyright (c) 2009-2015 Dominic Searson
5 | %
6 | % GPTIPS 2
7 | %
8 | % See also GPNOT, GPAND, LTH, GTH, MAXX, MINX, NEG, IFLTE, STEP, THRESH
9 |
10 | if any(isnan(a)) || any(isnan(b))
11 | c = nan;
12 | return;
13 | end
14 | c = double(or(real(a),real(b)));
--------------------------------------------------------------------------------
/nodefcn/gth.m:
--------------------------------------------------------------------------------
1 | function y = gth(a,b)
2 | %GTH Node. Greater than operator
3 | %
4 | % Performs an element by element comparison of A and B and returns
5 | % 1 if A(i) > B(i) and 0 otherwise.
6 | %
7 | % (c) Dominic Searson 2009-2015
8 | %
9 | % GPTIPS 2
10 | %
11 | % See also LTH, STEP, THRESH, IFLTE, MINX, MAXX, NEG, GPAND, GPNOT, GPOR
12 |
13 | y = double(lt(a,b));
--------------------------------------------------------------------------------
/nodefcn/iflte.m:
--------------------------------------------------------------------------------
1 | function x = iflte(a,b,c,d)
2 | % IFLTE Node. Performs an element wise IF THEN ELSE operation on vectors and scalars.
3 | %
4 | % X = IFLTE(A,B,C,D) computes X as follows on an element by element
5 | % basis:
6 | %
7 | % If A <= B then X = C else X = D.
8 | %
9 | % Remarks:
10 | %
11 | % If all of A,B,C,D are scalars then X will be scalar. If some, but not
12 | % all, of A,B,C,D are scalars then X will be a vector.
13 | %
14 | % Copyright (c) 2009-2015 Dominic Searson
15 | %
16 | % GPTIPS 2
17 | %
18 | % See also THRESH, STEP, MINX, MAXX, NEG, GTH, LTH, GPNOT, GPAND, GPOR
19 |
20 | x = ((a<=b) .* c) + ((a>b) .* d);
21 |
22 |
--------------------------------------------------------------------------------
/nodefcn/lth.m:
--------------------------------------------------------------------------------
1 | function y = lth(a,b)
2 | %LTH Node. Less than operator
3 | %
4 | % Performs an element by element comparison of A and B and returns
5 | % 1 if A(i) < B(i) and 0 otherwise.
6 | %
7 | % (c) Dominic Searson 2009-2015
8 | %
9 | % GPTIPS 2
10 | %
11 | % See also GTH, STEP, THRESH, IFLTE, MINX, MAXX, NEG, GPAND, GPNOT, GPOR
12 |
13 | y = double(lt(a,b));
--------------------------------------------------------------------------------
/nodefcn/maxx.m:
--------------------------------------------------------------------------------
1 | function m = maxx(x1,x2)
2 | %MAXX Node. Returns the maximum of x1 and x2.
3 | %
4 | % Copyright (c) 2009-2015 Dominic Searson
5 | %
6 | % GPTIPS 2
7 | %
8 | % See also MINX, LTH, GTH, STEP, THRESH, IFLTE, GPAND, GPNOT, GPOR
9 | m = max(x1,x2);
--------------------------------------------------------------------------------
/nodefcn/minx.m:
--------------------------------------------------------------------------------
1 | function m = minx(x1,x2)
2 | %MINX Node. Returns the minimum of x1 and x2.
3 | %
4 | % Copyright (c) 2009-2015 Dominic Searson
5 | %
6 | % GPTIPS 2
7 | %
8 | % See also MAXX, GTH, LTH, STEP, THRESH, IFLTE, GPAND, GPNOT, GPOR
9 | m = min(x1,x2);
--------------------------------------------------------------------------------
/nodefcn/mult3.m:
--------------------------------------------------------------------------------
1 | function y = mult3(x1,x2,x3)
2 | %MULT3 Node. Ternary multiplication function.
3 | %
4 | % Y = MULT3(X1,X2,X3) returns as Y the result of X1.*X2.*X3
5 | %
6 | % (c) Dominic Searson 2009-2015
7 | %
8 | % GPTIPS 2
9 | %
10 | % See also ADD3
11 |
12 | y=x1.*x2.*x3;
--------------------------------------------------------------------------------
/nodefcn/neg.m:
--------------------------------------------------------------------------------
1 | function y = neg(x)
2 | %NEG Node. Returns -1 times the argument.
3 | %
4 | % Y = NEG(X)
5 | %
6 | % Remarks:
7 | %
8 | % Computed on an element by element basis, and if X is scalar then Y will
9 | % be scalar.
10 | %
11 | % Copyright (c) 2009-2015 Dominic Searson
12 | %
13 | % GPTIPS 2
14 | %
15 | % See also GPNOT, GPAND, GPOR, MAXX, MINX, STEP, THRESH, IFLTE
16 |
17 | y = x .* -1;
18 |
19 |
--------------------------------------------------------------------------------
/nodefcn/negexp.m:
--------------------------------------------------------------------------------
1 | function y = negexp(x)
2 | %NEGEXP Node. Calculate exp(-x) on an element by element basis.
3 | %
4 | % Y = NEGEXP(X)
5 | %
6 | % Copyright (c) 2009-2015 Dominic Searson
7 | %
8 | % GPTIPS 2
9 | %
10 | % See also NEG, GAUSS
11 |
12 | y = exp(-x);
--------------------------------------------------------------------------------
/nodefcn/pdiv.m:
--------------------------------------------------------------------------------
1 | function x = pdiv(arg1,arg2)
2 | %PDIV Node. Performs a protected element by element divide.
3 | %
4 | % X = PDIV(ARG1,ARG2)
5 | %
6 | % For each element in ARG1 and ARG2:
7 | %
8 | % X = ARG1/ARG2
9 | %
10 | % Unless ARG2 == 0 then X = 0.
11 | %
12 | % Remarks:
13 | % Renamed from PDIVIDE due to Symbolic Math Toolbox bug. Also -inf bug
14 | % fixed since v1.0
15 | %
16 | % (c) Dominic Searson 2009-2015
17 | %
18 | % GPTIPS 2
19 | %
20 | % See also PLOG, RDIVIDE
21 |
22 | x = arg1./arg2;
23 | i = isinf(x);
24 | x(i) = 0;
25 |
--------------------------------------------------------------------------------
/nodefcn/plog.m:
--------------------------------------------------------------------------------
1 | function out = plog(x)
2 | %PLOG Node. Calculate the element by element protected natural log of a vector.
3 | %
4 | % OUT = PLOG(X) performs LOG(ABS(X))
5 | %
6 | % Copyright (c) 2009-2015 Dominic Searson
7 | %
8 | % GPTIPS 2
9 | %
10 | % See also PDIV
11 |
12 | out = log(abs(x));
13 | out(isinf(out)) = 0;
14 |
--------------------------------------------------------------------------------
/nodefcn/psqroot.m:
--------------------------------------------------------------------------------
1 | function y = psqroot(x)
2 | %PSQROOT Node. Computes element by element protected square root of a vector
3 | %
4 | % Y = PSQROOT(X) performs the vector operation Y = SQRT(ABS(X))
5 | %
6 | % Copyright (c) 2009-2015 Dominic Searson
7 | %
8 | % GPTIPS 2
9 | %
10 | % See also SQUARE, CUBE
11 | y = sqrt(abs(x));
12 |
--------------------------------------------------------------------------------
/nodefcn/square.m:
--------------------------------------------------------------------------------
1 | function y = square(x)
2 | %SQUARE Node. Calculates the element by element square of a vector or matrix
3 | %
4 | % Y = SQUARE(X) performs the operation Y = X.^2
5 | %
6 | % Copyright (c) 2009-2015 Dominic Searson
7 | %
8 | % GPTIPS 2
9 | %
10 | % See also CUBE
11 | y = x.^2;
12 |
--------------------------------------------------------------------------------
/nodefcn/step.m:
--------------------------------------------------------------------------------
1 | function y = step(x)
2 | %STEP Node. Threshold function that returns 1 if the argument is >= 0 and 0 otherwise.
3 | %
4 | % Y = STEP(X)
5 | %
6 | % Remarks:
7 | %
8 | % Computed on an element by element basis, and if X is scalar then Y will
9 | % be scalar.
10 | %
11 | % Copyright (c) 2009-2015 Dominic Searson
12 | %
13 | % GPTIPS 2
14 | %
15 | % See also THRESH, IFLTE, MAXX, MINX, NEG, LTH, GTH, GPAND, GPNOT, GPOR
16 |
17 | y = ( (x>=0) .* 1) + ( (x<0) .* 0);
18 |
19 |
--------------------------------------------------------------------------------
/nodefcn/thresh.m:
--------------------------------------------------------------------------------
1 | function x = thresh(a,b)
2 | %THRESH Node. Threshold function that returns 1 if the first argument is >= to the second argument and returns 0 otherwise.
3 | %
4 | % X = THRESH(A,B)
5 | %
6 | % This performs:
7 | %
8 | % If A >= B then X = 1 else X = 0 on an element by element basis.
9 | %
10 | % Remarks:
11 | %
12 | % If both of A and B are scalars then X will be scalar. If one or more of
13 | % A and B are vectors then X will be a vector.
14 | %
15 | % Copyright (c) 2009-2015 Dominic Searson
16 | %
17 | % GPTIPS 2
18 | %
19 | % See also IFLTE, STEP, MAXX, MINX, GTH, LTH, GPAND, GPNOT, GPOR
20 |
21 | x=((a>=b) .* 1) + ((b>a) .* 0);
22 |
23 |
--------------------------------------------------------------------------------
/rules/gprule_no_nested_adfs.m:
--------------------------------------------------------------------------------
1 | function [gp_out, fit] = gprule_no_nested_adfs(gp, expr, params)
2 | %GPRULE_NO_NESTED_ADFS Individuals with nested ADFs are discarded
3 | % Individuals having the trait of being expressed through nested ADFs
4 | % should not have any chances of survival.
5 |
6 | gp_out = gp;
7 |
8 | % Params are not used at the moment
9 | params; %#ok
10 |
11 | % In expressions, ADFs are always encoded as certain known symbols
12 | adfs = gp.nodes.adf.seed_str;
13 |
14 | adf_found = false; % Found ADF
15 | open_count = 0; % Open parentheses counter
16 | k = 1; % Character pointer
17 | fit = 1.0; % By default, the gene is fit for survival
18 |
19 | % Parser body
20 | % Here we assume that some rules for expression strings hold,
21 | % e.g., that they are always complete [TODO: potential bug]
22 | while k <= length(expr)
23 |
24 | if ~isempty(strfind(adfs, expr(k))) %#ok: backwards compatibility
25 | if adf_found % Nesting detected
26 | fit = 0.0;
27 | break;
28 | else
29 | adf_found = true;
30 | k = k + 1;
31 | end
32 | end
33 |
34 | if strcmpi(expr(k), '(') && adf_found
35 | open_count = open_count + 1;
36 | end
37 |
38 | if strcmpi(expr(k), ')') && adf_found
39 | open_count = open_count - 1;
40 | if open_count == 0
41 | adf_found = false;
42 | end
43 | end
44 |
45 | k = k + 1; % Increment character counter
46 |
47 | end
48 |
49 | end
50 |
51 |
--------------------------------------------------------------------------------
/rules/gprule_template.m:
--------------------------------------------------------------------------------
1 | function [gp_out, fit] = gprule_template(gp, expr, params)
2 | %GPRULE_TEMPLATE Use this template to define a custom rule for fitness
3 | % These rules are applied to genes, no complete expressions
4 | %
5 | % Input arguments: GP is the main structure, EXPR is the encoded gene
6 | % expression, PARAMS are additional parameters needed for running the rule
7 | %
8 | % Output arguments: GP_OUT is the GP structure (can be left unassigned if
9 | % unmodified) and FIT is the fitness of this particular gene as determined
10 | % by the rule logic.
11 | %
12 | % NB! FIT has a completely different meaning here compared to the overall
13 | % fitness function. Here, FIT = 0.0 means the individual is unfit for
14 | % survival, while a FIT = 1.0 means the opposite.
15 |
16 | % Params are left unused in this case (no params)
17 | params; %#ok
18 |
19 | % The structure is here if it is needed;
20 | % NB! Do not forget to pass it as output!
21 | gp_out = gp;
22 |
23 | % This is custom decision logic, but the idea is that
24 | % the output of decision must be constrained to [0,1].
25 |
26 | % As implemented in this particular example, FIT is always "true" or "pass"
27 | % depending on the context, unless the input expression is empty.
28 | %
29 | % However, the logic of the rule may be quite complicated.
30 |
31 | fit = 1.0;
32 | if isempty(expr)
33 | fit = 0.0;
34 | end
35 |
36 | end
37 |
38 |
--------------------------------------------------------------------------------
/user/gp_userfcn.m:
--------------------------------------------------------------------------------
1 | function gp = gp_userfcn(gp)
2 | %GP_USERFCN Calls a user defined function once per generation if one has been specified in the field GP.USERDATA.USER_FCN.
3 | %
4 | % Remarks:
5 | %
6 | % The user defined function must accept (as 1st) and return (as 2nd) the
7 | % GP structure as arguments.
8 | %
9 | % Example:
10 | %
11 | % In multigene symbolic regression the function
12 | % regressmulti_fitfun_validate can be called once per generation to
13 | % evaluate the best individual so far on a validation ('holdout') data
14 | % set.
15 | %
16 | % Copyright (c) 2009-2015 Dominic Searson
17 | %
18 | % GPTIPS 2
19 |
20 | if ~isempty(gp.userdata.user_fcn)
21 | [~,gp] = feval(gp.userdata.user_fcn,gp);
22 | end
23 |
--------------------------------------------------------------------------------