├── MATLAB_RUNTIME.z01 ├── MATLAB_RUNTIME.zip ├── MATLAB_version.zip ├── Readme.md ├── WIN_10_64_bit.z01 ├── WIN_10_64_bit.zip ├── datasets.zip ├── java_package.z01 ├── java_package.zip ├── pythonpackage.z01 └── pythonpackage.zip /MATLAB_RUNTIME.z01: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LBBSoft/FeatureSelect/0fc614407f7d35380a4d896b5ab4d3f4011ee868/MATLAB_RUNTIME.z01 -------------------------------------------------------------------------------- /MATLAB_RUNTIME.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LBBSoft/FeatureSelect/0fc614407f7d35380a4d896b5ab4d3f4011ee868/MATLAB_RUNTIME.zip -------------------------------------------------------------------------------- /MATLAB_version.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LBBSoft/FeatureSelect/0fc614407f7d35380a4d896b5ab4d3f4011ee868/MATLAB_version.zip -------------------------------------------------------------------------------- /Readme.md: -------------------------------------------------------------------------------- 1 | **Tutorial for FEATURESELECT** 2 | 3 | **Yosef Masoudi-Sobhanzadeh, Habib Motieghader, Ali Masoudi-Nejad,\*** 4 | 5 | Laboratory of system Biology and Bioinformatics, Institute of Biochemistry and 6 | Biophysics, university of Tehran, Tehran, iran. 7 | 8 | *FEATURESELECT*, an application for feature selection based on machine learning 9 | methods, has been developed in laboratory of system biology and bioinformatics 10 | (LBB). FEATURESELECT can be applied on problems needing to select subset of 11 | features from given feature set. In continue, we describe some aspects of 12 | *FEATURESELECT*. 13 | 14 | **Versions** 15 | 16 | Four versions of FeatureSelect, which are based on MATLAB run time, are available: 17 | 18 | 1- MATLAB version which can be opend by MATLAB 19 | 20 | 2- Java package 21 | 22 | In order to install and run the javapackage follow the bellow steps: 23 | 24 | 2-1 copy FeatureSelect.jar to mcroot/toolbox/javabuilder/jar where mcroot is root of MATLAB run time 25 | 26 | 27 | 2-2 Write a java program and save it FeatureS.java like the below: 28 | 29 | import FeatureSelect.*; 30 | 31 | import com.mathworks.toolbox.javabuilder.*; 32 | 33 | public class FeatureS 34 | 35 | { 36 | 37 | public static void main(String[] args) 38 | 39 | { 40 | 41 | Class1 FS=null; 42 | 43 | try 44 | 45 | { 46 | 47 | FS= new Class1(); 48 | 49 | FS.Call_FS(); 50 | 51 | } 52 | 53 | catch (Exception e) 54 | 55 | { 56 | 57 | System.out.println("Exception: " + e.toString()); 58 | 59 | } 60 | 61 | } 62 | 63 | } 64 | 65 | 66 | 2-3 Run cmd as adiministrator 67 | 68 | 2-4 Execute the following command 69 | 70 | javac -classpath "javabuilder.jar";.\FeatureSelect.jar FeatureS.java 71 | 72 | 2-5 Execute the following command 73 | 74 | java -classpath .;"javabuilder.jar";.\FeatureSelect.jar FeatureS 75 | 76 | 3- Python package 77 | 78 | In order to install and run the python package follow the below steps: 79 | 80 | a) Execute install command: 81 | 82 | python setup.py install 83 | 84 | After installation, run FeatureSelect as the below: 85 | 86 | 87 | b) Import the package: 88 | 89 | import FeatureSelect 90 | 91 | c) Prepare the FeatureSelect: 92 | 93 | FS = FeatureSelect.initialize() 94 | 95 | d) Run the program: 96 | 97 | FS.Selection() 98 | 99 | 100 | 4- Stand-alone version which is an EXE file (64 bits) 101 | 102 | 103 | **Implemented language** 104 | 105 | MATLAB programing language is used for implementing of *FEATURESELECT*. There 106 | are some reasons for using it: 107 | 108 | 1. Because MATLAB is common programing language in different sciences, 109 | *FEATURESLECT* has been implemented in it. *FEATURESELECT* can be applied on 110 | various areas such as biological data, image processing, handwrite 111 | detection, computer science and many other fields. 112 | 113 | 2. MATLAB is supported by various operating systems such as win, linux, mac. 114 | 115 | 3. MATLAB is an open source programing language, so everyone can add some new 116 | capabilities on *FEATURESELECT*. After investigating new capabilities, we 117 | will publish new version of *FEATURESELECT* at 118 | https://github.com/LBBSoft/FeatureSelect. 119 | 120 | **To use in MATLAB** 121 | 122 | In order to install *FEATURESELECT*, you must provide some requirements: 123 | 124 | 1. Install WINSDK.1 in windws or MinGW in linux 125 | hat include C++ compiler 126 | 127 | 2. Install MATLAB 128 | 129 | After installing the requirements, follow these stages: 130 | 131 | 1. Copy all files placed in *FEATURESELECT* folder on the one of the available 132 | directories. 133 | 134 | 2. Go to \\FEATURESELECT\\matlab\\ in the intended directory. 135 | 136 | 3. Click on the one of the matlab files which is available in the entered 137 | directory. Notice that the matlab path and the current directory path must 138 | be the same. 139 | 140 | 4. If your application is not working for SVM, write "make" in the matlab's 141 | command window and press enter. Be sure that the command successfully 142 | completed. In order to get more information about LIBSM, look at 143 | https://www.csie.ntu.edu.tw/\~cjlin/libsvm/. 144 | 145 | **Using from FEATURESELECT** 146 | 147 | After installing the software, you can write "LBBFS" in the matlab command 148 | window and use from *FEATURESELECT*. Consider fig.1 and fig.2. 149 | 150 | ![1](https://user-images.githubusercontent.com/42937478/51424353-aa48cc80-1be1-11e9-85d2-6149c81d8b4f.jpg) 151 | 152 | 153 | Fig.1: run *FEATURESELECT* 154 | 155 | 156 | ![2](https://user-images.githubusercontent.com/42937478/51424379-0e6b9080-1be2-11e9-9373-1a1eee1eb072.jpg) 157 | 158 | Fig.2: *FEATURESELECT* 159 | 160 | 161 | Fig.2 shows the *FEATURESELECT* software. This application has several sections: 162 | 163 | 1. **LBB**: LBB is the ram of our laboratory. This laboratory has been founded 164 | by prof ali masoudi-nejad in 2006 at university of Tehran, iran. 165 | 166 | 2. **Input**: Text, xls and matlab files are acceptable formats of input. You 167 | must convert xls file to txt file or m file if it has *struct* structure. 168 | TCN is abbreviation for training column number. Your data file and label 169 | file must be in the same file. Supposed file name is *input.tx* and the 170 | train column number (label of samples) is 222 in it. You can type your input 171 | file location in specified box or select it using *select input file 172 | button*. For some applications, we need to normalize or fuzzifize input 173 | file. *Data normalization* and *data fuzzification* are designed for this 174 | purpose. After clicking on the each of the mentioned buttons, a new file 175 | will be created and its name will be added to the specified box (data.txt). 176 | When you select an input file, rows of input file arrangement are changed 177 | randomly. If first row or first column is not part of input file, click *on 178 | first row is not data* or *first column is not data* respectively. 179 | *FEATURESELECT* has three main goals: 1- easy using from LIBSVM, ANN and DT, 180 | 2- feature selection for regression problems and 3- feature selection for 181 | classification problems. The default option is regression. Disable “*this is 182 | regression problem”* option if your problem is classification. 183 | 184 | 3. **Selecting learner type:** Three types of learners are available in 185 | *FEATURESELECT*. The first one is SVM. As mentioned before, intended SVM is 186 | based on parameters of LIBSVM. The second one is ANN which only includes one 187 | parameter (training iteration). We examined some types of artificial neural 188 | networks. Finally, the results showed that optimization algorithms can lead 189 | to better results in training phase of ANN. Meanwhile, the elapsed time of 190 | training phase is enhancing, so it is advised that this type of learner is 191 | applied on small datasets. Also, you can select your features by the SVM or 192 | DT, and then use ANN in order to obtain an efficient model. The third 193 | learner is decision tree (DT). This learner does not need parameter setting. 194 | 195 | 4. **Selecting parameters of LIBSVM:** If your learner type is SVM, you can set 196 | parameters in this section. Learning parameter which can be selected by the 197 | doted button (fig.3) includes LIBSVM's parameters. Training data percentage 198 | and maximum number of features which is desirable for your application can 199 | be written in the related boxes. Also, if you want to apply LIBSVM on the 200 | all of the features (in other words, if you don’t want feature selection), 201 | click on the *only apply SVM* button**.** 202 | 203 | 204 | ![3](https://user-images.githubusercontent.com/42937478/51424395-3b1fa800-1be2-11e9-847b-5d21fda363df.jpg) 205 | 206 | Fig.3: learning parameters of LIBSVM 207 | 208 | 209 | 5. **Feature selection method:** Three types of feature selection methods are 210 | available in *FEATURESELECT*: 1- Wrapper method (optimization algorithm). 2- 211 | Filter method: this type of feature selection consists of five popular 212 | methods. The experimental results show that every learner and every method 213 | have their special view relative to dataset, but wrapper methods can lead to 214 | better results than filter methods in overall state. 3- Hybrid method: A 215 | user can exploit two-step feature selection using combination of the filter 216 | and wrapper methods. 217 | 218 | 6. **Algorithms**: Eleven algorithms have been developed for selecting features 219 | from feature set in wrapper method section. It is advised that the 220 | optimization algorithms iterated more than 30 times because of stochastic 221 | nature of them. You can set number of iterations in the relative box. The 222 | new from such as fig.4 which is result of clicking on WCC algorithm will 223 | appear. Then you can set the algorithm parameters. 224 | 225 | ![4](https://user-images.githubusercontent.com/42937478/51424409-5ab6d080-1be2-11e9-8aca-672c7f253be0.jpg) 226 | 227 | Fig.4: WCC's parameters 228 | 229 | The developed algorithms and their reference are: 230 | 231 | 7. WCC (world competitive contest): this algorithm is inspired by human sport 232 | rules. The default values are determined fairly and based on number of 233 | LIBSVM calling for all of the algorithms. You can get more information about 234 | WCC in . 235 | 236 | 8. LCA (league championship algorithm): LCA is an algorithm inspired by sport 237 | championships. Here is a link for download LCA original paper: 238 | . 239 | 240 | 9. GA (genetic algorithm): GA is the first optimization that mimics natural 241 | evolutionary processes. *Crate* and *mrate* are abbreviation for crossover 242 | rate and mutation rate in FEATURESELECT. More information about genetic 243 | algorithms is available in 244 | . 245 | 246 | 10. PSO (particle swarm optimization): PSO inspired by social behavior of birds. 247 | Unlike GA, PSO has not evolutionary operations such as crossover and 248 | mutation. Groups of birds fly toward destination. Useful information about 249 | PSO is available in . 250 | 251 | 11. ACO (ant colony optimization): this algorithm proposed by marco dorigo in 252 | 1992 inspired by ants social behavior. Some aspects of ACO can be found in 253 | . 254 | 255 | 12. ICA (imperialist competitive algorithm): atashpaz gargari proposed ICA which 256 | is an algorithm inspired by imperialistic competition. You can download 257 | relative paper from . 258 | 259 | 13. LA (learning automata): an automata is an abstract concept. Every cellular 260 | automata selects an action from action set and applies it on environment. 261 | The selected action will be awarded or penalized. Meybodi published 262 | application of LA in 263 | . 264 | 265 | 14. HTS (heat transfer optimization algorithm): it is a meta-heuristic algorithm 266 | which is recently introduced and is based on thermodynamics law. HTS is 267 | available in 268 | . We 269 | showed conduction factor as CDF, convection factor as COF and radiation 270 | factor as ROF in FEATURESELECT. 271 | 272 | 15. FOA (forest optimization algorithm): FOA has been proposed by manizheh 273 | ghaemi and has interesting results. This algorithms begins with some 274 | randomly created trees as potential solutions. Original research article can 275 | be accessed in 276 | . 277 | 278 | 16. DSOS (discrete symbiotic organisms search): DSOS has been published in 2017. 279 | It has been showed that DSOS is comparable with the other optimization 280 | algorithms, so we implemented it in FEATURESELECT. Original paper of DSOS 281 | can be found in 282 | . 283 | 284 | 17. CUK (cuckoo optimization): CUK is proper for continuous nonlinear 285 | optimization problem. CUK is inspired by the life of bird family. 286 | . 287 | 288 | 18. **Notifications** 289 | 290 | > After running the selected algorithms, the status of program is showed in 291 | > the notification section. 292 | 293 | > **Outputs** 294 | 295 | > The *results* folder is placed in the directory which contains 296 | > *FEATURESELECT's* files. For the regression problem, 2 files named 297 | > *description* and *tbls* are created. For the classification problems, 3 298 | > files named *description*, *evaluation* and *tbls* are created. Date and 299 | > time are added to the end of created file name. The created files contents 300 | > also are presented in the matlab command window. *Description* file (for 301 | > both regression and classification problems) includes some information such 302 | > as number of features and their indices, etc. *Evaluation* file that is 303 | > specific for classification problems includes statistical measures which are 304 | > essential for classification problems. For both classification and 305 | > regression problems, *Tbls* file includes some other statistical information 306 | > such as p-value, confidence interval, standard deviation, etc. Fig.5 through 307 | > fig.7 are output instances which have been acquired by batch running of the 308 | > all algorithms on supposed input file located in *FEATURESELECT* directory. 309 | 310 | ![5](https://user-images.githubusercontent.com/42937478/51424416-7326eb00-1be2-11e9-90a7-0330ba7b3e2c.jpg) 311 | 312 | > Fig.5: part of *description* file 313 | 314 | 315 | ![6](https://user-images.githubusercontent.com/42937478/51424428-8c2f9c00-1be2-11e9-969b-3a869218eb55.jpg) 316 | 317 | > Fig.6: part of *tbls* file 318 | 319 | 320 | ![7](https://user-images.githubusercontent.com/42937478/51424439-b41eff80-1be2-11e9-8154-43d2db0ec063.jpg) 321 | 322 | > Fig.7: part of *evaluation* file (only for classification) 323 | 324 | 325 | > Table.1 shows abbreviation used in *FEATURESELECT* and their complete 326 | > states. 327 | 328 | > Table.1: abbreviations 329 | 330 | | abbreviation | Complete state | 331 | |--------------|---------------------------| 332 | | ACC | Accuracy | 333 | | SEN | Sensitivity | 334 | | SPC | Specificity | 335 | | FPR | False positive rate | 336 | | AL_NAME | Algorithm name | 337 | | PRE | Precision | 338 | | NOF | A number of features | 339 | | ET | Elapsed time | 340 | | ER | Error | 341 | | CR | Correlation | 342 | | STD | Standard deviation | 343 | | CI | Confidence interval | 344 | | P | p-value | 345 | | DF | Degree of freedom | 346 | | ANN | Artificial neural network | 347 | | DT | Decision tree | 348 | 349 | > Accuracy convergence, accuracy average convergence (accuracy for all of the 350 | > population in specific generation), accuracy stability, error convergence, 351 | > error average convergence (for all potential solutions in specific 352 | > generation) and error stability are plotted for classification problems 353 | > (fig.8). Error convergence, error average convergence, error stability, 354 | > correlation convergence, correlation average convergence and correlation 355 | > stability are plotted for regression problem (dig.9). ROC plot, a 356 | > statistical measurement that investigates diagnostic ability of classifier, 357 | > and ROC space are showed in fig.10. You can modify these plots using 358 | > *view/property editor* menu. 359 | 360 | 361 | ![8](https://user-images.githubusercontent.com/42937478/51424446-d57feb80-1be2-11e9-93e8-de34adc13481.jpg) 362 | > Fig.8: algorithms output for classification problem 363 | 364 | 365 | ![9](https://user-images.githubusercontent.com/42937478/51424450-f2b4ba00-1be2-11e9-9d71-b606865b7146.jpg) 366 | > Fig.9: algorithms output for regression problem 367 | 368 | ![10](https://user-images.githubusercontent.com/42937478/51424452-07914d80-1be3-11e9-9b14-ec08994037c7.jpg) 369 | > Fig.10: ROC plot and ROC space 370 | 371 | > In order to exploit hybrid method, a user can follow the bellow steps which 372 | > are depicted in Figures 11 and 12: 373 | 374 | 1. Selecting ensemble method: 375 | 376 | ![11](https://user-images.githubusercontent.com/42937478/51424457-360f2880-1be3-11e9-9e4b-d68c2902227c.jpg) 377 | Fig.11: Selecting feature selection method 378 | 379 | 2. Setting the parameters 380 | 381 | ![12](https://user-images.githubusercontent.com/42937478/51424460-52ab6080-1be3-11e9-9a3d-e008663e694a.jpg) 382 | Fig.12: Setting the parameters of the hybrid method 383 | -------------------------------------------------------------------------------- /WIN_10_64_bit.z01: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LBBSoft/FeatureSelect/0fc614407f7d35380a4d896b5ab4d3f4011ee868/WIN_10_64_bit.z01 -------------------------------------------------------------------------------- /WIN_10_64_bit.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LBBSoft/FeatureSelect/0fc614407f7d35380a4d896b5ab4d3f4011ee868/WIN_10_64_bit.zip -------------------------------------------------------------------------------- /datasets.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LBBSoft/FeatureSelect/0fc614407f7d35380a4d896b5ab4d3f4011ee868/datasets.zip -------------------------------------------------------------------------------- /java_package.z01: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LBBSoft/FeatureSelect/0fc614407f7d35380a4d896b5ab4d3f4011ee868/java_package.z01 -------------------------------------------------------------------------------- /java_package.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LBBSoft/FeatureSelect/0fc614407f7d35380a4d896b5ab4d3f4011ee868/java_package.zip -------------------------------------------------------------------------------- /pythonpackage.z01: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LBBSoft/FeatureSelect/0fc614407f7d35380a4d896b5ab4d3f4011ee868/pythonpackage.z01 -------------------------------------------------------------------------------- /pythonpackage.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LBBSoft/FeatureSelect/0fc614407f7d35380a4d896b5ab4d3f4011ee868/pythonpackage.zip --------------------------------------------------------------------------------