├── LICENSE ├── README.md ├── dataset └── BIOLOGICAL │ ├── ALLAML │ └── ALLAML.mat │ ├── COLON │ └── COLON.mat │ ├── LUNG_DISCRETE │ └── LUNG_DISCRETE.mat │ ├── LYMPHOMA │ └── LYMPHOMA.mat │ └── WISCONSIN │ └── WISCONSIN.mat ├── img ├── GS-CSFS_vs_OM-CSFS.png └── TFS_vs_OM.png └── src ├── CSFS_SMBA.py ├── Classifier.py ├── Dataset.py ├── FeatureSelector.py ├── Loader.py ├── SMBA.py ├── __init__.py ├── grid_search.py └── skfeature ├── __init__.py ├── function ├── __init__.py ├── information_theoretical_based │ ├── CIFE.py │ ├── CMIM.py │ ├── DISR.py │ ├── FCBF.py │ ├── ICAP.py │ ├── JMI.py │ ├── LCSI.py │ ├── MIFS.py │ ├── MIM.py │ ├── MRMR.py │ └── __init__.py ├── similarity_based │ ├── SPEC.py │ ├── __init__.py │ ├── fisher_score.py │ ├── lap_score.py │ ├── reliefF.py │ └── trace_ratio.py ├── sparse_learning_based │ ├── MCFS.py │ ├── NDFS.py │ ├── RFS.py │ ├── UDFS.py │ ├── __init__.py │ ├── ll_l21.py │ └── ls_l21.py ├── statistical_based │ ├── CFS.py │ ├── __init__.py │ ├── chi_square.py │ ├── f_score.py │ ├── gini_index.py │ ├── low_variance.py │ └── t_score.py ├── streaming │ ├── __init__.py │ └── alpha_investing.py ├── structure │ ├── __init__.py │ ├── graph_fs.py │ ├── group_fs.py │ └── tree_fs.py └── wrapper │ ├── __init__.py │ ├── decision_tree_backward.py │ ├── decision_tree_forward.py │ ├── svm_backward.py │ └── svm_forward.py └── utility ├── __init__.py ├── construct_W.py ├── data_discretization.py ├── entropy_estimators.py ├── mutual_information.py ├── sparse_learning.py └── unsupervised_evaluation.py /README.md: -------------------------------------------------------------------------------- 1 | # Background 2 | 3 | Feature selection (FS) plays a key role in several scientific fields and in particular computational biology, making it possible to treat models with fewer variables, which in turn are easier to explain and might speed the experimental validation up, by providing valuable insight into the importance and their role. We propose a novel procedure for FS conceiving a two-steps approach. Firstly, a sparse coding based learning technique is used to find the best subset of features for each class of the train set. In doing so, it is assumed that a class is represented by using a subset of features, called **_representatives_** such that each sample, in a specific class, can be described as a linear combination of them. Secondly, the discovered feature subsets are fed to a class-specific feature selection scheme, to assess the effectiveness of the selected features in classification task. To this end, an ensemble of classifiers is built by training a classifier, one for each class on its own feature subset, i.e., the one discovered in the previous step and, a proper decision rule is adopted to compute the ensemble responses. 4 | 5 | # A-Sparse-Coding-Based-Approach-for-Class-Specific-Feature-Selection (SMBA-CSFS) 6 | 7 | Feature selection has been widely used for eliminating redundant or irrelevant features and it can be done in two ways: Traditional Feature Selection (TFS) for all classes and Class-Specific Feature Selection (CSFS). CSFS is the process of finding different set of features for each class. In this kind of approach, different methods have been proposed. Conversely from a TFS algorithm, where a single feature subset is selected for discriminating among all the classes in a supervised classification problem, a CSFS algorithm selects a subset of feature for each class. A general framework for CSFS can use any traditional **_feature selector_**, for choosing a possible different subset for each class of a supervised classification problem. Depending on the type of the feature selector, the overall process may slightly change. 8 | 9 | Sparse-Coding Based Approach for Class-Specific Feature Selection (SMBA-CSFS) is based on the concept of the **_Compressed Sensing_**. Basically, this approach is a joint sparse multiple optimization problem which tries to find a subset of features called **_representative_**, that best reconstruct/represent the entire dataset by linearly combining each retrieved feature component. We try to best represent each class-sample set of the train set by only using few representatives features. 10 | 11 | # Prerequisites and requirements 12 | 13 | ## Pre-requisites 14 | 1. Python 2.7 or greater
15 | 2. A CUDA version 5.0 or greater (facoltative). For installing, please refer to the [official CUDA documention](http://docs.nvidia.com/cuda/#axzz4al7PKeAs). 16 | 17 | # Requirements 18 | The software is written in Python. In order to correctly work, the software requires the following packages: 19 | 20 | - numPy 21 | - sciPy 22 | - sklearn 23 | - hdf5storage 24 | - optional: pycuda 25 | - optional: skcuda 26 | 27 | **NB**: The SMBA class can eventually run faster by exploiting the CUDA environment. In case you cannot install (for some reason) the latter dependencies, you must manually remove the code which depends on these packages. 28 | 29 | # Usage 30 | 31 | In order to use this algorithm, go into the project folder: `/src` and run the file `CSFS_SMBA.py`. 32 | 33 | # Results 34 | 35 | ![alt text](img/TFS_vs_OM.png "") 36 | `Comparison of several TFS accuracies against SMBA and SMBA-CSFS on nine data sets: 37 | (a) ALLAML(2), (b) LEUKEMIA(2), (c) CLL_SUB_111(3), (d) GLIOMA(4), (e) LUNG_C(5), (f) LUNG_D(7), (g) DLBCL(9), (h) CARCINOM(11), (i) GCM(14), when a varying number of features is selected. SVM classifier with 5-fold CV was used.` 38 | 39 |
40 | 41 | ![alt text](img/GS-CSFS_vs_OM-CSFS.png "") 42 | `Comparison of several CSFS accuracies against SMBA-CSFS on nine data sets: 43 | (a) ALLAML(2), (b) LEUKEMIA(2), (c) CLL_SUB_111(3), (d) GLIOMA(4), (e) LUNG_C(5), (f) LUNG_D(7), (g) DLBCL(9), (h) CARCINOM(11), (i) GCM(14), when a varying number of features is selected. SVM classifier with 5-fold CV was used.` 44 | 45 | # Authors 46 | 47 | Davide Nardone, University of Naples Parthenope, Science and Techonlogies Departement, Msc Applied Computer Science 48 | https://www.linkedin.com/in/davide-nardone-127428102/ 49 | 50 | # Contacts 51 | 52 | For any kind of problem, questions, ideas or suggestions, please don't esitate to contact me at: 53 | - **davide.nardone@live.it** 54 | 55 | # Papers that cite CSFS-SMBA 56 | 57 | If you use the software in a scientific publication, please consider citing the following scientific manuscript: 58 | 59 | ``` 60 | @article{nardone2019, 61 | author = {Nardone, Davide and Ciaramella, Angelo and Staiano, Antonino}, 62 | year = {2019}, 63 | month = {11}, 64 | pages = {25}, 65 | title = {A Sparse-Modeling Based Approach for Class Specific Feature Selection}, 66 | volume = {5}, 67 | journal = {PeerJ Computer Science}, 68 | doi = {10.7717/peerj-cs.237} 69 | } 70 | ``` 71 | -------------------------------------------------------------------------------- /dataset/BIOLOGICAL/ALLAML/ALLAML.mat: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DavideNardone/A-Sparse-Coding-Based-Approach-for-Class-Specific-Feature-Selection/32a4eba5ca437c8fe853c001b7a769ed716801d7/dataset/BIOLOGICAL/ALLAML/ALLAML.mat -------------------------------------------------------------------------------- /dataset/BIOLOGICAL/COLON/COLON.mat: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DavideNardone/A-Sparse-Coding-Based-Approach-for-Class-Specific-Feature-Selection/32a4eba5ca437c8fe853c001b7a769ed716801d7/dataset/BIOLOGICAL/COLON/COLON.mat -------------------------------------------------------------------------------- /dataset/BIOLOGICAL/LUNG_DISCRETE/LUNG_DISCRETE.mat: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DavideNardone/A-Sparse-Coding-Based-Approach-for-Class-Specific-Feature-Selection/32a4eba5ca437c8fe853c001b7a769ed716801d7/dataset/BIOLOGICAL/LUNG_DISCRETE/LUNG_DISCRETE.mat -------------------------------------------------------------------------------- /dataset/BIOLOGICAL/LYMPHOMA/LYMPHOMA.mat: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DavideNardone/A-Sparse-Coding-Based-Approach-for-Class-Specific-Feature-Selection/32a4eba5ca437c8fe853c001b7a769ed716801d7/dataset/BIOLOGICAL/LYMPHOMA/LYMPHOMA.mat -------------------------------------------------------------------------------- /dataset/BIOLOGICAL/WISCONSIN/WISCONSIN.mat: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DavideNardone/A-Sparse-Coding-Based-Approach-for-Class-Specific-Feature-Selection/32a4eba5ca437c8fe853c001b7a769ed716801d7/dataset/BIOLOGICAL/WISCONSIN/WISCONSIN.mat -------------------------------------------------------------------------------- /img/GS-CSFS_vs_OM-CSFS.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DavideNardone/A-Sparse-Coding-Based-Approach-for-Class-Specific-Feature-Selection/32a4eba5ca437c8fe853c001b7a769ed716801d7/img/GS-CSFS_vs_OM-CSFS.png -------------------------------------------------------------------------------- /img/TFS_vs_OM.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DavideNardone/A-Sparse-Coding-Based-Approach-for-Class-Specific-Feature-Selection/32a4eba5ca437c8fe853c001b7a769ed716801d7/img/TFS_vs_OM.png -------------------------------------------------------------------------------- /src/CSFS_SMBA.py: -------------------------------------------------------------------------------- 1 | from __future__ import division 2 | 3 | from sklearn.svm import SVC 4 | from sklearn.metrics import accuracy_score 5 | from sklearn.model_selection import KFold 6 | from imblearn.over_sampling import SMOTE #dependency 7 | 8 | import numpy as np 9 | np.set_printoptions(threshold=np.inf) 10 | import random as rnd 11 | import time 12 | import os 13 | import errno 14 | import pickle 15 | import sys 16 | 17 | # sys.path.insert(0, './src') 18 | import Loader as lr 19 | import Dataset as ds 20 | import Classifier as i_clf 21 | import FeatureSelector as fs 22 | 23 | 24 | def checkFolder(root, path_output): 25 | 26 | #folders to generate recursively 27 | path = root+'/'+path_output 28 | 29 | try: 30 | os.makedirs(path) 31 | except OSError as exc: # Python >2.5 32 | if exc.errno == errno.EEXIST and os.path.isdir(path): 33 | pass 34 | else: 35 | raise 36 | 37 | 38 | 39 | def classificationDecisionRule(clf_score, cls, clf_name, target): 40 | 41 | n_classes = len(cls) 42 | DTS = {} 43 | 44 | for ccn in clf_name: 45 | hits = [] 46 | res = [] 47 | preds = [] 48 | 49 | for i in xrange(0,n_classes): 50 | 51 | #ensemble scores on class 'C' for the testing set 52 | e_th = clf_score['C'+str(cls[i])]['accuracy'][ccn] 53 | res.append(e_th) 54 | 55 | hits.append((e_th == cls[i]).astype('int').flatten()) 56 | 57 | # ensemble scores and hits for the testing set 58 | ensemble_res = np.vstack(res) 59 | ensemble_hits = np.vstack(hits) 60 | 61 | # Applying decision rules 62 | for i in xrange(0, ensemble_hits.shape[1]): # number of sample 63 | hits = ensemble_hits[:,i] #it has a 1 in a position whether the classifier e_i has predicted the class w_i for the i-th pattern 64 | ens_preds = ensemble_res[:,i] #it's simply the predictions of all the trained classifier for the i-th pattern 65 | cond = np.sum(hits) #count the number of true positive for the i-th pattern 66 | 67 | if cond == 1: #rule 1 68 | pred = cls[np.where(hits==1)[0].squeeze()] #retrieve the cls for the 'only' true positive 69 | preds.append(pred) 70 | 71 | elif cond == 0 or cond > 1: # rule 1-2 (tie) 72 | 73 | # we find the majority votes (frequency) among all classifier (e.g., ) [[4 2][5 1][6 2][7 2]] 74 | unique, counts = np.unique(ens_preds, return_counts=True) 75 | maj_rule = np.asarray((unique, counts)).T 76 | 77 | # we find the 'majority' index, then its class 78 | ind_max = np.argmax(maj_rule[:, 1]) 79 | pred = maj_rule[ind_max, 0] 80 | max = maj_rule[ind_max, 1] 81 | 82 | # we look for a 'tie of the tie', then we look for the majority class among all the tied classes 83 | tied_cls = np.where(maj_rule[:, 1] == max)[0] 84 | if ( len(np.where(maj_rule[:, 1] == max)[0]) ) > 1: #tie of the tie 85 | pred = maj_rule[tied_cls,0] 86 | 87 | # pick one tied cls randomly 88 | pred = pred[rnd.randint(0,len(pred)-1)] 89 | preds.append(pred) 90 | 91 | else: 92 | preds.append(pred) 93 | 94 | #compute accuracy 95 | test_score = accuracy_score(target, preds) 96 | 97 | dic_test_score = { 98 | ccn: test_score 99 | } 100 | 101 | DTS.update(dic_test_score) 102 | 103 | return DTS 104 | 105 | def main(): 106 | ''' LOADING ANY DATASET ''' 107 | dataset_dir = '/dataset' 108 | dataset_type = '/BIOLOGICAL' 109 | dataset_name = '/WISCONSIN' 110 | 111 | #this variable decide whether to balance or not the dataset 112 | resample = True 113 | p_step = 1 114 | 115 | # defining directory paths for saving partial and complete result 116 | path_data_folder = dataset_dir + dataset_type + dataset_name 117 | path_data_file = path_data_folder + dataset_name 118 | variables = ['X', 'Y'] 119 | 120 | print ('%d.Loading and pre-processing the data...\n' % p_step) 121 | p_step += 1 122 | # NB: If you get an error such as: 'Please use HDF reader for matlab v7.3 files',please change the 'format variable' to 'matlab_v73' 123 | D = lr.Loader(file_path=path_data_file, 124 | format='matlab', 125 | variables=variables, 126 | name=dataset_name[1:] 127 | ).getVariables(variables=variables) 128 | 129 | dataset = ds.Dataset(D['X'], D['Y']) 130 | 131 | n_classes = dataset.classes.shape[0] 132 | cls = np.unique(dataset.classes) 133 | 134 | # check if the data are already standardized, if not standardize it 135 | dataset.standardizeDataset() 136 | 137 | # re-sampling dataset 138 | num_min_cls = 9999999 139 | print ('%d.Class-sample separation...\n' % p_step) 140 | p_step += 1 141 | if resample == True: 142 | 143 | print ('\tDataset %s before resampling w/ size: %s and number of classes: %s---> %s' % ( 144 | dataset_name[1:], dataset.data.shape, n_classes, cls)) 145 | 146 | # discriminating classes of the whole dataset 147 | dataset_train = ds.Dataset(dataset.data, dataset.target) 148 | dataset_train.separateSampleClass() 149 | data, target = dataset_train.getSampleClass() 150 | 151 | for i in xrange(0, n_classes): 152 | print ('\t\t#sample for class C%s: %s' % (i + 1, data[i].shape)) 153 | if data[i].shape[0] < num_min_cls: 154 | num_min_cls = data[i].shape[0] 155 | 156 | resample = '/BALANCED' 157 | print ('%d.Class balancing...' % p_step) 158 | dataset.data, dataset.target = SMOTE(kind='regular', k_neighbors=num_min_cls - 1).fit_sample(dataset.data, 159 | dataset.target) 160 | p_step += 1 161 | else: 162 | resample = '/UNBALANCED' 163 | 164 | # shuffling data 165 | print ('\tShuffling data...') 166 | dataset.shufflingDataset() 167 | 168 | print ('\tDataset %s w/ size: %s and number of classes: %s---> %s' % ( 169 | dataset_name[1:], dataset.data.shape, n_classes, cls)) 170 | 171 | # discriminating classes the whole dataset 172 | dataset_train = ds.Dataset(dataset.data, dataset.target) 173 | dataset_train.separateSampleClass() 174 | data, target = dataset_train.getSampleClass() 175 | 176 | for i in xrange(0, n_classes): 177 | print ('\t\t#sample for class C%s: %s' % (i + 1, data[i].shape)) 178 | 179 | # Max number of features to use 180 | max_num_feat = 300 181 | step = 1 182 | # max_num_feat = dataset.data.shape[1] 183 | 184 | if max_num_feat > dataset.data.shape[1]: 185 | max_num_feat = dataset.data.shape[1] 186 | 187 | alpha = 10 #regularizatio parameter (typically alpha in [2,50]) 188 | 189 | params = { 190 | 191 | 'SMBA': 192 | # the smaller is alpha the sparser is the C matrix (fewer representatives) 193 | { 194 | 'alpha': alpha, 195 | 'norm_type': 1, 196 | 'max_iter': 3000, 197 | 'thr': [10 ** -8], 198 | 'type_indices': 'nrmInd', 199 | 'normalize': False, 200 | 'GPU': False, 201 | 'device': 0, 202 | 'PCA': False, 203 | 'verbose': False, 204 | 'step': 1, 205 | 'affine': False, 206 | } 207 | # it's possible to add other FS methods by modifying the correct file 208 | } 209 | 210 | fs_model = fs.FeatureSelector(name='SMBA', tp='SLB', params=params['SMBA']) 211 | fs_name = 'SMBA' 212 | 213 | # CLASSIFIERS (it's possible to add other classifier methods by adding entries into this list) 214 | clf_name = [ 215 | "SVM" 216 | # "Decision Tree", 217 | # "KNN" 218 | ] 219 | model = [ 220 | SVC(kernel="linear") 221 | # DecisionTreeClassifier(max_depth=5), 222 | # KNeighborsClassifier(n_neighbors=1) 223 | ] 224 | 225 | '''Perform K-fold Cross Validation...''' 226 | k_fold = 10 227 | 228 | #defining result folders 229 | fs_path_output = '/CSFS/FS/K_FOLD' 230 | checkFolder(path_data_folder, fs_path_output) 231 | 232 | res_path_output = '/CSFS/RESULTS/K_FOLD' 233 | checkFolder(path_data_folder, fs_path_output) 234 | 235 | all_scores = {} 236 | all_scores.update({fs_name: []}) 237 | 238 | cc_fold = 0 239 | conf_dataset = {} 240 | 241 | X = dataset.data 242 | y = dataset.target 243 | kf = KFold(n_splits=k_fold) 244 | 245 | print ('%d.Running the Intra-Class-Specific Feature Selection and building the ensemble classifier...\n' % p_step) 246 | p_step += 1 247 | for train_index, test_index in kf.split(X): 248 | 249 | X_train_kth, X_test_kth = X[train_index], X[test_index] 250 | y_train, y_test = y[train_index], y[test_index] 251 | 252 | print ('\tDOING %s-CROSS VALIDATION W/ TRAINING SET SIZE: %s' % (cc_fold + 1, X_train_kth.shape)) 253 | 254 | ''' For the training data in each class we find the representative features and use them as a best subset feature 255 | (in representing each class sample) to perform classification 256 | ''' 257 | 258 | csfs_res = {} 259 | 260 | for i in xrange(0, n_classes): 261 | cls_res = { 262 | 'C' + str(cls[i]): {} 263 | } 264 | csfs_res.update(cls_res) 265 | 266 | kth_scores = {} 267 | for i in xrange(0, len(clf_name)): 268 | kth_scores.update({clf_name[i]: []}) 269 | 270 | # check whether the 'curr_res_fs_fold' directory exists, otherwise create it 271 | curr_res_fs_fold = path_data_folder + '/' + fs_path_output + '/' + fs_name + resample 272 | checkFolder(path_data_folder, fs_path_output + '/' + fs_name + resample) 273 | 274 | # discriminating classes for the k-th fold of the training set 275 | data_train = ds.Dataset(X_train_kth, y_train) 276 | data_train.separateSampleClass() 277 | ktrain_data, ktrain_target = data_train.getSampleClass() 278 | K_cls_ind_train = data_train.ind_class 279 | 280 | for i in xrange(0, n_classes): 281 | # print ('Train set size C' + str(i + 1) + ':', ktrain_data[i].shape) 282 | 283 | print ('\tPerforming feature selection on class %d with shape %s' % (cls[i] + 1, ktrain_data[i].shape)) 284 | 285 | start_time = time.time() 286 | idx = fs_model.fit(ktrain_data[i], ktrain_target[i]) 287 | 288 | # print idx 289 | 290 | print('\tTotal Time = %s seconds\n' % (time.time() - start_time)) 291 | 292 | csfs_res['C' + str(cls[i])]['idx'] = idx 293 | csfs_res['C' + str(cls[i])]['params'] = params[fs_name] 294 | 295 | # with open(curr_res_fs_fold + '/' + str(cc_fold + 1) + '-fold' + '.pickle', 'wb') as handle: 296 | # pickle.dump(csfs_res, handle, protocol=pickle.HIGHEST_PROTOCOL) 297 | 298 | ens_class = {} 299 | # learning a classifier (ccn) for each subset of 'n_rep' feature 300 | for j in xrange(0, max_num_feat): 301 | n_rep = j + 1 # first n_rep indices 302 | 303 | for i in xrange(0, n_classes): 304 | # get subset of feature from the i-th class 305 | idx = csfs_res['C' + str(cls[i])]['idx'] 306 | 307 | # print idx[0:n_rep] 308 | 309 | X_train_fs = X_train_kth[:, idx[0:n_rep]] 310 | 311 | _clf = i_clf.Classifier(names=clf_name, classifiers=model) 312 | _clf.train(X_train_fs, y_train) 313 | 314 | csfs_res['C' + str(cls[i])]['accuracy'] = _clf.classify(X_test_kth[:, idx[0:n_rep]], y_test) 315 | 316 | DTS = classificationDecisionRule(csfs_res, cls, clf_name, y_test) 317 | 318 | for i in xrange(0, len(clf_name)): 319 | _score = DTS[clf_name[i]] 320 | # print ('Accuracy w/ %d feature: %f' % (n_rep, _score)) 321 | kth_scores[clf_name[i]].append(_score) 322 | 323 | x = np.arange(1, max_num_feat + 1) 324 | 325 | kth_results = { 326 | 'clf_name': clf_name, 327 | 'x': x, 328 | 'scores': kth_scores, 329 | } 330 | 331 | all_scores[fs_name].append(kth_results) 332 | 333 | # saving k-th dataset configuration 334 | # with open(path_data_folder + fs_path_output + '/' + str(cc_fold + 1) + '-fold_conf_dataset.pickle', 335 | # 'wb') as handle: # TODO: customize output name for recognizing FS parameters' method 336 | # pickle.dump(conf_dataset, handle, protocol=pickle.HIGHEST_PROTOCOL) 337 | 338 | cc_fold += 1 339 | 340 | # print all_scores 341 | 342 | print('%s.Averaging results...\n' % p_step) 343 | p_step += 1 344 | # Averaging results on k-fold 345 | 346 | # check whether the 'curr_res_fs_fold' directory exists, otherwise create it 347 | curr_res_output_fold = path_data_folder + '/' + res_path_output + '/' + fs_name + resample 348 | checkFolder(path_data_folder, res_path_output + '/' + fs_name + resample) 349 | 350 | M = {} 351 | for i in xrange(0, len(clf_name)): 352 | M.update({clf_name[i]: np.ones([k_fold, max_num_feat]) * 0}) 353 | 354 | avg_scores = {} 355 | std_scores = {} 356 | for i in xrange(0, len(clf_name)): 357 | avg_scores.update({clf_name[i]: []}) 358 | std_scores.update({clf_name[i]: []}) 359 | 360 | # k-fold results for each classifier 361 | for k in xrange(0, k_fold): 362 | for clf in clf_name: 363 | M[clf][k, :] = all_scores[fs_name][k]['scores'][clf][:max_num_feat] 364 | 365 | for clf in clf_name: 366 | avg_scores[clf] = np.mean(M[clf], axis=0) 367 | std_scores[clf] = np.std(M[clf], axis=0) 368 | 369 | x = np.arange(1, max_num_feat + 1) 370 | results = { 371 | 'clf_name': clf_name, 372 | 'x': x, 373 | 'M': M, 374 | 'scores': avg_scores, 375 | 'std': std_scores 376 | } 377 | 378 | # print avg_scores 379 | 380 | with open(curr_res_output_fold + '/clf_results.pickle', 'wb') as handle: 381 | pickle.dump(results, handle, protocol=pickle.HIGHEST_PROTOCOL) 382 | print ('Done with %s, [%d-cross validation] ' % (dataset_name[1:], k_fold)) 383 | 384 | 385 | if __name__ == '__main__': 386 | main() 387 | -------------------------------------------------------------------------------- /src/Classifier.py: -------------------------------------------------------------------------------- 1 | from sklearn.neighbors import KNeighborsClassifier 2 | from sklearn.svm import SVC 3 | from sklearn.tree import DecisionTreeClassifier 4 | from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier 5 | from sklearn.linear_model import LogisticRegression 6 | #add the need classifiers when using this class 7 | 8 | 9 | 10 | class Classifier: 11 | 12 | def __init__(self, names=None, classifiers=None): 13 | 14 | self.cv_scores = {} 15 | 16 | #Default classifiers and parameters 17 | if names == None: 18 | 19 | self.names = [ 20 | "KNN", "Logistic Regression", "SVM", 21 | "Decision Tree", "Random Forest", "AdaBoost" 22 | ] 23 | 24 | self.classifiers = [ 25 | 26 | KNeighborsClassifier(n_neighbors=1), 27 | LogisticRegression(C=1e5), 28 | SVC(kernel="linear"), 29 | DecisionTreeClassifier(max_depth=5), 30 | RandomForestClassifier(max_depth=5, n_estimators=10), 31 | AdaBoostClassifier() 32 | ] 33 | 34 | else: 35 | self.names = names 36 | self.classifiers = classifiers 37 | 38 | for name in self.names: 39 | self.cv_scores[name] = [] 40 | 41 | 42 | 43 | def train(self, X_train, y_train): 44 | 45 | for name, clf in zip(self.names, self.classifiers): 46 | 47 | # Training the algorithm using the selected predictors and target. 48 | clf.fit(X_train, y_train) 49 | 50 | def classify(self, X_test, y_test): 51 | 52 | # Record error for training and testing 53 | DTS = {} 54 | 55 | for name, clf in zip(self.names, self.classifiers): 56 | 57 | preds = clf.predict(X_test) 58 | 59 | dic_label = { 60 | name: preds 61 | } 62 | 63 | DTS.update(dic_label) 64 | 65 | return DTS -------------------------------------------------------------------------------- /src/Dataset.py: -------------------------------------------------------------------------------- 1 | from sklearn import preprocessing 2 | from sklearn.preprocessing import StandardScaler 3 | 4 | import hdf5storage #dependency 5 | import numpy as np 6 | np.set_printoptions(threshold=np.inf) 7 | 8 | 9 | class Dataset: 10 | def __init__(self, X, y): 11 | 12 | self.data = X 13 | self.target = y.flatten() 14 | 15 | # removing any row with at least one NaN value 16 | # TODO: remove also the corresponding target value 17 | self.data = self.data[~np.isnan(self.data).any(axis=1)] 18 | 19 | self.num_sample, self.num_features = self.data.shape[0], self.data.shape[1] 20 | 21 | # retrieving unique label for Dataset 22 | self.classes = np.unique(self.target) 23 | 24 | def standardizeDataset(self): 25 | 26 | # it simply standardize the data [mean 0 and std 1] 27 | if np.sum(np.std(self.data, axis=0)).astype('int32') == self.num_features and np.sum( 28 | np.mean(self.data, axis=0)) < 1 ** -7: 29 | print ('\tThe data were already standardized!') 30 | else: 31 | print ('Standardizing data....') 32 | self.data = StandardScaler().fit_transform(self.data) 33 | 34 | def normalizeDataset(self, norm): 35 | 36 | normalizer = preprocessing.Normalizer(norm=norm) 37 | self.data = normalizer.fit_transform(self.data) 38 | 39 | def scalingDataset(self): 40 | 41 | min_max_scaler = preprocessing.MinMaxScaler() 42 | self.data = min_max_scaler.fit_transform(self.data) 43 | 44 | def shufflingDataset(self): 45 | 46 | idx = np.random.permutation(self.data.shape[0]) 47 | self.data = self.data[idx] 48 | self.target = self.target[idx] 49 | 50 | 51 | def split(self, split_ratio=0.8): 52 | 53 | # shuffling data 54 | indices = np.random.permutation(self.num_sample) 55 | 56 | start = int(split_ratio * self.num_sample) 57 | training_idx, test_idx = indices[:start], indices[start:] 58 | X_train, X_test = self.data[training_idx, :], self.data[test_idx, :] 59 | y_train, y_test = self.target[training_idx], self.target[test_idx] 60 | 61 | return X_train, y_train, X_test, y_test, training_idx, test_idx 62 | 63 | def separateSampleClass(self): 64 | 65 | # Discriminating the classes sample 66 | self.ind_class = [] 67 | for i in xrange(0, len(self.classes)): 68 | self.ind_class.append(np.where(self.target == self.classes[i])) 69 | 70 | def getSampleClass(self): 71 | 72 | data = [] 73 | target = [] 74 | # Selecting the 'train sample' on the basis of the previously retrieved indices 75 | for i in xrange(0, len(self.classes)): 76 | data.append(self.data[self.ind_class[i]]) 77 | target.append(self.target[self.ind_class[i]]) 78 | 79 | return data, target 80 | 81 | def getIndClass(self): 82 | 83 | return self.ind_class -------------------------------------------------------------------------------- /src/FeatureSelector.py: -------------------------------------------------------------------------------- 1 | from sklearn.feature_selection import SelectFromModel 2 | from sklearn.linear_model import ElasticNet, Lasso 3 | from sklearn.feature_selection import mutual_info_classif 4 | 5 | from skfeature.utility.sparse_learning import construct_label_matrix, feature_ranking 6 | from skfeature.function.sparse_learning_based import RFS, ls_l21, ll_l21, MCFS, NDFS, UDFS 7 | 8 | from skfeature.function.similarity_based import reliefF, fisher_score 9 | from skfeature.function.information_theoretical_based import MRMR 10 | 11 | import sys 12 | sys.path.insert(0, './src') 13 | import numpy as np 14 | np.set_printoptions(threshold=np.inf) 15 | import SMBA as fs 16 | 17 | 18 | class FeatureSelector: 19 | 20 | def __init__(self, model=None, name=None, tp=None, params=None): 21 | 22 | self.name = name 23 | self.model = model 24 | self.tp = tp 25 | self.params = params 26 | 27 | def setParams(self, comb_par, params_name, params): 28 | 29 | for par_name, par in zip(params_name, comb_par): 30 | params[par_name] = par 31 | 32 | self.params = params 33 | 34 | def fit(self, X, y): 35 | 36 | idx = [] 37 | 38 | if self.tp == 'ITB': 39 | 40 | if self.name == 'MRMR': 41 | idx = MRMR.mrmr(X, y, n_selected_features=self.params['num_feats']) 42 | 43 | elif self.tp == 'filter': 44 | 45 | if self.name == 'Relief': 46 | score = reliefF.reliefF(X, y, k=self.params['k']) 47 | idx = reliefF.feature_ranking(score) 48 | 49 | if self.name == 'Fisher': 50 | # obtain the score of each feature on the training set 51 | score = fisher_score.fisher_score(X, y) 52 | 53 | # rank features in descending order according to score 54 | idx = fisher_score.feature_ranking(score) 55 | 56 | if self.name == 'MI': 57 | idx = np.argsort(mutual_info_classif(X, y, n_neighbors=self.params['n_neighbors']))[::-1] 58 | 59 | elif self.tp == 'wrapper': 60 | 61 | model_fit = self.model.fit(X, y) 62 | model = SelectFromModel(model_fit, prefit=True) 63 | idx = model.get_support(indices=True) 64 | elif self.tp == 'SLB': 65 | 66 | # one-hot-encode on target 67 | y = construct_label_matrix(y) 68 | 69 | if self.name == 'SMBA': 70 | scba = fs.SCBA(data=X, alpha=self.params['alpha'], norm_type=self.params['norm_type'], 71 | verbose=self.params['verbose'], thr=self.params['thr'], max_iter=self.params['max_iter'], 72 | affine=self.params['affine'], 73 | normalize=self.params['normalize'], 74 | step=self.params['step'], 75 | PCA=self.params['PCA'], 76 | GPU=self.params['GPU'], 77 | device = self.params['device']) 78 | 79 | nrmInd, sInd, repInd, _ = scba.admm() 80 | if self.params['type_indices'] == 'nrmInd': 81 | idx = nrmInd 82 | elif self.params['type_indices'] == 'repInd': 83 | idx = repInd 84 | else: 85 | idx = sInd 86 | 87 | if self.name == 'RFS': 88 | W = RFS.rfs(X, y, gamma=self.params['gamma']) 89 | idx = feature_ranking(W) 90 | 91 | if self.name == 'll_l21': 92 | # obtain the feature weight matrix 93 | W, _, _ = ll_l21.proximal_gradient_descent(X, y, z=self.params['z'], verbose=False) 94 | # sort the feature scores in an ascending order according to the feature scores 95 | idx = feature_ranking(W) 96 | if self.name == 'ls_l21': 97 | # obtain the feature weight matrix 98 | W, _, _ = ls_l21.proximal_gradient_descent(X, y, z=self.params['z'], verbose=False) 99 | 100 | # sort the feature scores in an ascending order according to the feature scores 101 | idx = feature_ranking(W) 102 | 103 | if self.name == 'LASSO': 104 | 105 | LASSO = Lasso(alpha=self.params['alpha'], positive=True) 106 | 107 | y_pred_lasso = LASSO.fit(X, y) 108 | 109 | if y_pred_lasso.coef_.ndim == 1: 110 | coeff = y_pred_lasso.coef_ 111 | else: 112 | coeff = np.asarray(y_pred_lasso.coef_[0, :]) 113 | 114 | idx = np.argsort(-coeff) 115 | 116 | if self.name == 'EN': # elastic net L1 117 | 118 | enet = ElasticNet(alpha=self.params['alpha'], l1_ratio=1, positive=True) 119 | y_pred_enet = enet.fit(X, y) 120 | 121 | if y_pred_enet.coef_.ndim == 1: 122 | coeff = y_pred_enet.coef_ 123 | else: 124 | coeff = np.asarray(y_pred_enet.coef_[0, :]) 125 | 126 | idx = np.argsort(-coeff) 127 | 128 | return idx -------------------------------------------------------------------------------- /src/Loader.py: -------------------------------------------------------------------------------- 1 | import hdf5storage #dependency 2 | import numpy as np 3 | 4 | np.set_printoptions(threshold=np.inf) 5 | import scipy.io as sio 6 | 7 | class Loader: 8 | def __init__(self, file_path, name, variables, format, k_fold=None): 9 | 10 | 11 | # This Class provides several method for loading many type of dataset (matlab, csv, txt, etc) 12 | 13 | if format == 'matlab': # classic workspace 14 | 15 | mc = sio.loadmat(file_path) 16 | 17 | for variable in variables: 18 | setattr(self, variable, mc[variable]) 19 | 20 | elif format == 'matlab_struct': # struct one level 21 | print ('Loading data...') 22 | 23 | mc = sio.loadmat(file_path) 24 | mc = mc[name][0, 0] 25 | 26 | for variable in variables: 27 | setattr(self, variable, mc[variable]) 28 | 29 | elif format == 'custom_matlab': 30 | print ('Loading data...') 31 | 32 | mc = sio.loadmat(file_path) 33 | mc = mc[name][0, 0] 34 | 35 | for variable in variables: 36 | setattr(self, variable, mc[variable][0, 0]) 37 | 38 | elif format == 'matlab_v73': 39 | mc = hdf5storage.loadmat(file_path) 40 | 41 | for variable in variables: 42 | setattr(self, variable, mc[variable]) 43 | 44 | def getVariables(self, variables): 45 | 46 | D = {} 47 | 48 | for variable in variables: 49 | D[variable] = getattr(self, variable) 50 | 51 | return D -------------------------------------------------------------------------------- /src/SMBA.py: -------------------------------------------------------------------------------- 1 | from __future__ import division 2 | 3 | from sklearn.decomposition import PCA 4 | 5 | import numpy as np 6 | import numpy.matlib 7 | np.set_printoptions(threshold=np.inf) 8 | import pycuda.autoinit 9 | import pycuda.gpuarray as gpuarray 10 | import skcuda.linalg as linalg 11 | import skcuda.misc as misc 12 | import time 13 | 14 | 15 | class SMBA(): 16 | 17 | def __init__(self, data, alpha=10, norm_type=1, 18 | verbose=False, step=5, thr=[10**-8,-1], max_iter=5000, 19 | affine=False, 20 | normalize=True, 21 | PCA=False, npc=10, GPU=False, device=0): 22 | 23 | self.data = data 24 | self.alpha = alpha 25 | self.norm_type=norm_type 26 | self.verbose = verbose 27 | self.step = step 28 | self.thr = thr 29 | self.max_iter = max_iter 30 | self.affine = affine 31 | self.normalize = normalize 32 | self.device = device 33 | self.PCA = PCA 34 | self.npc = npc 35 | self.GPU = GPU 36 | 37 | self.num_rows = data.shape[0] 38 | self.num_columns = data.shape[1] 39 | 40 | if(self.GPU==True): 41 | # self.data = self.data.astype('float32') 42 | linalg.init() 43 | # dev = misc.get_current_device() 44 | # dev = misc.init_device(n=self.device) 45 | # print misc.get_dev_attrs(dev) 46 | 47 | 48 | def computeLambda(self): 49 | print ('\t\tComputing lambda...') 50 | 51 | T = np.zeros(self.num_columns) 52 | 53 | if (self.GPU == True): 54 | 55 | if not self.affine: 56 | 57 | gpu_data = gpuarray.to_gpu(self.data) 58 | C_gpu = linalg.dot(gpu_data, gpu_data, transa='T') 59 | 60 | for i in xrange(self.num_columns): 61 | T[i] = linalg.norm(C_gpu[i,:]) 62 | 63 | else: 64 | 65 | gpu_data = gpuarray.to_gpu(self.data) 66 | 67 | # affine transformation 68 | y_mean_gpu = misc.mean(gpu_data,axis=1) 69 | 70 | # creating affine matrix to subtract to the data (may encounter problem with strides) 71 | aff_mat = np.zeros([self.num_rows,self.num_columns]).astype('f') 72 | for i in xrange(0,self.num_columns): 73 | aff_mat[:,i] = y_mean_gpu.get() 74 | 75 | 76 | aff_mat_gpu = gpuarray.to_gpu(aff_mat) 77 | gpu_data_aff = misc.subtract(aff_mat_gpu,gpu_data) 78 | 79 | C_gpu = linalg.dot(gpu_data, gpu_data_aff, transa='T') 80 | 81 | #computing euclidean norm (rows) 82 | for i in xrange(self.num_columns): 83 | T[i] = linalg.norm(C_gpu[i,:]) 84 | else: 85 | 86 | if not self.affine: 87 | 88 | T = np.linalg.norm(np.dot(self.data.T, self.data), axis=1) 89 | 90 | else: 91 | #affine transformation 92 | y_mean = np.mean(self.data, axis=1) 93 | 94 | tmp_mat = np.outer(y_mean, np.ones(self.num_columns)) - self.data 95 | 96 | T = np.linalg.norm(np.dot(self.data.T, tmp_mat),axis=1) 97 | 98 | _lambda = np.amax(T) 99 | 100 | return _lambda 101 | 102 | def shrinkL1Lq(self, C1, _lambda): 103 | 104 | D,N = C1.shape 105 | C2 = [] 106 | if self.norm_type == 1: 107 | 108 | #TODO: incapsulate into one function 109 | # soft thresholding 110 | C2 = np.abs(C1) - _lambda 111 | ind = C2 < 0 112 | C2[ind] = 0 113 | C2 = np.multiply(C2, np.sign(C1)) 114 | elif self.norm_type == 2: 115 | r = np.zeros([D,1]) 116 | for j in xrange(0,D): 117 | th = np.linalg.norm(C1[j,:]) - _lambda 118 | r[j] = 0 if th < 0 else th 119 | C2 = np.multiply(np.matlib.repmat(np.divide(r, (r + _lambda )), 1, N), C1) 120 | elif self.norm_type == 'inf': 121 | # TODO: write it 122 | print '' 123 | 124 | return C2 125 | 126 | def errorCoef(self, Z, C): 127 | 128 | err = np.sum(np.abs(Z-C)) / (np.shape(C)[0] * np.shape(C)[1]) 129 | 130 | return err 131 | # err = sum(sum(abs(Z - C))) / (size(C, 1) * size(C, 2)); 132 | 133 | def almLasso_mat_fun(self): 134 | 135 | ''' 136 | This function represents the Augumented Lagrangian Multipliers method for Lasso problem. 137 | The lagrangian form of the Lasso can be expressed as following: 138 | 139 | MIN{ 1/2||Y-XBHETA||_2^2 + lambda||THETA||_1} s.t B-T=0 140 | 141 | When applied to this problem, the ADMM updates take the form 142 | 143 | BHETA^t+1 = (XtX + rhoI)^-1(Xty + rho^t - mu^t) 144 | THETA^t+1 = Shrinkage_lambda/rho(BHETA(t+1) + mu(t)/rho) 145 | mu(t+1) = mu(t) + rho(BHETA(t+1) - BHETA(t+1)) 146 | 147 | The algorithm involves a 'ridge regression' update for BHETA, a soft-thresholding (shrinkage) step for THETA and 148 | then a simple linear update for mu 149 | 150 | NB: Actually, this ADMM version contains several variations such as the using of two penalty parameters instead 151 | of just one of them (mu1, mu2) 152 | ''' 153 | 154 | print ('\tADMM processing...') 155 | 156 | alpha1 = alpha2 = 0 157 | if (len(self.reg_params) == 1): 158 | alpha1 = self.reg_params[0] 159 | alpha2 = self.reg_params[0] 160 | elif (len(self.reg_params) == 2): 161 | alpha1 = self.reg_params[0] 162 | alpha2 = self.reg_params[1] 163 | 164 | #thresholds parameters for stopping criteria 165 | if (len(self.thr) == 1): 166 | thr1 = self.thr[0] 167 | thr2 = self.thr[0] 168 | elif (len(self.thr) == 2): 169 | thr1 = self.thr[0] 170 | thr2 = self.thr[1] 171 | 172 | # entry condition 173 | err1 = 10 * thr1 174 | err2 = 10 * thr2 175 | 176 | start_time = time.time() 177 | 178 | # setting penalty parameters for the ALM 179 | mu1p = alpha1 * 1/self.computeLambda() 180 | print("\t\t-Compute Lambda- Time = %s seconds" % (time.time() - start_time)) 181 | mu2p = alpha2 * 1 182 | 183 | mu1 = mu1p 184 | mu2 = mu2p 185 | 186 | i = 1 187 | start_time = time.time() 188 | if self.GPU == True: 189 | 190 | # defining penalty parameters e constraint to minimize, lambda and C matrix respectively 191 | THETA = misc.zeros((self.num_columns,self.num_columns),dtype='float64') 192 | lambda2 = misc.zeros((self.num_columns,self.num_columns),dtype='float64') 193 | 194 | gpu_data = gpuarray.to_gpu(self.data) 195 | P_GPU = linalg.dot(gpu_data,gpu_data,transa='T') 196 | 197 | OP1 = P_GPU 198 | linalg.scale(np.float32(mu1), OP1) 199 | 200 | OP2 = linalg.eye(self.num_columns) 201 | linalg.scale(mu2,OP2) 202 | 203 | 204 | if self.affine == True: 205 | 206 | print ('\t\tGPU affine...') 207 | 208 | OP3 = misc.ones((self.num_columns, self.num_columns), dtype='float64') 209 | linalg.scale(mu2, OP3) 210 | lambda3 = misc.zeros((1, self.num_columns), dtype='float64') 211 | 212 | # TODO: Because of some problem with linalg.inv version of scikit-cuda we fix it using np.linalg.inv of numpy 213 | A = np.linalg.inv(misc.add(misc.add(OP1.get(), OP2.get()), OP3.get())) 214 | 215 | A_GPU = gpuarray.to_gpu(A) 216 | 217 | while ( (err1 > thr1 or err2 > thr1) and i < self.max_iter): 218 | 219 | _lambda2 = gpuarray.to_gpu(lambda2) 220 | _lambda3 = gpuarray.to_gpu(lambda3) 221 | 222 | linalg.scale(1/mu2, _lambda2) 223 | term_OP2 = gpuarray.to_gpu(_lambda2.get()) 224 | 225 | OP2 = gpuarray.to_gpu(misc.subtract(THETA, term_OP2)) 226 | linalg.scale(mu2,OP2) 227 | 228 | OP4 = gpuarray.to_gpu(np.matlib.repmat(_lambda3.get(), self.num_columns, 1)) 229 | 230 | # updating Z 231 | BHETA = linalg.dot(A_GPU,misc.add(misc.add(misc.add(OP1,OP2),OP3),OP4)) 232 | 233 | # deallocating unnecessary GPU variables 234 | OP2.gpudata.free() 235 | OP4.gpudata.free() 236 | _lambda2.gpudata.free() 237 | _lambda3.gpudata.free() 238 | 239 | # updating C 240 | THETA = misc.add(BHETA,term_OP2) 241 | THETA = self.shrinkL1Lq(THETA.get(),1/mu2) 242 | THETA = THETA.astype('float64') 243 | 244 | # updating Lagrange multipliers 245 | term_lambda2 = misc.subtract(BHETA, gpuarray.to_gpu(THETA)) 246 | 247 | linalg.scale(mu2,term_lambda2) 248 | term_lambda2 = gpuarray.to_gpu(term_lambda2.get()) 249 | lambda2 = misc.add(lambda2, term_lambda2) # on GPU 250 | 251 | term_lambda3 = misc.subtract(misc.ones((1, self.num_columns), dtype='float64'), misc.sum(BHETA,axis=0)) 252 | linalg.scale(mu2,term_lambda3) 253 | term_lambda3 = gpuarray.to_gpu(term_lambda3.get()) 254 | lambda3 = misc.add(lambda3, term_lambda3) # on GPU 255 | 256 | # deallocating unnecessary GPU variables 257 | term_OP2.gpudata.free() 258 | term_lambda2.gpudata.free() 259 | term_lambda3.gpudata.free() 260 | 261 | err1 = self.errorCoef(BHETA.get(), THETA) 262 | err2 = self.errorCoef(np.sum(BHETA.get(), axis=0), np.ones([1, self.num_columns])) 263 | 264 | # deallocating unnecessary GPU variables 265 | BHETA.gpudata.free() 266 | 267 | THETA = gpuarray.to_gpu((THETA)) 268 | 269 | # reporting errors 270 | if (self.verbose and (i % self.step == 0)): 271 | print('\t\tIteration = %d, ||Z - C|| = %2.5e, ||1 - C^T 1|| = %2.5e' % (i, err1, err2)) 272 | i += 1 273 | 274 | THETA = THETA.get() 275 | 276 | Err = [err1, err2] 277 | if(self.verbose): 278 | print ('\t\tTerminating ADMM at iteration %5.0f, \n ||Z - C|| = %2.5e, ||1 - C^T 1|| = %2.5e. \n' % (i, err1, err2)) 279 | 280 | else: 281 | print '\t\tGPU not affine' 282 | 283 | # TODO: Because of some problem with linalg.inv version of scikit-cuda we fix it using np.linalg.inv of numpy 284 | A = np.linalg.inv(misc.add(OP1.get(), OP2.get())) 285 | A_GPU = gpuarray.to_gpu(A) 286 | 287 | while ( err1 > thr1 and i < self.max_iter): 288 | 289 | _lambda2 = gpuarray.to_gpu(lambda2) 290 | 291 | term_OP2 = THETA 292 | linalg.scale(mu2, term_OP2) 293 | 294 | term_OP2 = misc.subtract(term_OP2, _lambda2) 295 | 296 | OP2 = gpuarray.to_gpu(term_OP2.get()) 297 | 298 | 299 | BHETA = linalg.dot(A_GPU, misc.add(OP1 , OP2)) 300 | 301 | linalg.scale(1 / mu2, _lambda2) 302 | term_THETA = gpuarray.to_gpu(_lambda2.get()) 303 | 304 | THETA = misc.add(BHETA,term_THETA) 305 | THETA = self.shrinkL1Lq(THETA.get(),1/mu2) 306 | 307 | THETA = THETA.astype('float32') 308 | 309 | # updating Lagrange multipliers 310 | term_lambda2 = misc.subtract(BHETA, gpuarray.to_gpu(THETA)) 311 | linalg.scale(mu2,term_lambda2) 312 | term_lambda2 = gpuarray.to_gpu(term_lambda2.get()) 313 | lambda2 = misc.add(lambda2, term_lambda2) # on GPU 314 | 315 | err1 = self.errorCoef(BHETA.get(), THETA) 316 | 317 | THETA = gpuarray.to_gpu((THETA)) 318 | 319 | # reporting errors 320 | if (self.verbose and (i % self.step == 0)): 321 | print('\t\tIteration %5.0f, ||Z - C|| = %2.5e' % (i, err1)) 322 | i += 1 323 | 324 | 325 | THETA = THETA.get() 326 | Err = [err1, err2] 327 | if(self.verbose): 328 | print ('\t\tTerminating ADMM at iteration %5.0f, \n ||Z - C|| = %2.5e' % (i, err1)) 329 | 330 | else: #CPU version 331 | 332 | # defining penalty parameters e constraint to minimize, lambda and C matrix respectively 333 | THETA = np.zeros([self.num_columns, self.num_columns]) 334 | lambda2 = np.zeros([self.num_columns, self.num_columns]) 335 | 336 | P = self.data.T.dot(self.data) 337 | OP1 = np.multiply(P, mu1) 338 | 339 | if self.affine == True: 340 | 341 | # INITIALIZATION 342 | lambda3 = np.zeros(self.num_columns).T 343 | 344 | A = np.linalg.inv(np.multiply(mu1,P) + np.multiply(mu2, np.eye(self.num_columns, dtype=int)) + np.multiply(mu2, np.ones([self.num_columns,self.num_columns]) )) 345 | 346 | OP3 = np.multiply(mu2, np.ones([self.num_columns, self.num_columns])) 347 | 348 | while ( (err1 > thr1 or err2 > thr1) and i < self.max_iter): 349 | 350 | # updating Bheta 351 | OP2 = np.multiply(THETA - np.divide(lambda2,mu2), mu2) 352 | OP4 = np.matlib.repmat(lambda3, self.num_columns, 1) 353 | BHETA = A.dot(OP1 + OP2 + OP3 + OP4 ) 354 | 355 | # updating C 356 | THETA = BHETA + np.divide(lambda2,mu2) 357 | THETA = self.shrinkL1Lq(THETA, 1/mu2) 358 | 359 | # updating Lagrange multipliers 360 | lambda2 = lambda2 + np.multiply(mu2,BHETA - THETA) 361 | lambda3 = lambda3 + np.multiply(mu2, np.ones([1,self.num_columns]) - np.sum(BHETA,axis=0)) 362 | 363 | err1 = self.errorCoef(BHETA, THETA) 364 | err2 = self.errorCoef(np.sum(BHETA,axis=0), np.ones([1, self.num_columns])) 365 | 366 | # mu1 = min(mu1 * (1 + 10 ^ -5), 10 ^ 2 * mu1p); 367 | # mu2 = min(mu2 * (1 + 10 ^ -5), 10 ^ 2 * mu2p); 368 | 369 | # reporting errors 370 | if (self.verbose and (i % self.step == 0)): 371 | print('\t\tIteration = %d, ||Z - C|| = %2.5e, ||1 - C^T 1|| = %2.5e' % (i, err1, err2)) 372 | i += 1 373 | 374 | Err = [err1, err2] 375 | 376 | if(self.verbose): 377 | print ('\t\tTerminating ADMM at iteration %5.0f, \n ||Z - C|| = %2.5e, ||1 - C^T 1|| = %2.5e. \n' % (i, err1,err2)) 378 | else: 379 | print '\t\tCPU not affine' 380 | 381 | A = np.linalg.inv(OP1 + np.multiply(mu2, np.eye(self.num_columns, dtype=int))) 382 | 383 | while ( err1 > thr1 and i < self.max_iter): 384 | 385 | # updating Z 386 | OP2 = np.multiply(mu2, THETA) - lambda2 387 | BHETA = A.dot(OP1 + OP2) 388 | 389 | # updating C 390 | THETA = BHETA + np.divide(lambda2, mu2) 391 | THETA = self.shrinkL1Lq(THETA, 1/mu2) 392 | 393 | # updating Lagrange multipliers 394 | lambda2 = lambda2 + np.multiply(mu2,BHETA - THETA) 395 | 396 | # computing errors 397 | err1 = self.errorCoef(BHETA, THETA) 398 | 399 | # reporting errors 400 | if (self.verbose and (i % self.step == 0)): 401 | print('\t\tIteration %5.0f, ||Z - C|| = %2.5e' % (i, err1)) 402 | i += 1 403 | 404 | Err = [err1, err2] 405 | if(self.verbose): 406 | print ('\t\tTerminating ADMM at iteration %5.0f, \n ||Z - C|| = %2.5e' % (i, err1)) 407 | 408 | print("\t\t-ADMM- Time = %s seconds" % (time.time() - start_time)) 409 | 410 | return THETA, Err 411 | 412 | def rmRep(self, sInd, thr): 413 | 414 | ''' 415 | This function takes the data matrix and the indices of the representatives and removes the representatives 416 | that are too close to each other 417 | 418 | :param sInd: indices of the representatives 419 | :param thr: threshold for pruning the representatives, typically in [0.9,0.99] 420 | :return: representatives indices 421 | ''' 422 | 423 | Ys = self.data[:, sInd] 424 | 425 | Ns = Ys.shape[1] 426 | d = np.zeros([Ns, Ns]) 427 | 428 | # Computes a the distance matrix for all selected columns by the algorithm 429 | for i in xrange(0,Ns-1): 430 | for j in xrange(i+1,Ns): 431 | d[i,j] = np.linalg.norm(Ys[:,i] - Ys[:,j]) 432 | 433 | d = d + d.T # define symmetric matrix 434 | 435 | dsorti = np.argsort(d,axis=0)[::-1] 436 | dsort = np.flipud(np.sort(d,axis=0)) 437 | 438 | pind = np.arange(0,Ns) 439 | for i in xrange(0, Ns): 440 | if np.any(pind==i) == True: 441 | cum = 0 442 | t = -1 443 | while cum <= (thr * np.sum(dsort[:,i])): 444 | t += 1 445 | cum += dsort[t, i] 446 | 447 | pind = np.setdiff1d(pind, np.setdiff1d( dsorti[t:,i], np.arange(0,i+1), assume_unique=True), assume_unique=True) 448 | 449 | ind = sInd[pind] 450 | 451 | return ind 452 | 453 | def findRep(self,C, thr, norm): 454 | 455 | ''' 456 | This function takes the coefficient matrix with few nonzero rows and computes the indices of the nonzero rows 457 | :param C: NxN coefficient matrix 458 | :param thr: threshold for selecting the nonzero rows of C, typically in [0.9,0.99] 459 | :param norm: value of norm used in the L1/Lq minimization program in {1,2,inf} 460 | :return: the representatives indices on the basis of the ascending norm of the row of C (larger is the norm of 461 | a generic row most representative it is) 462 | ''' 463 | 464 | N = C.shape[0] 465 | 466 | r = np.zeros([1,N]) 467 | 468 | for i in xrange(0, N): 469 | 470 | r[:,i] = np.linalg.norm(C[i,:],norm) 471 | 472 | nrmInd = np.argsort(r)[0][::-1] #descending order 473 | nrm = r[0,nrmInd] 474 | 475 | # pick norm indices basing on the thresholding of the 'cumulative norm's sum' 476 | cssInd = nrmInd[np.cumsum(nrm)/np.sum(nrm) < thr] 477 | 478 | return cssInd, nrmInd 479 | 480 | 481 | def admm(self): 482 | 483 | ''' 484 | ''' 485 | # initializing penalty parameters 486 | self.reg_params = [self.alpha, self.alpha] 487 | 488 | thrS = 0.99 489 | thrP = 0.95 490 | 491 | #subtract mean from sample 492 | if self.normalize == True: 493 | self.data = self.data - np.matlib.repmat(np.mean(self.data, axis=1), self.num_columns,1).T 494 | 495 | self.repInd = [] 496 | if (self.PCA == True): 497 | print ('\t\tPerforming PCA...') 498 | pca = PCA(n_components = self.npc) 499 | self.data = pca.fit_transform(self.data) 500 | self.num_columns = self.data.shape[0] 501 | self.num_row = self.data.shape[0] 502 | self.num_columns = self.data.shape[1] 503 | 504 | 505 | self.C,_ = self.almLasso_mat_fun() 506 | 507 | self.sInd, self.nrmInd = self.findRep(self.C, thrS, self.norm_type) 508 | 509 | # custom procedure for removing redundant indices 510 | # self.repInd = self.rmRep(self.sInd, thrP) 511 | self.repInd = [] 512 | 513 | 514 | return self.nrmInd, self.sInd, self.repInd, self.C -------------------------------------------------------------------------------- /src/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DavideNardone/A-Sparse-Coding-Based-Approach-for-Class-Specific-Feature-Selection/32a4eba5ca437c8fe853c001b7a769ed716801d7/src/__init__.py -------------------------------------------------------------------------------- /src/grid_search.py: -------------------------------------------------------------------------------- 1 | from __future__ import division 2 | 3 | from sklearn.svm import SVC 4 | from sklearn.model_selection import KFold 5 | from imblearn.over_sampling import SMOTE 6 | from sklearn.metrics import accuracy_score 7 | from sklearn.preprocessing import label_binarize 8 | 9 | import numpy as np 10 | np.set_printoptions(threshold=np.inf) 11 | import os 12 | import errno 13 | import random as rnd 14 | import itertools 15 | import sys 16 | sys.path.insert(0, './src') 17 | 18 | import Loader as lr 19 | import Classifier as clf 20 | import FeatureSelector as fs 21 | 22 | 23 | 24 | def tuning_analysis(fs, n_feats): 25 | 26 | min_var = 99999999 27 | min_hyp_par = {} 28 | 29 | for curr_fs_name,curr_fs in fs.iteritems(): 30 | 31 | voting_matrix = {} 32 | _res_voting = {} 33 | 34 | combs = curr_fs.keys() 35 | combs.sort() 36 | 37 | for comb in combs: 38 | voting_matrix[comb] = np.zeros([1,n_feats]) 39 | value = curr_fs[comb] 40 | # print ('hyper-params. comb. is %s'%comb) 41 | curr_var = np.var(value['ACC']) 42 | if curr_var < min_var: 43 | min_var = curr_var 44 | min_hyp_par = comb 45 | 46 | print 'Hyper-params. comb=%s has minimum variance of %s'%(min_hyp_par, min_var) 47 | 48 | combs = curr_fs.keys() 49 | combs.sort() 50 | 51 | # voting matrix dim: [num_comb, n_feats] 52 | # voting_matrix = np.zeros([len(combs), n_feats]) 53 | print '\nApplying majority voting...' 54 | for j in xrange(0,n_feats): 55 | _competitors = {} 56 | for comb in combs: 57 | _competitors[comb] = curr_fs[comb]['ACC'][j] 58 | 59 | #getting the winner accuracy for all the combinations computed 60 | winners = [comb for m in [max(_competitors.values())] for comb, val in _competitors.iteritems() if val == m] 61 | for winner in winners: 62 | voting_matrix[winner][0][j] = 1 63 | 64 | #getting the parameter with largest voting 65 | for comb in combs: 66 | _res_voting[comb] = np.sum(voting_matrix[comb][0]) 67 | 68 | _max = -9999999 69 | best_comb = {} 70 | BS = {} 71 | for comb in combs: 72 | if _res_voting[comb] > _max: 73 | _max = _res_voting[comb] 74 | best_comb = comb 75 | print ('Parameters set: '+ comb.__str__() +' got votes: ' + _res_voting[comb].__str__()) 76 | 77 | BS[fs_name] = best_comb 78 | 79 | print ('\nBest parameters set found on development set for: ' + fs_name.__str__() + ' is: ' + best_comb.__str__()) 80 | 81 | return BS 82 | 83 | def create_grid(params): 84 | 85 | comb = [] 86 | for t in itertools.product(*params): 87 | comb.append(t) 88 | 89 | return comb 90 | 91 | def classificationDecisionRule(clf_score, cls, clf_name, target): 92 | 93 | n_classes = len(cls) 94 | DTS = {} 95 | 96 | for ccn in clf_name: 97 | hits = [] 98 | res = [] 99 | preds = [] 100 | 101 | for i in xrange(0,n_classes): 102 | # print 'classifier e_' + str(cls[i]) 103 | 104 | #ensemble scores on class 'C' for the testing set 105 | e_th = clf_score['C'+str(cls[i])]['accuracy'][ccn] 106 | res.append(e_th) 107 | 108 | hits.append((e_th == cls[i]).astype('int').flatten()) 109 | 110 | # ensemble scores and hits for the testing set 111 | ensemble_res = np.vstack(res) 112 | ensemble_hits = np.vstack(hits) 113 | 114 | # Applying decision rules 115 | for i in xrange(0, ensemble_hits.shape[1]): # number of sample 116 | hits = ensemble_hits[:,i] #it has a 1 in a position whether the classifier e_i has predicted the class w_i for the i-th pattern 117 | ens_preds = ensemble_res[:,i] #it's simply the predictions of all the trained classifier for the i-th pattern 118 | cond = np.sum(hits) #count the number of true positive for the i-th pattern 119 | 120 | if cond == 1: #rule 1 121 | pred = cls[np.where(hits==1)[0].squeeze()] #retrieve the cls for the 'only' true positive 122 | preds.append(pred) 123 | 124 | elif cond == 0 or cond > 1: # rule 1-2 (tie) 125 | 126 | # we find the majority votes (frequency) among all classifier (e.g., ) [[4 2][5 1][6 2][7 2]] 127 | unique, counts = np.unique(ens_preds, return_counts=True) 128 | maj_rule = np.asarray((unique, counts)).T 129 | 130 | # we find the 'majority' index, then its class 131 | ind_max = np.argmax(maj_rule[:, 1]) 132 | pred = maj_rule[ind_max, 0] 133 | max = maj_rule[ind_max, 1] 134 | 135 | # we look for a 'tie of the tie', then we look for the majority class among all the tied classes 136 | tied_cls = np.where(maj_rule[:, 1] == max)[0] 137 | if ( len(np.where(maj_rule[:, 1] == max)[0]) ) > 1: #tie of the tie 138 | pred = maj_rule[tied_cls,0] 139 | 140 | # pick one tied cls randomly 141 | pred = pred[rnd.randint(0,len(pred)-1)] 142 | preds.append(pred) 143 | 144 | else: 145 | preds.append(pred) 146 | 147 | #compute accuracy 148 | test_score = accuracy_score(target, preds) 149 | 150 | dic_test_score = { 151 | ccn: test_score 152 | } 153 | 154 | DTS.update(dic_test_score) 155 | 156 | return DTS 157 | 158 | def checkFolder(root, path_output): 159 | 160 | #folders to generate recursively 161 | path = root+'/'+path_output 162 | 163 | try: 164 | os.makedirs(path) 165 | except OSError as exc: # Python >2.5 166 | if exc.errno == errno.EEXIST and os.path.isdir(path): 167 | pass 168 | else: 169 | raise 170 | 171 | 172 | 173 | if __name__ == '__main__': 174 | 175 | ''' LOADING ANY DATASET ''' 176 | dataset_dir = '/dataset' 177 | dataset_type = '/BIOLOGICAL' 178 | dataset_name = '/LUNG_DISCRETE' 179 | 180 | resample = True 181 | 182 | path_data_folder = dataset_dir + dataset_type + dataset_name 183 | path_data_file = path_data_folder + dataset_name 184 | 185 | variables = ['X', 'Y'] 186 | # NB: If you get an error such as: 'Please use HDF reader for matlab v7.3 files',please change the 'format variable' to 'matlab_v73' 187 | D = lr.Loader(file_path=path_data_file, 188 | format='matlab', 189 | variables=variables, 190 | name=dataset_name[1:] 191 | ).getVariables(variables=variables) 192 | 193 | dataset = lr.Dataset(D['X'], D['Y']) 194 | 195 | # check if the data are already standardized, if not standardize it 196 | dataset.standardizeDataset() 197 | 198 | n_classes = dataset.classes.shape[0] 199 | cls = np.unique(dataset.classes) 200 | 201 | num_min_cls = 9999999 202 | if resample == True: 203 | 204 | print ('Dataset before resampling %s w/ size: %s and number of classes: %s---> %s' % ( 205 | dataset_name[1:], dataset.data.shape, n_classes, cls)) 206 | 207 | # discriminating classes the whole dataset 208 | dataset_train = lr.Dataset(dataset.data, dataset.target) 209 | dataset_train.separateSampleClass() 210 | data, target = dataset_train.getSampleClass() 211 | 212 | for i in xrange(0, n_classes): 213 | print ('# sample for class C' + str(i + 1) + ':', data[i].shape) 214 | if data[i].shape[0] < num_min_cls: 215 | num_min_cls = data[i].shape[0] 216 | 217 | resample = '/BALANCED' 218 | print 'Re-sampling dataset...' 219 | dataset.data, dataset.target = SMOTE(kind='regular', k_neighbors=num_min_cls-1).fit_sample(dataset.data, dataset.target) 220 | else: 221 | resample = '/UNBALANCED' 222 | 223 | # shuffling data 224 | dataset.shufflingDataset() 225 | 226 | n_classes = dataset.classes.shape[0] 227 | cls = np.unique(dataset.classes) 228 | 229 | print ('Dataset %s w/ size: %s and number of classes: %s---> %s' %(dataset_name[1:], dataset.data.shape, n_classes, cls)) 230 | 231 | # discriminating classes the whole dataset 232 | dataset_train = lr.Dataset(dataset.data, dataset.target) 233 | dataset_train.separateSampleClass() 234 | data, target = dataset_train.getSampleClass() 235 | 236 | for i in xrange(0, n_classes): 237 | print ('# sample for class C' + str(i + 1) + ':', data[i].shape) 238 | 239 | 240 | ################################### TUNING PARAMS ################################### 241 | 242 | FS = {} 243 | 244 | #CLASSIFIERS 245 | clf_name = [ 246 | "SVM" 247 | # "Decision Tree", 248 | # "KNN" 249 | ] 250 | model = [ 251 | SVC(kernel="linear") 252 | # DecisionTreeClassifier(max_depth=5), 253 | # KNeighborsClassifier(n_neighbors=1), 254 | ] 255 | 256 | max_num_feat = 300 257 | step = 1 258 | 259 | # initializing feature selector parameters 260 | params = { 261 | # the smaller alpha the sparser C matrix (fewer representatives) 262 | 'SMBA': 263 | { 264 | 'alpha': 5, #typically alpha in [2,50] 265 | 'norm_type': 1, 266 | 'max_iter': 3000, 267 | 'thr': [10**-8], 268 | 'type_indices': 'nrmInd', 269 | 'normalize': False, 270 | 'GPU': False, 271 | 'device': 0, 272 | 'PCA': False, 273 | 'verbose': False, 274 | 'step': 1, 275 | 'affine': False, 276 | }, 277 | 'RFS': 278 | { 279 | 'gamma': 0 280 | }, 281 | 'll_l21': 282 | { 283 | 'z': 0 284 | }, 285 | 'ls_l21': 286 | { 287 | 'z': 0 288 | }, 289 | 'Relief': 290 | { 291 | 'k': 0 292 | }, 293 | 'MRMR': 294 | { 295 | 296 | 'num_feats': max_num_feat 297 | }, 298 | 'MI': 299 | { 300 | 'n_neighbors': 0 301 | }, 302 | # the bigger is alpha the sparser is the C matrix (fewer representatives) 303 | 'EN': 304 | { 305 | 'alpha': 1, # default value is 1 306 | }, 307 | # the bigger is alpha the sparser is the C matrix (fewer representatives) 308 | 'LASSO': 309 | { 310 | 'alpha': 1 # default value is 1 311 | } 312 | } 313 | 314 | 315 | slb_fs = { 316 | 317 | 'LASSO': fs.FeatureSelector(name='LASSO', tp='SLB', params=params['LASSO']), 318 | 'EN': fs.FeatureSelector(name='EN', tp='SLB', params=params['EN']), 319 | 'SMBA': fs.FeatureSelector(name='SMBA', tp='SLB', params=params['SMBA']), 320 | 'RFS': fs.FeatureSelector(name='RFS', tp='SLB',params=params['RFS']), 321 | 'll_l21': fs.FeatureSelector(name='ll_l21', tp='SLB',params=params['ll_l21']), #injection not working 322 | 'ls_l21': fs.FeatureSelector(name='ls_l21', tp='SLB',params=params['ls_l21']), 323 | 324 | 'Relief': fs.FeatureSelector(name='Relief', tp='filter', params=params['Relief']), 325 | 'MRMR': fs.FeatureSelector(name='MRMR', tp='ITB', params=params['MRMR']), 326 | 'MI': fs.FeatureSelector(name='MI', tp='filter', params=params['MI']) 327 | } 328 | 329 | tuned_parameters = { 330 | 331 | 'LASSO': {'alpha': [1e-15, 1e-10, 1e-8, 1e-5,1e-4, 1e-3,1e-2, 1, 5, 10]}, 332 | 'EN': {'alpha': [1e-15, 1e-10, 1e-8, 1e-5,1e-4, 1e-3,1e-2, 1, 5, 10]}, 333 | 'SMBA': {'alpha': [5, 10, 15, 20, 25, 30, 35, 40, 45, 50]}, 334 | 'RFS': {'gamma': [1e-15, 1e-10, 1e-8, 1e-5,1e-4, 1e-3,1e-2, 1, 5, 10]}, 335 | 'll_l21': {'z': [1e-15, 1e-10, 1e-8, 1e-5,1e-4, 1e-3,1e-2, 1, 5, 10]}, 336 | 'ls_l21': {'z': [1e-15, 1e-10, 1e-8, 1e-5,1e-4, 1e-3,1e-2, 1, 5, 10]}, 337 | 'Relief': {'k': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}, 338 | 'MRMR': {'num_feats': [max_num_feat]}, 339 | 'MI': {'n_neighbors': [1, 2, 3, 5, 7, 10]} 340 | } 341 | 342 | 343 | if max_num_feat > dataset.data.shape[1]: 344 | max_num_feat = dataset.data.shape[1] 345 | 346 | print ('\nMax number of features to use: ', max_num_feat) 347 | 348 | #setting the parameters for k-fold CV 349 | k_fold = 5 350 | 351 | X = dataset.data 352 | y = dataset.target 353 | kf = KFold(n_splits=k_fold) 354 | 355 | tuning_type = 'CSFS-PAPER' 356 | 357 | ################################### TFS ################################### 358 | 359 | if tuning_type == 'TFS': 360 | 361 | res_path_output = '/TFS/RESULTS/' 362 | 363 | # tuning process on all feature selector 364 | for fs_name, fs_model in slb_fs.iteritems(): 365 | 366 | # check whether the 'curr_res_fs_fold' directory exists, otherwise create it 367 | # curr_res_output_fold = path_data_folder + '/' + res_path_output + '/TUNING/' + resample + '/'+ fs_name 368 | # checkFolder(path_data_folder, res_path_output + '/TUNING/' + resample + '/'+ fs_name) 369 | 370 | print '\nTuning hyper-parameters on ' +fs_name.__str__()+ ' for accuracy by means of k-fold CV...' 371 | FS.update({fs_name: {}}) 372 | comb = [] 373 | params_name = [] 374 | 375 | for name, tun_par in tuned_parameters[fs_name].iteritems(): 376 | comb.append(tun_par) 377 | params_name.append(name) 378 | 379 | #create all combinations parameters 380 | combs = create_grid(comb) 381 | 382 | n_iter = 1 383 | #loop on each combination 384 | for comb in combs: 385 | 386 | FS[fs_name].update({comb: {}}) 387 | CV = np.ones([k_fold, max_num_feat])*0 388 | avg_scores = [] 389 | std_scores = [] 390 | 391 | print ('\tComputing '+n_iter.__str__() +'-th combination...') 392 | 393 | # set i-th parameters combination parameters for the current feature selector 394 | fs_model.setParams(comb,params_name,params[fs_name]) 395 | 396 | cc_fold = 0 397 | for train_index, test_index in kf.split(X): 398 | 399 | kth_scores = [] 400 | 401 | X_train, X_test = X[train_index, :], X[test_index, :] 402 | y_train, y_test = y[train_index], y[test_index] 403 | 404 | idx = fs_model.fit(X_train, y_train) 405 | # print idx 406 | # idx = list(range(1, 20)) 407 | 408 | #classification step on the first max_num_feat 409 | for n_rep in xrange(step, max_num_feat + step, step): # first n_rep indices 410 | 411 | X_train_fs = X_train[:, idx[0:n_rep]] 412 | X_test_fs = X_test[:, idx[0:n_rep]] 413 | 414 | _clf = clf.Classifier(names=clf_name, classifiers=model) 415 | DTS = _clf.train_and_classify(X_train_fs, y_train, X_test_fs, y_test) 416 | 417 | _score = DTS['SVM'] 418 | kth_scores.append(_score) # it contains the max_num_feat scores for the k-th CV fold 419 | 420 | CV[cc_fold,:] = kth_scores 421 | cc_fold += 1 422 | 423 | avg_scores = np.mean(CV, axis=0) 424 | std_scores = np.std(CV, axis=0) 425 | 426 | FS[fs_name][comb]['ACC'] = avg_scores 427 | FS[fs_name][comb]['STD'] = std_scores 428 | 429 | n_iter +=1 430 | 431 | #tuning analysis 432 | print 'Applying tuning analysis...' 433 | num_feat = 10 434 | best_params = tuning_analysis(FS,num_feat) 435 | 436 | print 'Saving results...\n' 437 | # curr_res_output_fold = path_data_folder + '/' + res_path_output + '/TUNING/' + resample 438 | # checkFolder(path_data_folder, res_path_output + '/TUNING/' + resample) 439 | # 440 | # with open(curr_res_output_fold + '/' + 'best_params.pickle', 'wb') as handle: 441 | # pickle.dump(results, handle, protocol=pickle.HIGHEST_PROTOCOL) 442 | 443 | elif tuning_type == 'CSFS': 444 | 445 | res_path_output = '/CSFS/RESULTS/' 446 | 447 | # tuning process on all feature selector 448 | for fs_name, fs_model in slb_fs.iteritems(): 449 | 450 | # check whether the 'curr_res_fs_fold' directory exists, otherwise create it 451 | # curr_res_output_fold = path_data_folder + '/' + res_path_output + '/TUNING/' + resample + '/'+ fs_name 452 | # checkFolder(path_data_folder, res_path_output + '/TUNING/' + resample + '/'+ fs_name) 453 | 454 | print '\nTuning hyper-parameters on ' +fs_name.__str__()+ ' for accuracy by means of k-fold CV...' 455 | FS.update({fs_name: {}}) 456 | comb = [] 457 | params_name = [] 458 | 459 | for name, tun_par in tuned_parameters[fs_name].iteritems(): 460 | comb.append(tun_par) 461 | params_name.append(name) 462 | 463 | #create all combinations parameters 464 | combs = create_grid(comb) 465 | 466 | n_iter = 1 467 | #loop on each combination 468 | for comb in combs: 469 | 470 | FS[fs_name].update({comb: {}}) 471 | CV = np.ones([k_fold, max_num_feat])*0 472 | avg_scores = [] 473 | std_scores = [] 474 | 475 | print ('\tComputing '+n_iter.__str__() +'-th combination...') 476 | 477 | # set i-th parameters combination parameters for the current feature selector 478 | fs_model.setParams(comb,params_name,params[fs_name]) 479 | 480 | cc_fold = 0 481 | # k-fold CV 482 | for train_index, test_index in kf.split(X): 483 | 484 | kth_scores = [] 485 | 486 | csfs_res = {} 487 | cls_res = {} 488 | k_computing_time = 0 489 | 490 | for i in xrange(0, n_classes): 491 | cls_res = { 492 | 493 | 'C' + str(cls[i]): {} 494 | } 495 | csfs_res.update(cls_res) 496 | 497 | X_train_kth, X_test_kth = X[train_index], X[test_index] 498 | y_train, y_test = y[train_index], y[test_index] 499 | 500 | ''' For the training data in each class we find the representative features and use them as a best subset feature 501 | (in representing each class sample) to perform classification 502 | ''' 503 | 504 | # check whether the 'curr_res_fs_fold' directory exists, otherwise create it 505 | # curr_res_fs_fold = path_data_folder + '/' + fs_path_output + '/' + fs_name + resample 506 | # checkFolder(path_data_folder, fs_path_output + '/' + fs_name + resample) 507 | 508 | # discriminating classes for the k-th fold of the training set 509 | data_train = lr.Dataset(X_train_kth, y_train) 510 | data_train.separateSampleClass() 511 | ktrain_data, ktrain_target = data_train.getSampleClass() 512 | 513 | for i in xrange(0, n_classes): 514 | 515 | idx = fs_model.fit(ktrain_data[i], ktrain_target[i]) 516 | 517 | csfs_res['C' + str(cls[i])]['idx'] = idx 518 | csfs_res['C' + str(cls[i])]['params'] = params[fs_name] 519 | 520 | # learning a classifier (ccn) for each subset of 'n_rep' feature 521 | for j in xrange(0, max_num_feat): 522 | n_rep = j + 1 # first n_rep indices 523 | 524 | for i in xrange(0, n_classes): 525 | # get subset of feature from the i-th class 526 | idx = csfs_res['C' + str(cls[i])]['idx'] 527 | 528 | X_train_fs = X_train_kth[:, idx[0:n_rep]] 529 | 530 | _clf = clf.Classifier(names=clf_name, classifiers=model) 531 | _clf.train(X_train_fs, y_train) 532 | 533 | csfs_res['C' + str(cls[i])]['accuracy'] = _clf.classify(X_test_kth[:, idx[0:n_rep]], y_test) 534 | 535 | DTS = classificationDecisionRule(csfs_res, cls, clf_name, y_test) 536 | 537 | _score = DTS['SVM'] 538 | kth_scores.append(_score) 539 | # print kth_scores 540 | 541 | CV[cc_fold,:] = kth_scores 542 | cc_fold += 1 543 | 544 | avg_scores = np.mean(CV, axis=0) 545 | std_scores = np.std(CV, axis=0) 546 | 547 | # print avg_scores 548 | 549 | FS[fs_name][comb]['ACC'] = avg_scores 550 | FS[fs_name][comb]['STD'] = std_scores 551 | 552 | n_iter +=1 553 | 554 | #tuning analysis 555 | print 'Applying tuning analysis...' 556 | num_feat = 10 557 | best_params = tuning_analysis(FS,num_feat) 558 | 559 | print best_params 560 | 561 | # print 'Saving results...\n' 562 | # curr_res_output_fold = path_data_folder + '/' + res_path_output + '/TUNING/' + resample 563 | # checkFolder(path_data_folder, res_path_output + '/TUNING/' + resample) 564 | # 565 | # with open(curr_res_output_fold + '/' + 'best_params.pickle', 'wb') as handle: 566 | # pickle.dump(results, handle, protocol=pickle.HIGHEST_PROTOCOL) 567 | 568 | elif tuning_type == 'CSFS-PAPER': 569 | 570 | res_path_output = '/CSFS_PAPER/RESULTS/' 571 | 572 | # tuning process on all feature selector 573 | for fs_name, fs_model in slb_fs.iteritems(): 574 | 575 | # check whether the 'curr_res_fs_fold' directory exists, otherwise create it 576 | # curr_res_output_fold = path_data_folder + '/' + res_path_output + '/TUNING/' + resample + '/'+ fs_name 577 | # checkFolder(path_data_folder, res_path_output + '/TUNING/' + resample + '/'+ fs_name) 578 | 579 | print '\nTuning hyper-parameters on ' +fs_name.__str__()+ ' for accuracy by means of k-fold CV...' 580 | FS.update({fs_name: {}}) 581 | comb = [] 582 | params_name = [] 583 | 584 | for name, tun_par in tuned_parameters[fs_name].iteritems(): 585 | comb.append(tun_par) 586 | params_name.append(name) 587 | 588 | #create all combinations parameters 589 | combs = create_grid(comb) 590 | 591 | n_iter = 1 592 | #loop on each combination 593 | for comb in combs: 594 | 595 | FS[fs_name].update({comb: {}}) 596 | CV = np.ones([k_fold, max_num_feat])*0 597 | avg_scores = [] 598 | std_scores = [] 599 | 600 | print ('\tComputing '+n_iter.__str__() +'-th combination...') 601 | 602 | # set i-th parameters combination parameters for the current feature selector 603 | fs_model.setParams(comb,params_name,params[fs_name]) 604 | 605 | cc_fold = 0 606 | # k-fold CV 607 | for train_index, test_index in kf.split(X): 608 | 609 | kth_scores = [] 610 | 611 | csfs_res = {} 612 | cls_res = {} 613 | k_computing_time = 0 614 | 615 | for i in xrange(0, n_classes): 616 | cls_res = { 617 | 618 | 'C' + str(cls[i]): {} 619 | } 620 | csfs_res.update(cls_res) 621 | 622 | X_train_kth, X_test_kth = X[train_index], X[test_index] 623 | y_train, y_test = y[train_index], y[test_index] 624 | 625 | # CLASS BINARIZATION 626 | lb = label_binarize(cls, classes=y_train) 627 | 628 | for i in xrange(0, n_classes): 629 | num_min_cls = 9999999 630 | k_neighbors = 5 631 | 632 | # discriminating classes the whole dataset 633 | dataset_train = lr.Dataset(X_train_kth, lb[i]) 634 | dataset_train.separateSampleClass() 635 | data, target = dataset_train.getSampleClass() 636 | 637 | for j in xrange(0, 2): 638 | if data[j].shape[0] < num_min_cls: 639 | num_min_cls = data[j].shape[0] 640 | 641 | if num_min_cls == 1: 642 | num_min_cls += 1 643 | 644 | # CLASS BALANCING 645 | data_cls, target_cls = SMOTE(kind='regular',k_neighbors=num_min_cls-1).fit_sample(X_train_kth, lb[i]) 646 | 647 | # Performing feature selection on each class 648 | 649 | idx = fs_model.fit(data_cls, target_cls) 650 | 651 | csfs_res['C' + str(cls[i])]['idx'] = idx 652 | csfs_res['C' + str(cls[i])]['params'] = params[fs_name] 653 | 654 | # Classification 655 | ens_class = {} 656 | # learning a classifier (ccn) for each subset of 'n_rep' feature 657 | for n_rep in xrange(step, max_num_feat + step, step): # first n_rep indices 658 | 659 | for i in xrange(0, n_classes): 660 | # get subset of feature from the i-th class 661 | idx = csfs_res['C' + str(cls[i])]['idx'] 662 | 663 | X_train_fs = X_train_kth[:, idx[0:n_rep]] 664 | 665 | _clf = clf.Classifier(names=clf_name, classifiers=model) 666 | _clf.train(X_train_fs, y_train) 667 | 668 | csfs_res['C' + str(cls[i])]['accuracy'] = _clf.classify(X_test_kth[:, idx[0:n_rep]], y_test) 669 | 670 | DTS = classificationDecisionRule(csfs_res, cls, clf_name, y_test) 671 | 672 | _score = DTS['SVM'] 673 | kth_scores.append(_score) 674 | 675 | CV[cc_fold, :] = kth_scores 676 | cc_fold += 1 677 | 678 | avg_scores = np.mean(CV, axis=0) 679 | std_scores = np.std(CV, axis=0) 680 | 681 | FS[fs_name][comb]['ACC'] = avg_scores 682 | FS[fs_name][comb]['STD'] = std_scores 683 | 684 | n_iter += 1 685 | 686 | #tuning analysis 687 | print 'Applying tuning analysis...' 688 | num_feat = 10 689 | best_params = tuning_analysis(FS,num_feat) 690 | 691 | print best_params 692 | 693 | # print 'Saving results...\n' 694 | # curr_res_output_fold = path_data_folder + '/' + res_path_output + '/TUNING/' + resample 695 | # checkFolder(path_data_folder, res_path_output + '/TUNING/' + resample) 696 | # 697 | # with open(curr_res_output_fold + '/' + 'best_params.pickle', 'wb') as handle: 698 | # pickle.dump(results, handle, protocol=pickle.HIGHEST_PROTOCOL) 699 | 700 | else: 701 | print 'Wrong tuning type selected' 702 | -------------------------------------------------------------------------------- /src/skfeature/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DavideNardone/A-Sparse-Coding-Based-Approach-for-Class-Specific-Feature-Selection/32a4eba5ca437c8fe853c001b7a769ed716801d7/src/skfeature/__init__.py -------------------------------------------------------------------------------- /src/skfeature/function/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DavideNardone/A-Sparse-Coding-Based-Approach-for-Class-Specific-Feature-Selection/32a4eba5ca437c8fe853c001b7a769ed716801d7/src/skfeature/function/__init__.py -------------------------------------------------------------------------------- /src/skfeature/function/information_theoretical_based/CIFE.py: -------------------------------------------------------------------------------- 1 | import LCSI 2 | 3 | 4 | def cife(X, y, **kwargs): 5 | """ 6 | This function implements the CIFE feature selection 7 | 8 | Input 9 | ----- 10 | X: {numpy array}, shape (n_samples, n_features) 11 | input data, guaranteed to be discrete 12 | y: {numpy array}, shape (n_samples,) 13 | input class labels 14 | kwargs: {dictionary} 15 | n_selected_features: {int} 16 | number of features to select 17 | 18 | Output 19 | ------ 20 | F: {numpy array}, shape (n_features,) 21 | index of selected features, F[1] is the most important feature 22 | 23 | Reference 24 | --------- 25 | Brown, Gavin et al. "Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection." JMLR 2012. 26 | """ 27 | 28 | if 'n_selected_features' in kwargs.keys(): 29 | n_selected_features = kwargs['n_selected_features'] 30 | F = LCSI.lcsi(X, y, beta=1, gamma=1, n_selected_features=n_selected_features) 31 | else: 32 | F = LCSI.lcsi(X, y, beta=1, gamma=1) 33 | return F 34 | -------------------------------------------------------------------------------- /src/skfeature/function/information_theoretical_based/CMIM.py: -------------------------------------------------------------------------------- 1 | from skfeature.utility.entropy_estimators import * 2 | 3 | 4 | def cmim(X, y, **kwargs): 5 | """ 6 | This function implements the CMIM feature selection. 7 | The scoring criteria is calculated based on the formula j_cmim=I(f;y)-max_j(I(fj;f)-I(fj;f|y)) 8 | 9 | Input 10 | ----- 11 | X: {numpy array}, shape (n_samples, n_features) 12 | Input data, guaranteed to be a discrete numpy array 13 | y: {numpy array}, shape (n_samples,) 14 | guaranteed to be a numpy array 15 | kwargs: {dictionary} 16 | n_selected_features: {int} 17 | number of features to select 18 | 19 | Output 20 | ------ 21 | F: {numpy array}, shape (n_features,) 22 | index of selected features, F(1) is the most important feature 23 | 24 | Reference 25 | --------- 26 | Brown, Gavin et al. "Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection." JMLR 2012. 27 | """ 28 | 29 | n_samples, n_features = X.shape 30 | # index of selected features, initialized to be empty 31 | F = [] 32 | # indicate whether the user specifies the number of features 33 | is_n_selected_features_specified = False 34 | 35 | if 'n_selected_features' in kwargs.keys(): 36 | n_selected_features = kwargs['n_selected_features'] 37 | is_n_selected_features_specified = True 38 | 39 | # t1 stores I(f;y) for each feature f 40 | t1 = np.zeros(n_features) 41 | 42 | # max stores max(I(fj;f)-I(fj;f|y)) for each feature f 43 | # we assign an extreme small value to max[i] ito make it is smaller than possible value of max(I(fj;f)-I(fj;f|y)) 44 | max = -10000000*np.ones(n_features) 45 | for i in range(n_features): 46 | f = X[:, i] 47 | t1[i] = midd(f, y) 48 | 49 | # make sure that j_cmi is positive at the very beginning 50 | j_cmim = 1 51 | 52 | while True: 53 | if len(F) == 0: 54 | # select the feature whose mutual information is the largest 55 | idx = np.argmax(t1) 56 | F.append(idx) 57 | f_select = X[:, idx] 58 | 59 | if is_n_selected_features_specified is True: 60 | if len(F) == n_selected_features: 61 | break 62 | if is_n_selected_features_specified is not True: 63 | if j_cmim <= 0: 64 | break 65 | 66 | # we assign an extreme small value to j_cmim to ensure it is smaller than all possible values of j_cmim 67 | j_cmim = -1000000000000 68 | for i in range(n_features): 69 | if i not in F: 70 | f = X[:, i] 71 | t2 = midd(f_select, f) 72 | t3 = cmidd(f_select, f, y) 73 | if t2-t3 > max[i]: 74 | max[i] = t2-t3 75 | # calculate j_cmim for feature i (not in F) 76 | t = t1[i] - max[i] 77 | # record the largest j_cmim and the corresponding feature index 78 | if t > j_cmim: 79 | j_cmim = t 80 | idx = i 81 | F.append(idx) 82 | f_select = X[:, idx] 83 | 84 | return np.array(F) -------------------------------------------------------------------------------- /src/skfeature/function/information_theoretical_based/DISR.py: -------------------------------------------------------------------------------- 1 | from skfeature.utility.entropy_estimators import * 2 | from skfeature.utility.mutual_information import conditional_entropy 3 | 4 | 5 | def disr(X, y, **kwargs): 6 | """ 7 | This function implement the DISR feature selection. 8 | The scoring criteria is calculated based on the formula j_disr=sum_j(I(f,fj;y)/H(f,fj,y)) 9 | 10 | Input 11 | ----- 12 | X: {numpy array}, shape (n_samples, n_features) 13 | input data, guaranteed to be a discrete data matrix 14 | y: {numpy array}, shape (n_samples,) 15 | input class labels 16 | 17 | kwargs: {dictionary} 18 | n_selected_features: {int} 19 | number of features to select 20 | 21 | Output 22 | ------ 23 | F: {numpy array}, shape (n_features, ) 24 | index of selected features, F[1] is the most important feature 25 | 26 | Reference 27 | --------- 28 | Brown, Gavin et al. "Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection." JMLR 2012. 29 | """ 30 | 31 | n_samples, n_features = X.shape 32 | # index of selected features, initialized to be empty 33 | F = [] 34 | # indicate whether the user specifies the number of features 35 | is_n_selected_features_specified = False 36 | 37 | if 'n_selected_features' in kwargs.keys(): 38 | n_selected_features = kwargs['n_selected_features'] 39 | is_n_selected_features_specified = True 40 | 41 | # sum stores sum_j(I(f,fj;y)/H(f,fj,y)) for each feature f 42 | sum = np.zeros(n_features) 43 | 44 | # make sure that j_cmi is positive at the very beginning 45 | j_disr = 1 46 | 47 | while True: 48 | if len(F) == 0: 49 | # t1 stores I(f;y) for each feature f 50 | t1 = np.zeros(n_features) 51 | for i in range(n_features): 52 | f = X[:, i] 53 | t1[i] = midd(f, y) 54 | # select the feature whose mutual information is the largest 55 | idx = np.argmax(t1) 56 | F.append(idx) 57 | f_select = X[:, idx] 58 | 59 | if is_n_selected_features_specified is True: 60 | if len(F) == n_selected_features: 61 | break 62 | if is_n_selected_features_specified is not True: 63 | if j_disr <= 0: 64 | break 65 | 66 | # we assign an extreme small value to j_disr to ensure that it is smaller than all possible value of j_disr 67 | j_disr = -1000000000000 68 | for i in range(n_features): 69 | if i not in F: 70 | f = X[:, i] 71 | t1 = midd(f_select, y) + cmidd(f, y, f_select) 72 | t2 = entropyd(f) + conditional_entropy(f_select, f) + (conditional_entropy(y, f_select) - cmidd(y, f, f_select)) 73 | sum[i] += np.true_divide(t1, t2) 74 | # record the largest j_disr and the corresponding feature index 75 | if sum[i] > j_disr: 76 | j_disr = sum[i] 77 | idx = i 78 | F.append(idx) 79 | f_select = X[:, idx] 80 | 81 | return np.array(F) 82 | 83 | -------------------------------------------------------------------------------- /src/skfeature/function/information_theoretical_based/FCBF.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from skfeature.utility.mutual_information import su_calculation 3 | 4 | 5 | def fcbf(X, y, **kwargs): 6 | """ 7 | This function implements Fast Correlation Based Filter algorithm 8 | 9 | Input 10 | ----- 11 | X: {numpy array}, shape (n_samples, n_features) 12 | input data, guaranteed to be discrete 13 | y: {numpy array}, shape (n_samples,) 14 | input class labels 15 | kwargs: {dictionary} 16 | delta: {float} 17 | delta is a threshold parameter, the default value of delta is 0 18 | 19 | Output 20 | ------ 21 | F: {numpy array}, shape (n_features,) 22 | index of selected features, F[1] is the most important feature 23 | 24 | Reference 25 | --------- 26 | Yu, Lei and Liu, Huan. "Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution." ICML 2003. 27 | """ 28 | 29 | n_samples, n_features = X.shape 30 | if 'delta' in kwargs.keys(): 31 | delta = kwargs['delta'] 32 | else: 33 | # the default value of delta is 0 34 | delta = 0 35 | 36 | # t1[:,0] stores index of features, t1[:,1] stores symmetrical uncertainty of features 37 | t1 = np.zeros((n_features, 2)) 38 | for i in range(n_features): 39 | f = X[:, i] 40 | t1[i, 0] = i 41 | t1[i, 1] = su_calculation(f, y) 42 | s_list = t1[t1[:, 1] > delta, :] 43 | # index of selected features, initialized to be empty 44 | F = [] 45 | while len(s_list) != 0: 46 | # select the largest su inside s_list 47 | idx = np.argmax(s_list[:, 1]) 48 | # record the index of the feature with the largest su 49 | fp = X[:, s_list[idx, 0]] 50 | np.delete(s_list, idx, 0) 51 | F.append(s_list[idx, 0]) 52 | for i in s_list[:, 0]: 53 | fi = X[:, i] 54 | if su_calculation(fp, fi) >= t1[i, 1]: 55 | # construct the mask for feature whose su is larger than su(fp,y) 56 | idx = s_list[:, 0] != i 57 | idx = np.array([idx, idx]) 58 | idx = np.transpose(idx) 59 | # delete the feature by using the mask 60 | s_list = s_list[idx] 61 | length = len(s_list)/2 62 | s_list = s_list.reshape((length, 2)) 63 | return np.array(F, dtype=int) -------------------------------------------------------------------------------- /src/skfeature/function/information_theoretical_based/ICAP.py: -------------------------------------------------------------------------------- 1 | from skfeature.utility.entropy_estimators import * 2 | 3 | 4 | def icap(X, y, **kwargs): 5 | """ 6 | This function implements the ICAP feature selection. 7 | The scoring criteria is calculated based on the formula j_icap = I(f;y) - max_j(0,(I(fj;f)-I(fj;f|y))) 8 | 9 | Input 10 | ----- 11 | X: {numpy array}, shape (n_samples, n_features) 12 | input data, guaranteed to be a discrete data matrix 13 | y: {numpy array}, shape (n_samples,) 14 | input class labels 15 | kwargs: {dictionary} 16 | n_selected_features: {int} 17 | number of features to select 18 | 19 | Output 20 | ------ 21 | F: {numpy array}, shape (n_features,) 22 | index of selected features, F(1) is the most important feature 23 | """ 24 | n_samples, n_features = X.shape 25 | # index of selected features, initialized to be empty 26 | F = [] 27 | # indicate whether the user specifies the number of features 28 | is_n_selected_features_specified = False 29 | if 'n_selected_features' in kwargs.keys(): 30 | n_selected_features = kwargs['n_selected_features'] 31 | is_n_selected_features_specified = True 32 | 33 | # t1 contains I(f;y) for each feature f 34 | t1 = np.zeros(n_features) 35 | # max contains max_j(0,(I(fj;f)-I(fj;f|y))) for each feature f 36 | max = np.zeros(n_features) 37 | for i in range(n_features): 38 | f = X[:, i] 39 | t1[i] = midd(f, y) 40 | 41 | # make sure that j_cmi is positive at the very beginning 42 | j_icap = 1 43 | 44 | while True: 45 | if len(F) == 0: 46 | # select the feature whose mutual information is the largest 47 | idx = np.argmax(t1) 48 | F.append(idx) 49 | f_select = X[:, idx] 50 | 51 | if is_n_selected_features_specified is True: 52 | if len(F) == n_selected_features: 53 | break 54 | if is_n_selected_features_specified is not True: 55 | if j_icap <= 0: 56 | break 57 | 58 | # we assign an extreme small value to j_icap to ensure it is smaller than all possible values of j_icap 59 | j_icap = -1000000000000 60 | for i in range(n_features): 61 | if i not in F: 62 | f = X[:, i] 63 | t2 = midd(f_select, f) 64 | t3 = cmidd(f_select, f, y) 65 | if t2-t3 > max[i]: 66 | max[i] = t2-t3 67 | # calculate j_icap for feature i (not in F) 68 | t = t1[i] - max[i] 69 | # record the largest j_icap and the corresponding feature index 70 | if t > j_icap: 71 | j_icap = t 72 | idx = i 73 | F.append(idx) 74 | f_select = X[:, idx] 75 | 76 | return np.array(F) 77 | -------------------------------------------------------------------------------- /src/skfeature/function/information_theoretical_based/JMI.py: -------------------------------------------------------------------------------- 1 | import LCSI 2 | 3 | 4 | def jmi(X, y, **kwargs): 5 | """ 6 | This function implements the JMI feature selection 7 | 8 | Input 9 | ----- 10 | X: {numpy array}, shape (n_samples, n_features) 11 | input data, guaranteed to be discrete 12 | y: {numpy array}, shape (n_samples,) 13 | input class labels 14 | kwargs: {dictionary} 15 | n_selected_features: {int} 16 | number of features to select 17 | 18 | Output 19 | ------ 20 | F: {numpy array}, shape (n_features,) 21 | index of selected features, F[1] is the most important feature 22 | 23 | Reference 24 | --------- 25 | Brown, Gavin et al. "Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection." JMLR 2012. 26 | """ 27 | if 'n_selected_features' in kwargs.keys(): 28 | n_selected_features = kwargs['n_selected_features'] 29 | F = LCSI.lcsi(X, y, function_name='JMI', n_selected_features=n_selected_features) 30 | else: 31 | F = LCSI.lcsi(X, y, function_name='JMI') 32 | return F -------------------------------------------------------------------------------- /src/skfeature/function/information_theoretical_based/LCSI.py: -------------------------------------------------------------------------------- 1 | from skfeature.utility.entropy_estimators import * 2 | 3 | 4 | def lcsi(X, y, **kwargs): 5 | """ 6 | This function implements the basic scoring criteria for linear combination of shannon information term. 7 | The scoring criteria is calculated based on the formula j_cmi=I(f;y)-beta*sum_j(I(fj;f))+gamma*sum(I(fj;f|y)) 8 | 9 | Input 10 | ----- 11 | X: {numpy array}, shape (n_samples, n_features) 12 | input data, guaranteed to be a discrete data matrix 13 | y: {numpy array}, shape (n_samples,) 14 | input class labels 15 | kwargs: {dictionary} 16 | Parameters for different feature selection algorithms. 17 | beta: {float} 18 | beta is the parameter in j_cmi=I(f;y)-beta*sum(I(fj;f))+gamma*sum(I(fj;f|y)) 19 | gamma: {float} 20 | gamma is the parameter in j_cmi=I(f;y)-beta*sum(I(fj;f))+gamma*sum(I(fj;f|y)) 21 | function_name: {string} 22 | name of the feature selection function 23 | n_selected_features: {int} 24 | number of features to select 25 | 26 | Output 27 | ------ 28 | F: {numpy array}, shape: (n_features,) 29 | index of selected features, F[1] is the most important feature 30 | 31 | Reference 32 | --------- 33 | Brown, Gavin et al. "Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection." JMLR 2012. 34 | """ 35 | 36 | n_samples, n_features = X.shape 37 | # index of selected features, initialized to be empty 38 | F = [] 39 | # indicate whether the user specifies the number of features 40 | is_n_selected_features_specified = False 41 | # initialize the parameters 42 | if 'beta' in kwargs.keys(): 43 | beta = kwargs['beta'] 44 | if 'gamma' in kwargs.keys(): 45 | gamma = kwargs['gamma'] 46 | if 'n_selected_features' in kwargs.keys(): 47 | n_selected_features = kwargs['n_selected_features'] 48 | is_n_selected_features_specified = True 49 | 50 | # select the feature whose j_cmi is the largest 51 | # t1 stores I(f;y) for each feature f 52 | t1 = np.zeros(n_features) 53 | # t2 sotres sum_j(I(fj;f)) for each feature f 54 | t2 = np.zeros(n_features) 55 | # t3 stores sum_j(I(fj;f|y)) for each feature f 56 | t3 = np.zeros(n_features) 57 | for i in range(n_features): 58 | f = X[:, i] 59 | t1[i] = midd(f, y) 60 | 61 | # make sure that j_cmi is positive at the very beginning 62 | j_cmi = 1 63 | 64 | while True: 65 | if len(F) == 0: 66 | # select the feature whose mutual information is the largest 67 | idx = np.argmax(t1) 68 | F.append(idx) 69 | f_select = X[:, idx] 70 | 71 | if is_n_selected_features_specified is True: 72 | if len(F) == n_selected_features: 73 | break 74 | if is_n_selected_features_specified is not True: 75 | if j_cmi < 0: 76 | break 77 | 78 | # we assign an extreme small value to j_cmi to ensure it is smaller than all possible values of j_cmi 79 | j_cmi = -1000000000000 80 | if 'function_name' in kwargs.keys(): 81 | if kwargs['function_name'] == 'MRMR': 82 | beta = 1.0 / len(F) 83 | elif kwargs['function_name'] == 'JMI': 84 | beta = 1.0 / len(F) 85 | gamma = 1.0 / len(F) 86 | for i in range(n_features): 87 | if i not in F: 88 | f = X[:, i] 89 | t2[i] += midd(f_select, f) 90 | t3[i] += cmidd(f_select, f, y) 91 | # calculate j_cmi for feature i (not in F) 92 | t = t1[i] - beta*t2[i] + gamma*t3[i] 93 | # record the largest j_cmi and the corresponding feature index 94 | if t > j_cmi: 95 | j_cmi = t 96 | idx = i 97 | F.append(idx) 98 | f_select = X[:, idx] 99 | 100 | return np.array(F) 101 | 102 | 103 | 104 | 105 | 106 | -------------------------------------------------------------------------------- /src/skfeature/function/information_theoretical_based/MIFS.py: -------------------------------------------------------------------------------- 1 | import LCSI 2 | 3 | 4 | def mifs(X, y, **kwargs): 5 | """ 6 | This function implements the MIFS feature selection 7 | 8 | Input 9 | ----- 10 | X: {numpy array}, shape (n_samples, n_features) 11 | input data, guaranteed to be discrete 12 | y: {numpy array}, shape (n_samples,) 13 | input class labels 14 | kwargs: {dictionary} 15 | n_selected_features: {int} 16 | number of features to select 17 | 18 | Output 19 | ------ 20 | F: {numpy array}, shape (n_features,) 21 | index of selected features, F[1] is the most important feature 22 | 23 | Reference 24 | --------- 25 | Brown, Gavin et al. "Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection." JMLR 2012. 26 | """ 27 | 28 | if 'beta' not in kwargs.keys(): 29 | beta = 0.5 30 | else: 31 | beta = kwargs['beta'] 32 | if 'n_selected_features' in kwargs.keys(): 33 | n_selected_features = kwargs['n_selected_features'] 34 | F = LCSI.lcsi(X, y, beta=beta, gamma=0, n_selected_features=n_selected_features) 35 | else: 36 | F = LCSI.lcsi(X, y, beta=beta, gamma=0) 37 | return F 38 | -------------------------------------------------------------------------------- /src/skfeature/function/information_theoretical_based/MIM.py: -------------------------------------------------------------------------------- 1 | import LCSI 2 | 3 | 4 | def mim(X, y, **kwargs): 5 | """ 6 | This function implements the MIM feature selection 7 | 8 | Input 9 | ----- 10 | X: {numpy array}, shape (n_samples, n_features) 11 | input data, guaranteed to be discrete 12 | y: {numpy array}, shape (n_samples,) 13 | input class labels 14 | kwargs: {dictionary} 15 | n_selected_features: {int} 16 | number of features to select 17 | 18 | Output 19 | ------ 20 | F: {numpy array}, shape (n_features, ) 21 | index of selected features, F[1] is the most important feature 22 | 23 | Reference 24 | --------- 25 | Brown, Gavin et al. "Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection." JMLR 2012. 26 | """ 27 | 28 | if 'n_selected_features' in kwargs.keys(): 29 | n_selected_features = kwargs['n_selected_features'] 30 | F = LCSI.lcsi(X, y, beta=0, gamma=0, n_selected_features=n_selected_features) 31 | else: 32 | F = LCSI.lcsi(X, y, beta=0, gamma=0) 33 | return F 34 | -------------------------------------------------------------------------------- /src/skfeature/function/information_theoretical_based/MRMR.py: -------------------------------------------------------------------------------- 1 | import LCSI 2 | 3 | 4 | def mrmr(X, y, **kwargs): 5 | """ 6 | This function implements the MRMR feature selection 7 | 8 | Input 9 | ----- 10 | X: {numpy array}, shape (n_samples, n_features) 11 | input data, guaranteed to be discrete 12 | y: {numpy array}, shape (n_samples,) 13 | input class labels 14 | kwargs: {dictionary} 15 | n_selected_features: {int} 16 | number of features to select 17 | 18 | Output 19 | ------ 20 | F: {numpy array}, shape (n_features,) 21 | index of selected features, F[1] is the most important feature 22 | 23 | Reference 24 | --------- 25 | Brown, Gavin et al. "Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection." JMLR 2012. 26 | """ 27 | if 'n_selected_features' in kwargs.keys(): 28 | n_selected_features = kwargs['n_selected_features'] 29 | F = LCSI.lcsi(X, y, gamma=0, function_name='MRMR', n_selected_features=n_selected_features) 30 | else: 31 | F = LCSI.lcsi(X, y, gamma=0, function_name='MRMR') 32 | return F -------------------------------------------------------------------------------- /src/skfeature/function/information_theoretical_based/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DavideNardone/A-Sparse-Coding-Based-Approach-for-Class-Specific-Feature-Selection/32a4eba5ca437c8fe853c001b7a769ed716801d7/src/skfeature/function/information_theoretical_based/__init__.py -------------------------------------------------------------------------------- /src/skfeature/function/similarity_based/SPEC.py: -------------------------------------------------------------------------------- 1 | import numpy.matlib 2 | import numpy as np 3 | from scipy.sparse import * 4 | from sklearn.metrics.pairwise import rbf_kernel 5 | from numpy import linalg as LA 6 | 7 | 8 | def spec(X, **kwargs): 9 | """ 10 | This function implements the SPEC feature selection 11 | 12 | Input 13 | ----- 14 | X: {numpy array}, shape (n_samples, n_features) 15 | input data 16 | kwargs: {dictionary} 17 | style: {int} 18 | style == -1, the first feature ranking function, use all eigenvalues 19 | style == 0, the second feature ranking function, use all except the 1st eigenvalue 20 | style >= 2, the third feature ranking function, use the first k except 1st eigenvalue 21 | W: {sparse matrix}, shape (n_samples, n_samples} 22 | input affinity matrix 23 | 24 | Output 25 | ------ 26 | w_fea: {numpy array}, shape (n_features,) 27 | SPEC feature score for each feature 28 | 29 | Reference 30 | --------- 31 | Zhao, Zheng and Liu, Huan. "Spectral Feature Selection for Supervised and Unsupervised Learning." ICML 2007. 32 | """ 33 | 34 | if 'style' not in kwargs: 35 | kwargs['style'] = 0 36 | if 'W' not in kwargs: 37 | kwargs['W'] = rbf_kernel(X, gamma=1) 38 | 39 | style = kwargs['style'] 40 | W = kwargs['W'] 41 | if type(W) is numpy.ndarray: 42 | W = csc_matrix(W) 43 | 44 | n_samples, n_features = X.shape 45 | 46 | # build the degree matrix 47 | X_sum = np.array(W.sum(axis=1)) 48 | D = np.zeros((n_samples, n_samples)) 49 | for i in range(n_samples): 50 | D[i, i] = X_sum[i] 51 | 52 | # build the laplacian matrix 53 | L = D - W 54 | d1 = np.power(np.array(W.sum(axis=1)), -0.5) 55 | d1[np.isinf(d1)] = 0 56 | d2 = np.power(np.array(W.sum(axis=1)), 0.5) 57 | v = np.dot(np.diag(d2[:, 0]), np.ones(n_samples)) 58 | v = v/LA.norm(v) 59 | 60 | # build the normalized laplacian matrix 61 | L_hat = (np.matlib.repmat(d1, 1, n_samples)) * np.array(L) * np.matlib.repmat(np.transpose(d1), n_samples, 1) 62 | 63 | # calculate and construct spectral information 64 | s, U = np.linalg.eigh(L_hat) 65 | s = np.flipud(s) 66 | U = np.fliplr(U) 67 | 68 | # begin to select features 69 | w_fea = np.ones(n_features)*1000 70 | 71 | for i in range(n_features): 72 | f = X[:, i] 73 | F_hat = np.dot(np.diag(d2[:, 0]), f) 74 | l = LA.norm(F_hat) 75 | if l < 100*np.spacing(1): 76 | w_fea[i] = 1000 77 | continue 78 | else: 79 | F_hat = F_hat/l 80 | a = np.array(np.dot(np.transpose(F_hat), U)) 81 | a = np.multiply(a, a) 82 | a = np.transpose(a) 83 | 84 | # use f'Lf formulation 85 | if style == -1: 86 | w_fea[i] = np.sum(a * s) 87 | # using all eigenvalues except the 1st 88 | elif style == 0: 89 | a1 = a[0:n_samples-1] 90 | w_fea[i] = np.sum(a1 * s[0:n_samples-1])/(1-np.power(np.dot(np.transpose(F_hat), v), 2)) 91 | # use first k except the 1st 92 | else: 93 | a1 = a[n_samples-style:n_samples-1] 94 | w_fea[i] = np.sum(a1 * (2-s[n_samples-style: n_samples-1])) 95 | 96 | if style != -1 and style != 0: 97 | w_fea[w_fea == 1000] = -1000 98 | 99 | return w_fea 100 | 101 | 102 | def feature_ranking(score, **kwargs): 103 | if 'style' not in kwargs: 104 | kwargs['style'] = 0 105 | style = kwargs['style'] 106 | 107 | # if style = -1 or 0, ranking features in descending order, the higher the score, the more important the feature is 108 | if style == -1 or style == 0: 109 | idx = np.argsort(score, 0) 110 | return idx[::-1] 111 | # if style != -1 and 0, ranking features in ascending order, the lower the score, the more important the feature is 112 | elif style != -1 and style != 0: 113 | idx = np.argsort(score, 0) 114 | return idx -------------------------------------------------------------------------------- /src/skfeature/function/similarity_based/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DavideNardone/A-Sparse-Coding-Based-Approach-for-Class-Specific-Feature-Selection/32a4eba5ca437c8fe853c001b7a769ed716801d7/src/skfeature/function/similarity_based/__init__.py -------------------------------------------------------------------------------- /src/skfeature/function/similarity_based/fisher_score.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from scipy.sparse import * 3 | from skfeature.utility.construct_W import construct_W 4 | 5 | 6 | def fisher_score(X, y): 7 | """ 8 | This function implements the fisher score feature selection, steps are as follows: 9 | 1. Construct the affinity matrix W in fisher score way 10 | 2. For the r-th feature, we define fr = X(:,r), D = diag(W*ones), ones = [1,...,1]', L = D - W 11 | 3. Let fr_hat = fr - (fr'*D*ones)*ones/(ones'*D*ones) 12 | 4. Fisher score for the r-th feature is score = (fr_hat'*D*fr_hat)/(fr_hat'*L*fr_hat)-1 13 | 14 | Input 15 | ----- 16 | X: {numpy array}, shape (n_samples, n_features) 17 | input data 18 | y: {numpy array}, shape (n_samples,) 19 | input class labels 20 | 21 | Output 22 | ------ 23 | score: {numpy array}, shape (n_features,) 24 | fisher score for each feature 25 | 26 | Reference 27 | --------- 28 | He, Xiaofei et al. "Laplacian Score for Feature Selection." NIPS 2005. 29 | Duda, Richard et al. "Pattern classification." John Wiley & Sons, 2012. 30 | """ 31 | 32 | # Construct weight matrix W in a fisherScore way 33 | kwargs = {"neighbor_mode": "supervised", "fisher_score": True, 'y': y} 34 | W = construct_W(X, **kwargs) 35 | 36 | # build the diagonal D matrix from affinity matrix W 37 | D = np.array(W.sum(axis=1)) 38 | L = W 39 | tmp = np.dot(np.transpose(D), X) 40 | D = diags(np.transpose(D), [0]) 41 | Xt = np.transpose(X) 42 | t1 = np.transpose(np.dot(Xt, D.todense())) 43 | t2 = np.transpose(np.dot(Xt, L.todense())) 44 | # compute the numerator of Lr 45 | D_prime = np.sum(np.multiply(t1, X), 0) - np.multiply(tmp, tmp)/D.sum() 46 | # compute the denominator of Lr 47 | L_prime = np.sum(np.multiply(t2, X), 0) - np.multiply(tmp, tmp)/D.sum() 48 | # avoid the denominator of Lr to be 0 49 | D_prime[D_prime < 1e-12] = 10000 50 | lap_score = 1 - np.array(np.multiply(L_prime, 1/D_prime))[0, :] 51 | 52 | # compute fisher score from laplacian score, where fisher_score = 1/lap_score - 1 53 | score = 1.0/lap_score - 1 54 | return np.transpose(score) 55 | 56 | 57 | def feature_ranking(score): 58 | """ 59 | Rank features in descending order according to fisher score, the larger the fisher score, the more important the 60 | feature is 61 | """ 62 | idx = np.argsort(score, 0) 63 | return idx[::-1] -------------------------------------------------------------------------------- /src/skfeature/function/similarity_based/lap_score.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from scipy.sparse import * 3 | from skfeature.utility.construct_W import construct_W 4 | 5 | 6 | def lap_score(X, **kwargs): 7 | """ 8 | This function implements the laplacian score feature selection, steps are as follows: 9 | 1. Construct the affinity matrix W if it is not specified 10 | 2. For the r-th feature, we define fr = X(:,r), D = diag(W*ones), ones = [1,...,1]', L = D - W 11 | 3. Let fr_hat = fr - (fr'*D*ones)*ones/(ones'*D*ones) 12 | 4. Laplacian score for the r-th feature is score = (fr_hat'*L*fr_hat)/(fr_hat'*D*fr_hat) 13 | 14 | Input 15 | ----- 16 | X: {numpy array}, shape (n_samples, n_features) 17 | input data 18 | kwargs: {dictionary} 19 | W: {sparse matrix}, shape (n_samples, n_samples) 20 | input affinity matrix 21 | 22 | Output 23 | ------ 24 | score: {numpy array}, shape (n_features,) 25 | laplacian score for each feature 26 | 27 | Reference 28 | --------- 29 | He, Xiaofei et al. "Laplacian Score for Feature Selection." NIPS 2005. 30 | """ 31 | 32 | # if 'W' is not specified, use the default W 33 | if 'W' not in kwargs.keys(): 34 | W = construct_W(X) 35 | # construct the affinity matrix W 36 | W = kwargs['W'] 37 | # build the diagonal D matrix from affinity matrix W 38 | D = np.array(W.sum(axis=1)) 39 | L = W 40 | tmp = np.dot(np.transpose(D), X) 41 | D = diags(np.transpose(D), [0]) 42 | Xt = np.transpose(X) 43 | t1 = np.transpose(np.dot(Xt, D.todense())) 44 | t2 = np.transpose(np.dot(Xt, L.todense())) 45 | # compute the numerator of Lr 46 | D_prime = np.sum(np.multiply(t1, X), 0) - np.multiply(tmp, tmp)/D.sum() 47 | # compute the denominator of Lr 48 | L_prime = np.sum(np.multiply(t2, X), 0) - np.multiply(tmp, tmp)/D.sum() 49 | # avoid the denominator of Lr to be 0 50 | D_prime[D_prime < 1e-12] = 10000 51 | 52 | # compute laplacian score for all features 53 | score = 1 - np.array(np.multiply(L_prime, 1/D_prime))[0, :] 54 | return np.transpose(score) 55 | 56 | 57 | def feature_ranking(score): 58 | """ 59 | Rank features in ascending order according to their laplacian scores, the smaller the laplacian score is, the more 60 | important the feature is 61 | """ 62 | idx = np.argsort(score, 0) 63 | return idx 64 | -------------------------------------------------------------------------------- /src/skfeature/function/similarity_based/reliefF.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from random import randrange 3 | from sklearn.metrics.pairwise import pairwise_distances 4 | 5 | 6 | def reliefF(X, y, **kwargs): 7 | """ 8 | This function implements the reliefF feature selection 9 | 10 | Input 11 | ----- 12 | X: {numpy array}, shape (n_samples, n_features) 13 | input data 14 | y: {numpy array}, shape (n_samples,) 15 | input class labels 16 | kwargs: {dictionary} 17 | parameters of reliefF: 18 | k: {int} 19 | choices for the number of neighbors (default k = 5) 20 | 21 | Output 22 | ------ 23 | score: {numpy array}, shape (n_features,) 24 | reliefF score for each feature 25 | 26 | Reference 27 | --------- 28 | Robnik-Sikonja, Marko et al. "Theoretical and empirical analysis of relieff and rrelieff." Machine Learning 2003. 29 | Zhao, Zheng et al. "On Similarity Preserving Feature Selection." TKDE 2013. 30 | """ 31 | 32 | if "k" not in kwargs.keys(): 33 | k = 5 34 | else: 35 | k = kwargs["k"] 36 | n_samples, n_features = X.shape 37 | 38 | # calculate pairwise distances between instances 39 | distance = pairwise_distances(X, metric='manhattan') 40 | 41 | score = np.zeros(n_features) 42 | 43 | # the number of sampled instances is equal to the number of total instances 44 | for iter in range(n_samples): 45 | idx = randrange(0, n_samples, 1) 46 | near_hit = [] 47 | near_miss = dict() 48 | 49 | self_fea = X[idx, :] 50 | c = np.unique(y).tolist() 51 | 52 | stop_dict = dict() 53 | for label in c: 54 | stop_dict[label] = 0 55 | del c[c.index(y[idx])] 56 | 57 | p_dict = dict() 58 | p_label_idx = float(len(y[y == y[idx]]))/float(n_samples) 59 | 60 | for label in c: 61 | p_label_c = float(len(y[y == label]))/float(n_samples) 62 | p_dict[label] = p_label_c/(1-p_label_idx) 63 | near_miss[label] = [] 64 | 65 | distance_sort = [] 66 | distance[idx, idx] = np.max(distance[idx, :]) 67 | 68 | for i in range(n_samples): 69 | distance_sort.append([distance[idx, i], int(i), y[i]]) 70 | distance_sort.sort(key=lambda x: x[0]) 71 | 72 | for i in range(n_samples): 73 | # find k nearest hit points 74 | if distance_sort[i][2] == y[idx]: 75 | if len(near_hit) < k: 76 | near_hit.append(distance_sort[i][1]) 77 | elif len(near_hit) == k: 78 | stop_dict[y[idx]] = 1 79 | else: 80 | # find k nearest miss points for each label 81 | if len(near_miss[distance_sort[i][2]]) < k: 82 | near_miss[distance_sort[i][2]].append(distance_sort[i][1]) 83 | else: 84 | if len(near_miss[distance_sort[i][2]]) == k: 85 | stop_dict[distance_sort[i][2]] = 1 86 | stop = True 87 | for (key, value) in stop_dict.items(): 88 | if value != 1: 89 | stop = False 90 | if stop: 91 | break 92 | 93 | # update reliefF score 94 | near_hit_term = np.zeros(n_features) 95 | for ele in near_hit: 96 | near_hit_term = np.array(abs(self_fea-X[ele, :]))+np.array(near_hit_term) 97 | 98 | near_miss_term = dict() 99 | for (label, miss_list) in near_miss.items(): 100 | near_miss_term[label] = np.zeros(n_features) 101 | for ele in miss_list: 102 | near_miss_term[label] = np.array(abs(self_fea-X[ele, :]))+np.array(near_miss_term[label]) 103 | score += near_miss_term[label]/(k*p_dict[label]) 104 | score -= near_hit_term/k 105 | return score 106 | 107 | 108 | def feature_ranking(score): 109 | """ 110 | Rank features in descending order according to reliefF score, the higher the reliefF score, the more important the 111 | feature is 112 | """ 113 | idx = np.argsort(score, 0) 114 | return idx[::-1] 115 | 116 | 117 | 118 | 119 | 120 | 121 | -------------------------------------------------------------------------------- /src/skfeature/function/similarity_based/trace_ratio.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from skfeature.utility.construct_W import construct_W 3 | 4 | 5 | def trace_ratio(X, y, n_selected_features, **kwargs): 6 | """ 7 | This function implements the trace ratio criterion for feature selection 8 | 9 | Input 10 | ----- 11 | X: {numpy array}, shape (n_samples, n_features) 12 | input data 13 | y: {numpy array}, shape (n_samples,) 14 | input class labels 15 | n_selected_features: {int} 16 | number of features to select 17 | kwargs: {dictionary} 18 | style: {string} 19 | style == 'fisher', build between-class matrix and within-class affinity matrix in a fisher score way 20 | style == 'laplacian', build between-class matrix and within-class affinity matrix in a laplacian score way 21 | verbose: {boolean} 22 | True if user want to print out the objective function value in each iteration, False if not 23 | 24 | Output 25 | ------ 26 | feature_idx: {numpy array}, shape (n_features,) 27 | the ranked (descending order) feature index based on subset-level score 28 | feature_score: {numpy array}, shape (n_features,) 29 | the feature-level score 30 | subset_score: {float} 31 | the subset-level score 32 | 33 | Reference 34 | --------- 35 | Feiping Nie et al. "Trace Ratio Criterion for Feature Selection." AAAI 2008. 36 | """ 37 | 38 | # if 'style' is not specified, use the fisher score way to built two affinity matrix 39 | if 'style' not in kwargs.keys(): 40 | kwargs['style'] = 'fisher' 41 | # get the way to build affinity matrix, 'fisher' or 'laplacian' 42 | style = kwargs['style'] 43 | n_samples, n_features = X.shape 44 | 45 | # if 'verbose' is not specified, do not output the value of objective function 46 | if 'verbose' not in kwargs: 47 | kwargs['verbose'] = False 48 | verbose = kwargs['verbose'] 49 | 50 | if style is 'fisher': 51 | kwargs_within = {"neighbor_mode": "supervised", "fisher_score": True, 'y': y} 52 | # build within class and between class laplacian matrix L_w and L_b 53 | W_within = construct_W(X, **kwargs_within) 54 | L_within = np.eye(n_samples) - W_within 55 | L_tmp = np.eye(n_samples) - np.ones([n_samples, n_samples])/n_samples 56 | L_between = L_within - L_tmp 57 | 58 | if style is 'laplacian': 59 | kwargs_within = {"metric": "euclidean", "neighbor_mode": "knn", "weight_mode": "heat_kernel", "k": 5, 't': 1} 60 | # build within class and between class laplacian matrix L_w and L_b 61 | W_within = construct_W(X, **kwargs_within) 62 | D_within = np.diag(np.array(W_within.sum(1))[:, 0]) 63 | L_within = D_within - W_within 64 | W_between = np.dot(np.dot(D_within, np.ones([n_samples, n_samples])), D_within)/np.sum(D_within) 65 | D_between = np.diag(np.array(W_between.sum(1))) 66 | L_between = D_between - W_between 67 | 68 | # build X'*L_within*X and X'*L_between*X 69 | L_within = (np.transpose(L_within) + L_within)/2 70 | L_between = (np.transpose(L_between) + L_between)/2 71 | S_within = np.array(np.dot(np.dot(np.transpose(X), L_within), X)) 72 | S_between = np.array(np.dot(np.dot(np.transpose(X), L_between), X)) 73 | 74 | # reflect the within-class or local affinity relationship encoded on graph, Sw = X*Lw*X' 75 | S_within = (np.transpose(S_within) + S_within)/2 76 | # reflect the between-class or global affinity relationship encoded on graph, Sb = X*Lb*X' 77 | S_between = (np.transpose(S_between) + S_between)/2 78 | 79 | # take the absolute values of diagonal 80 | s_within = np.absolute(S_within.diagonal()) 81 | s_between = np.absolute(S_between.diagonal()) 82 | s_between[s_between == 0] = 1e-14 # this number if from authors' code 83 | 84 | # preprocessing 85 | fs_idx = np.argsort(np.divide(s_between, s_within), 0)[::-1] 86 | k = np.sum(s_between[0:n_selected_features])/np.sum(s_within[0:n_selected_features]) 87 | s_within = s_within[fs_idx[0:n_selected_features]] 88 | s_between = s_between[fs_idx[0:n_selected_features]] 89 | 90 | # iterate util converge 91 | count = 0 92 | while True: 93 | score = np.sort(s_between-k*s_within)[::-1] 94 | I = np.argsort(s_between-k*s_within)[::-1] 95 | idx = I[0:n_selected_features] 96 | old_k = k 97 | k = np.sum(s_between[idx])/np.sum(s_within[idx]) 98 | if verbose: 99 | print 'obj at iter ' + str(count+1) + ': ' + str(k) 100 | count += 1 101 | if abs(k - old_k) < 1e-3: 102 | break 103 | 104 | # get feature index, feature-level score and subset-level score 105 | feature_idx = fs_idx[I] 106 | feature_score = score 107 | subset_score = k 108 | 109 | return feature_idx, feature_score, subset_score 110 | 111 | 112 | 113 | -------------------------------------------------------------------------------- /src/skfeature/function/sparse_learning_based/MCFS.py: -------------------------------------------------------------------------------- 1 | import scipy 2 | import numpy as np 3 | from sklearn import linear_model 4 | from skfeature.utility.construct_W import construct_W 5 | 6 | 7 | def mcfs(X, n_selected_features, **kwargs): 8 | """ 9 | This function implements unsupervised feature selection for multi-cluster data. 10 | 11 | Input 12 | ----- 13 | X: {numpy array}, shape (n_samples, n_features) 14 | input data 15 | n_selected_features: {int} 16 | number of features to select 17 | kwargs: {dictionary} 18 | W: {sparse matrix}, shape (n_samples, n_samples) 19 | affinity matrix 20 | n_clusters: {int} 21 | number of clusters (default is 5) 22 | 23 | Output 24 | ------ 25 | W: {numpy array}, shape(n_features, n_clusters) 26 | feature weight matrix 27 | 28 | Reference 29 | --------- 30 | Cai, Deng et al. "Unsupervised Feature Selection for Multi-Cluster Data." KDD 2010. 31 | """ 32 | 33 | # use the default affinity matrix 34 | if 'W' not in kwargs: 35 | W = construct_W(X) 36 | else: 37 | W = kwargs['W'] 38 | # default number of clusters is 5 39 | if 'n_clusters' not in kwargs: 40 | n_clusters = 5 41 | else: 42 | n_clusters = kwargs['n_clusters'] 43 | 44 | # solve the generalized eigen-decomposition problem and get the top K 45 | # eigen-vectors with respect to the smallest eigenvalues 46 | W = W.toarray() 47 | W = (W + W.T) / 2 48 | W_norm = np.diag(np.sqrt(1 / W.sum(1))) 49 | W = np.dot(W_norm, np.dot(W, W_norm)) 50 | WT = W.T 51 | W[W < WT] = WT[W < WT] 52 | eigen_value, ul = scipy.linalg.eigh(a=W) 53 | Y = np.dot(W_norm, ul[:, -1*n_clusters-1:-1]) 54 | 55 | # solve K L1-regularized regression problem using LARs algorithm with cardinality constraint being d 56 | n_sample, n_feature = X.shape 57 | W = np.zeros((n_feature, n_clusters)) 58 | for i in range(n_clusters): 59 | clf = linear_model.Lars(n_nonzero_coefs=n_selected_features) 60 | clf.fit(X, Y[:, i]) 61 | W[:, i] = clf.coef_ 62 | return W 63 | 64 | 65 | def feature_ranking(W): 66 | """ 67 | This function computes MCFS score and ranking features according to feature weights matrix W 68 | """ 69 | mcfs_score = W.max(1) 70 | idx = np.argsort(mcfs_score, 0) 71 | idx = idx[::-1] 72 | return idx -------------------------------------------------------------------------------- /src/skfeature/function/sparse_learning_based/NDFS.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import sys 3 | import math 4 | import sklearn.cluster 5 | from skfeature.utility.construct_W import construct_W 6 | 7 | 8 | def ndfs(X, **kwargs): 9 | """ 10 | This function implement unsupervised feature selection using nonnegative spectral analysis, i.e., 11 | min_{F,W} Tr(F^T L F) + alpha*(||XW-F||_F^2 + beta*||W||_{2,1}) + gamma/2 * ||F^T F - I||_F^2 12 | s.t. F >= 0 13 | 14 | Input 15 | ----- 16 | X: {numpy array}, shape (n_samples, n_features) 17 | input data 18 | kwargs: {dictionary} 19 | W: {sparse matrix}, shape {n_samples, n_samples} 20 | affinity matrix 21 | alpha: {float} 22 | Parameter alpha in objective function 23 | beta: {float} 24 | Parameter beta in objective function 25 | gamma: {float} 26 | a very large number used to force F^T F = I 27 | F0: {numpy array}, shape (n_samples, n_clusters) 28 | initialization of the pseudo label matirx F, if not provided 29 | n_clusters: {int} 30 | number of clusters 31 | verbose: {boolean} 32 | True if user want to print out the objective function value in each iteration, false if not 33 | 34 | Output 35 | ------ 36 | W: {numpy array}, shape(n_features, n_clusters) 37 | feature weight matrix 38 | 39 | Reference: 40 | Li, Zechao, et al. "Unsupervised Feature Selection Using Nonnegative Spectral Analysis." AAAI. 2012. 41 | """ 42 | 43 | # default gamma is 10e8 44 | if 'gamma' not in kwargs: 45 | gamma = 10e8 46 | else: 47 | gamma = kwargs['gamma'] 48 | # use the default affinity matrix 49 | if 'W' not in kwargs: 50 | W = construct_W(X) 51 | else: 52 | W = kwargs['W'] 53 | if 'alpha' not in kwargs: 54 | alpha = 1 55 | else: 56 | alpha = kwargs['alpha'] 57 | if 'beta' not in kwargs: 58 | beta = 1 59 | else: 60 | beta = kwargs['beta'] 61 | if 'F0' not in kwargs: 62 | if 'n_clusters' not in kwargs: 63 | print >>sys.stderr, "either F0 or n_clusters should be provided" 64 | else: 65 | # initialize F 66 | n_clusters = kwargs['n_clusters'] 67 | F = kmeans_initialization(X, n_clusters) 68 | else: 69 | F = kwargs['F0'] 70 | if 'verbose' not in kwargs: 71 | verbose = False 72 | else: 73 | verbose = kwargs['verbose'] 74 | 75 | n_samples, n_features = X.shape 76 | 77 | # initialize D as identity matrix 78 | D = np.identity(n_features) 79 | I = np.identity(n_samples) 80 | 81 | # build laplacian matrix 82 | L = np.array(W.sum(1))[:, 0] - W 83 | 84 | max_iter = 1000 85 | obj = np.zeros(max_iter) 86 | for iter_step in range(max_iter): 87 | # update W 88 | T = np.linalg.inv(np.dot(X.transpose(), X) + beta * D + 1e-6*np.eye(n_features)) 89 | W = np.dot(np.dot(T, X.transpose()), F) 90 | # update D 91 | temp = np.sqrt((W*W).sum(1)) 92 | temp[temp < 1e-16] = 1e-16 93 | temp = 0.5 / temp 94 | D = np.diag(temp) 95 | # update M 96 | M = L + alpha * (I - np.dot(np.dot(X, T), X.transpose())) 97 | M = (M + M.transpose())/2 98 | # update F 99 | denominator = np.dot(M, F) + gamma*np.dot(np.dot(F, F.transpose()), F) 100 | temp = np.divide(gamma*F, denominator) 101 | F = F*np.array(temp) 102 | temp = np.diag(np.sqrt(np.diag(1 / (np.dot(F.transpose(), F) + 1e-16)))) 103 | F = np.dot(F, temp) 104 | 105 | # calculate objective function 106 | obj[iter_step] = np.trace(np.dot(np.dot(F.transpose(), M), F)) + gamma/4*np.linalg.norm(np.dot(F.transpose(), F)-np.identity(n_clusters), 'fro') 107 | if verbose: 108 | print 'obj at iter ' + str(iter_step+1) + ': ' + str(obj[iter_step]) 109 | 110 | if iter_step >= 1 and math.fabs(obj[iter_step] - obj[iter_step-1]) < 1e-3: 111 | break 112 | return W 113 | 114 | 115 | def kmeans_initialization(X, n_clusters): 116 | """ 117 | This function uses kmeans to initialize the pseudo label 118 | 119 | Input 120 | ----- 121 | X: {numpy array}, shape (n_samples, n_features) 122 | input data 123 | n_clusters: {int} 124 | number of clusters 125 | 126 | Output 127 | ------ 128 | Y: {numpy array}, shape (n_samples, n_clusters) 129 | pseudo label matrix 130 | """ 131 | 132 | n_samples, n_features = X.shape 133 | kmeans = sklearn.cluster.KMeans(n_clusters=n_clusters, init='k-means++', n_init=10, max_iter=300, 134 | tol=0.0001, precompute_distances=True, verbose=0, 135 | random_state=None, copy_x=True, n_jobs=1) 136 | kmeans.fit(X) 137 | labels = kmeans.labels_ 138 | Y = np.zeros((n_samples, n_clusters)) 139 | for row in range(0, n_samples): 140 | Y[row, labels[row]] = 1 141 | T = np.dot(Y.transpose(), Y) 142 | F = np.dot(Y, np.sqrt(np.linalg.inv(T))) 143 | F = F + 0.02*np.ones((n_samples, n_clusters)) 144 | return F 145 | 146 | 147 | def calculate_obj(X, W, F, L, alpha, beta): 148 | """ 149 | This function calculates the objective function of NDFS 150 | """ 151 | # Tr(F^T L F) 152 | T1 = np.trace(np.dot(np.dot(F.transpose(), L), F)) 153 | T2 = np.linalg.norm(np.dot(X, W) - F, 'fro') 154 | T3 = (np.sqrt((W*W).sum(1))).sum() 155 | obj = T1 + alpha*(T2 + beta*T3) 156 | return obj -------------------------------------------------------------------------------- /src/skfeature/function/sparse_learning_based/RFS.py: -------------------------------------------------------------------------------- 1 | import math 2 | import numpy as np 3 | from numpy import linalg as LA 4 | from skfeature.utility.sparse_learning import generate_diagonal_matrix 5 | from skfeature.utility.sparse_learning import calculate_l21_norm 6 | 7 | 8 | def rfs(X, Y, **kwargs): 9 | """ 10 | This function implementS efficient and robust feature selection via joint l21-norms minimization 11 | min_W||X^T W - Y||_2,1 + gamma||W||_2,1 12 | 13 | Input 14 | ----- 15 | X: {numpy array}, shape (n_samples, n_features) 16 | input data 17 | Y: {numpy array}, shape (n_samples, n_classes) 18 | input class label matrix, each row is a one-hot-coding class label 19 | kwargs: {dictionary} 20 | gamma: {float} 21 | parameter in RFS 22 | verbose: boolean 23 | True if want to display the objective function value, false if not 24 | 25 | Output 26 | ------ 27 | W: {numpy array}, shape(n_samples, n_features) 28 | feature weight matrix 29 | 30 | Reference 31 | --------- 32 | Nie, Feiping et al. "Efficient and Robust Feature Selection via Joint l2,1-Norms Minimization" NIPS 2010. 33 | """ 34 | 35 | # default gamma is 1 36 | if 'gamma' not in kwargs: 37 | gamma = 1 38 | else: 39 | gamma = kwargs['gamma'] 40 | if 'verbose' not in kwargs: 41 | verbose = False 42 | else: 43 | verbose = kwargs['verbose'] 44 | 45 | n_samples, n_features = X.shape 46 | A = np.zeros((n_samples, n_samples + n_features)) 47 | A[:, 0:n_features] = X 48 | A[:, n_features:n_features+n_samples] = gamma*np.eye(n_samples) 49 | D = np.eye(n_features+n_samples) 50 | 51 | max_iter = 1000 52 | obj = np.zeros(max_iter) 53 | for iter_step in range(max_iter): 54 | # update U as U = D^{-1} A^T (A D^-1 A^T)^-1 Y 55 | D_inv = LA.inv(D) 56 | temp = LA.inv(np.dot(np.dot(A, D_inv), A.T) + 1e-6*np.eye(n_samples)) # (A D^-1 A^T)^-1 57 | U = np.dot(np.dot(np.dot(D_inv, A.T), temp), Y) 58 | # update D as D_ii = 1 / 2 / ||U(i,:)|| 59 | D = generate_diagonal_matrix(U) 60 | 61 | obj[iter_step] = calculate_obj(X, Y, U[0:n_features, :], gamma) 62 | 63 | if verbose: 64 | print 'obj at iter ' + str(iter_step+1) + ': ' + str(obj[iter_step]) 65 | if iter_step >= 1 and math.fabs(obj[iter_step] - obj[iter_step-1]) < 1e-3: 66 | break 67 | 68 | # the first d rows of U are the feature weights 69 | W = U[0:n_features, :] 70 | return W 71 | 72 | 73 | def calculate_obj(X, Y, W, gamma): 74 | """ 75 | This function calculates the objective function of rfs 76 | """ 77 | temp = np.dot(X, W) - Y 78 | return calculate_l21_norm(temp) + gamma*calculate_l21_norm(W) -------------------------------------------------------------------------------- /src/skfeature/function/sparse_learning_based/UDFS.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import scipy 3 | import math 4 | from skfeature.utility.sparse_learning import generate_diagonal_matrix, calculate_l21_norm 5 | from sklearn.metrics.pairwise import pairwise_distances 6 | 7 | 8 | def udfs(X, **kwargs): 9 | """ 10 | This function implements l2,1-norm regularized discriminative feature 11 | selection for unsupervised learning, i.e., min_W Tr(W^T M W) + gamma ||W||_{2,1}, s.t. W^T W = I 12 | 13 | Input 14 | ----- 15 | X: {numpy array}, shape (n_samples, n_features) 16 | input data 17 | kwargs: {dictionary} 18 | gamma: {float} 19 | parameter in the objective function of UDFS (default is 1) 20 | n_clusters: {int} 21 | Number of clusters 22 | k: {int} 23 | number of nearest neighbor 24 | verbose: {boolean} 25 | True if want to display the objective function value, false if not 26 | 27 | Output 28 | ------ 29 | W: {numpy array}, shape(n_features, n_clusters) 30 | feature weight matrix 31 | 32 | Reference 33 | Yang, Yi et al. "l2,1-Norm Regularized Discriminative Feature Selection for Unsupervised Learning." AAAI 2012. 34 | """ 35 | 36 | # default gamma is 0.1 37 | if 'gamma' not in kwargs: 38 | gamma = 0.1 39 | else: 40 | gamma = kwargs['gamma'] 41 | # default k is set to be 5 42 | if 'k' not in kwargs: 43 | k = 5 44 | else: 45 | k = kwargs['k'] 46 | if 'n_clusters' not in kwargs: 47 | n_clusters = 5 48 | else: 49 | n_clusters = kwargs['n_clusters'] 50 | if 'verbose' not in kwargs: 51 | verbose = False 52 | else: 53 | verbose = kwargs['verbose'] 54 | 55 | # construct M 56 | n_sample, n_feature = X.shape 57 | M = construct_M(X, k, gamma) 58 | 59 | D = np.eye(n_feature) 60 | max_iter = 1000 61 | obj = np.zeros(max_iter) 62 | for iter_step in range(max_iter): 63 | # update W as the eigenvectors of P corresponding to the first n_clusters 64 | # smallest eigenvalues 65 | P = M + gamma*D 66 | eigen_value, eigen_vector = scipy.linalg.eigh(a=P) 67 | W = eigen_vector[:, 0:n_clusters] 68 | # update D as D_ii = 1 / 2 / ||W(i,:)|| 69 | D = generate_diagonal_matrix(W) 70 | 71 | obj[iter_step] = calculate_obj(X, W, M, gamma) 72 | if verbose: 73 | print 'obj at iter ' + str(iter_step+1) + ': ' + str(obj[iter_step]) 74 | 75 | if iter_step >= 1 and math.fabs(obj[iter_step] - obj[iter_step-1]) < 1e-3: 76 | break 77 | return W 78 | 79 | 80 | def construct_M(X, k, gamma): 81 | """ 82 | This function constructs the M matrix described in the paper 83 | """ 84 | n_sample, n_feature = X.shape 85 | Xt = X.T 86 | D = pairwise_distances(X) 87 | # sort the distance matrix D in ascending order 88 | idx = np.argsort(D, axis=1) 89 | # choose the k-nearest neighbors for each instance 90 | idx_new = idx[:, 0:k+1] 91 | H = np.eye(k+1) - 1/(k+1) * np.ones((k+1, k+1)) 92 | I = np.eye(k+1) 93 | Mi = np.zeros((n_sample, n_sample)) 94 | for i in range(n_sample): 95 | Xi = Xt[:, idx_new[i, :]] 96 | Xi_tilde =np.dot(Xi, H) 97 | Bi = np.linalg.inv(np.dot(Xi_tilde.T, Xi_tilde) + gamma*I) 98 | Si = np.zeros((n_sample, k+1)) 99 | for q in range(k+1): 100 | Si[idx_new[q], q] = 1 101 | Mi = Mi + np.dot(np.dot(Si, np.dot(np.dot(H, Bi), H)), Si.T) 102 | M = np.dot(np.dot(X.T, Mi), X) 103 | return M 104 | 105 | 106 | def calculate_obj(X, W, M, gamma): 107 | """ 108 | This function calculates the objective function of ls_l21 described in the paper 109 | """ 110 | return np.trace(np.dot(np.dot(W.T, M), W)) + gamma*calculate_l21_norm(W) -------------------------------------------------------------------------------- /src/skfeature/function/sparse_learning_based/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DavideNardone/A-Sparse-Coding-Based-Approach-for-Class-Specific-Feature-Selection/32a4eba5ca437c8fe853c001b7a769ed716801d7/src/skfeature/function/sparse_learning_based/__init__.py -------------------------------------------------------------------------------- /src/skfeature/function/sparse_learning_based/ll_l21.py: -------------------------------------------------------------------------------- 1 | import math 2 | import numpy as np 3 | from numpy import linalg as LA 4 | from skfeature.utility.sparse_learning import euclidean_projection, calculate_l21_norm 5 | 6 | 7 | def proximal_gradient_descent(X, Y, z, **kwargs): 8 | """ 9 | This function implements supervised sparse feature selection via l2,1 norm, i.e., 10 | min_{W} sum_{i}log(1+exp(-yi*(W'*x+C))) + z*||W||_{2,1} 11 | 12 | Input 13 | ----- 14 | X: {numpy array}, shape (n_samples, n_features) 15 | input data 16 | Y: {numpy array}, shape (n_samples, n_classes) 17 | input class labels, each row is a one-hot-coding class label, guaranteed to be a numpy array 18 | z: {float} 19 | regularization parameter 20 | kwargs: {dictionary} 21 | verbose: {boolean} 22 | True if user want to print out the objective function value in each iteration, false if not 23 | 24 | Output 25 | ------ 26 | W: {numpy array}, shape (n_features, n_classes) 27 | weight matrix 28 | obj: {numpy array}, shape (n_iterations,) 29 | objective function value during iterations 30 | value_gamma: {numpy array}, shape (n_iterations,s) 31 | suitable step size during iterations 32 | 33 | 34 | Reference: 35 | Liu, Jun, et al. "Multi-Task Feature Learning Via Efficient l2,1-Norm Minimization." UAI. 2009. 36 | """ 37 | 38 | if 'verbose' not in kwargs: 39 | verbose = False 40 | else: 41 | verbose = kwargs['verbose'] 42 | 43 | # Starting point initialization # 44 | n_samples, n_features = X.shape 45 | n_samples, n_classes = Y.shape 46 | 47 | # the indices of positive samples 48 | p_flag = (Y == 1) 49 | # the total number of positive samples 50 | n_positive_samples = np.sum(p_flag, 0) 51 | # the total number of negative samples 52 | n_negative_samples = n_samples - n_positive_samples 53 | n_positive_samples = n_positive_samples.astype(float) 54 | n_negative_samples = n_negative_samples.astype(float) 55 | 56 | # initialize a starting point 57 | W = np.zeros((n_features, n_classes)) 58 | C = np.log(np.divide(n_positive_samples, n_negative_samples)) 59 | 60 | # compute XW = X*W 61 | XW = np.dot(X, W) 62 | 63 | # starting the main program, the Armijo Goldstein line search scheme + accelerated gradient descent 64 | # the intial guess of the Lipschitz continuous gradient 65 | gamma = 1.0/(n_samples*n_classes) 66 | 67 | # assign Wp with W, and XWp with XW 68 | XWp = XW 69 | WWp =np.zeros((n_features, n_classes)) 70 | CCp = np.zeros((1, n_classes)) 71 | 72 | alphap = 0 73 | alpha = 1 74 | 75 | # indicates whether the gradient step only changes a little 76 | flag = False 77 | 78 | max_iter = 1000 79 | value_gamma = np.zeros(max_iter) 80 | obj = np.zeros(max_iter) 81 | for iter_step in range(max_iter): 82 | # step1: compute search point S based on Wp and W (with beta) 83 | beta = (alphap-1)/alpha 84 | S = W + beta*WWp 85 | SC = C + beta*CCp 86 | 87 | # step2: line search for gamma and compute the new approximation solution W 88 | XS = XW + beta*(XW - XWp) 89 | aa = -np.multiply(Y, XS+np.tile(SC, (n_samples, 1))) 90 | # fun_S is the logistic loss at the search point 91 | bb = np.maximum(aa, 0) 92 | fun_S = np.sum(np.log(np.exp(-bb)+np.exp(aa-bb))+bb)/(n_samples*n_classes) 93 | # compute prob = [p_1;p_2;...;p_m] 94 | prob = 1.0/(1+np.exp(aa)) 95 | 96 | b = np.multiply(-Y, (1-prob))/(n_samples*n_classes) 97 | # compute the gradient of C 98 | GC = np.sum(b, 0) 99 | # compute the gradient of W as X'*b 100 | G = np.dot(np.transpose(X), b) 101 | 102 | # copy W and XW to Wp and XWp 103 | Wp = W 104 | XWp = XW 105 | Cp = C 106 | 107 | while True: 108 | # let S walk in a step in the antigradient of S to get V and then do the L1/L2-norm regularized projection 109 | V = S - G/gamma 110 | C = SC - GC/gamma 111 | W = euclidean_projection(V, n_features, n_classes, z, gamma) 112 | 113 | # the difference between the new approximate solution W and the search point S 114 | V = W - S 115 | # compute XW = X*W 116 | XW = np.dot(X, W) 117 | aa = -np.multiply(Y, XW+np.tile(C, (n_samples, 1))) 118 | # fun_W is the logistic loss at the new approximate solution 119 | bb = np.maximum(aa, 0) 120 | fun_W = np.sum(np.log(np.exp(-bb)+np.exp(aa-bb))+bb)/(n_samples*n_classes) 121 | 122 | r_sum = (LA.norm(V, 'fro')**2 + LA.norm(C-SC, 2)**2) / 2 123 | l_sum = fun_W - fun_S - np.sum(np.multiply(V, G)) - np.inner((C-SC), GC) 124 | 125 | # determine weather the gradient step makes little improvement 126 | if r_sum <= 1e-20: 127 | flag = True 128 | break 129 | 130 | # the condition is fun_W <= fun_S + + + gamma/2 * ( + ) 131 | if l_sum < r_sum*gamma: 132 | break 133 | else: 134 | gamma = max(2*gamma, l_sum/r_sum) 135 | value_gamma[iter_step] = gamma 136 | 137 | # step3: update alpha and alphap, and check weather converge 138 | alphap = alpha 139 | alpha = (1+math.sqrt(4*alpha*alpha+1))/2 140 | 141 | WWp = W - Wp 142 | CCp = C - Cp 143 | 144 | # calculate obj 145 | obj[iter_step] = fun_W 146 | obj[iter_step] += z*calculate_l21_norm(W) 147 | 148 | if verbose: 149 | print 'obj at iter ' + str(iter_step+1) + ': ' + str(obj[iter_step]) 150 | 151 | if flag is True: 152 | break 153 | 154 | # determine weather converge 155 | if iter_step >= 1 and math.fabs(obj[iter_step] - obj[iter_step-1]) < 1e-3: 156 | break 157 | return W, obj, value_gamma 158 | -------------------------------------------------------------------------------- /src/skfeature/function/sparse_learning_based/ls_l21.py: -------------------------------------------------------------------------------- 1 | import math 2 | import numpy as np 3 | from numpy import linalg as LA 4 | from skfeature.utility.sparse_learning import euclidean_projection, calculate_l21_norm 5 | 6 | 7 | def proximal_gradient_descent(X, Y, z, **kwargs): 8 | """ 9 | This function implements supervised sparse feature selection via l2,1 norm, i.e., 10 | min_{W} ||XW-Y||_F^2 + z*||W||_{2,1} 11 | 12 | Input 13 | ----- 14 | X: {numpy array}, shape (n_samples, n_features) 15 | input data, guaranteed to be a numpy array 16 | Y: {numpy array}, shape (n_samples, n_classes) 17 | input class labels, each row is a one-hot-coding class label 18 | z: {float} 19 | regularization parameter 20 | kwargs: {dictionary} 21 | verbose: {boolean} 22 | True if user want to print out the objective function value in each iteration, false if not 23 | 24 | Output 25 | ------ 26 | W: {numpy array}, shape (n_features, n_classes) 27 | weight matrix 28 | obj: {numpy array}, shape (n_iterations,) 29 | objective function value during iterations 30 | value_gamma: {numpy array}, shape (n_iterations,) 31 | suitable step size during iterations 32 | 33 | Reference 34 | --------- 35 | Liu, Jun, et al. "Multi-Task Feature Learning Via Efficient l2,1-Norm Minimization." UAI. 2009. 36 | """ 37 | 38 | if 'verbose' not in kwargs: 39 | verbose = False 40 | else: 41 | verbose = kwargs['verbose'] 42 | 43 | # starting point initialization 44 | n_samples, n_features = X.shape 45 | n_samples, n_classes = Y.shape 46 | 47 | # compute X'Y 48 | XtY = np.dot(np.transpose(X), Y) 49 | 50 | # initialize a starting point 51 | W = XtY 52 | 53 | # compute XW = X*W 54 | XW = np.dot(X, W) 55 | 56 | # compute l2,1 norm of W 57 | W_norm = calculate_l21_norm(W) 58 | 59 | if W_norm >= 1e-6: 60 | ratio = init_factor(W_norm, XW, Y, z) 61 | W = ratio*W 62 | XW = ratio*XW 63 | 64 | # starting the main program, the Armijo Goldstein line search scheme + accelerated gradient descent 65 | # initialize step size gamma = 1 66 | gamma = 1 67 | 68 | # assign Wp with W, and XWp with XW 69 | XWp = XW 70 | WWp =np.zeros((n_features, n_classes)) 71 | alphap = 0 72 | alpha = 1 73 | 74 | # indicate whether the gradient step only changes a little 75 | flag = False 76 | 77 | max_iter = 1000 78 | value_gamma = np.zeros(max_iter) 79 | obj = np.zeros(max_iter) 80 | for iter_step in range(max_iter): 81 | # step1: compute search point S based on Wp and W (with beta) 82 | beta = (alphap-1)/alpha 83 | S = W + beta*WWp 84 | 85 | # step2: line search for gamma and compute the new approximation solution W 86 | XS = XW + beta*(XW - XWp) 87 | # compute X'* XS 88 | XtXS = np.dot(np.transpose(X), XS) 89 | # obtain the gradient g 90 | G = XtXS - XtY 91 | # copy W and XW to Wp and XWp 92 | Wp = W 93 | XWp = XW 94 | 95 | while True: 96 | # let S walk in a step in the antigradient of S to get V and then do the L1/L2-norm regularized projection 97 | V = S - G/gamma 98 | W = euclidean_projection(V, n_features, n_classes, z, gamma) 99 | # the difference between the new approximate solution W and the search point S 100 | V = W - S 101 | # compute XW = X*W 102 | XW = np.dot(X, W) 103 | XV = XW - XS 104 | r_sum = LA.norm(V, 'fro')**2 105 | l_sum = LA.norm(XV, 'fro')**2 106 | 107 | # determine weather the gradient step makes little improvement 108 | if r_sum <= 1e-20: 109 | flag = True 110 | break 111 | 112 | # the condition is ||XV||_2^2 <= gamma * ||V||_2^2 113 | if l_sum < r_sum*gamma: 114 | break 115 | else: 116 | gamma = max(2*gamma, l_sum/r_sum) 117 | value_gamma[iter_step] = gamma 118 | 119 | # step3: update alpha and alphap, and check weather converge 120 | alphap = alpha 121 | alpha = (1+math.sqrt(4*alpha*alpha+1))/2 122 | 123 | WWp = W - Wp 124 | XWY = XW -Y 125 | 126 | # calculate obj 127 | obj[iter_step] = LA.norm(XWY, 'fro')**2/2 128 | obj[iter_step] += z*calculate_l21_norm(W) 129 | 130 | if verbose: 131 | print 'obj at iter ' + str(iter_step+1) + ': ' + str(obj[iter_step]) 132 | 133 | if flag is True: 134 | break 135 | 136 | # determine weather converge 137 | if iter_step >= 1 and math.fabs(obj[iter_step] - obj[iter_step-1]) < 1e-3: 138 | break 139 | return W, obj, value_gamma 140 | 141 | 142 | def init_factor(W_norm, XW, Y, z): 143 | """ 144 | Initialize the starting point of W, according to the author's code 145 | """ 146 | n_samples, n_classes = XW.shape 147 | a = np.inner(np.reshape(XW, n_samples*n_classes), np.reshape(Y, n_samples*n_classes)) - z*W_norm 148 | b = LA.norm(XW, 'fro')**2 149 | ratio = a / b 150 | return ratio -------------------------------------------------------------------------------- /src/skfeature/function/statistical_based/CFS.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from skfeature.utility.mutual_information import su_calculation 3 | 4 | 5 | def merit_calculation(X, y): 6 | """ 7 | This function calculates the merit of X given class labels y, where 8 | merits = (k * rcf)/sqrt(k+k*(k-1)*rff) 9 | rcf = (1/k)*sum(su(fi,y)) for all fi in X 10 | rff = (1/(k*(k-1)))*sum(su(fi,fj)) for all fi and fj in X 11 | 12 | Input 13 | ---------- 14 | X: {numpy array}, shape (n_samples, n_features) 15 | input data 16 | y: {numpy array}, shape (n_samples,) 17 | input class labels 18 | 19 | Output 20 | ---------- 21 | merits: {float} 22 | merit of a feature subset X 23 | """ 24 | 25 | n_samples, n_features = X.shape 26 | rff = 0 27 | rcf = 0 28 | for i in range(n_features): 29 | fi = X[:, i] 30 | rcf += su_calculation(fi, y) 31 | for j in range(n_features): 32 | if j > i: 33 | fj = X[:, j] 34 | rff += su_calculation(fi, fj) 35 | rff *= 2 36 | merits = rcf / np.sqrt(n_features + rff) 37 | return merits 38 | 39 | 40 | def cfs(X, y): 41 | """ 42 | This function uses a correlation based heuristic to evaluate the worth of features which is called CFS 43 | 44 | Input 45 | ----- 46 | X: {numpy array}, shape (n_samples, n_features) 47 | input data 48 | y: {numpy array}, shape (n_samples,) 49 | input class labels 50 | 51 | Output 52 | ------ 53 | F: {numpy array} 54 | index of selected features 55 | 56 | Reference 57 | --------- 58 | Zhao, Zheng et al. "Advancing Feature Selection Research - ASU Feature Selection Repository" 2010. 59 | """ 60 | 61 | n_samples, n_features = X.shape 62 | F = [] 63 | # M stores the merit values 64 | M = [] 65 | while True: 66 | merit = -100000000000 67 | idx = -1 68 | for i in range(n_features): 69 | if i not in F: 70 | F.append(i) 71 | # calculate the merit of current selected features 72 | t = merit_calculation(X[:, F], y) 73 | if t > merit: 74 | merit = t 75 | idx = i 76 | F.pop() 77 | F.append(idx) 78 | M.append(merit) 79 | if len(M) > 5: 80 | if M[len(M)-1] <= M[len(M)-2]: 81 | if M[len(M)-2] <= M[len(M)-3]: 82 | if M[len(M)-3] <= M[len(M)-4]: 83 | if M[len(M)-4] <= M[len(M)-5]: 84 | break 85 | return np.array(F) 86 | 87 | -------------------------------------------------------------------------------- /src/skfeature/function/statistical_based/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DavideNardone/A-Sparse-Coding-Based-Approach-for-Class-Specific-Feature-Selection/32a4eba5ca437c8fe853c001b7a769ed716801d7/src/skfeature/function/statistical_based/__init__.py -------------------------------------------------------------------------------- /src/skfeature/function/statistical_based/chi_square.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from sklearn.feature_selection import chi2 3 | 4 | 5 | def chi_square(X, y): 6 | """ 7 | This function implements the chi-square feature selection (existing method for classification in scikit-learn) 8 | 9 | Input 10 | ----- 11 | X: {numpy array}, shape (n_samples, n_features) 12 | input data 13 | y: {numpy array},shape (n_samples,) 14 | input class labels 15 | 16 | Output 17 | ------ 18 | F: {numpy array}, shape (n_features,) 19 | chi-square score for each feature 20 | """ 21 | F, pval = chi2(X, y) 22 | return F 23 | 24 | 25 | def feature_ranking(F): 26 | """ 27 | Rank features in descending order according to chi2-score, the higher the chi2-score, the more important the feature is 28 | """ 29 | idx = np.argsort(F) 30 | return idx[::-1] -------------------------------------------------------------------------------- /src/skfeature/function/statistical_based/f_score.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from sklearn.feature_selection import f_classif 3 | 4 | 5 | def f_score(X, y): 6 | """ 7 | This function implements the anova f_value feature selection (existing method for classification in scikit-learn), 8 | where f_score = sum((ni/(c-1))*(mean_i - mean)^2)/((1/(n - c))*sum((ni-1)*std_i^2)) 9 | 10 | Input 11 | ----- 12 | X: {numpy array}, shape (n_samples, n_features) 13 | input data 14 | y : {numpy array},shape (n_samples,) 15 | input class labels 16 | 17 | Output 18 | ------ 19 | F: {numpy array}, shape (n_features,) 20 | f-score for each feature 21 | """ 22 | 23 | F, pval = f_classif(X, y) 24 | return F 25 | 26 | 27 | def feature_ranking(F): 28 | """ 29 | Rank features in descending order according to f-score, the higher the f-score, the more important the feature is 30 | """ 31 | idx = np.argsort(F) 32 | return idx[::-1] -------------------------------------------------------------------------------- /src/skfeature/function/statistical_based/gini_index.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | 4 | def gini_index(X, y): 5 | """ 6 | This function implements the gini index feature selection. 7 | 8 | Input 9 | ---------- 10 | X: {numpy array}, shape (n_samples, n_features) 11 | input data 12 | y: {numpy array}, shape (n_samples,) 13 | input class labels 14 | 15 | Output 16 | ---------- 17 | gini: {numpy array}, shape (n_features, ) 18 | gini index value of each feature 19 | """ 20 | 21 | n_samples, n_features = X.shape 22 | 23 | # initialize gini_index for all features to be 0.5 24 | gini = np.ones(n_features) * 0.5 25 | 26 | # For i-th feature we define fi = x[:,i] ,v include all unique values in fi 27 | for i in range(n_features): 28 | v = np.unique(X[:, i]) 29 | for j in range(len(v)): 30 | # left_y contains labels of instances whose i-th feature value is less than or equal to v[j] 31 | left_y = y[X[:, i] <= v[j]] 32 | # right_y contains labels of instances whose i-th feature value is larger than v[j] 33 | right_y = y[X[:, i] > v[j]] 34 | 35 | # gini_left is sum of square of probability of occurrence of v[i] in left_y 36 | # gini_right is sum of square of probability of occurrence of v[i] in right_y 37 | gini_left = 0 38 | gini_right = 0 39 | 40 | for k in range(np.min(y), np.max(y)+1): 41 | if len(left_y) != 0: 42 | # t1_left is probability of occurrence of k in left_y 43 | t1_left = np.true_divide(len(left_y[left_y == k]), len(left_y)) 44 | t2_left = np.power(t1_left, 2) 45 | gini_left += t2_left 46 | 47 | if len(right_y) != 0: 48 | # t1_right is probability of occurrence of k in left_y 49 | t1_right = np.true_divide(len(right_y[right_y == k]), len(right_y)) 50 | t2_right = np.power(t1_right, 2) 51 | gini_right += t2_right 52 | 53 | gini_left = 1 - gini_left 54 | gini_right = 1 - gini_right 55 | 56 | # weighted average of len(left_y) and len(right_y) 57 | t1_gini = (len(left_y) * gini_left + len(right_y) * gini_right) 58 | 59 | # compute the gini_index for the i-th feature 60 | value = np.true_divide(t1_gini, len(y)) 61 | 62 | if value < gini[i]: 63 | gini[i] = value 64 | return gini 65 | 66 | 67 | def feature_ranking(W): 68 | """ 69 | Rank features in descending order according to their gini index values, the smaller the gini index, 70 | the more important the feature is 71 | """ 72 | idx = np.argsort(W) 73 | return idx 74 | 75 | 76 | 77 | 78 | 79 | 80 | -------------------------------------------------------------------------------- /src/skfeature/function/statistical_based/low_variance.py: -------------------------------------------------------------------------------- 1 | from sklearn.feature_selection import VarianceThreshold 2 | 3 | 4 | def low_variance_feature_selection(X, threshold): 5 | """ 6 | This function implements the low_variance feature selection (existing method in scikit-learn) 7 | 8 | Input 9 | ----- 10 | X: {numpy array}, shape (n_samples, n_features) 11 | input data 12 | p:{float} 13 | parameter used to calculate the threshold(threshold = p*(1-p)) 14 | 15 | Output 16 | ------ 17 | X_new: {numpy array}, shape (n_samples, n_selected_features) 18 | data with selected features 19 | """ 20 | sel = VarianceThreshold(threshold) 21 | return sel.fit_transform(X) -------------------------------------------------------------------------------- /src/skfeature/function/statistical_based/t_score.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | 4 | def t_score(X, y): 5 | """ 6 | This function calculates t_score for each feature, where t_score is only used for binary problem 7 | t_score = |mean1-mean2|/sqrt(((std1^2)/n1)+((std2^2)/n2))) 8 | 9 | Input 10 | ----- 11 | X: {numpy array}, shape (n_samples, n_features) 12 | input data 13 | y: {numpy array}, shape (n_samples,) 14 | input class labels 15 | 16 | Output 17 | ------ 18 | F: {numpy array}, shape (n_features,) 19 | t-score for each feature 20 | """ 21 | 22 | n_samples, n_features = X.shape 23 | F = np.zeros(n_features) 24 | c = np.unique(y) 25 | if len(c) == 2: 26 | for i in range(n_features): 27 | f = X[:, i] 28 | # class0 contains instances belonging to the first class 29 | # class1 contains instances belonging to the second class 30 | class0 = f[y == c[0]] 31 | class1 = f[y == c[1]] 32 | mean0 = np.mean(class0) 33 | mean1 = np.mean(class1) 34 | std0 = np.std(class0) 35 | std1 = np.std(class1) 36 | n0 = len(class0) 37 | n1 = len(class1) 38 | t = mean0 - mean1 39 | t0 = np.true_divide(std0**2, n0) 40 | t1 = np.true_divide(std1**2, n1) 41 | F[i] = np.true_divide(t, (t0 + t1)**0.5) 42 | else: 43 | print('y should be guaranteed to a binary class vector') 44 | exit(0) 45 | return np.abs(F) 46 | 47 | 48 | def feature_ranking(F): 49 | """ 50 | Rank features in descending order according to t-score, the higher the t-score, the more important the feature is 51 | """ 52 | idx = np.argsort(F) 53 | return idx[::-1] 54 | 55 | -------------------------------------------------------------------------------- /src/skfeature/function/streaming/__init__.py: -------------------------------------------------------------------------------- 1 | __author__ = 'jundongl' 2 | -------------------------------------------------------------------------------- /src/skfeature/function/streaming/alpha_investing.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from sklearn import linear_model 3 | 4 | 5 | def alpha_investing(X, y, w0, dw): 6 | """ 7 | This function implements streamwise feature selection (SFS) algorithm alpha_investing for binary regression or 8 | univariate regression 9 | 10 | Input 11 | ----- 12 | X: {numpy array}, shape (n_samples, n_features) 13 | input data, assume feature arrives one at each time step 14 | y: {numpy array}, shape (n_samples,) 15 | input class labels or regression target 16 | 17 | Output 18 | ------ 19 | F: {numpy array}, shape (n_selected_features,) 20 | index of selected features in a streamwise way 21 | 22 | Reference 23 | --------- 24 | Zhou, Jing et al. "Streaming Feature Selection using Alpha-investing." KDD 2006. 25 | """ 26 | 27 | n_samples, n_features = X.shape 28 | w = w0 29 | F = [] # selected features 30 | for i in range(n_features): 31 | x_can = X[:, i] # generate next feature 32 | alpha = w/2/(i+1) 33 | X_old = X[:, F] 34 | if i is 0: 35 | X_old = np.ones((n_samples, 1)) 36 | linreg_old = linear_model.LinearRegression() 37 | linreg_old.fit(X_old, y) 38 | error_old = 1 - linreg_old.score(X_old, y) 39 | if i is not 0: 40 | # model built with only X_old 41 | linreg_old = linear_model.LinearRegression() 42 | linreg_old.fit(X_old, y) 43 | error_old = 1 - linreg_old.score(X_old, y) 44 | 45 | # model built with X_old & {x_can} 46 | X_new = np.concatenate((X_old, x_can.reshape(n_samples, 1)), axis=1) 47 | logreg_new = linear_model.LinearRegression() 48 | logreg_new.fit(X_new, y) 49 | error_new = 1 - logreg_new.score(X_new, y) 50 | 51 | # calculate p-value 52 | pval = np.exp((error_new - error_old)/(2*error_old/n_samples)) 53 | if pval < alpha: 54 | F.append(i) 55 | w = w + dw - alpha 56 | else: 57 | w -= alpha 58 | return np.array(F) 59 | 60 | -------------------------------------------------------------------------------- /src/skfeature/function/structure/__init__.py: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /src/skfeature/function/structure/graph_fs.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | 4 | def soft_threshold(A,b): 5 | """ 6 | This function implement the soft-threshold operator 7 | Input: 8 | A: {numpy scalar, vector, or matrix} 9 | b: scalar} 10 | """ 11 | res = np.zeros(A.shape) 12 | res[A > b] = A[A > b] - b 13 | res[A < -b] = A[A < -b] + b 14 | return res 15 | 16 | 17 | def calculate_obj(X, y, w, lambda1, lambda2, T): 18 | return 1/2 * (np.linalg.norm(y- np.dot(X, w), 'fro'))**2 + lambda1*np.abs(w).sum() + lambda2*np.abs(np.dot(T, w)).sum() 19 | 20 | 21 | def graph_fs(X, y, **kwargs): 22 | """ 23 | This function implement the graph structural feature selection algorithm GOSCAR 24 | 25 | Objective Function 26 | min_{w} 1/2 ||X*w - y||_F^2 + lambda1 ||w||_1 + lambda2 \sum_{(i,j) \in E} max{|w_i|, |w|_j} 27 | 28 | Input: 29 | X: {numpy array}, shape (n_samples, n_features) 30 | Input data, guaranteed to be a numpy array 31 | y: {numpy array}, shape (n_samples, 1) 32 | Input data, the label matrix 33 | edge_list: {numpy array}, shape (n_edges, 2) 34 | Input data, each row is a pair of linked features, note feature index should start from 0 35 | lambda1: {float} 36 | Parameter lambda1 in objective function 37 | lambda2: {float} 38 | Parameter labmda2 in objective function 39 | rho: {flot} 40 | parameter used for optimization 41 | max_iter: {int} 42 | maximal iteration 43 | verbose: {boolean} True or False 44 | True if we want to print out the objective function value in each iteration, False if not 45 | 46 | Output: 47 | w: the weights of the features 48 | obj: the value of the objective function in each iteration 49 | """ 50 | 51 | if 'lambda1' not in kwargs: 52 | lambda1 = 0.8 53 | else: 54 | lambda1 = kwargs['lambda1'] 55 | if 'lambda2' not in kwargs: 56 | lambda2 = 0.8 57 | else: 58 | lambda2 = kwargs['lambda2'] 59 | if 'edge_list' not in kwargs: 60 | print 'Error using function, the network structure E is required' 61 | raise() 62 | else : 63 | edge_list = kwargs['edge_list'] 64 | if 'max_iter' not in kwargs: 65 | max_iter = 300 66 | else: 67 | max_iter = kwargs['max_iter'] 68 | if 'verbose' not in kwargs: 69 | verbose = 0 70 | else: 71 | verbose = kwargs['verbose'] 72 | if 'rho' not in kwargs: 73 | rho = 5 74 | else: 75 | rho = kwargs['rho'] 76 | 77 | n_samples, n_features = X.shape 78 | 79 | # construct T from E 80 | ind1 = edge_list[:, 0] 81 | ind2 = edge_list[:, 1] 82 | num_edge = ind1.shape[0] 83 | T = np.zeros((num_edge*2, n_features)) 84 | for i in range(num_edge): 85 | T[i, ind1[i]] = 0.5 86 | T[i, ind2[i]] = 0.5 87 | T[i+num_edge, ind1[i]] = 0.5 88 | T[i+num_edge, ind2[i]] = -0.5 89 | 90 | # calculate F = X^T X + rho(I + T^T * T) 91 | F = np.dot(X.T, X) + rho*(np.identity(n_features) + np.dot(T.T, T)) 92 | 93 | # Cholesky factorization of F = R^T R 94 | R = np.linalg.cholesky(F) # NOTE, this return F = R R^T 95 | R = R.T 96 | Rinv = np.linalg.inv(R) 97 | Rtinv = Rinv.T 98 | 99 | # initialize p, q, mu , v to be zero vectors 100 | p = np.zeros((2*num_edge, 1)) 101 | q = np.zeros((n_features, 1)) 102 | mu = np.zeros((n_features, 1)) 103 | v = np.zeros((2*num_edge, 1)) 104 | 105 | # start the main loop 106 | iter = 0 107 | obj = np.zeros((max_iter,1)) 108 | while iter < max_iter: 109 | print iter 110 | # update w 111 | b = np.dot(X.T, y) - mu - np.dot(T.T, v) + rho*np.dot(T.T,p) + rho*q 112 | w_hat = np.dot(Rtinv, b) 113 | w = np.dot(Rinv, w_hat) 114 | 115 | # update q 116 | q = soft_threshold(w + 1/rho*mu, lambda1/rho) 117 | # update p 118 | 119 | p = soft_threshold(np.dot(T, w)+1/rho*v, lambda2/rho) 120 | # update mu, v 121 | mu += rho*(w - q) 122 | v += rho*(np.dot(T, w) - p) 123 | 124 | # calculate objective function 125 | obj[iter] = calculate_obj(X, y, w, lambda1, lambda2, T) 126 | if verbose: 127 | print 'obj at iter ' + str(iter) + ': ' + str(obj[iter]) 128 | iter += 1 129 | return w, obj, q 130 | 131 | def feature_ranking(w): 132 | T = w.abs() 133 | idx = np.argsort(T, 0) 134 | return idx[::-1] 135 | -------------------------------------------------------------------------------- /src/skfeature/function/structure/group_fs.py: -------------------------------------------------------------------------------- 1 | import math 2 | import numpy as np 3 | from skfeature.utility.sparse_learning import tree_lasso_projection, tree_norm 4 | 5 | 6 | def group_fs(X, y, z1, z2, idx, **kwargs): 7 | """ 8 | This function implements supervised sparse group feature selection with least square loss, i.e., 9 | min_{w} ||Xw-y||_2^2 + z_1||w||_1 + z_2*sum_{i} h_{i}||w_{G_{i}}|| where h_i is the weight for the i-th group 10 | 11 | Input 12 | ----- 13 | X: {numpy array}, shape (n_samples, n_features) 14 | input data 15 | y: {numpy array}, shape (n_samples,) 16 | input class labels or regression target 17 | z1: {float} 18 | regularization parameter of L1 norm for each element 19 | z2: {float} 20 | regularization parameter of L2 norm for the non-overlapping group 21 | idx: {numpy array}, shape (3, n_nodes) 22 | 3*nodes matrix, where nodes denotes the number of groups 23 | idx[1,:] contains the starting index of a group 24 | idx[2,: contains the ending index of a group 25 | idx[3,:] contains the corresponding weight (w_{j}) 26 | kwargs: {dictionary} 27 | verbose: {boolean} 28 | True if user want to print out the objective function value in each iteration, false if not 29 | 30 | Output 31 | ------ 32 | w: {numpy array}, shape (n_features, ) 33 | weight matrix 34 | obj: {numpy array}, shape (n_iterations, ) 35 | objective function value during iterations 36 | value_gamma: {numpy array}, shape (n_iterations, ) 37 | suitable step size during iterations 38 | 39 | Reference 40 | --------- 41 | Liu, Jun, et al. "Moreau-Yosida Regularization for Grouped Tree Structure Learning." NIPS. 2010. 42 | Liu, Jun, et al. "SLEP: Sparse Learning with Efficient Projections." http://www.public.asu.edu/~jye02/Software/SLEP, 2009. 43 | """ 44 | if 'verbose' not in kwargs: 45 | verbose = False 46 | else: 47 | verbose = kwargs['verbose'] 48 | 49 | # starting point initialization 50 | n_samples, n_features = X.shape 51 | 52 | # compute X'y 53 | Xty = np.dot(np.transpose(X), y) 54 | 55 | # initialize a starting point 56 | w = np.zeros(n_features) 57 | 58 | # compute Xw = X*w 59 | Xw = np.dot(X, w) 60 | 61 | # starting the main program, the Armijo Goldstein line search scheme + accelerated gradient descent 62 | # initialize step size gamma = 1 63 | gamma = 1 64 | 65 | # assign wp with w, and Xwp with Xw 66 | Xwp = Xw 67 | wwp = np.zeros(n_features) 68 | alphap = 0 69 | alpha = 1 70 | 71 | # indicates whether the gradient step only changes a little 72 | flag = False 73 | 74 | max_iter = 1000 75 | value_gamma = np.zeros(max_iter) 76 | obj = np.zeros(max_iter) 77 | for iter_step in range(max_iter): 78 | # step1: compute search point s based on wp and w (with beta) 79 | beta = (alphap-1)/alpha 80 | s = w + beta*wwp 81 | 82 | # step2: line search for gamma and compute the new approximation solution w 83 | Xs = Xw + beta*(Xw - Xwp) 84 | # compute X'* Xs 85 | XtXs = np.dot(np.transpose(X), Xs) 86 | # obtain the gradient g 87 | G = XtXs - Xty 88 | # copy w and Xw to wp and Xwp 89 | wp = w 90 | Xwp = Xw 91 | 92 | while True: 93 | # let s walk in a step in the antigradient of s to get v and then do the L1/L2-norm regularized projection 94 | v = s - G/gamma 95 | # tree overlapping group lasso projection 96 | n_nodes = int(idx.shape[1]) 97 | idx_tmp = np.zeros((3, n_nodes+1)) 98 | idx_tmp[0:2, :] = np.concatenate((np.array([[-1], [-1]]), idx[0:2, :]), axis=1) 99 | idx_tmp[2, :] = np.concatenate((np.array([z1/gamma]), z2/gamma*idx[2, :]), axis=1) 100 | w = tree_lasso_projection(v, n_features, idx_tmp, n_nodes+1) 101 | # the difference between the new approximate solution w and the search point s 102 | v = w - s 103 | # compute Xw = X*w 104 | Xw = np.dot(X, w) 105 | Xv = Xw - Xs 106 | r_sum = np.inner(v, v) 107 | l_sum = np.inner(Xv, Xv) 108 | # determine weather the gradient step makes little improvement 109 | if r_sum <= 1e-20: 110 | flag = True 111 | break 112 | 113 | # the condition is ||Xv||_2^2 <= gamma * ||v||_2^2 114 | if l_sum <= r_sum*gamma: 115 | break 116 | else: 117 | gamma = max(2*gamma, l_sum/r_sum) 118 | value_gamma[iter_step] = gamma 119 | 120 | # step3: update alpha and alphap, and check weather converge 121 | alphap = alpha 122 | alpha = (1+math.sqrt(4*alpha*alpha+1))/2 123 | 124 | wwp = w - wp 125 | Xwy = Xw -y 126 | 127 | # calculate the regularization part 128 | idx_tmp = np.zeros((3, n_nodes+1)) 129 | idx_tmp[0:2, :] = np.concatenate((np.array([[-1], [-1]]), idx[0:2, :]), axis=1) 130 | idx_tmp[2, :] = np.concatenate((np.array([z1]), z2*idx[2, :]), axis=1) 131 | tree_norm_val = tree_norm(w, n_features, idx_tmp, n_nodes+1) 132 | 133 | # function value = loss + regularization 134 | obj[iter_step] = np.inner(Xwy, Xwy)/2 + tree_norm_val 135 | 136 | if verbose: 137 | print 'obj at iter ' + str(iter_step+1) + ': ' + str(obj[iter_step]) 138 | 139 | if flag is True: 140 | break 141 | 142 | # determine weather converge 143 | if iter_step >= 2 and math.fabs(obj[iter_step] - obj[iter_step-1]) < 1e-3: 144 | break 145 | 146 | return w, obj, value_gamma 147 | 148 | 149 | -------------------------------------------------------------------------------- /src/skfeature/function/structure/tree_fs.py: -------------------------------------------------------------------------------- 1 | import math 2 | import numpy as np 3 | from skfeature.utility.sparse_learning import tree_lasso_projection, tree_norm 4 | 5 | 6 | def tree_fs(X, y, z, idx, **kwargs): 7 | """ 8 | This function implements tree structured group lasso regularization with least square loss, i.e., 9 | min_{w} ||Xw-Y||_2^2 + z\sum_{i}\sum_{j} h_{j}^{i}|||w_{G_{j}^{i}}|| where h_{j}^{i} is the weight for the j-th group 10 | from the i-th level (the root node is in level 0) 11 | 12 | Input 13 | ----- 14 | X: {numpy array}, shape (n_samples, n_features) 15 | input data 16 | y: {numpy array}, shape (n_samples,) 17 | input class labels or regression target 18 | z: {float} 19 | regularization parameter of L2 norm for the non-overlapping group 20 | idx: {numpy array}, shape (3, n_nodes) 21 | 3*nodes matrix, where nodes denotes the number of nodes of the tree 22 | idx(1,:) contains the starting index 23 | idx(2,:) contains the ending index 24 | idx(3,:) contains the corresponding weight (w_{j}) 25 | kwargs: {dictionary} 26 | verbose: {boolean} 27 | True if user want to print out the objective function value in each iteration, false if not 28 | 29 | Output 30 | ------ 31 | w: {numpy array}, shape (n_features,) 32 | weight vector 33 | obj: {numpy array}, shape (n_iterations,) 34 | objective function value during iterations 35 | value_gamma: {numpy array}, shape (n_iterations,) 36 | suitable step size during iterations 37 | 38 | Note for input parameter idx: 39 | (1) For idx, if each entry in w is a leaf node of the tree and the weight for this leaf node are the same, then 40 | idx[0,0] = -1 and idx[1,0] = -1, idx[2,0] denotes the common weight 41 | (2) In idx, the features of the left tree is smaller than the right tree (idx[0,i] is always smaller than idx[1,i]) 42 | 43 | Reference: 44 | Liu, Jun, et al. "Moreau-Yosida Regularization for Grouped Tree Structure Learning." NIPS. 2010. 45 | Liu, Jun, et al. "SLEP: Sparse Learning with Efficient Projections." http://www.public.asu.edu/~jye02/Software/SLEP, 2009. 46 | """ 47 | 48 | if 'verbose' not in kwargs: 49 | verbose = False 50 | else: 51 | verbose = kwargs['verbose'] 52 | 53 | # starting point initialization 54 | n_samples, n_features = X.shape 55 | 56 | # compute X'y 57 | Xty = np.dot(np.transpose(X), y) 58 | 59 | # initialize a starting point 60 | w = np.zeros(n_features) 61 | 62 | # compute Xw = X*w 63 | Xw = np.dot(X, w) 64 | 65 | # starting the main program, the Armijo Goldstein line search scheme + accelerated gradient descent 66 | # initialize step size gamma = 1 67 | gamma = 1 68 | 69 | # assign wp with w, and Xwp with Xw 70 | Xwp = Xw 71 | wwp = np.zeros(n_features) 72 | alphap = 0 73 | alpha = 1 74 | 75 | # indicates whether the gradient step only changes a little 76 | flag = False 77 | 78 | max_iter = 1000 79 | value_gamma = np.zeros(max_iter) 80 | obj = np.zeros(max_iter) 81 | for iter_step in range(max_iter): 82 | # step1: compute search point s based on wp and w (with beta) 83 | beta = (alphap-1)/alpha 84 | s = w + beta*wwp 85 | 86 | # step2: line search for gamma and compute the new approximation solution w 87 | Xs = Xw + beta*(Xw - Xwp) 88 | # compute X'* Xs 89 | XtXs = np.dot(np.transpose(X), Xs) 90 | 91 | # obtain the gradient g 92 | G = XtXs - Xty 93 | 94 | # copy w and Xw to wp and Xwp 95 | wp = w 96 | Xwp = Xw 97 | 98 | while True: 99 | # let s walk in a step in the antigradient of s to get v and then do the L1/L2-norm regularized projection 100 | v = s - G/gamma 101 | # tree overlapping group lasso projection 102 | n_nodes = int(idx.shape[1]) 103 | idx_tmp = idx.copy() 104 | idx_tmp[2, :] = idx[2, :] * z / gamma 105 | w = tree_lasso_projection(v, n_features, idx_tmp, n_nodes) 106 | # the difference between the new approximate solution w and the search point s 107 | v = w - s 108 | # compute Xw = X*w 109 | Xw = np.dot(X, w) 110 | Xv = Xw - Xs 111 | r_sum = np.inner(v, v) 112 | l_sum = np.inner(Xv, Xv) 113 | # determine weather the gradient step makes little improvement 114 | if r_sum <= 1e-20: 115 | flag = True 116 | break 117 | 118 | # the condition is ||Xv||_2^2 <= gamma * ||v||_2^2 119 | if l_sum <= r_sum*gamma: 120 | break 121 | else: 122 | gamma = max(2*gamma, l_sum/r_sum) 123 | value_gamma[iter_step] = gamma 124 | 125 | # step3: update alpha and alphap, and check weather converge 126 | alphap = alpha 127 | alpha = (1+math.sqrt(4*alpha*alpha+1))/2 128 | 129 | wwp = w - wp 130 | Xwy = Xw -y 131 | # calculate the regularization part 132 | tree_norm_val = tree_norm(w, n_features, idx, n_nodes) 133 | 134 | # function value = loss + regularization 135 | obj[iter_step] = np.inner(Xwy, Xwy)/2 + z*tree_norm_val 136 | 137 | if verbose: 138 | print 'obj at iter ' + str(iter_step+1) + ': ' + str(obj[iter_step]) 139 | 140 | if flag is True: 141 | break 142 | 143 | # determine whether converge 144 | if iter_step >= 2 and math.fabs(obj[iter_step] - obj[iter_step-1]) < 1e-3: 145 | break 146 | 147 | return w, obj, value_gamma 148 | 149 | 150 | 151 | 152 | -------------------------------------------------------------------------------- /src/skfeature/function/wrapper/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DavideNardone/A-Sparse-Coding-Based-Approach-for-Class-Specific-Feature-Selection/32a4eba5ca437c8fe853c001b7a769ed716801d7/src/skfeature/function/wrapper/__init__.py -------------------------------------------------------------------------------- /src/skfeature/function/wrapper/decision_tree_backward.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from sklearn.tree import DecisionTreeClassifier 3 | from sklearn.cross_validation import KFold 4 | from sklearn.metrics import accuracy_score 5 | 6 | 7 | def decision_tree_backward(X, y, n_selected_features): 8 | """ 9 | This function implements the backward feature selection algorithm based on decision tree 10 | 11 | Input 12 | ----- 13 | X: {numpy array}, shape (n_samples, n_features) 14 | input data 15 | y: {numpy array}, shape (n_samples,) 16 | input class labels 17 | n_selected_features : {int} 18 | number of selected features 19 | 20 | Output 21 | ------ 22 | F: {numpy array}, shape (n_features, ) 23 | index of selected features 24 | """ 25 | 26 | n_samples, n_features = X.shape 27 | # using 10 fold cross validation 28 | cv = KFold(n_samples, n_folds=10, shuffle=True) 29 | # choose decision tree as the classifier 30 | clf = DecisionTreeClassifier() 31 | 32 | # selected feature set, initialized to contain all features 33 | F = range(n_features) 34 | count = n_features 35 | 36 | while count > n_selected_features: 37 | max_acc = 0 38 | for i in range(n_features): 39 | if i in F: 40 | F.remove(i) 41 | X_tmp = X[:, F] 42 | acc = 0 43 | for train, test in cv: 44 | clf.fit(X_tmp[train], y[train]) 45 | y_predict = clf.predict(X_tmp[test]) 46 | acc_tmp = accuracy_score(y[test], y_predict) 47 | acc += acc_tmp 48 | acc = float(acc)/10 49 | F.append(i) 50 | # record the feature which results in the largest accuracy 51 | if acc > max_acc: 52 | max_acc = acc 53 | idx = i 54 | # delete the feature which results in the largest accuracy 55 | F.remove(idx) 56 | count -= 1 57 | return np.array(F) 58 | 59 | 60 | -------------------------------------------------------------------------------- /src/skfeature/function/wrapper/decision_tree_forward.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from sklearn.tree import DecisionTreeClassifier 3 | from sklearn.cross_validation import KFold 4 | from sklearn.metrics import accuracy_score 5 | 6 | 7 | def decision_tree_forward(X, y, n_selected_features): 8 | """ 9 | This function implements the forward feature selection algorithm based on decision tree 10 | 11 | Input 12 | ----- 13 | X: {numpy array}, shape (n_samples, n_features) 14 | input data 15 | y: {numpy array}, shape (n_samples, ) 16 | input class labels 17 | n_selected_features: {int} 18 | number of selected features 19 | 20 | Output 21 | ------ 22 | F: {numpy array}, shape (n_features,) 23 | index of selected features 24 | """ 25 | 26 | n_samples, n_features = X.shape 27 | # using 10 fold cross validation 28 | cv = KFold(n_samples, n_folds=10, shuffle=True) 29 | # choose decision tree as the classifier 30 | clf = DecisionTreeClassifier() 31 | 32 | # selected feature set, initialized to be empty 33 | F = [] 34 | count = 0 35 | while count < n_selected_features: 36 | max_acc = 0 37 | for i in range(n_features): 38 | if i not in F: 39 | F.append(i) 40 | X_tmp = X[:, F] 41 | acc = 0 42 | for train, test in cv: 43 | clf.fit(X_tmp[train], y[train]) 44 | y_predict = clf.predict(X_tmp[test]) 45 | acc_tmp = accuracy_score(y[test], y_predict) 46 | acc += acc_tmp 47 | acc = float(acc)/10 48 | F.pop() 49 | # record the feature which results in the largest accuracy 50 | if acc > max_acc: 51 | max_acc = acc 52 | idx = i 53 | # add the feature which results in the largest accuracy 54 | F.append(idx) 55 | count += 1 56 | return np.array(F) 57 | 58 | -------------------------------------------------------------------------------- /src/skfeature/function/wrapper/svm_backward.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from sklearn.svm import SVC 3 | from sklearn.cross_validation import KFold 4 | from sklearn.metrics import accuracy_score 5 | 6 | 7 | def svm_backward(X, y, n_selected_features): 8 | """ 9 | This function implements the backward feature selection algorithm based on SVM 10 | 11 | Input 12 | ----- 13 | X: {numpy array}, shape (n_samples, n_features) 14 | input data 15 | y: {numpy array}, shape (n_samples,) 16 | input class labels 17 | n_selected_features: {int} 18 | number of selected features 19 | 20 | Output 21 | ------ 22 | F: {numpy array}, shape (n_features, ) 23 | index of selected features 24 | """ 25 | 26 | n_samples, n_features = X.shape 27 | # using 10 fold cross validation 28 | cv = KFold(n_samples, n_folds=10, shuffle=True) 29 | # choose SVM as the classifier 30 | clf = SVC() 31 | 32 | # selected feature set, initialized to contain all features 33 | F = range(n_features) 34 | count = n_features 35 | 36 | while count > n_selected_features: 37 | max_acc = 0 38 | for i in range(n_features): 39 | if i in F: 40 | F.remove(i) 41 | X_tmp = X[:, F] 42 | acc = 0 43 | for train, test in cv: 44 | clf.fit(X_tmp[train], y[train]) 45 | y_predict = clf.predict(X_tmp[test]) 46 | acc_tmp = accuracy_score(y[test], y_predict) 47 | acc += acc_tmp 48 | acc = float(acc)/10 49 | F.append(i) 50 | # record the feature which results in the largest accuracy 51 | if acc > max_acc: 52 | max_acc = acc 53 | idx = i 54 | # delete the feature which results in the largest accuracy 55 | F.remove(idx) 56 | count -= 1 57 | return np.array(F) 58 | 59 | 60 | -------------------------------------------------------------------------------- /src/skfeature/function/wrapper/svm_forward.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from sklearn.svm import SVC 3 | from sklearn.cross_validation import KFold 4 | from sklearn.metrics import accuracy_score 5 | 6 | 7 | def svm_forward(X, y, n_selected_features): 8 | """ 9 | This function implements the forward feature selection algorithm based on SVM 10 | 11 | Input 12 | ----- 13 | X: {numpy array}, shape (n_samples, n_features) 14 | input data 15 | y: {numpy array}, shape (n_samples,) 16 | input class labels 17 | n_selected_features: {int} 18 | number of selected features 19 | 20 | Output 21 | ------ 22 | F: {numpy array}, shape (n_features, ) 23 | index of selected features 24 | """ 25 | 26 | n_samples, n_features = X.shape 27 | # using 10 fold cross validation 28 | cv = KFold(n_samples, n_folds=10, shuffle=True) 29 | # choose SVM as the classifier 30 | clf = SVC() 31 | 32 | # selected feature set, initialized to be empty 33 | F = [] 34 | count = 0 35 | while count < n_selected_features: 36 | max_acc = 0 37 | for i in range(n_features): 38 | if i not in F: 39 | F.append(i) 40 | X_tmp = X[:, F] 41 | acc = 0 42 | for train, test in cv: 43 | clf.fit(X_tmp[train], y[train]) 44 | y_predict = clf.predict(X_tmp[test]) 45 | acc_tmp = accuracy_score(y[test], y_predict) 46 | acc += acc_tmp 47 | acc = float(acc)/10 48 | F.pop() 49 | # record the feature which results in the largest accuracy 50 | if acc > max_acc: 51 | max_acc = acc 52 | idx = i 53 | # add the feature which results in the largest accuracy 54 | F.append(idx) 55 | count += 1 56 | return np.array(F) -------------------------------------------------------------------------------- /src/skfeature/utility/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DavideNardone/A-Sparse-Coding-Based-Approach-for-Class-Specific-Feature-Selection/32a4eba5ca437c8fe853c001b7a769ed716801d7/src/skfeature/utility/__init__.py -------------------------------------------------------------------------------- /src/skfeature/utility/construct_W.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from scipy.sparse import * 3 | from sklearn.metrics.pairwise import pairwise_distances 4 | 5 | 6 | def construct_W(X, **kwargs): 7 | """ 8 | Construct the affinity matrix W through different ways 9 | 10 | Notes 11 | ----- 12 | if kwargs is null, use the default parameter settings; 13 | if kwargs is not null, construct the affinity matrix according to parameters in kwargs 14 | 15 | Input 16 | ----- 17 | X: {numpy array}, shape (n_samples, n_features) 18 | input data 19 | kwargs: {dictionary} 20 | parameters to construct different affinity matrix W: 21 | y: {numpy array}, shape (n_samples, 1) 22 | the true label information needed under the 'supervised' neighbor mode 23 | metric: {string} 24 | choices for different distance measures 25 | 'euclidean' - use euclidean distance 26 | 'cosine' - use cosine distance (default) 27 | neighbor_mode: {string} 28 | indicates how to construct the graph 29 | 'knn' - put an edge between two nodes if and only if they are among the 30 | k nearest neighbors of each other (default) 31 | 'supervised' - put an edge between two nodes if they belong to same class 32 | and they are among the k nearest neighbors of each other 33 | weight_mode: {string} 34 | indicates how to assign weights for each edge in the graph 35 | 'binary' - 0-1 weighting, every edge receives weight of 1 (default) 36 | 'heat_kernel' - if nodes i and j are connected, put weight W_ij = exp(-norm(x_i - x_j)/2t^2) 37 | this weight mode can only be used under 'euclidean' metric and you are required 38 | to provide the parameter t 39 | 'cosine' - if nodes i and j are connected, put weight cosine(x_i,x_j). 40 | this weight mode can only be used under 'cosine' metric 41 | k: {int} 42 | choices for the number of neighbors (default k = 5) 43 | t: {float} 44 | parameter for the 'heat_kernel' weight_mode 45 | fisher_score: {boolean} 46 | indicates whether to build the affinity matrix in a fisher score way, in which W_ij = 1/n_l if yi = yj = l; 47 | otherwise W_ij = 0 (default fisher_score = false) 48 | reliefF: {boolean} 49 | indicates whether to build the affinity matrix in a reliefF way, NH(x) and NM(x,y) denotes a set of 50 | k nearest points to x with the same class as x, and a different class (the class y), respectively. 51 | W_ij = 1 if i = j; W_ij = 1/k if x_j \in NH(x_i); W_ij = -1/(c-1)k if x_j \in NM(x_i, y) (default reliefF = false) 52 | 53 | Output 54 | ------ 55 | W: {sparse matrix}, shape (n_samples, n_samples) 56 | output affinity matrix W 57 | """ 58 | 59 | # default metric is 'cosine' 60 | if 'metric' not in kwargs.keys(): 61 | kwargs['metric'] = 'cosine' 62 | 63 | # default neighbor mode is 'knn' and default neighbor size is 5 64 | if 'neighbor_mode' not in kwargs.keys(): 65 | kwargs['neighbor_mode'] = 'knn' 66 | if kwargs['neighbor_mode'] == 'knn' and 'k' not in kwargs.keys(): 67 | kwargs['k'] = 5 68 | if kwargs['neighbor_mode'] == 'supervised' and 'k' not in kwargs.keys(): 69 | kwargs['k'] = 5 70 | if kwargs['neighbor_mode'] == 'supervised' and 'y' not in kwargs.keys(): 71 | print ('Warning: label is required in the supervised neighborMode!!!') 72 | exit(0) 73 | 74 | # default weight mode is 'binary', default t in heat kernel mode is 1 75 | if 'weight_mode' not in kwargs.keys(): 76 | kwargs['weight_mode'] = 'binary' 77 | if kwargs['weight_mode'] == 'heat_kernel': 78 | if kwargs['metric'] != 'euclidean': 79 | kwargs['metric'] = 'euclidean' 80 | if 't' not in kwargs.keys(): 81 | kwargs['t'] = 1 82 | elif kwargs['weight_mode'] == 'cosine': 83 | if kwargs['metric'] != 'cosine': 84 | kwargs['metric'] = 'cosine' 85 | 86 | # default fisher_score and reliefF mode are 'false' 87 | if 'fisher_score' not in kwargs.keys(): 88 | kwargs['fisher_score'] = False 89 | if 'reliefF' not in kwargs.keys(): 90 | kwargs['reliefF'] = False 91 | 92 | n_samples, n_features = np.shape(X) 93 | 94 | # choose 'knn' neighbor mode 95 | if kwargs['neighbor_mode'] == 'knn': 96 | k = kwargs['k'] 97 | if kwargs['weight_mode'] == 'binary': 98 | if kwargs['metric'] == 'euclidean': 99 | # compute pairwise euclidean distances 100 | D = pairwise_distances(X) 101 | D **= 2 102 | # sort the distance matrix D in ascending order 103 | dump = np.sort(D, axis=1) 104 | idx = np.argsort(D, axis=1) 105 | # choose the k-nearest neighbors for each instance 106 | idx_new = idx[:, 0:k+1] 107 | G = np.zeros((n_samples*(k+1), 3)) 108 | G[:, 0] = np.tile(np.arange(n_samples), (k+1, 1)).reshape(-1) 109 | G[:, 1] = np.ravel(idx_new, order='F') 110 | G[:, 2] = 1 111 | # build the sparse affinity matrix W 112 | W = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples)) 113 | bigger = np.transpose(W) > W 114 | W = W - W.multiply(bigger) + np.transpose(W).multiply(bigger) 115 | return W 116 | 117 | elif kwargs['metric'] == 'cosine': 118 | # normalize the data first 119 | X_normalized = np.power(np.sum(X*X, axis=1), 0.5) 120 | for i in range(n_samples): 121 | X[i, :] = X[i, :]/max(1e-12, X_normalized[i]) 122 | # compute pairwise cosine distances 123 | D_cosine = np.dot(X, np.transpose(X)) 124 | # sort the distance matrix D in descending order 125 | dump = np.sort(-D_cosine, axis=1) 126 | idx = np.argsort(-D_cosine, axis=1) 127 | idx_new = idx[:, 0:k+1] 128 | G = np.zeros((n_samples*(k+1), 3)) 129 | G[:, 0] = np.tile(np.arange(n_samples), (k+1, 1)).reshape(-1) 130 | G[:, 1] = np.ravel(idx_new, order='F') 131 | G[:, 2] = 1 132 | # build the sparse affinity matrix W 133 | W = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples)) 134 | bigger = np.transpose(W) > W 135 | W = W - W.multiply(bigger) + np.transpose(W).multiply(bigger) 136 | return W 137 | 138 | elif kwargs['weight_mode'] == 'heat_kernel': 139 | t = kwargs['t'] 140 | # compute pairwise euclidean distances 141 | D = pairwise_distances(X) 142 | D **= 2 143 | # sort the distance matrix D in ascending order 144 | dump = np.sort(D, axis=1) 145 | idx = np.argsort(D, axis=1) 146 | idx_new = idx[:, 0:k+1] 147 | dump_new = dump[:, 0:k+1] 148 | # compute the pairwise heat kernel distances 149 | dump_heat_kernel = np.exp(-dump_new/(2*t*t)) 150 | G = np.zeros((n_samples*(k+1), 3)) 151 | G[:, 0] = np.tile(np.arange(n_samples), (k+1, 1)).reshape(-1) 152 | G[:, 1] = np.ravel(idx_new, order='F') 153 | G[:, 2] = np.ravel(dump_heat_kernel, order='F') 154 | # build the sparse affinity matrix W 155 | W = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples)) 156 | bigger = np.transpose(W) > W 157 | W = W - W.multiply(bigger) + np.transpose(W).multiply(bigger) 158 | return W 159 | 160 | elif kwargs['weight_mode'] == 'cosine': 161 | # normalize the data first 162 | X_normalized = np.power(np.sum(X*X, axis=1), 0.5) 163 | for i in range(n_samples): 164 | X[i, :] = X[i, :]/max(1e-12, X_normalized[i]) 165 | # compute pairwise cosine distances 166 | D_cosine = np.dot(X, np.transpose(X)) 167 | # sort the distance matrix D in ascending order 168 | dump = np.sort(-D_cosine, axis=1) 169 | idx = np.argsort(-D_cosine, axis=1) 170 | idx_new = idx[:, 0:k+1] 171 | dump_new = -dump[:, 0:k+1] 172 | G = np.zeros((n_samples*(k+1), 3)) 173 | G[:, 0] = np.tile(np.arange(n_samples), (k+1, 1)).reshape(-1) 174 | G[:, 1] = np.ravel(idx_new, order='F') 175 | G[:, 2] = np.ravel(dump_new, order='F') 176 | # build the sparse affinity matrix W 177 | W = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples)) 178 | bigger = np.transpose(W) > W 179 | W = W - W.multiply(bigger) + np.transpose(W).multiply(bigger) 180 | return W 181 | 182 | # choose supervised neighborMode 183 | elif kwargs['neighbor_mode'] == 'supervised': 184 | k = kwargs['k'] 185 | # get true labels and the number of classes 186 | y = kwargs['y'] 187 | label = np.unique(y) 188 | n_classes = np.unique(y).size 189 | # construct the weight matrix W in a fisherScore way, W_ij = 1/n_l if yi = yj = l, otherwise W_ij = 0 190 | if kwargs['fisher_score'] is True: 191 | W = lil_matrix((n_samples, n_samples)) 192 | for i in range(n_classes): 193 | class_idx = (y == label[i]) 194 | class_idx_all = (class_idx[:, np.newaxis] & class_idx[np.newaxis, :]) 195 | W[class_idx_all] = 1.0/np.sum(np.sum(class_idx)) 196 | return W 197 | 198 | # construct the weight matrix W in a reliefF way, NH(x) and NM(x,y) denotes a set of k nearest 199 | # points to x with the same class as x, a different class (the class y), respectively. W_ij = 1 if i = j; 200 | # W_ij = 1/k if x_j \in NH(x_i); W_ij = -1/(c-1)k if x_j \in NM(x_i, y) 201 | if kwargs['reliefF'] is True: 202 | # when xj in NH(xi) 203 | G = np.zeros((n_samples*(k+1), 3)) 204 | id_now = 0 205 | for i in range(n_classes): 206 | class_idx = np.column_stack(np.where(y == label[i]))[:, 0] 207 | D = pairwise_distances(X[class_idx, :]) 208 | D **= 2 209 | idx = np.argsort(D, axis=1) 210 | idx_new = idx[:, 0:k+1] 211 | n_smp_class = (class_idx[idx_new[:]]).size 212 | if len(class_idx) <= k: 213 | k = len(class_idx) - 1 214 | G[id_now:n_smp_class+id_now, 0] = np.tile(class_idx, (k+1, 1)).reshape(-1) 215 | G[id_now:n_smp_class+id_now, 1] = np.ravel(class_idx[idx_new[:]], order='F') 216 | G[id_now:n_smp_class+id_now, 2] = 1.0/k 217 | id_now += n_smp_class 218 | W1 = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples)) 219 | # when i = j, W_ij = 1 220 | for i in range(n_samples): 221 | W1[i, i] = 1 222 | # when x_j in NM(x_i, y) 223 | G = np.zeros((n_samples*k*(n_classes - 1), 3)) 224 | id_now = 0 225 | for i in range(n_classes): 226 | class_idx1 = np.column_stack(np.where(y == label[i]))[:, 0] 227 | X1 = X[class_idx1, :] 228 | for j in range(n_classes): 229 | if label[j] != label[i]: 230 | class_idx2 = np.column_stack(np.where(y == label[j]))[:, 0] 231 | X2 = X[class_idx2, :] 232 | D = pairwise_distances(X1, X2) 233 | idx = np.argsort(D, axis=1) 234 | idx_new = idx[:, 0:k] 235 | n_smp_class = len(class_idx1)*k 236 | G[id_now:n_smp_class+id_now, 0] = np.tile(class_idx1, (k, 1)).reshape(-1) 237 | G[id_now:n_smp_class+id_now, 1] = np.ravel(class_idx2[idx_new[:]], order='F') 238 | G[id_now:n_smp_class+id_now, 2] = -1.0/((n_classes-1)*k) 239 | id_now += n_smp_class 240 | W2 = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples)) 241 | bigger = np.transpose(W2) > W2 242 | W2 = W2 - W2.multiply(bigger) + np.transpose(W2).multiply(bigger) 243 | W = W1 + W2 244 | return W 245 | 246 | if kwargs['weight_mode'] == 'binary': 247 | if kwargs['metric'] == 'euclidean': 248 | G = np.zeros((n_samples*(k+1), 3)) 249 | id_now = 0 250 | for i in range(n_classes): 251 | class_idx = np.column_stack(np.where(y == label[i]))[:, 0] 252 | # compute pairwise euclidean distances for instances in class i 253 | D = pairwise_distances(X[class_idx, :]) 254 | D **= 2 255 | # sort the distance matrix D in ascending order for instances in class i 256 | idx = np.argsort(D, axis=1) 257 | idx_new = idx[:, 0:k+1] 258 | n_smp_class = len(class_idx)*(k+1) 259 | G[id_now:n_smp_class+id_now, 0] = np.tile(class_idx, (k+1, 1)).reshape(-1) 260 | G[id_now:n_smp_class+id_now, 1] = np.ravel(class_idx[idx_new[:]], order='F') 261 | G[id_now:n_smp_class+id_now, 2] = 1 262 | id_now += n_smp_class 263 | # build the sparse affinity matrix W 264 | W = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples)) 265 | bigger = np.transpose(W) > W 266 | W = W - W.multiply(bigger) + np.transpose(W).multiply(bigger) 267 | return W 268 | 269 | if kwargs['metric'] == 'cosine': 270 | # normalize the data first 271 | X_normalized = np.power(np.sum(X*X, axis=1), 0.5) 272 | for i in range(n_samples): 273 | X[i, :] = X[i, :]/max(1e-12, X_normalized[i]) 274 | G = np.zeros((n_samples*(k+1), 3)) 275 | id_now = 0 276 | for i in range(n_classes): 277 | class_idx = np.column_stack(np.where(y == label[i]))[:, 0] 278 | # compute pairwise cosine distances for instances in class i 279 | D_cosine = np.dot(X[class_idx, :], np.transpose(X[class_idx, :])) 280 | # sort the distance matrix D in descending order for instances in class i 281 | idx = np.argsort(-D_cosine, axis=1) 282 | idx_new = idx[:, 0:k+1] 283 | n_smp_class = len(class_idx)*(k+1) 284 | G[id_now:n_smp_class+id_now, 0] = np.tile(class_idx, (k+1, 1)).reshape(-1) 285 | G[id_now:n_smp_class+id_now, 1] = np.ravel(class_idx[idx_new[:]], order='F') 286 | G[id_now:n_smp_class+id_now, 2] = 1 287 | id_now += n_smp_class 288 | # build the sparse affinity matrix W 289 | W = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples)) 290 | bigger = np.transpose(W) > W 291 | W = W - W.multiply(bigger) + np.transpose(W).multiply(bigger) 292 | return W 293 | 294 | elif kwargs['weight_mode'] == 'heat_kernel': 295 | G = np.zeros((n_samples*(k+1), 3)) 296 | id_now = 0 297 | for i in range(n_classes): 298 | class_idx = np.column_stack(np.where(y == label[i]))[:, 0] 299 | # compute pairwise cosine distances for instances in class i 300 | D = pairwise_distances(X[class_idx, :]) 301 | D **= 2 302 | # sort the distance matrix D in ascending order for instances in class i 303 | dump = np.sort(D, axis=1) 304 | idx = np.argsort(D, axis=1) 305 | idx_new = idx[:, 0:k+1] 306 | dump_new = dump[:, 0:k+1] 307 | t = kwargs['t'] 308 | # compute pairwise heat kernel distances for instances in class i 309 | dump_heat_kernel = np.exp(-dump_new/(2*t*t)) 310 | n_smp_class = len(class_idx)*(k+1) 311 | G[id_now:n_smp_class+id_now, 0] = np.tile(class_idx, (k+1, 1)).reshape(-1) 312 | G[id_now:n_smp_class+id_now, 1] = np.ravel(class_idx[idx_new[:]], order='F') 313 | G[id_now:n_smp_class+id_now, 2] = np.ravel(dump_heat_kernel, order='F') 314 | id_now += n_smp_class 315 | # build the sparse affinity matrix W 316 | W = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples)) 317 | bigger = np.transpose(W) > W 318 | W = W - W.multiply(bigger) + np.transpose(W).multiply(bigger) 319 | return W 320 | 321 | elif kwargs['weight_mode'] == 'cosine': 322 | # normalize the data first 323 | X_normalized = np.power(np.sum(X*X, axis=1), 0.5) 324 | for i in range(n_samples): 325 | X[i, :] = X[i, :]/max(1e-12, X_normalized[i]) 326 | G = np.zeros((n_samples*(k+1), 3)) 327 | id_now = 0 328 | for i in range(n_classes): 329 | class_idx = np.column_stack(np.where(y == label[i]))[:, 0] 330 | # compute pairwise cosine distances for instances in class i 331 | D_cosine = np.dot(X[class_idx, :], np.transpose(X[class_idx, :])) 332 | # sort the distance matrix D in descending order for instances in class i 333 | dump = np.sort(-D_cosine, axis=1) 334 | idx = np.argsort(-D_cosine, axis=1) 335 | idx_new = idx[:, 0:k+1] 336 | dump_new = -dump[:, 0:k+1] 337 | n_smp_class = len(class_idx)*(k+1) 338 | G[id_now:n_smp_class+id_now, 0] = np.tile(class_idx, (k+1, 1)).reshape(-1) 339 | G[id_now:n_smp_class+id_now, 1] = np.ravel(class_idx[idx_new[:]], order='F') 340 | G[id_now:n_smp_class+id_now, 2] = np.ravel(dump_new, order='F') 341 | id_now += n_smp_class 342 | # build the sparse affinity matrix W 343 | W = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples)) 344 | bigger = np.transpose(W) > W 345 | W = W - W.multiply(bigger) + np.transpose(W).multiply(bigger) 346 | return W -------------------------------------------------------------------------------- /src/skfeature/utility/data_discretization.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import sklearn.preprocessing 3 | 4 | 5 | def data_discretization(X, n_bins): 6 | """ 7 | This function implements the data discretization function to discrete data into n_bins 8 | 9 | Input 10 | ----- 11 | X: {numpy array}, shape (n_samples, n_features) 12 | input data 13 | n_bins: {int} 14 | number of bins to be discretized 15 | 16 | Output 17 | ------ 18 | X_discretized: {numpy array}, shape (n_samples, n_features) 19 | output discretized data, where features are digitized to n_bins 20 | """ 21 | 22 | # normalize each feature 23 | min_max_scaler = sklearn.preprocessing.MinMaxScaler() 24 | X_normalized = min_max_scaler.fit_transform(X) 25 | 26 | # discretize X 27 | n_samples, n_features = X.shape 28 | X_discretized = np.zeros((n_samples, n_features)) 29 | bins = np.linspace(0, 1, n_bins) 30 | for i in range(n_features): 31 | X_discretized[:, i] = np.digitize(X_normalized[:, i], bins) 32 | 33 | return X_discretized 34 | -------------------------------------------------------------------------------- /src/skfeature/utility/entropy_estimators.py: -------------------------------------------------------------------------------- 1 | # Written by Greg Ver Steeg (http://www.isi.edu/~gregv/npeet.html) 2 | 3 | import scipy.spatial as ss 4 | from scipy.special import digamma 5 | from math import log 6 | import numpy.random as nr 7 | import numpy as np 8 | import random 9 | 10 | 11 | # continuous estimators 12 | 13 | def entropy(x, k=3, base=2): 14 | """ 15 | The classic K-L k-nearest neighbor continuous entropy estimator x should be a list of vectors, 16 | e.g. x = [[1.3],[3.7],[5.1],[2.4]] if x is a one-dimensional scalar and we have four samples 17 | """ 18 | 19 | assert k <= len(x)-1, "Set k smaller than num. samples - 1" 20 | d = len(x[0]) 21 | N = len(x) 22 | intens = 1e-10 # small noise to break degeneracy, see doc. 23 | x = [list(p + intens * nr.rand(len(x[0]))) for p in x] 24 | tree = ss.cKDTree(x) 25 | nn = [tree.query(point, k+1, p=float('inf'))[0][k] for point in x] 26 | const = digamma(N)-digamma(k) + d*log(2) 27 | return (const + d*np.mean(map(log, nn)))/log(base) 28 | 29 | 30 | def mi(x, y, k=3, base=2): 31 | """ 32 | Mutual information of x and y; x, y should be a list of vectors, e.g. x = [[1.3],[3.7],[5.1],[2.4]] 33 | if x is a one-dimensional scalar and we have four samples 34 | """ 35 | 36 | assert len(x) == len(y), "Lists should have same length" 37 | assert k <= len(x) - 1, "Set k smaller than num. samples - 1" 38 | intens = 1e-10 # small noise to break degeneracy, see doc. 39 | x = [list(p + intens * nr.rand(len(x[0]))) for p in x] 40 | y = [list(p + intens * nr.rand(len(y[0]))) for p in y] 41 | points = zip2(x, y) 42 | # Find nearest neighbors in joint space, p=inf means max-norm 43 | tree = ss.cKDTree(points) 44 | dvec = [tree.query(point, k+1, p=float('inf'))[0][k] for point in points] 45 | a, b, c, d = avgdigamma(x, dvec), avgdigamma(y, dvec), digamma(k), digamma(len(x)) 46 | return (-a-b+c+d)/log(base) 47 | 48 | 49 | def cmi(x, y, z, k=3, base=2): 50 | """ 51 | Mutual information of x and y, conditioned on z; x, y, z should be a list of vectors, e.g. x = [[1.3],[3.7],[5.1],[2.4]] 52 | if x is a one-dimensional scalar and we have four samples 53 | """ 54 | 55 | assert len(x) == len(y), "Lists should have same length" 56 | assert k <= len(x) - 1, "Set k smaller than num. samples - 1" 57 | intens = 1e-10 # small noise to break degeneracy, see doc. 58 | x = [list(p + intens * nr.rand(len(x[0]))) for p in x] 59 | y = [list(p + intens * nr.rand(len(y[0]))) for p in y] 60 | z = [list(p + intens * nr.rand(len(z[0]))) for p in z] 61 | points = zip2(x, y, z) 62 | # Find nearest neighbors in joint space, p=inf means max-norm 63 | tree = ss.cKDTree(points) 64 | dvec = [tree.query(point, k+1, p=float('inf'))[0][k] for point in points] 65 | a, b, c, d = avgdigamma(zip2(x, z), dvec), avgdigamma(zip2(y, z), dvec), avgdigamma(z, dvec), digamma(k) 66 | return (-a-b+c+d)/log(base) 67 | 68 | 69 | def kldiv(x, xp, k=3, base=2): 70 | """ 71 | KL Divergence between p and q for x~p(x), xp~q(x); x, xp should be a list of vectors, e.g. x = [[1.3],[3.7],[5.1],[2.4]] 72 | if x is a one-dimensional scalar and we have four samples 73 | """ 74 | 75 | assert k <= len(x) - 1, "Set k smaller than num. samples - 1" 76 | assert k <= len(xp) - 1, "Set k smaller than num. samples - 1" 77 | assert len(x[0]) == len(xp[0]), "Two distributions must have same dim." 78 | d = len(x[0]) 79 | n = len(x) 80 | m = len(xp) 81 | const = log(m) - log(n-1) 82 | tree = ss.cKDTree(x) 83 | treep = ss.cKDTree(xp) 84 | nn = [tree.query(point, k+1, p=float('inf'))[0][k] for point in x] 85 | nnp = [treep.query(point, k, p=float('inf'))[0][k-1] for point in x] 86 | return (const + d*np.mean(map(log, nnp))-d*np.mean(map(log, nn)))/log(base) 87 | 88 | 89 | # Discrete estimators 90 | def entropyd(sx, base=2): 91 | """ 92 | Discrete entropy estimator given a list of samples which can be any hashable object 93 | """ 94 | 95 | return entropyfromprobs(hist(sx), base=base) 96 | 97 | 98 | def midd(x, y): 99 | """ 100 | Discrete mutual information estimator given a list of samples which can be any hashable object 101 | """ 102 | 103 | return -entropyd(zip(x, y))+entropyd(x)+entropyd(y) 104 | 105 | 106 | def cmidd(x, y, z): 107 | """ 108 | Discrete mutual information estimator given a list of samples which can be any hashable object 109 | """ 110 | 111 | return entropyd(zip(y, z))+entropyd(zip(x, z))-entropyd(zip(x, y, z))-entropyd(z) 112 | 113 | 114 | def hist(sx): 115 | # Histogram from list of samples 116 | d = dict() 117 | for s in sx: 118 | d[s] = d.get(s, 0) + 1 119 | return map(lambda z: float(z)/len(sx), d.values()) 120 | 121 | 122 | def entropyfromprobs(probs, base=2): 123 | # Turn a normalized list of probabilities of discrete outcomes into entropy (base 2) 124 | return -sum(map(elog, probs))/log(base) 125 | 126 | 127 | def elog(x): 128 | # for entropy, 0 log 0 = 0. but we get an error for putting log 0 129 | if x <= 0. or x >= 1.: 130 | return 0 131 | else: 132 | return x*log(x) 133 | 134 | 135 | # Mixed estimators 136 | def micd(x, y, k=3, base=2, warning=True): 137 | """ If x is continuous and y is discrete, compute mutual information 138 | """ 139 | 140 | overallentropy = entropy(x, k, base) 141 | n = len(y) 142 | word_dict = dict() 143 | for sample in y: 144 | word_dict[sample] = word_dict.get(sample, 0) + 1./n 145 | yvals = list(set(word_dict.keys())) 146 | 147 | mi = overallentropy 148 | for yval in yvals: 149 | xgiveny = [x[i] for i in range(n) if y[i] == yval] 150 | if k <= len(xgiveny) - 1: 151 | mi -= word_dict[yval]*entropy(xgiveny, k, base) 152 | else: 153 | if warning: 154 | print "Warning, after conditioning, on y=", yval, " insufficient data. Assuming maximal entropy in this case." 155 | mi -= word_dict[yval]*overallentropy 156 | return mi # units already applied 157 | 158 | 159 | # Utility functions 160 | def vectorize(scalarlist): 161 | """ 162 | Turn a list of scalars into a list of one-d vectors 163 | """ 164 | 165 | return [(x,) for x in scalarlist] 166 | 167 | 168 | def shuffle_test(measure, x, y, z=False, ns=200, ci=0.95, **kwargs): 169 | """ 170 | Shuffle test 171 | Repeatedly shuffle the x-values and then estimate measure(x,y,[z]). 172 | Returns the mean and conf. interval ('ci=0.95' default) over 'ns' runs, 'measure' could me mi,cmi, 173 | e.g. Keyword arguments can be passed. Mutual information and CMI should have a mean near zero. 174 | """ 175 | 176 | xp = x[:] # A copy that we can shuffle 177 | outputs = [] 178 | for i in range(ns): 179 | random.shuffle(xp) 180 | if z: 181 | outputs.append(measure(xp, y, z, **kwargs)) 182 | else: 183 | outputs.append(measure(xp, y, **kwargs)) 184 | outputs.sort() 185 | return np.mean(outputs), (outputs[int((1.-ci)/2*ns)], outputs[int((1.+ci)/2*ns)]) 186 | 187 | 188 | # Internal functions 189 | def avgdigamma(points, dvec): 190 | # This part finds number of neighbors in some radius in the marginal space 191 | # returns expectation value of 192 | N = len(points) 193 | tree = ss.cKDTree(points) 194 | avg = 0. 195 | for i in range(N): 196 | dist = dvec[i] 197 | # subtlety, we don't include the boundary point, 198 | # but we are implicitly adding 1 to kraskov def bc center point is included 199 | num_points = len(tree.query_ball_point(points[i], dist-1e-15, p=float('inf'))) 200 | avg += digamma(num_points)/N 201 | return avg 202 | 203 | 204 | def zip2(*args): 205 | # zip2(x,y) takes the lists of vectors and makes it a list of vectors in a joint space 206 | # E.g. zip2([[1],[2],[3]],[[4],[5],[6]]) = [[1,4],[2,5],[3,6]] 207 | return [sum(sublist, []) for sublist in zip(*args)] 208 | -------------------------------------------------------------------------------- /src/skfeature/utility/mutual_information.py: -------------------------------------------------------------------------------- 1 | import entropy_estimators as ee 2 | 3 | 4 | def information_gain(f1, f2): 5 | """ 6 | This function calculates the information gain, where ig(f1,f2) = H(f1) - H(f1|f2) 7 | 8 | Input 9 | ----- 10 | f1: {numpy array}, shape (n_samples,) 11 | f2: {numpy array}, shape (n_samples,) 12 | 13 | Output 14 | ------ 15 | ig: {float} 16 | """ 17 | 18 | ig = ee.entropyd(f1) - conditional_entropy(f1, f2) 19 | return ig 20 | 21 | 22 | def conditional_entropy(f1, f2): 23 | """ 24 | This function calculates the conditional entropy, where ce = H(f1) - I(f1;f2) 25 | 26 | Input 27 | ----- 28 | f1: {numpy array}, shape (n_samples,) 29 | f2: {numpy array}, shape (n_samples,) 30 | 31 | Output 32 | ------ 33 | ce: {float} 34 | ce is conditional entropy of f1 and f2 35 | """ 36 | 37 | ce = ee.entropyd(f1) - ee.midd(f1, f2) 38 | return ce 39 | 40 | 41 | def su_calculation(f1, f2): 42 | """ 43 | This function calculates the symmetrical uncertainty, where su(f1,f2) = 2*IG(f1,f2)/(H(f1)+H(f2)) 44 | 45 | Input 46 | ----- 47 | f1: {numpy array}, shape (n_samples,) 48 | f2: {numpy array}, shape (n_samples,) 49 | 50 | Output 51 | ------ 52 | su: {float} 53 | su is the symmetrical uncertainty of f1 and f2 54 | 55 | """ 56 | 57 | # calculate information gain of f1 and f2, t1 = ig(f1,f2) 58 | t1 = information_gain(f1, f2) 59 | # calculate entropy of f1, t2 = H(f1) 60 | t2 = ee.entropyd(f1) 61 | # calculate entropy of f2, t3 = H(f2) 62 | t3 = ee.entropyd(f2) 63 | # su(f1,f2) = 2*t1/(t2+t3) 64 | su = 2.0*t1/(t2+t3) 65 | 66 | return su -------------------------------------------------------------------------------- /src/skfeature/utility/sparse_learning.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from numpy import linalg as LA 3 | 4 | 5 | def feature_ranking(W): 6 | """ 7 | This function ranks features according to the feature weights matrix W 8 | 9 | Input: 10 | ----- 11 | W: {numpy array}, shape (n_features, n_classes) 12 | feature weights matrix 13 | 14 | Output: 15 | ------ 16 | idx: {numpy array}, shape {n_features,} 17 | feature index ranked in descending order by feature importance 18 | """ 19 | T = (W*W).sum(1) 20 | idx = np.argsort(T, 0) 21 | return idx[::-1] 22 | 23 | 24 | def generate_diagonal_matrix(U): 25 | """ 26 | This function generates a diagonal matrix D from an input matrix U as D_ii = 0.5 / ||U[i,:]|| 27 | 28 | Input: 29 | ----- 30 | U: {numpy array}, shape (n_samples, n_features) 31 | 32 | Output: 33 | ------ 34 | D: {numpy array}, shape (n_samples, n_samples) 35 | """ 36 | temp = np.sqrt(np.multiply(U, U).sum(1)) 37 | temp[temp < 1e-16] = 1e-16 38 | temp = 0.5 / temp 39 | D = np.diag(temp) 40 | return D 41 | 42 | 43 | def calculate_l21_norm(X): 44 | """ 45 | This function calculates the l21 norm of a matrix X, i.e., \sum ||X[i,:]||_2 46 | 47 | Input: 48 | ----- 49 | X: {numpy array}, shape (n_samples, n_features) 50 | 51 | Output: 52 | ------ 53 | l21_norm: {float} 54 | """ 55 | return (np.sqrt(np.multiply(X, X).sum(1))).sum() 56 | 57 | 58 | def construct_label_matrix(label): 59 | """ 60 | This function converts a 1d numpy array to a 2d array, for each instance, the class label is 1 or 0 61 | 62 | Input: 63 | ----- 64 | label: {numpy array}, shape(n_samples,) 65 | 66 | Output: 67 | ------ 68 | label_matrix: {numpy array}, shape(n_samples, n_classes) 69 | """ 70 | 71 | n_samples = label.shape[0] 72 | unique_label = np.unique(label) 73 | n_classes = unique_label.shape[0] 74 | label_matrix = np.zeros((n_samples, n_classes)) 75 | for i in range(n_classes): 76 | label_matrix[label == unique_label[i], i] = 1 77 | 78 | return label_matrix.astype(int) 79 | 80 | 81 | def construct_label_matrix_pan(label): 82 | """ 83 | This function converts a 1d numpy array to a 2d array, for each instance, the class label is 1 or -1 84 | 85 | Input: 86 | ----- 87 | label: {numpy array}, shape(n_samples,) 88 | 89 | Output: 90 | ------ 91 | label_matrix: {numpy array}, shape(n_samples, n_classes) 92 | """ 93 | n_samples = label.shape[0] 94 | unique_label = np.unique(label) 95 | n_classes = unique_label.shape[0] 96 | label_matrix = np.zeros((n_samples, n_classes)) 97 | for i in range(n_classes): 98 | label_matrix[label == unique_label[i], i] = 1 99 | label_matrix[label_matrix == 0] = -1 100 | 101 | return label_matrix.astype(int) 102 | 103 | 104 | def euclidean_projection(V, n_features, n_classes, z, gamma): 105 | """ 106 | L2 Norm regularized euclidean projection min_W 1/2 ||W- V||_2^2 + z * ||W||_2 107 | """ 108 | W_projection = np.zeros((n_features, n_classes)) 109 | for i in range(n_features): 110 | if LA.norm(V[i, :]) > z/gamma: 111 | W_projection[i, :] = (1-z/(gamma*LA.norm(V[i, :])))*V[i, :] 112 | else: 113 | W_projection[i, :] = np.zeros(n_classes) 114 | return W_projection 115 | 116 | 117 | def tree_lasso_projection(v, n_features, idx, n_nodes): 118 | """ 119 | This functions solves the following optimization problem min_w 1/2 ||w-v||_2^2 + \sum z_i||w_{G_{i}}|| 120 | where w and v are of dimensions of n_features; z_i >=0, and G_{i} follows the tree structure 121 | """ 122 | # test whether the first node is special 123 | if idx[0, 0] == -1 and idx[1, 0] == -1: 124 | w_projection = np.zeros(n_features) 125 | z = idx[2, 0] 126 | for j in range(n_features): 127 | if v[j] > z: 128 | w_projection[j] = v[j] - z 129 | else: 130 | if v[j] < -z: 131 | w_projection[j] = v[j] + z 132 | else: 133 | w_projection[j] = 0 134 | i = 1 135 | 136 | else: 137 | w = v.copy() 138 | i = 0 139 | 140 | # sequentially process each node 141 | while i < n_nodes: 142 | # compute the L2 norm of this group 143 | two_norm = 0 144 | start_idx = int(idx[0, i] - 1) 145 | end_idx = int(idx[1, i]) 146 | for j in range(start_idx, end_idx): 147 | two_norm += w_projection[j] * w_projection[j] 148 | two_norm = np.sqrt(two_norm) 149 | z = idx[2, i] 150 | if two_norm > z: 151 | ratio = (two_norm - z) / two_norm 152 | # shrinkage this group by ratio 153 | for j in range(start_idx, end_idx): 154 | w_projection[j] *= ratio 155 | else: 156 | for j in range(start_idx, end_idx): 157 | w_projection[j] = 0 158 | i += 1 159 | return w_projection 160 | 161 | 162 | def tree_norm(w, n_features, idx, n_nodes): 163 | """ 164 | This function computes \sum z_i||w_{G_{i}}|| 165 | """ 166 | obj = 0 167 | # test whether the first node is special 168 | if idx[0, 0] == -1 and idx[1, 0] == -1: 169 | z = idx[2, 0] 170 | for j in range(n_features): 171 | obj += np.abs(w[j]) 172 | obj *= z 173 | i = 1 174 | else: 175 | i = 0 176 | 177 | # sequentially process each node 178 | while i < n_nodes: 179 | two_norm = 0 180 | start_idx = int(idx[0, i] - 1) 181 | end_idx = int(idx[1, i]) 182 | for j in range(start_idx, end_idx): 183 | two_norm += w[j] * w[j] 184 | two_norm = np.sqrt(two_norm) 185 | z = idx[2, i] 186 | obj += z*two_norm 187 | i += 1 188 | return obj 189 | 190 | -------------------------------------------------------------------------------- /src/skfeature/utility/unsupervised_evaluation.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import sklearn.utils.linear_assignment_ as la 3 | from sklearn.metrics import accuracy_score 4 | from sklearn.metrics.cluster import normalized_mutual_info_score 5 | from sklearn.cluster import KMeans 6 | 7 | 8 | def best_map(l1, l2): 9 | """ 10 | Permute labels of l2 to match l1 as much as possible 11 | """ 12 | if len(l1) != len(l2): 13 | print "L1.shape must == L2.shape" 14 | exit(0) 15 | 16 | label1 = np.unique(l1) 17 | n_class1 = len(label1) 18 | 19 | label2 = np.unique(l2) 20 | n_class2 = len(label2) 21 | 22 | n_class = max(n_class1, n_class2) 23 | G = np.zeros((n_class, n_class)) 24 | 25 | for i in range(0, n_class1): 26 | for j in range(0, n_class2): 27 | ss = l1 == label1[i] 28 | tt = l2 == label2[j] 29 | G[i, j] = np.count_nonzero(ss & tt) 30 | 31 | A = la.linear_assignment(-G) 32 | 33 | new_l2 = np.zeros(l2.shape) 34 | for i in range(0, n_class2): 35 | new_l2[l2 == label2[A[i][1]]] = label1[A[i][0]] 36 | return new_l2.astype(int) 37 | 38 | 39 | def evaluation(X_selected, n_clusters, y): 40 | """ 41 | This function calculates ARI, ACC and NMI of clustering results 42 | 43 | Input 44 | ----- 45 | X_selected: {numpy array}, shape (n_samples, n_selected_features} 46 | input data on the selected features 47 | n_clusters: {int} 48 | number of clusters 49 | y: {numpy array}, shape (n_samples,) 50 | true labels 51 | 52 | Output 53 | ------ 54 | nmi: {float} 55 | Normalized Mutual Information 56 | acc: {float} 57 | Accuracy 58 | """ 59 | k_means = KMeans(n_clusters=n_clusters, init='k-means++', n_init=10, max_iter=300, 60 | tol=0.0001, precompute_distances=True, verbose=0, 61 | random_state=None, copy_x=True, n_jobs=1) 62 | 63 | k_means.fit(X_selected) 64 | y_predict = k_means.labels_ 65 | 66 | # calculate NMI 67 | nmi = normalized_mutual_info_score(y, y_predict) 68 | 69 | # calculate ACC 70 | y_permuted_predict = best_map(y, y_predict) 71 | acc = accuracy_score(y, y_permuted_predict) 72 | 73 | return nmi, acc --------------------------------------------------------------------------------