├── LICENSE
├── README.md
├── dataset
└── BIOLOGICAL
│ ├── ALLAML
│ └── ALLAML.mat
│ ├── COLON
│ └── COLON.mat
│ ├── LUNG_DISCRETE
│ └── LUNG_DISCRETE.mat
│ ├── LYMPHOMA
│ └── LYMPHOMA.mat
│ └── WISCONSIN
│ └── WISCONSIN.mat
├── img
├── GS-CSFS_vs_OM-CSFS.png
└── TFS_vs_OM.png
└── src
├── CSFS_SMBA.py
├── Classifier.py
├── Dataset.py
├── FeatureSelector.py
├── Loader.py
├── SMBA.py
├── __init__.py
├── grid_search.py
└── skfeature
├── __init__.py
├── function
├── __init__.py
├── information_theoretical_based
│ ├── CIFE.py
│ ├── CMIM.py
│ ├── DISR.py
│ ├── FCBF.py
│ ├── ICAP.py
│ ├── JMI.py
│ ├── LCSI.py
│ ├── MIFS.py
│ ├── MIM.py
│ ├── MRMR.py
│ └── __init__.py
├── similarity_based
│ ├── SPEC.py
│ ├── __init__.py
│ ├── fisher_score.py
│ ├── lap_score.py
│ ├── reliefF.py
│ └── trace_ratio.py
├── sparse_learning_based
│ ├── MCFS.py
│ ├── NDFS.py
│ ├── RFS.py
│ ├── UDFS.py
│ ├── __init__.py
│ ├── ll_l21.py
│ └── ls_l21.py
├── statistical_based
│ ├── CFS.py
│ ├── __init__.py
│ ├── chi_square.py
│ ├── f_score.py
│ ├── gini_index.py
│ ├── low_variance.py
│ └── t_score.py
├── streaming
│ ├── __init__.py
│ └── alpha_investing.py
├── structure
│ ├── __init__.py
│ ├── graph_fs.py
│ ├── group_fs.py
│ └── tree_fs.py
└── wrapper
│ ├── __init__.py
│ ├── decision_tree_backward.py
│ ├── decision_tree_forward.py
│ ├── svm_backward.py
│ └── svm_forward.py
└── utility
├── __init__.py
├── construct_W.py
├── data_discretization.py
├── entropy_estimators.py
├── mutual_information.py
├── sparse_learning.py
└── unsupervised_evaluation.py
/README.md:
--------------------------------------------------------------------------------
1 | # Background
2 |
3 | Feature selection (FS) plays a key role in several scientific fields and in particular computational biology, making it possible to treat models with fewer variables, which in turn are easier to explain and might speed the experimental validation up, by providing valuable insight into the importance and their role. We propose a novel procedure for FS conceiving a two-steps approach. Firstly, a sparse coding based learning technique is used to find the best subset of features for each class of the train set. In doing so, it is assumed that a class is represented by using a subset of features, called **_representatives_** such that each sample, in a specific class, can be described as a linear combination of them. Secondly, the discovered feature subsets are fed to a class-specific feature selection scheme, to assess the effectiveness of the selected features in classification task. To this end, an ensemble of classifiers is built by training a classifier, one for each class on its own feature subset, i.e., the one discovered in the previous step and, a proper decision rule is adopted to compute the ensemble responses.
4 |
5 | # A-Sparse-Coding-Based-Approach-for-Class-Specific-Feature-Selection (SMBA-CSFS)
6 |
7 | Feature selection has been widely used for eliminating redundant or irrelevant features and it can be done in two ways: Traditional Feature Selection (TFS) for all classes and Class-Specific Feature Selection (CSFS). CSFS is the process of finding different set of features for each class. In this kind of approach, different methods have been proposed. Conversely from a TFS algorithm, where a single feature subset is selected for discriminating among all the classes in a supervised classification problem, a CSFS algorithm selects a subset of feature for each class. A general framework for CSFS can use any traditional **_feature selector_**, for choosing a possible different subset for each class of a supervised classification problem. Depending on the type of the feature selector, the overall process may slightly change.
8 |
9 | Sparse-Coding Based Approach for Class-Specific Feature Selection (SMBA-CSFS) is based on the concept of the **_Compressed Sensing_**. Basically, this approach is a joint sparse multiple optimization problem which tries to find a subset of features called **_representative_**, that best reconstruct/represent the entire dataset by linearly combining each retrieved feature component. We try to best represent each class-sample set of the train set by only using few representatives features.
10 |
11 | # Prerequisites and requirements
12 |
13 | ## Pre-requisites
14 | 1. Python 2.7 or greater
15 | 2. A CUDA version 5.0 or greater (facoltative). For installing, please refer to the [official CUDA documention](http://docs.nvidia.com/cuda/#axzz4al7PKeAs).
16 |
17 | # Requirements
18 | The software is written in Python. In order to correctly work, the software requires the following packages:
19 |
20 | - numPy
21 | - sciPy
22 | - sklearn
23 | - hdf5storage
24 | - optional: pycuda
25 | - optional: skcuda
26 |
27 | **NB**: The SMBA class can eventually run faster by exploiting the CUDA environment. In case you cannot install (for some reason) the latter dependencies, you must manually remove the code which depends on these packages.
28 |
29 | # Usage
30 |
31 | In order to use this algorithm, go into the project folder: `/src` and run the file `CSFS_SMBA.py`.
32 |
33 | # Results
34 |
35 | 
36 | `Comparison of several TFS accuracies against SMBA and SMBA-CSFS on nine data sets:
37 | (a) ALLAML(2), (b) LEUKEMIA(2), (c) CLL_SUB_111(3), (d) GLIOMA(4), (e) LUNG_C(5), (f) LUNG_D(7), (g) DLBCL(9), (h) CARCINOM(11), (i) GCM(14), when a varying number of features is selected. SVM classifier with 5-fold CV was used.`
38 |
39 |
40 |
41 | 
42 | `Comparison of several CSFS accuracies against SMBA-CSFS on nine data sets:
43 | (a) ALLAML(2), (b) LEUKEMIA(2), (c) CLL_SUB_111(3), (d) GLIOMA(4), (e) LUNG_C(5), (f) LUNG_D(7), (g) DLBCL(9), (h) CARCINOM(11), (i) GCM(14), when a varying number of features is selected. SVM classifier with 5-fold CV was used.`
44 |
45 | # Authors
46 |
47 | Davide Nardone, University of Naples Parthenope, Science and Techonlogies Departement, Msc Applied Computer Science
48 | https://www.linkedin.com/in/davide-nardone-127428102/
49 |
50 | # Contacts
51 |
52 | For any kind of problem, questions, ideas or suggestions, please don't esitate to contact me at:
53 | - **davide.nardone@live.it**
54 |
55 | # Papers that cite CSFS-SMBA
56 |
57 | If you use the software in a scientific publication, please consider citing the following scientific manuscript:
58 |
59 | ```
60 | @article{nardone2019,
61 | author = {Nardone, Davide and Ciaramella, Angelo and Staiano, Antonino},
62 | year = {2019},
63 | month = {11},
64 | pages = {25},
65 | title = {A Sparse-Modeling Based Approach for Class Specific Feature Selection},
66 | volume = {5},
67 | journal = {PeerJ Computer Science},
68 | doi = {10.7717/peerj-cs.237}
69 | }
70 | ```
71 |
--------------------------------------------------------------------------------
/dataset/BIOLOGICAL/ALLAML/ALLAML.mat:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DavideNardone/A-Sparse-Coding-Based-Approach-for-Class-Specific-Feature-Selection/32a4eba5ca437c8fe853c001b7a769ed716801d7/dataset/BIOLOGICAL/ALLAML/ALLAML.mat
--------------------------------------------------------------------------------
/dataset/BIOLOGICAL/COLON/COLON.mat:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DavideNardone/A-Sparse-Coding-Based-Approach-for-Class-Specific-Feature-Selection/32a4eba5ca437c8fe853c001b7a769ed716801d7/dataset/BIOLOGICAL/COLON/COLON.mat
--------------------------------------------------------------------------------
/dataset/BIOLOGICAL/LUNG_DISCRETE/LUNG_DISCRETE.mat:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DavideNardone/A-Sparse-Coding-Based-Approach-for-Class-Specific-Feature-Selection/32a4eba5ca437c8fe853c001b7a769ed716801d7/dataset/BIOLOGICAL/LUNG_DISCRETE/LUNG_DISCRETE.mat
--------------------------------------------------------------------------------
/dataset/BIOLOGICAL/LYMPHOMA/LYMPHOMA.mat:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DavideNardone/A-Sparse-Coding-Based-Approach-for-Class-Specific-Feature-Selection/32a4eba5ca437c8fe853c001b7a769ed716801d7/dataset/BIOLOGICAL/LYMPHOMA/LYMPHOMA.mat
--------------------------------------------------------------------------------
/dataset/BIOLOGICAL/WISCONSIN/WISCONSIN.mat:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DavideNardone/A-Sparse-Coding-Based-Approach-for-Class-Specific-Feature-Selection/32a4eba5ca437c8fe853c001b7a769ed716801d7/dataset/BIOLOGICAL/WISCONSIN/WISCONSIN.mat
--------------------------------------------------------------------------------
/img/GS-CSFS_vs_OM-CSFS.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DavideNardone/A-Sparse-Coding-Based-Approach-for-Class-Specific-Feature-Selection/32a4eba5ca437c8fe853c001b7a769ed716801d7/img/GS-CSFS_vs_OM-CSFS.png
--------------------------------------------------------------------------------
/img/TFS_vs_OM.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DavideNardone/A-Sparse-Coding-Based-Approach-for-Class-Specific-Feature-Selection/32a4eba5ca437c8fe853c001b7a769ed716801d7/img/TFS_vs_OM.png
--------------------------------------------------------------------------------
/src/CSFS_SMBA.py:
--------------------------------------------------------------------------------
1 | from __future__ import division
2 |
3 | from sklearn.svm import SVC
4 | from sklearn.metrics import accuracy_score
5 | from sklearn.model_selection import KFold
6 | from imblearn.over_sampling import SMOTE #dependency
7 |
8 | import numpy as np
9 | np.set_printoptions(threshold=np.inf)
10 | import random as rnd
11 | import time
12 | import os
13 | import errno
14 | import pickle
15 | import sys
16 |
17 | # sys.path.insert(0, './src')
18 | import Loader as lr
19 | import Dataset as ds
20 | import Classifier as i_clf
21 | import FeatureSelector as fs
22 |
23 |
24 | def checkFolder(root, path_output):
25 |
26 | #folders to generate recursively
27 | path = root+'/'+path_output
28 |
29 | try:
30 | os.makedirs(path)
31 | except OSError as exc: # Python >2.5
32 | if exc.errno == errno.EEXIST and os.path.isdir(path):
33 | pass
34 | else:
35 | raise
36 |
37 |
38 |
39 | def classificationDecisionRule(clf_score, cls, clf_name, target):
40 |
41 | n_classes = len(cls)
42 | DTS = {}
43 |
44 | for ccn in clf_name:
45 | hits = []
46 | res = []
47 | preds = []
48 |
49 | for i in xrange(0,n_classes):
50 |
51 | #ensemble scores on class 'C' for the testing set
52 | e_th = clf_score['C'+str(cls[i])]['accuracy'][ccn]
53 | res.append(e_th)
54 |
55 | hits.append((e_th == cls[i]).astype('int').flatten())
56 |
57 | # ensemble scores and hits for the testing set
58 | ensemble_res = np.vstack(res)
59 | ensemble_hits = np.vstack(hits)
60 |
61 | # Applying decision rules
62 | for i in xrange(0, ensemble_hits.shape[1]): # number of sample
63 | hits = ensemble_hits[:,i] #it has a 1 in a position whether the classifier e_i has predicted the class w_i for the i-th pattern
64 | ens_preds = ensemble_res[:,i] #it's simply the predictions of all the trained classifier for the i-th pattern
65 | cond = np.sum(hits) #count the number of true positive for the i-th pattern
66 |
67 | if cond == 1: #rule 1
68 | pred = cls[np.where(hits==1)[0].squeeze()] #retrieve the cls for the 'only' true positive
69 | preds.append(pred)
70 |
71 | elif cond == 0 or cond > 1: # rule 1-2 (tie)
72 |
73 | # we find the majority votes (frequency) among all classifier (e.g., ) [[4 2][5 1][6 2][7 2]]
74 | unique, counts = np.unique(ens_preds, return_counts=True)
75 | maj_rule = np.asarray((unique, counts)).T
76 |
77 | # we find the 'majority' index, then its class
78 | ind_max = np.argmax(maj_rule[:, 1])
79 | pred = maj_rule[ind_max, 0]
80 | max = maj_rule[ind_max, 1]
81 |
82 | # we look for a 'tie of the tie', then we look for the majority class among all the tied classes
83 | tied_cls = np.where(maj_rule[:, 1] == max)[0]
84 | if ( len(np.where(maj_rule[:, 1] == max)[0]) ) > 1: #tie of the tie
85 | pred = maj_rule[tied_cls,0]
86 |
87 | # pick one tied cls randomly
88 | pred = pred[rnd.randint(0,len(pred)-1)]
89 | preds.append(pred)
90 |
91 | else:
92 | preds.append(pred)
93 |
94 | #compute accuracy
95 | test_score = accuracy_score(target, preds)
96 |
97 | dic_test_score = {
98 | ccn: test_score
99 | }
100 |
101 | DTS.update(dic_test_score)
102 |
103 | return DTS
104 |
105 | def main():
106 | ''' LOADING ANY DATASET '''
107 | dataset_dir = '/dataset'
108 | dataset_type = '/BIOLOGICAL'
109 | dataset_name = '/WISCONSIN'
110 |
111 | #this variable decide whether to balance or not the dataset
112 | resample = True
113 | p_step = 1
114 |
115 | # defining directory paths for saving partial and complete result
116 | path_data_folder = dataset_dir + dataset_type + dataset_name
117 | path_data_file = path_data_folder + dataset_name
118 | variables = ['X', 'Y']
119 |
120 | print ('%d.Loading and pre-processing the data...\n' % p_step)
121 | p_step += 1
122 | # NB: If you get an error such as: 'Please use HDF reader for matlab v7.3 files',please change the 'format variable' to 'matlab_v73'
123 | D = lr.Loader(file_path=path_data_file,
124 | format='matlab',
125 | variables=variables,
126 | name=dataset_name[1:]
127 | ).getVariables(variables=variables)
128 |
129 | dataset = ds.Dataset(D['X'], D['Y'])
130 |
131 | n_classes = dataset.classes.shape[0]
132 | cls = np.unique(dataset.classes)
133 |
134 | # check if the data are already standardized, if not standardize it
135 | dataset.standardizeDataset()
136 |
137 | # re-sampling dataset
138 | num_min_cls = 9999999
139 | print ('%d.Class-sample separation...\n' % p_step)
140 | p_step += 1
141 | if resample == True:
142 |
143 | print ('\tDataset %s before resampling w/ size: %s and number of classes: %s---> %s' % (
144 | dataset_name[1:], dataset.data.shape, n_classes, cls))
145 |
146 | # discriminating classes of the whole dataset
147 | dataset_train = ds.Dataset(dataset.data, dataset.target)
148 | dataset_train.separateSampleClass()
149 | data, target = dataset_train.getSampleClass()
150 |
151 | for i in xrange(0, n_classes):
152 | print ('\t\t#sample for class C%s: %s' % (i + 1, data[i].shape))
153 | if data[i].shape[0] < num_min_cls:
154 | num_min_cls = data[i].shape[0]
155 |
156 | resample = '/BALANCED'
157 | print ('%d.Class balancing...' % p_step)
158 | dataset.data, dataset.target = SMOTE(kind='regular', k_neighbors=num_min_cls - 1).fit_sample(dataset.data,
159 | dataset.target)
160 | p_step += 1
161 | else:
162 | resample = '/UNBALANCED'
163 |
164 | # shuffling data
165 | print ('\tShuffling data...')
166 | dataset.shufflingDataset()
167 |
168 | print ('\tDataset %s w/ size: %s and number of classes: %s---> %s' % (
169 | dataset_name[1:], dataset.data.shape, n_classes, cls))
170 |
171 | # discriminating classes the whole dataset
172 | dataset_train = ds.Dataset(dataset.data, dataset.target)
173 | dataset_train.separateSampleClass()
174 | data, target = dataset_train.getSampleClass()
175 |
176 | for i in xrange(0, n_classes):
177 | print ('\t\t#sample for class C%s: %s' % (i + 1, data[i].shape))
178 |
179 | # Max number of features to use
180 | max_num_feat = 300
181 | step = 1
182 | # max_num_feat = dataset.data.shape[1]
183 |
184 | if max_num_feat > dataset.data.shape[1]:
185 | max_num_feat = dataset.data.shape[1]
186 |
187 | alpha = 10 #regularizatio parameter (typically alpha in [2,50])
188 |
189 | params = {
190 |
191 | 'SMBA':
192 | # the smaller is alpha the sparser is the C matrix (fewer representatives)
193 | {
194 | 'alpha': alpha,
195 | 'norm_type': 1,
196 | 'max_iter': 3000,
197 | 'thr': [10 ** -8],
198 | 'type_indices': 'nrmInd',
199 | 'normalize': False,
200 | 'GPU': False,
201 | 'device': 0,
202 | 'PCA': False,
203 | 'verbose': False,
204 | 'step': 1,
205 | 'affine': False,
206 | }
207 | # it's possible to add other FS methods by modifying the correct file
208 | }
209 |
210 | fs_model = fs.FeatureSelector(name='SMBA', tp='SLB', params=params['SMBA'])
211 | fs_name = 'SMBA'
212 |
213 | # CLASSIFIERS (it's possible to add other classifier methods by adding entries into this list)
214 | clf_name = [
215 | "SVM"
216 | # "Decision Tree",
217 | # "KNN"
218 | ]
219 | model = [
220 | SVC(kernel="linear")
221 | # DecisionTreeClassifier(max_depth=5),
222 | # KNeighborsClassifier(n_neighbors=1)
223 | ]
224 |
225 | '''Perform K-fold Cross Validation...'''
226 | k_fold = 10
227 |
228 | #defining result folders
229 | fs_path_output = '/CSFS/FS/K_FOLD'
230 | checkFolder(path_data_folder, fs_path_output)
231 |
232 | res_path_output = '/CSFS/RESULTS/K_FOLD'
233 | checkFolder(path_data_folder, fs_path_output)
234 |
235 | all_scores = {}
236 | all_scores.update({fs_name: []})
237 |
238 | cc_fold = 0
239 | conf_dataset = {}
240 |
241 | X = dataset.data
242 | y = dataset.target
243 | kf = KFold(n_splits=k_fold)
244 |
245 | print ('%d.Running the Intra-Class-Specific Feature Selection and building the ensemble classifier...\n' % p_step)
246 | p_step += 1
247 | for train_index, test_index in kf.split(X):
248 |
249 | X_train_kth, X_test_kth = X[train_index], X[test_index]
250 | y_train, y_test = y[train_index], y[test_index]
251 |
252 | print ('\tDOING %s-CROSS VALIDATION W/ TRAINING SET SIZE: %s' % (cc_fold + 1, X_train_kth.shape))
253 |
254 | ''' For the training data in each class we find the representative features and use them as a best subset feature
255 | (in representing each class sample) to perform classification
256 | '''
257 |
258 | csfs_res = {}
259 |
260 | for i in xrange(0, n_classes):
261 | cls_res = {
262 | 'C' + str(cls[i]): {}
263 | }
264 | csfs_res.update(cls_res)
265 |
266 | kth_scores = {}
267 | for i in xrange(0, len(clf_name)):
268 | kth_scores.update({clf_name[i]: []})
269 |
270 | # check whether the 'curr_res_fs_fold' directory exists, otherwise create it
271 | curr_res_fs_fold = path_data_folder + '/' + fs_path_output + '/' + fs_name + resample
272 | checkFolder(path_data_folder, fs_path_output + '/' + fs_name + resample)
273 |
274 | # discriminating classes for the k-th fold of the training set
275 | data_train = ds.Dataset(X_train_kth, y_train)
276 | data_train.separateSampleClass()
277 | ktrain_data, ktrain_target = data_train.getSampleClass()
278 | K_cls_ind_train = data_train.ind_class
279 |
280 | for i in xrange(0, n_classes):
281 | # print ('Train set size C' + str(i + 1) + ':', ktrain_data[i].shape)
282 |
283 | print ('\tPerforming feature selection on class %d with shape %s' % (cls[i] + 1, ktrain_data[i].shape))
284 |
285 | start_time = time.time()
286 | idx = fs_model.fit(ktrain_data[i], ktrain_target[i])
287 |
288 | # print idx
289 |
290 | print('\tTotal Time = %s seconds\n' % (time.time() - start_time))
291 |
292 | csfs_res['C' + str(cls[i])]['idx'] = idx
293 | csfs_res['C' + str(cls[i])]['params'] = params[fs_name]
294 |
295 | # with open(curr_res_fs_fold + '/' + str(cc_fold + 1) + '-fold' + '.pickle', 'wb') as handle:
296 | # pickle.dump(csfs_res, handle, protocol=pickle.HIGHEST_PROTOCOL)
297 |
298 | ens_class = {}
299 | # learning a classifier (ccn) for each subset of 'n_rep' feature
300 | for j in xrange(0, max_num_feat):
301 | n_rep = j + 1 # first n_rep indices
302 |
303 | for i in xrange(0, n_classes):
304 | # get subset of feature from the i-th class
305 | idx = csfs_res['C' + str(cls[i])]['idx']
306 |
307 | # print idx[0:n_rep]
308 |
309 | X_train_fs = X_train_kth[:, idx[0:n_rep]]
310 |
311 | _clf = i_clf.Classifier(names=clf_name, classifiers=model)
312 | _clf.train(X_train_fs, y_train)
313 |
314 | csfs_res['C' + str(cls[i])]['accuracy'] = _clf.classify(X_test_kth[:, idx[0:n_rep]], y_test)
315 |
316 | DTS = classificationDecisionRule(csfs_res, cls, clf_name, y_test)
317 |
318 | for i in xrange(0, len(clf_name)):
319 | _score = DTS[clf_name[i]]
320 | # print ('Accuracy w/ %d feature: %f' % (n_rep, _score))
321 | kth_scores[clf_name[i]].append(_score)
322 |
323 | x = np.arange(1, max_num_feat + 1)
324 |
325 | kth_results = {
326 | 'clf_name': clf_name,
327 | 'x': x,
328 | 'scores': kth_scores,
329 | }
330 |
331 | all_scores[fs_name].append(kth_results)
332 |
333 | # saving k-th dataset configuration
334 | # with open(path_data_folder + fs_path_output + '/' + str(cc_fold + 1) + '-fold_conf_dataset.pickle',
335 | # 'wb') as handle: # TODO: customize output name for recognizing FS parameters' method
336 | # pickle.dump(conf_dataset, handle, protocol=pickle.HIGHEST_PROTOCOL)
337 |
338 | cc_fold += 1
339 |
340 | # print all_scores
341 |
342 | print('%s.Averaging results...\n' % p_step)
343 | p_step += 1
344 | # Averaging results on k-fold
345 |
346 | # check whether the 'curr_res_fs_fold' directory exists, otherwise create it
347 | curr_res_output_fold = path_data_folder + '/' + res_path_output + '/' + fs_name + resample
348 | checkFolder(path_data_folder, res_path_output + '/' + fs_name + resample)
349 |
350 | M = {}
351 | for i in xrange(0, len(clf_name)):
352 | M.update({clf_name[i]: np.ones([k_fold, max_num_feat]) * 0})
353 |
354 | avg_scores = {}
355 | std_scores = {}
356 | for i in xrange(0, len(clf_name)):
357 | avg_scores.update({clf_name[i]: []})
358 | std_scores.update({clf_name[i]: []})
359 |
360 | # k-fold results for each classifier
361 | for k in xrange(0, k_fold):
362 | for clf in clf_name:
363 | M[clf][k, :] = all_scores[fs_name][k]['scores'][clf][:max_num_feat]
364 |
365 | for clf in clf_name:
366 | avg_scores[clf] = np.mean(M[clf], axis=0)
367 | std_scores[clf] = np.std(M[clf], axis=0)
368 |
369 | x = np.arange(1, max_num_feat + 1)
370 | results = {
371 | 'clf_name': clf_name,
372 | 'x': x,
373 | 'M': M,
374 | 'scores': avg_scores,
375 | 'std': std_scores
376 | }
377 |
378 | # print avg_scores
379 |
380 | with open(curr_res_output_fold + '/clf_results.pickle', 'wb') as handle:
381 | pickle.dump(results, handle, protocol=pickle.HIGHEST_PROTOCOL)
382 | print ('Done with %s, [%d-cross validation] ' % (dataset_name[1:], k_fold))
383 |
384 |
385 | if __name__ == '__main__':
386 | main()
387 |
--------------------------------------------------------------------------------
/src/Classifier.py:
--------------------------------------------------------------------------------
1 | from sklearn.neighbors import KNeighborsClassifier
2 | from sklearn.svm import SVC
3 | from sklearn.tree import DecisionTreeClassifier
4 | from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
5 | from sklearn.linear_model import LogisticRegression
6 | #add the need classifiers when using this class
7 |
8 |
9 |
10 | class Classifier:
11 |
12 | def __init__(self, names=None, classifiers=None):
13 |
14 | self.cv_scores = {}
15 |
16 | #Default classifiers and parameters
17 | if names == None:
18 |
19 | self.names = [
20 | "KNN", "Logistic Regression", "SVM",
21 | "Decision Tree", "Random Forest", "AdaBoost"
22 | ]
23 |
24 | self.classifiers = [
25 |
26 | KNeighborsClassifier(n_neighbors=1),
27 | LogisticRegression(C=1e5),
28 | SVC(kernel="linear"),
29 | DecisionTreeClassifier(max_depth=5),
30 | RandomForestClassifier(max_depth=5, n_estimators=10),
31 | AdaBoostClassifier()
32 | ]
33 |
34 | else:
35 | self.names = names
36 | self.classifiers = classifiers
37 |
38 | for name in self.names:
39 | self.cv_scores[name] = []
40 |
41 |
42 |
43 | def train(self, X_train, y_train):
44 |
45 | for name, clf in zip(self.names, self.classifiers):
46 |
47 | # Training the algorithm using the selected predictors and target.
48 | clf.fit(X_train, y_train)
49 |
50 | def classify(self, X_test, y_test):
51 |
52 | # Record error for training and testing
53 | DTS = {}
54 |
55 | for name, clf in zip(self.names, self.classifiers):
56 |
57 | preds = clf.predict(X_test)
58 |
59 | dic_label = {
60 | name: preds
61 | }
62 |
63 | DTS.update(dic_label)
64 |
65 | return DTS
--------------------------------------------------------------------------------
/src/Dataset.py:
--------------------------------------------------------------------------------
1 | from sklearn import preprocessing
2 | from sklearn.preprocessing import StandardScaler
3 |
4 | import hdf5storage #dependency
5 | import numpy as np
6 | np.set_printoptions(threshold=np.inf)
7 |
8 |
9 | class Dataset:
10 | def __init__(self, X, y):
11 |
12 | self.data = X
13 | self.target = y.flatten()
14 |
15 | # removing any row with at least one NaN value
16 | # TODO: remove also the corresponding target value
17 | self.data = self.data[~np.isnan(self.data).any(axis=1)]
18 |
19 | self.num_sample, self.num_features = self.data.shape[0], self.data.shape[1]
20 |
21 | # retrieving unique label for Dataset
22 | self.classes = np.unique(self.target)
23 |
24 | def standardizeDataset(self):
25 |
26 | # it simply standardize the data [mean 0 and std 1]
27 | if np.sum(np.std(self.data, axis=0)).astype('int32') == self.num_features and np.sum(
28 | np.mean(self.data, axis=0)) < 1 ** -7:
29 | print ('\tThe data were already standardized!')
30 | else:
31 | print ('Standardizing data....')
32 | self.data = StandardScaler().fit_transform(self.data)
33 |
34 | def normalizeDataset(self, norm):
35 |
36 | normalizer = preprocessing.Normalizer(norm=norm)
37 | self.data = normalizer.fit_transform(self.data)
38 |
39 | def scalingDataset(self):
40 |
41 | min_max_scaler = preprocessing.MinMaxScaler()
42 | self.data = min_max_scaler.fit_transform(self.data)
43 |
44 | def shufflingDataset(self):
45 |
46 | idx = np.random.permutation(self.data.shape[0])
47 | self.data = self.data[idx]
48 | self.target = self.target[idx]
49 |
50 |
51 | def split(self, split_ratio=0.8):
52 |
53 | # shuffling data
54 | indices = np.random.permutation(self.num_sample)
55 |
56 | start = int(split_ratio * self.num_sample)
57 | training_idx, test_idx = indices[:start], indices[start:]
58 | X_train, X_test = self.data[training_idx, :], self.data[test_idx, :]
59 | y_train, y_test = self.target[training_idx], self.target[test_idx]
60 |
61 | return X_train, y_train, X_test, y_test, training_idx, test_idx
62 |
63 | def separateSampleClass(self):
64 |
65 | # Discriminating the classes sample
66 | self.ind_class = []
67 | for i in xrange(0, len(self.classes)):
68 | self.ind_class.append(np.where(self.target == self.classes[i]))
69 |
70 | def getSampleClass(self):
71 |
72 | data = []
73 | target = []
74 | # Selecting the 'train sample' on the basis of the previously retrieved indices
75 | for i in xrange(0, len(self.classes)):
76 | data.append(self.data[self.ind_class[i]])
77 | target.append(self.target[self.ind_class[i]])
78 |
79 | return data, target
80 |
81 | def getIndClass(self):
82 |
83 | return self.ind_class
--------------------------------------------------------------------------------
/src/FeatureSelector.py:
--------------------------------------------------------------------------------
1 | from sklearn.feature_selection import SelectFromModel
2 | from sklearn.linear_model import ElasticNet, Lasso
3 | from sklearn.feature_selection import mutual_info_classif
4 |
5 | from skfeature.utility.sparse_learning import construct_label_matrix, feature_ranking
6 | from skfeature.function.sparse_learning_based import RFS, ls_l21, ll_l21, MCFS, NDFS, UDFS
7 |
8 | from skfeature.function.similarity_based import reliefF, fisher_score
9 | from skfeature.function.information_theoretical_based import MRMR
10 |
11 | import sys
12 | sys.path.insert(0, './src')
13 | import numpy as np
14 | np.set_printoptions(threshold=np.inf)
15 | import SMBA as fs
16 |
17 |
18 | class FeatureSelector:
19 |
20 | def __init__(self, model=None, name=None, tp=None, params=None):
21 |
22 | self.name = name
23 | self.model = model
24 | self.tp = tp
25 | self.params = params
26 |
27 | def setParams(self, comb_par, params_name, params):
28 |
29 | for par_name, par in zip(params_name, comb_par):
30 | params[par_name] = par
31 |
32 | self.params = params
33 |
34 | def fit(self, X, y):
35 |
36 | idx = []
37 |
38 | if self.tp == 'ITB':
39 |
40 | if self.name == 'MRMR':
41 | idx = MRMR.mrmr(X, y, n_selected_features=self.params['num_feats'])
42 |
43 | elif self.tp == 'filter':
44 |
45 | if self.name == 'Relief':
46 | score = reliefF.reliefF(X, y, k=self.params['k'])
47 | idx = reliefF.feature_ranking(score)
48 |
49 | if self.name == 'Fisher':
50 | # obtain the score of each feature on the training set
51 | score = fisher_score.fisher_score(X, y)
52 |
53 | # rank features in descending order according to score
54 | idx = fisher_score.feature_ranking(score)
55 |
56 | if self.name == 'MI':
57 | idx = np.argsort(mutual_info_classif(X, y, n_neighbors=self.params['n_neighbors']))[::-1]
58 |
59 | elif self.tp == 'wrapper':
60 |
61 | model_fit = self.model.fit(X, y)
62 | model = SelectFromModel(model_fit, prefit=True)
63 | idx = model.get_support(indices=True)
64 | elif self.tp == 'SLB':
65 |
66 | # one-hot-encode on target
67 | y = construct_label_matrix(y)
68 |
69 | if self.name == 'SMBA':
70 | scba = fs.SCBA(data=X, alpha=self.params['alpha'], norm_type=self.params['norm_type'],
71 | verbose=self.params['verbose'], thr=self.params['thr'], max_iter=self.params['max_iter'],
72 | affine=self.params['affine'],
73 | normalize=self.params['normalize'],
74 | step=self.params['step'],
75 | PCA=self.params['PCA'],
76 | GPU=self.params['GPU'],
77 | device = self.params['device'])
78 |
79 | nrmInd, sInd, repInd, _ = scba.admm()
80 | if self.params['type_indices'] == 'nrmInd':
81 | idx = nrmInd
82 | elif self.params['type_indices'] == 'repInd':
83 | idx = repInd
84 | else:
85 | idx = sInd
86 |
87 | if self.name == 'RFS':
88 | W = RFS.rfs(X, y, gamma=self.params['gamma'])
89 | idx = feature_ranking(W)
90 |
91 | if self.name == 'll_l21':
92 | # obtain the feature weight matrix
93 | W, _, _ = ll_l21.proximal_gradient_descent(X, y, z=self.params['z'], verbose=False)
94 | # sort the feature scores in an ascending order according to the feature scores
95 | idx = feature_ranking(W)
96 | if self.name == 'ls_l21':
97 | # obtain the feature weight matrix
98 | W, _, _ = ls_l21.proximal_gradient_descent(X, y, z=self.params['z'], verbose=False)
99 |
100 | # sort the feature scores in an ascending order according to the feature scores
101 | idx = feature_ranking(W)
102 |
103 | if self.name == 'LASSO':
104 |
105 | LASSO = Lasso(alpha=self.params['alpha'], positive=True)
106 |
107 | y_pred_lasso = LASSO.fit(X, y)
108 |
109 | if y_pred_lasso.coef_.ndim == 1:
110 | coeff = y_pred_lasso.coef_
111 | else:
112 | coeff = np.asarray(y_pred_lasso.coef_[0, :])
113 |
114 | idx = np.argsort(-coeff)
115 |
116 | if self.name == 'EN': # elastic net L1
117 |
118 | enet = ElasticNet(alpha=self.params['alpha'], l1_ratio=1, positive=True)
119 | y_pred_enet = enet.fit(X, y)
120 |
121 | if y_pred_enet.coef_.ndim == 1:
122 | coeff = y_pred_enet.coef_
123 | else:
124 | coeff = np.asarray(y_pred_enet.coef_[0, :])
125 |
126 | idx = np.argsort(-coeff)
127 |
128 | return idx
--------------------------------------------------------------------------------
/src/Loader.py:
--------------------------------------------------------------------------------
1 | import hdf5storage #dependency
2 | import numpy as np
3 |
4 | np.set_printoptions(threshold=np.inf)
5 | import scipy.io as sio
6 |
7 | class Loader:
8 | def __init__(self, file_path, name, variables, format, k_fold=None):
9 |
10 |
11 | # This Class provides several method for loading many type of dataset (matlab, csv, txt, etc)
12 |
13 | if format == 'matlab': # classic workspace
14 |
15 | mc = sio.loadmat(file_path)
16 |
17 | for variable in variables:
18 | setattr(self, variable, mc[variable])
19 |
20 | elif format == 'matlab_struct': # struct one level
21 | print ('Loading data...')
22 |
23 | mc = sio.loadmat(file_path)
24 | mc = mc[name][0, 0]
25 |
26 | for variable in variables:
27 | setattr(self, variable, mc[variable])
28 |
29 | elif format == 'custom_matlab':
30 | print ('Loading data...')
31 |
32 | mc = sio.loadmat(file_path)
33 | mc = mc[name][0, 0]
34 |
35 | for variable in variables:
36 | setattr(self, variable, mc[variable][0, 0])
37 |
38 | elif format == 'matlab_v73':
39 | mc = hdf5storage.loadmat(file_path)
40 |
41 | for variable in variables:
42 | setattr(self, variable, mc[variable])
43 |
44 | def getVariables(self, variables):
45 |
46 | D = {}
47 |
48 | for variable in variables:
49 | D[variable] = getattr(self, variable)
50 |
51 | return D
--------------------------------------------------------------------------------
/src/SMBA.py:
--------------------------------------------------------------------------------
1 | from __future__ import division
2 |
3 | from sklearn.decomposition import PCA
4 |
5 | import numpy as np
6 | import numpy.matlib
7 | np.set_printoptions(threshold=np.inf)
8 | import pycuda.autoinit
9 | import pycuda.gpuarray as gpuarray
10 | import skcuda.linalg as linalg
11 | import skcuda.misc as misc
12 | import time
13 |
14 |
15 | class SMBA():
16 |
17 | def __init__(self, data, alpha=10, norm_type=1,
18 | verbose=False, step=5, thr=[10**-8,-1], max_iter=5000,
19 | affine=False,
20 | normalize=True,
21 | PCA=False, npc=10, GPU=False, device=0):
22 |
23 | self.data = data
24 | self.alpha = alpha
25 | self.norm_type=norm_type
26 | self.verbose = verbose
27 | self.step = step
28 | self.thr = thr
29 | self.max_iter = max_iter
30 | self.affine = affine
31 | self.normalize = normalize
32 | self.device = device
33 | self.PCA = PCA
34 | self.npc = npc
35 | self.GPU = GPU
36 |
37 | self.num_rows = data.shape[0]
38 | self.num_columns = data.shape[1]
39 |
40 | if(self.GPU==True):
41 | # self.data = self.data.astype('float32')
42 | linalg.init()
43 | # dev = misc.get_current_device()
44 | # dev = misc.init_device(n=self.device)
45 | # print misc.get_dev_attrs(dev)
46 |
47 |
48 | def computeLambda(self):
49 | print ('\t\tComputing lambda...')
50 |
51 | T = np.zeros(self.num_columns)
52 |
53 | if (self.GPU == True):
54 |
55 | if not self.affine:
56 |
57 | gpu_data = gpuarray.to_gpu(self.data)
58 | C_gpu = linalg.dot(gpu_data, gpu_data, transa='T')
59 |
60 | for i in xrange(self.num_columns):
61 | T[i] = linalg.norm(C_gpu[i,:])
62 |
63 | else:
64 |
65 | gpu_data = gpuarray.to_gpu(self.data)
66 |
67 | # affine transformation
68 | y_mean_gpu = misc.mean(gpu_data,axis=1)
69 |
70 | # creating affine matrix to subtract to the data (may encounter problem with strides)
71 | aff_mat = np.zeros([self.num_rows,self.num_columns]).astype('f')
72 | for i in xrange(0,self.num_columns):
73 | aff_mat[:,i] = y_mean_gpu.get()
74 |
75 |
76 | aff_mat_gpu = gpuarray.to_gpu(aff_mat)
77 | gpu_data_aff = misc.subtract(aff_mat_gpu,gpu_data)
78 |
79 | C_gpu = linalg.dot(gpu_data, gpu_data_aff, transa='T')
80 |
81 | #computing euclidean norm (rows)
82 | for i in xrange(self.num_columns):
83 | T[i] = linalg.norm(C_gpu[i,:])
84 | else:
85 |
86 | if not self.affine:
87 |
88 | T = np.linalg.norm(np.dot(self.data.T, self.data), axis=1)
89 |
90 | else:
91 | #affine transformation
92 | y_mean = np.mean(self.data, axis=1)
93 |
94 | tmp_mat = np.outer(y_mean, np.ones(self.num_columns)) - self.data
95 |
96 | T = np.linalg.norm(np.dot(self.data.T, tmp_mat),axis=1)
97 |
98 | _lambda = np.amax(T)
99 |
100 | return _lambda
101 |
102 | def shrinkL1Lq(self, C1, _lambda):
103 |
104 | D,N = C1.shape
105 | C2 = []
106 | if self.norm_type == 1:
107 |
108 | #TODO: incapsulate into one function
109 | # soft thresholding
110 | C2 = np.abs(C1) - _lambda
111 | ind = C2 < 0
112 | C2[ind] = 0
113 | C2 = np.multiply(C2, np.sign(C1))
114 | elif self.norm_type == 2:
115 | r = np.zeros([D,1])
116 | for j in xrange(0,D):
117 | th = np.linalg.norm(C1[j,:]) - _lambda
118 | r[j] = 0 if th < 0 else th
119 | C2 = np.multiply(np.matlib.repmat(np.divide(r, (r + _lambda )), 1, N), C1)
120 | elif self.norm_type == 'inf':
121 | # TODO: write it
122 | print ''
123 |
124 | return C2
125 |
126 | def errorCoef(self, Z, C):
127 |
128 | err = np.sum(np.abs(Z-C)) / (np.shape(C)[0] * np.shape(C)[1])
129 |
130 | return err
131 | # err = sum(sum(abs(Z - C))) / (size(C, 1) * size(C, 2));
132 |
133 | def almLasso_mat_fun(self):
134 |
135 | '''
136 | This function represents the Augumented Lagrangian Multipliers method for Lasso problem.
137 | The lagrangian form of the Lasso can be expressed as following:
138 |
139 | MIN{ 1/2||Y-XBHETA||_2^2 + lambda||THETA||_1} s.t B-T=0
140 |
141 | When applied to this problem, the ADMM updates take the form
142 |
143 | BHETA^t+1 = (XtX + rhoI)^-1(Xty + rho^t - mu^t)
144 | THETA^t+1 = Shrinkage_lambda/rho(BHETA(t+1) + mu(t)/rho)
145 | mu(t+1) = mu(t) + rho(BHETA(t+1) - BHETA(t+1))
146 |
147 | The algorithm involves a 'ridge regression' update for BHETA, a soft-thresholding (shrinkage) step for THETA and
148 | then a simple linear update for mu
149 |
150 | NB: Actually, this ADMM version contains several variations such as the using of two penalty parameters instead
151 | of just one of them (mu1, mu2)
152 | '''
153 |
154 | print ('\tADMM processing...')
155 |
156 | alpha1 = alpha2 = 0
157 | if (len(self.reg_params) == 1):
158 | alpha1 = self.reg_params[0]
159 | alpha2 = self.reg_params[0]
160 | elif (len(self.reg_params) == 2):
161 | alpha1 = self.reg_params[0]
162 | alpha2 = self.reg_params[1]
163 |
164 | #thresholds parameters for stopping criteria
165 | if (len(self.thr) == 1):
166 | thr1 = self.thr[0]
167 | thr2 = self.thr[0]
168 | elif (len(self.thr) == 2):
169 | thr1 = self.thr[0]
170 | thr2 = self.thr[1]
171 |
172 | # entry condition
173 | err1 = 10 * thr1
174 | err2 = 10 * thr2
175 |
176 | start_time = time.time()
177 |
178 | # setting penalty parameters for the ALM
179 | mu1p = alpha1 * 1/self.computeLambda()
180 | print("\t\t-Compute Lambda- Time = %s seconds" % (time.time() - start_time))
181 | mu2p = alpha2 * 1
182 |
183 | mu1 = mu1p
184 | mu2 = mu2p
185 |
186 | i = 1
187 | start_time = time.time()
188 | if self.GPU == True:
189 |
190 | # defining penalty parameters e constraint to minimize, lambda and C matrix respectively
191 | THETA = misc.zeros((self.num_columns,self.num_columns),dtype='float64')
192 | lambda2 = misc.zeros((self.num_columns,self.num_columns),dtype='float64')
193 |
194 | gpu_data = gpuarray.to_gpu(self.data)
195 | P_GPU = linalg.dot(gpu_data,gpu_data,transa='T')
196 |
197 | OP1 = P_GPU
198 | linalg.scale(np.float32(mu1), OP1)
199 |
200 | OP2 = linalg.eye(self.num_columns)
201 | linalg.scale(mu2,OP2)
202 |
203 |
204 | if self.affine == True:
205 |
206 | print ('\t\tGPU affine...')
207 |
208 | OP3 = misc.ones((self.num_columns, self.num_columns), dtype='float64')
209 | linalg.scale(mu2, OP3)
210 | lambda3 = misc.zeros((1, self.num_columns), dtype='float64')
211 |
212 | # TODO: Because of some problem with linalg.inv version of scikit-cuda we fix it using np.linalg.inv of numpy
213 | A = np.linalg.inv(misc.add(misc.add(OP1.get(), OP2.get()), OP3.get()))
214 |
215 | A_GPU = gpuarray.to_gpu(A)
216 |
217 | while ( (err1 > thr1 or err2 > thr1) and i < self.max_iter):
218 |
219 | _lambda2 = gpuarray.to_gpu(lambda2)
220 | _lambda3 = gpuarray.to_gpu(lambda3)
221 |
222 | linalg.scale(1/mu2, _lambda2)
223 | term_OP2 = gpuarray.to_gpu(_lambda2.get())
224 |
225 | OP2 = gpuarray.to_gpu(misc.subtract(THETA, term_OP2))
226 | linalg.scale(mu2,OP2)
227 |
228 | OP4 = gpuarray.to_gpu(np.matlib.repmat(_lambda3.get(), self.num_columns, 1))
229 |
230 | # updating Z
231 | BHETA = linalg.dot(A_GPU,misc.add(misc.add(misc.add(OP1,OP2),OP3),OP4))
232 |
233 | # deallocating unnecessary GPU variables
234 | OP2.gpudata.free()
235 | OP4.gpudata.free()
236 | _lambda2.gpudata.free()
237 | _lambda3.gpudata.free()
238 |
239 | # updating C
240 | THETA = misc.add(BHETA,term_OP2)
241 | THETA = self.shrinkL1Lq(THETA.get(),1/mu2)
242 | THETA = THETA.astype('float64')
243 |
244 | # updating Lagrange multipliers
245 | term_lambda2 = misc.subtract(BHETA, gpuarray.to_gpu(THETA))
246 |
247 | linalg.scale(mu2,term_lambda2)
248 | term_lambda2 = gpuarray.to_gpu(term_lambda2.get())
249 | lambda2 = misc.add(lambda2, term_lambda2) # on GPU
250 |
251 | term_lambda3 = misc.subtract(misc.ones((1, self.num_columns), dtype='float64'), misc.sum(BHETA,axis=0))
252 | linalg.scale(mu2,term_lambda3)
253 | term_lambda3 = gpuarray.to_gpu(term_lambda3.get())
254 | lambda3 = misc.add(lambda3, term_lambda3) # on GPU
255 |
256 | # deallocating unnecessary GPU variables
257 | term_OP2.gpudata.free()
258 | term_lambda2.gpudata.free()
259 | term_lambda3.gpudata.free()
260 |
261 | err1 = self.errorCoef(BHETA.get(), THETA)
262 | err2 = self.errorCoef(np.sum(BHETA.get(), axis=0), np.ones([1, self.num_columns]))
263 |
264 | # deallocating unnecessary GPU variables
265 | BHETA.gpudata.free()
266 |
267 | THETA = gpuarray.to_gpu((THETA))
268 |
269 | # reporting errors
270 | if (self.verbose and (i % self.step == 0)):
271 | print('\t\tIteration = %d, ||Z - C|| = %2.5e, ||1 - C^T 1|| = %2.5e' % (i, err1, err2))
272 | i += 1
273 |
274 | THETA = THETA.get()
275 |
276 | Err = [err1, err2]
277 | if(self.verbose):
278 | print ('\t\tTerminating ADMM at iteration %5.0f, \n ||Z - C|| = %2.5e, ||1 - C^T 1|| = %2.5e. \n' % (i, err1, err2))
279 |
280 | else:
281 | print '\t\tGPU not affine'
282 |
283 | # TODO: Because of some problem with linalg.inv version of scikit-cuda we fix it using np.linalg.inv of numpy
284 | A = np.linalg.inv(misc.add(OP1.get(), OP2.get()))
285 | A_GPU = gpuarray.to_gpu(A)
286 |
287 | while ( err1 > thr1 and i < self.max_iter):
288 |
289 | _lambda2 = gpuarray.to_gpu(lambda2)
290 |
291 | term_OP2 = THETA
292 | linalg.scale(mu2, term_OP2)
293 |
294 | term_OP2 = misc.subtract(term_OP2, _lambda2)
295 |
296 | OP2 = gpuarray.to_gpu(term_OP2.get())
297 |
298 |
299 | BHETA = linalg.dot(A_GPU, misc.add(OP1 , OP2))
300 |
301 | linalg.scale(1 / mu2, _lambda2)
302 | term_THETA = gpuarray.to_gpu(_lambda2.get())
303 |
304 | THETA = misc.add(BHETA,term_THETA)
305 | THETA = self.shrinkL1Lq(THETA.get(),1/mu2)
306 |
307 | THETA = THETA.astype('float32')
308 |
309 | # updating Lagrange multipliers
310 | term_lambda2 = misc.subtract(BHETA, gpuarray.to_gpu(THETA))
311 | linalg.scale(mu2,term_lambda2)
312 | term_lambda2 = gpuarray.to_gpu(term_lambda2.get())
313 | lambda2 = misc.add(lambda2, term_lambda2) # on GPU
314 |
315 | err1 = self.errorCoef(BHETA.get(), THETA)
316 |
317 | THETA = gpuarray.to_gpu((THETA))
318 |
319 | # reporting errors
320 | if (self.verbose and (i % self.step == 0)):
321 | print('\t\tIteration %5.0f, ||Z - C|| = %2.5e' % (i, err1))
322 | i += 1
323 |
324 |
325 | THETA = THETA.get()
326 | Err = [err1, err2]
327 | if(self.verbose):
328 | print ('\t\tTerminating ADMM at iteration %5.0f, \n ||Z - C|| = %2.5e' % (i, err1))
329 |
330 | else: #CPU version
331 |
332 | # defining penalty parameters e constraint to minimize, lambda and C matrix respectively
333 | THETA = np.zeros([self.num_columns, self.num_columns])
334 | lambda2 = np.zeros([self.num_columns, self.num_columns])
335 |
336 | P = self.data.T.dot(self.data)
337 | OP1 = np.multiply(P, mu1)
338 |
339 | if self.affine == True:
340 |
341 | # INITIALIZATION
342 | lambda3 = np.zeros(self.num_columns).T
343 |
344 | A = np.linalg.inv(np.multiply(mu1,P) + np.multiply(mu2, np.eye(self.num_columns, dtype=int)) + np.multiply(mu2, np.ones([self.num_columns,self.num_columns]) ))
345 |
346 | OP3 = np.multiply(mu2, np.ones([self.num_columns, self.num_columns]))
347 |
348 | while ( (err1 > thr1 or err2 > thr1) and i < self.max_iter):
349 |
350 | # updating Bheta
351 | OP2 = np.multiply(THETA - np.divide(lambda2,mu2), mu2)
352 | OP4 = np.matlib.repmat(lambda3, self.num_columns, 1)
353 | BHETA = A.dot(OP1 + OP2 + OP3 + OP4 )
354 |
355 | # updating C
356 | THETA = BHETA + np.divide(lambda2,mu2)
357 | THETA = self.shrinkL1Lq(THETA, 1/mu2)
358 |
359 | # updating Lagrange multipliers
360 | lambda2 = lambda2 + np.multiply(mu2,BHETA - THETA)
361 | lambda3 = lambda3 + np.multiply(mu2, np.ones([1,self.num_columns]) - np.sum(BHETA,axis=0))
362 |
363 | err1 = self.errorCoef(BHETA, THETA)
364 | err2 = self.errorCoef(np.sum(BHETA,axis=0), np.ones([1, self.num_columns]))
365 |
366 | # mu1 = min(mu1 * (1 + 10 ^ -5), 10 ^ 2 * mu1p);
367 | # mu2 = min(mu2 * (1 + 10 ^ -5), 10 ^ 2 * mu2p);
368 |
369 | # reporting errors
370 | if (self.verbose and (i % self.step == 0)):
371 | print('\t\tIteration = %d, ||Z - C|| = %2.5e, ||1 - C^T 1|| = %2.5e' % (i, err1, err2))
372 | i += 1
373 |
374 | Err = [err1, err2]
375 |
376 | if(self.verbose):
377 | print ('\t\tTerminating ADMM at iteration %5.0f, \n ||Z - C|| = %2.5e, ||1 - C^T 1|| = %2.5e. \n' % (i, err1,err2))
378 | else:
379 | print '\t\tCPU not affine'
380 |
381 | A = np.linalg.inv(OP1 + np.multiply(mu2, np.eye(self.num_columns, dtype=int)))
382 |
383 | while ( err1 > thr1 and i < self.max_iter):
384 |
385 | # updating Z
386 | OP2 = np.multiply(mu2, THETA) - lambda2
387 | BHETA = A.dot(OP1 + OP2)
388 |
389 | # updating C
390 | THETA = BHETA + np.divide(lambda2, mu2)
391 | THETA = self.shrinkL1Lq(THETA, 1/mu2)
392 |
393 | # updating Lagrange multipliers
394 | lambda2 = lambda2 + np.multiply(mu2,BHETA - THETA)
395 |
396 | # computing errors
397 | err1 = self.errorCoef(BHETA, THETA)
398 |
399 | # reporting errors
400 | if (self.verbose and (i % self.step == 0)):
401 | print('\t\tIteration %5.0f, ||Z - C|| = %2.5e' % (i, err1))
402 | i += 1
403 |
404 | Err = [err1, err2]
405 | if(self.verbose):
406 | print ('\t\tTerminating ADMM at iteration %5.0f, \n ||Z - C|| = %2.5e' % (i, err1))
407 |
408 | print("\t\t-ADMM- Time = %s seconds" % (time.time() - start_time))
409 |
410 | return THETA, Err
411 |
412 | def rmRep(self, sInd, thr):
413 |
414 | '''
415 | This function takes the data matrix and the indices of the representatives and removes the representatives
416 | that are too close to each other
417 |
418 | :param sInd: indices of the representatives
419 | :param thr: threshold for pruning the representatives, typically in [0.9,0.99]
420 | :return: representatives indices
421 | '''
422 |
423 | Ys = self.data[:, sInd]
424 |
425 | Ns = Ys.shape[1]
426 | d = np.zeros([Ns, Ns])
427 |
428 | # Computes a the distance matrix for all selected columns by the algorithm
429 | for i in xrange(0,Ns-1):
430 | for j in xrange(i+1,Ns):
431 | d[i,j] = np.linalg.norm(Ys[:,i] - Ys[:,j])
432 |
433 | d = d + d.T # define symmetric matrix
434 |
435 | dsorti = np.argsort(d,axis=0)[::-1]
436 | dsort = np.flipud(np.sort(d,axis=0))
437 |
438 | pind = np.arange(0,Ns)
439 | for i in xrange(0, Ns):
440 | if np.any(pind==i) == True:
441 | cum = 0
442 | t = -1
443 | while cum <= (thr * np.sum(dsort[:,i])):
444 | t += 1
445 | cum += dsort[t, i]
446 |
447 | pind = np.setdiff1d(pind, np.setdiff1d( dsorti[t:,i], np.arange(0,i+1), assume_unique=True), assume_unique=True)
448 |
449 | ind = sInd[pind]
450 |
451 | return ind
452 |
453 | def findRep(self,C, thr, norm):
454 |
455 | '''
456 | This function takes the coefficient matrix with few nonzero rows and computes the indices of the nonzero rows
457 | :param C: NxN coefficient matrix
458 | :param thr: threshold for selecting the nonzero rows of C, typically in [0.9,0.99]
459 | :param norm: value of norm used in the L1/Lq minimization program in {1,2,inf}
460 | :return: the representatives indices on the basis of the ascending norm of the row of C (larger is the norm of
461 | a generic row most representative it is)
462 | '''
463 |
464 | N = C.shape[0]
465 |
466 | r = np.zeros([1,N])
467 |
468 | for i in xrange(0, N):
469 |
470 | r[:,i] = np.linalg.norm(C[i,:],norm)
471 |
472 | nrmInd = np.argsort(r)[0][::-1] #descending order
473 | nrm = r[0,nrmInd]
474 |
475 | # pick norm indices basing on the thresholding of the 'cumulative norm's sum'
476 | cssInd = nrmInd[np.cumsum(nrm)/np.sum(nrm) < thr]
477 |
478 | return cssInd, nrmInd
479 |
480 |
481 | def admm(self):
482 |
483 | '''
484 | '''
485 | # initializing penalty parameters
486 | self.reg_params = [self.alpha, self.alpha]
487 |
488 | thrS = 0.99
489 | thrP = 0.95
490 |
491 | #subtract mean from sample
492 | if self.normalize == True:
493 | self.data = self.data - np.matlib.repmat(np.mean(self.data, axis=1), self.num_columns,1).T
494 |
495 | self.repInd = []
496 | if (self.PCA == True):
497 | print ('\t\tPerforming PCA...')
498 | pca = PCA(n_components = self.npc)
499 | self.data = pca.fit_transform(self.data)
500 | self.num_columns = self.data.shape[0]
501 | self.num_row = self.data.shape[0]
502 | self.num_columns = self.data.shape[1]
503 |
504 |
505 | self.C,_ = self.almLasso_mat_fun()
506 |
507 | self.sInd, self.nrmInd = self.findRep(self.C, thrS, self.norm_type)
508 |
509 | # custom procedure for removing redundant indices
510 | # self.repInd = self.rmRep(self.sInd, thrP)
511 | self.repInd = []
512 |
513 |
514 | return self.nrmInd, self.sInd, self.repInd, self.C
--------------------------------------------------------------------------------
/src/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DavideNardone/A-Sparse-Coding-Based-Approach-for-Class-Specific-Feature-Selection/32a4eba5ca437c8fe853c001b7a769ed716801d7/src/__init__.py
--------------------------------------------------------------------------------
/src/grid_search.py:
--------------------------------------------------------------------------------
1 | from __future__ import division
2 |
3 | from sklearn.svm import SVC
4 | from sklearn.model_selection import KFold
5 | from imblearn.over_sampling import SMOTE
6 | from sklearn.metrics import accuracy_score
7 | from sklearn.preprocessing import label_binarize
8 |
9 | import numpy as np
10 | np.set_printoptions(threshold=np.inf)
11 | import os
12 | import errno
13 | import random as rnd
14 | import itertools
15 | import sys
16 | sys.path.insert(0, './src')
17 |
18 | import Loader as lr
19 | import Classifier as clf
20 | import FeatureSelector as fs
21 |
22 |
23 |
24 | def tuning_analysis(fs, n_feats):
25 |
26 | min_var = 99999999
27 | min_hyp_par = {}
28 |
29 | for curr_fs_name,curr_fs in fs.iteritems():
30 |
31 | voting_matrix = {}
32 | _res_voting = {}
33 |
34 | combs = curr_fs.keys()
35 | combs.sort()
36 |
37 | for comb in combs:
38 | voting_matrix[comb] = np.zeros([1,n_feats])
39 | value = curr_fs[comb]
40 | # print ('hyper-params. comb. is %s'%comb)
41 | curr_var = np.var(value['ACC'])
42 | if curr_var < min_var:
43 | min_var = curr_var
44 | min_hyp_par = comb
45 |
46 | print 'Hyper-params. comb=%s has minimum variance of %s'%(min_hyp_par, min_var)
47 |
48 | combs = curr_fs.keys()
49 | combs.sort()
50 |
51 | # voting matrix dim: [num_comb, n_feats]
52 | # voting_matrix = np.zeros([len(combs), n_feats])
53 | print '\nApplying majority voting...'
54 | for j in xrange(0,n_feats):
55 | _competitors = {}
56 | for comb in combs:
57 | _competitors[comb] = curr_fs[comb]['ACC'][j]
58 |
59 | #getting the winner accuracy for all the combinations computed
60 | winners = [comb for m in [max(_competitors.values())] for comb, val in _competitors.iteritems() if val == m]
61 | for winner in winners:
62 | voting_matrix[winner][0][j] = 1
63 |
64 | #getting the parameter with largest voting
65 | for comb in combs:
66 | _res_voting[comb] = np.sum(voting_matrix[comb][0])
67 |
68 | _max = -9999999
69 | best_comb = {}
70 | BS = {}
71 | for comb in combs:
72 | if _res_voting[comb] > _max:
73 | _max = _res_voting[comb]
74 | best_comb = comb
75 | print ('Parameters set: '+ comb.__str__() +' got votes: ' + _res_voting[comb].__str__())
76 |
77 | BS[fs_name] = best_comb
78 |
79 | print ('\nBest parameters set found on development set for: ' + fs_name.__str__() + ' is: ' + best_comb.__str__())
80 |
81 | return BS
82 |
83 | def create_grid(params):
84 |
85 | comb = []
86 | for t in itertools.product(*params):
87 | comb.append(t)
88 |
89 | return comb
90 |
91 | def classificationDecisionRule(clf_score, cls, clf_name, target):
92 |
93 | n_classes = len(cls)
94 | DTS = {}
95 |
96 | for ccn in clf_name:
97 | hits = []
98 | res = []
99 | preds = []
100 |
101 | for i in xrange(0,n_classes):
102 | # print 'classifier e_' + str(cls[i])
103 |
104 | #ensemble scores on class 'C' for the testing set
105 | e_th = clf_score['C'+str(cls[i])]['accuracy'][ccn]
106 | res.append(e_th)
107 |
108 | hits.append((e_th == cls[i]).astype('int').flatten())
109 |
110 | # ensemble scores and hits for the testing set
111 | ensemble_res = np.vstack(res)
112 | ensemble_hits = np.vstack(hits)
113 |
114 | # Applying decision rules
115 | for i in xrange(0, ensemble_hits.shape[1]): # number of sample
116 | hits = ensemble_hits[:,i] #it has a 1 in a position whether the classifier e_i has predicted the class w_i for the i-th pattern
117 | ens_preds = ensemble_res[:,i] #it's simply the predictions of all the trained classifier for the i-th pattern
118 | cond = np.sum(hits) #count the number of true positive for the i-th pattern
119 |
120 | if cond == 1: #rule 1
121 | pred = cls[np.where(hits==1)[0].squeeze()] #retrieve the cls for the 'only' true positive
122 | preds.append(pred)
123 |
124 | elif cond == 0 or cond > 1: # rule 1-2 (tie)
125 |
126 | # we find the majority votes (frequency) among all classifier (e.g., ) [[4 2][5 1][6 2][7 2]]
127 | unique, counts = np.unique(ens_preds, return_counts=True)
128 | maj_rule = np.asarray((unique, counts)).T
129 |
130 | # we find the 'majority' index, then its class
131 | ind_max = np.argmax(maj_rule[:, 1])
132 | pred = maj_rule[ind_max, 0]
133 | max = maj_rule[ind_max, 1]
134 |
135 | # we look for a 'tie of the tie', then we look for the majority class among all the tied classes
136 | tied_cls = np.where(maj_rule[:, 1] == max)[0]
137 | if ( len(np.where(maj_rule[:, 1] == max)[0]) ) > 1: #tie of the tie
138 | pred = maj_rule[tied_cls,0]
139 |
140 | # pick one tied cls randomly
141 | pred = pred[rnd.randint(0,len(pred)-1)]
142 | preds.append(pred)
143 |
144 | else:
145 | preds.append(pred)
146 |
147 | #compute accuracy
148 | test_score = accuracy_score(target, preds)
149 |
150 | dic_test_score = {
151 | ccn: test_score
152 | }
153 |
154 | DTS.update(dic_test_score)
155 |
156 | return DTS
157 |
158 | def checkFolder(root, path_output):
159 |
160 | #folders to generate recursively
161 | path = root+'/'+path_output
162 |
163 | try:
164 | os.makedirs(path)
165 | except OSError as exc: # Python >2.5
166 | if exc.errno == errno.EEXIST and os.path.isdir(path):
167 | pass
168 | else:
169 | raise
170 |
171 |
172 |
173 | if __name__ == '__main__':
174 |
175 | ''' LOADING ANY DATASET '''
176 | dataset_dir = '/dataset'
177 | dataset_type = '/BIOLOGICAL'
178 | dataset_name = '/LUNG_DISCRETE'
179 |
180 | resample = True
181 |
182 | path_data_folder = dataset_dir + dataset_type + dataset_name
183 | path_data_file = path_data_folder + dataset_name
184 |
185 | variables = ['X', 'Y']
186 | # NB: If you get an error such as: 'Please use HDF reader for matlab v7.3 files',please change the 'format variable' to 'matlab_v73'
187 | D = lr.Loader(file_path=path_data_file,
188 | format='matlab',
189 | variables=variables,
190 | name=dataset_name[1:]
191 | ).getVariables(variables=variables)
192 |
193 | dataset = lr.Dataset(D['X'], D['Y'])
194 |
195 | # check if the data are already standardized, if not standardize it
196 | dataset.standardizeDataset()
197 |
198 | n_classes = dataset.classes.shape[0]
199 | cls = np.unique(dataset.classes)
200 |
201 | num_min_cls = 9999999
202 | if resample == True:
203 |
204 | print ('Dataset before resampling %s w/ size: %s and number of classes: %s---> %s' % (
205 | dataset_name[1:], dataset.data.shape, n_classes, cls))
206 |
207 | # discriminating classes the whole dataset
208 | dataset_train = lr.Dataset(dataset.data, dataset.target)
209 | dataset_train.separateSampleClass()
210 | data, target = dataset_train.getSampleClass()
211 |
212 | for i in xrange(0, n_classes):
213 | print ('# sample for class C' + str(i + 1) + ':', data[i].shape)
214 | if data[i].shape[0] < num_min_cls:
215 | num_min_cls = data[i].shape[0]
216 |
217 | resample = '/BALANCED'
218 | print 'Re-sampling dataset...'
219 | dataset.data, dataset.target = SMOTE(kind='regular', k_neighbors=num_min_cls-1).fit_sample(dataset.data, dataset.target)
220 | else:
221 | resample = '/UNBALANCED'
222 |
223 | # shuffling data
224 | dataset.shufflingDataset()
225 |
226 | n_classes = dataset.classes.shape[0]
227 | cls = np.unique(dataset.classes)
228 |
229 | print ('Dataset %s w/ size: %s and number of classes: %s---> %s' %(dataset_name[1:], dataset.data.shape, n_classes, cls))
230 |
231 | # discriminating classes the whole dataset
232 | dataset_train = lr.Dataset(dataset.data, dataset.target)
233 | dataset_train.separateSampleClass()
234 | data, target = dataset_train.getSampleClass()
235 |
236 | for i in xrange(0, n_classes):
237 | print ('# sample for class C' + str(i + 1) + ':', data[i].shape)
238 |
239 |
240 | ################################### TUNING PARAMS ###################################
241 |
242 | FS = {}
243 |
244 | #CLASSIFIERS
245 | clf_name = [
246 | "SVM"
247 | # "Decision Tree",
248 | # "KNN"
249 | ]
250 | model = [
251 | SVC(kernel="linear")
252 | # DecisionTreeClassifier(max_depth=5),
253 | # KNeighborsClassifier(n_neighbors=1),
254 | ]
255 |
256 | max_num_feat = 300
257 | step = 1
258 |
259 | # initializing feature selector parameters
260 | params = {
261 | # the smaller alpha the sparser C matrix (fewer representatives)
262 | 'SMBA':
263 | {
264 | 'alpha': 5, #typically alpha in [2,50]
265 | 'norm_type': 1,
266 | 'max_iter': 3000,
267 | 'thr': [10**-8],
268 | 'type_indices': 'nrmInd',
269 | 'normalize': False,
270 | 'GPU': False,
271 | 'device': 0,
272 | 'PCA': False,
273 | 'verbose': False,
274 | 'step': 1,
275 | 'affine': False,
276 | },
277 | 'RFS':
278 | {
279 | 'gamma': 0
280 | },
281 | 'll_l21':
282 | {
283 | 'z': 0
284 | },
285 | 'ls_l21':
286 | {
287 | 'z': 0
288 | },
289 | 'Relief':
290 | {
291 | 'k': 0
292 | },
293 | 'MRMR':
294 | {
295 |
296 | 'num_feats': max_num_feat
297 | },
298 | 'MI':
299 | {
300 | 'n_neighbors': 0
301 | },
302 | # the bigger is alpha the sparser is the C matrix (fewer representatives)
303 | 'EN':
304 | {
305 | 'alpha': 1, # default value is 1
306 | },
307 | # the bigger is alpha the sparser is the C matrix (fewer representatives)
308 | 'LASSO':
309 | {
310 | 'alpha': 1 # default value is 1
311 | }
312 | }
313 |
314 |
315 | slb_fs = {
316 |
317 | 'LASSO': fs.FeatureSelector(name='LASSO', tp='SLB', params=params['LASSO']),
318 | 'EN': fs.FeatureSelector(name='EN', tp='SLB', params=params['EN']),
319 | 'SMBA': fs.FeatureSelector(name='SMBA', tp='SLB', params=params['SMBA']),
320 | 'RFS': fs.FeatureSelector(name='RFS', tp='SLB',params=params['RFS']),
321 | 'll_l21': fs.FeatureSelector(name='ll_l21', tp='SLB',params=params['ll_l21']), #injection not working
322 | 'ls_l21': fs.FeatureSelector(name='ls_l21', tp='SLB',params=params['ls_l21']),
323 |
324 | 'Relief': fs.FeatureSelector(name='Relief', tp='filter', params=params['Relief']),
325 | 'MRMR': fs.FeatureSelector(name='MRMR', tp='ITB', params=params['MRMR']),
326 | 'MI': fs.FeatureSelector(name='MI', tp='filter', params=params['MI'])
327 | }
328 |
329 | tuned_parameters = {
330 |
331 | 'LASSO': {'alpha': [1e-15, 1e-10, 1e-8, 1e-5,1e-4, 1e-3,1e-2, 1, 5, 10]},
332 | 'EN': {'alpha': [1e-15, 1e-10, 1e-8, 1e-5,1e-4, 1e-3,1e-2, 1, 5, 10]},
333 | 'SMBA': {'alpha': [5, 10, 15, 20, 25, 30, 35, 40, 45, 50]},
334 | 'RFS': {'gamma': [1e-15, 1e-10, 1e-8, 1e-5,1e-4, 1e-3,1e-2, 1, 5, 10]},
335 | 'll_l21': {'z': [1e-15, 1e-10, 1e-8, 1e-5,1e-4, 1e-3,1e-2, 1, 5, 10]},
336 | 'ls_l21': {'z': [1e-15, 1e-10, 1e-8, 1e-5,1e-4, 1e-3,1e-2, 1, 5, 10]},
337 | 'Relief': {'k': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]},
338 | 'MRMR': {'num_feats': [max_num_feat]},
339 | 'MI': {'n_neighbors': [1, 2, 3, 5, 7, 10]}
340 | }
341 |
342 |
343 | if max_num_feat > dataset.data.shape[1]:
344 | max_num_feat = dataset.data.shape[1]
345 |
346 | print ('\nMax number of features to use: ', max_num_feat)
347 |
348 | #setting the parameters for k-fold CV
349 | k_fold = 5
350 |
351 | X = dataset.data
352 | y = dataset.target
353 | kf = KFold(n_splits=k_fold)
354 |
355 | tuning_type = 'CSFS-PAPER'
356 |
357 | ################################### TFS ###################################
358 |
359 | if tuning_type == 'TFS':
360 |
361 | res_path_output = '/TFS/RESULTS/'
362 |
363 | # tuning process on all feature selector
364 | for fs_name, fs_model in slb_fs.iteritems():
365 |
366 | # check whether the 'curr_res_fs_fold' directory exists, otherwise create it
367 | # curr_res_output_fold = path_data_folder + '/' + res_path_output + '/TUNING/' + resample + '/'+ fs_name
368 | # checkFolder(path_data_folder, res_path_output + '/TUNING/' + resample + '/'+ fs_name)
369 |
370 | print '\nTuning hyper-parameters on ' +fs_name.__str__()+ ' for accuracy by means of k-fold CV...'
371 | FS.update({fs_name: {}})
372 | comb = []
373 | params_name = []
374 |
375 | for name, tun_par in tuned_parameters[fs_name].iteritems():
376 | comb.append(tun_par)
377 | params_name.append(name)
378 |
379 | #create all combinations parameters
380 | combs = create_grid(comb)
381 |
382 | n_iter = 1
383 | #loop on each combination
384 | for comb in combs:
385 |
386 | FS[fs_name].update({comb: {}})
387 | CV = np.ones([k_fold, max_num_feat])*0
388 | avg_scores = []
389 | std_scores = []
390 |
391 | print ('\tComputing '+n_iter.__str__() +'-th combination...')
392 |
393 | # set i-th parameters combination parameters for the current feature selector
394 | fs_model.setParams(comb,params_name,params[fs_name])
395 |
396 | cc_fold = 0
397 | for train_index, test_index in kf.split(X):
398 |
399 | kth_scores = []
400 |
401 | X_train, X_test = X[train_index, :], X[test_index, :]
402 | y_train, y_test = y[train_index], y[test_index]
403 |
404 | idx = fs_model.fit(X_train, y_train)
405 | # print idx
406 | # idx = list(range(1, 20))
407 |
408 | #classification step on the first max_num_feat
409 | for n_rep in xrange(step, max_num_feat + step, step): # first n_rep indices
410 |
411 | X_train_fs = X_train[:, idx[0:n_rep]]
412 | X_test_fs = X_test[:, idx[0:n_rep]]
413 |
414 | _clf = clf.Classifier(names=clf_name, classifiers=model)
415 | DTS = _clf.train_and_classify(X_train_fs, y_train, X_test_fs, y_test)
416 |
417 | _score = DTS['SVM']
418 | kth_scores.append(_score) # it contains the max_num_feat scores for the k-th CV fold
419 |
420 | CV[cc_fold,:] = kth_scores
421 | cc_fold += 1
422 |
423 | avg_scores = np.mean(CV, axis=0)
424 | std_scores = np.std(CV, axis=0)
425 |
426 | FS[fs_name][comb]['ACC'] = avg_scores
427 | FS[fs_name][comb]['STD'] = std_scores
428 |
429 | n_iter +=1
430 |
431 | #tuning analysis
432 | print 'Applying tuning analysis...'
433 | num_feat = 10
434 | best_params = tuning_analysis(FS,num_feat)
435 |
436 | print 'Saving results...\n'
437 | # curr_res_output_fold = path_data_folder + '/' + res_path_output + '/TUNING/' + resample
438 | # checkFolder(path_data_folder, res_path_output + '/TUNING/' + resample)
439 | #
440 | # with open(curr_res_output_fold + '/' + 'best_params.pickle', 'wb') as handle:
441 | # pickle.dump(results, handle, protocol=pickle.HIGHEST_PROTOCOL)
442 |
443 | elif tuning_type == 'CSFS':
444 |
445 | res_path_output = '/CSFS/RESULTS/'
446 |
447 | # tuning process on all feature selector
448 | for fs_name, fs_model in slb_fs.iteritems():
449 |
450 | # check whether the 'curr_res_fs_fold' directory exists, otherwise create it
451 | # curr_res_output_fold = path_data_folder + '/' + res_path_output + '/TUNING/' + resample + '/'+ fs_name
452 | # checkFolder(path_data_folder, res_path_output + '/TUNING/' + resample + '/'+ fs_name)
453 |
454 | print '\nTuning hyper-parameters on ' +fs_name.__str__()+ ' for accuracy by means of k-fold CV...'
455 | FS.update({fs_name: {}})
456 | comb = []
457 | params_name = []
458 |
459 | for name, tun_par in tuned_parameters[fs_name].iteritems():
460 | comb.append(tun_par)
461 | params_name.append(name)
462 |
463 | #create all combinations parameters
464 | combs = create_grid(comb)
465 |
466 | n_iter = 1
467 | #loop on each combination
468 | for comb in combs:
469 |
470 | FS[fs_name].update({comb: {}})
471 | CV = np.ones([k_fold, max_num_feat])*0
472 | avg_scores = []
473 | std_scores = []
474 |
475 | print ('\tComputing '+n_iter.__str__() +'-th combination...')
476 |
477 | # set i-th parameters combination parameters for the current feature selector
478 | fs_model.setParams(comb,params_name,params[fs_name])
479 |
480 | cc_fold = 0
481 | # k-fold CV
482 | for train_index, test_index in kf.split(X):
483 |
484 | kth_scores = []
485 |
486 | csfs_res = {}
487 | cls_res = {}
488 | k_computing_time = 0
489 |
490 | for i in xrange(0, n_classes):
491 | cls_res = {
492 |
493 | 'C' + str(cls[i]): {}
494 | }
495 | csfs_res.update(cls_res)
496 |
497 | X_train_kth, X_test_kth = X[train_index], X[test_index]
498 | y_train, y_test = y[train_index], y[test_index]
499 |
500 | ''' For the training data in each class we find the representative features and use them as a best subset feature
501 | (in representing each class sample) to perform classification
502 | '''
503 |
504 | # check whether the 'curr_res_fs_fold' directory exists, otherwise create it
505 | # curr_res_fs_fold = path_data_folder + '/' + fs_path_output + '/' + fs_name + resample
506 | # checkFolder(path_data_folder, fs_path_output + '/' + fs_name + resample)
507 |
508 | # discriminating classes for the k-th fold of the training set
509 | data_train = lr.Dataset(X_train_kth, y_train)
510 | data_train.separateSampleClass()
511 | ktrain_data, ktrain_target = data_train.getSampleClass()
512 |
513 | for i in xrange(0, n_classes):
514 |
515 | idx = fs_model.fit(ktrain_data[i], ktrain_target[i])
516 |
517 | csfs_res['C' + str(cls[i])]['idx'] = idx
518 | csfs_res['C' + str(cls[i])]['params'] = params[fs_name]
519 |
520 | # learning a classifier (ccn) for each subset of 'n_rep' feature
521 | for j in xrange(0, max_num_feat):
522 | n_rep = j + 1 # first n_rep indices
523 |
524 | for i in xrange(0, n_classes):
525 | # get subset of feature from the i-th class
526 | idx = csfs_res['C' + str(cls[i])]['idx']
527 |
528 | X_train_fs = X_train_kth[:, idx[0:n_rep]]
529 |
530 | _clf = clf.Classifier(names=clf_name, classifiers=model)
531 | _clf.train(X_train_fs, y_train)
532 |
533 | csfs_res['C' + str(cls[i])]['accuracy'] = _clf.classify(X_test_kth[:, idx[0:n_rep]], y_test)
534 |
535 | DTS = classificationDecisionRule(csfs_res, cls, clf_name, y_test)
536 |
537 | _score = DTS['SVM']
538 | kth_scores.append(_score)
539 | # print kth_scores
540 |
541 | CV[cc_fold,:] = kth_scores
542 | cc_fold += 1
543 |
544 | avg_scores = np.mean(CV, axis=0)
545 | std_scores = np.std(CV, axis=0)
546 |
547 | # print avg_scores
548 |
549 | FS[fs_name][comb]['ACC'] = avg_scores
550 | FS[fs_name][comb]['STD'] = std_scores
551 |
552 | n_iter +=1
553 |
554 | #tuning analysis
555 | print 'Applying tuning analysis...'
556 | num_feat = 10
557 | best_params = tuning_analysis(FS,num_feat)
558 |
559 | print best_params
560 |
561 | # print 'Saving results...\n'
562 | # curr_res_output_fold = path_data_folder + '/' + res_path_output + '/TUNING/' + resample
563 | # checkFolder(path_data_folder, res_path_output + '/TUNING/' + resample)
564 | #
565 | # with open(curr_res_output_fold + '/' + 'best_params.pickle', 'wb') as handle:
566 | # pickle.dump(results, handle, protocol=pickle.HIGHEST_PROTOCOL)
567 |
568 | elif tuning_type == 'CSFS-PAPER':
569 |
570 | res_path_output = '/CSFS_PAPER/RESULTS/'
571 |
572 | # tuning process on all feature selector
573 | for fs_name, fs_model in slb_fs.iteritems():
574 |
575 | # check whether the 'curr_res_fs_fold' directory exists, otherwise create it
576 | # curr_res_output_fold = path_data_folder + '/' + res_path_output + '/TUNING/' + resample + '/'+ fs_name
577 | # checkFolder(path_data_folder, res_path_output + '/TUNING/' + resample + '/'+ fs_name)
578 |
579 | print '\nTuning hyper-parameters on ' +fs_name.__str__()+ ' for accuracy by means of k-fold CV...'
580 | FS.update({fs_name: {}})
581 | comb = []
582 | params_name = []
583 |
584 | for name, tun_par in tuned_parameters[fs_name].iteritems():
585 | comb.append(tun_par)
586 | params_name.append(name)
587 |
588 | #create all combinations parameters
589 | combs = create_grid(comb)
590 |
591 | n_iter = 1
592 | #loop on each combination
593 | for comb in combs:
594 |
595 | FS[fs_name].update({comb: {}})
596 | CV = np.ones([k_fold, max_num_feat])*0
597 | avg_scores = []
598 | std_scores = []
599 |
600 | print ('\tComputing '+n_iter.__str__() +'-th combination...')
601 |
602 | # set i-th parameters combination parameters for the current feature selector
603 | fs_model.setParams(comb,params_name,params[fs_name])
604 |
605 | cc_fold = 0
606 | # k-fold CV
607 | for train_index, test_index in kf.split(X):
608 |
609 | kth_scores = []
610 |
611 | csfs_res = {}
612 | cls_res = {}
613 | k_computing_time = 0
614 |
615 | for i in xrange(0, n_classes):
616 | cls_res = {
617 |
618 | 'C' + str(cls[i]): {}
619 | }
620 | csfs_res.update(cls_res)
621 |
622 | X_train_kth, X_test_kth = X[train_index], X[test_index]
623 | y_train, y_test = y[train_index], y[test_index]
624 |
625 | # CLASS BINARIZATION
626 | lb = label_binarize(cls, classes=y_train)
627 |
628 | for i in xrange(0, n_classes):
629 | num_min_cls = 9999999
630 | k_neighbors = 5
631 |
632 | # discriminating classes the whole dataset
633 | dataset_train = lr.Dataset(X_train_kth, lb[i])
634 | dataset_train.separateSampleClass()
635 | data, target = dataset_train.getSampleClass()
636 |
637 | for j in xrange(0, 2):
638 | if data[j].shape[0] < num_min_cls:
639 | num_min_cls = data[j].shape[0]
640 |
641 | if num_min_cls == 1:
642 | num_min_cls += 1
643 |
644 | # CLASS BALANCING
645 | data_cls, target_cls = SMOTE(kind='regular',k_neighbors=num_min_cls-1).fit_sample(X_train_kth, lb[i])
646 |
647 | # Performing feature selection on each class
648 |
649 | idx = fs_model.fit(data_cls, target_cls)
650 |
651 | csfs_res['C' + str(cls[i])]['idx'] = idx
652 | csfs_res['C' + str(cls[i])]['params'] = params[fs_name]
653 |
654 | # Classification
655 | ens_class = {}
656 | # learning a classifier (ccn) for each subset of 'n_rep' feature
657 | for n_rep in xrange(step, max_num_feat + step, step): # first n_rep indices
658 |
659 | for i in xrange(0, n_classes):
660 | # get subset of feature from the i-th class
661 | idx = csfs_res['C' + str(cls[i])]['idx']
662 |
663 | X_train_fs = X_train_kth[:, idx[0:n_rep]]
664 |
665 | _clf = clf.Classifier(names=clf_name, classifiers=model)
666 | _clf.train(X_train_fs, y_train)
667 |
668 | csfs_res['C' + str(cls[i])]['accuracy'] = _clf.classify(X_test_kth[:, idx[0:n_rep]], y_test)
669 |
670 | DTS = classificationDecisionRule(csfs_res, cls, clf_name, y_test)
671 |
672 | _score = DTS['SVM']
673 | kth_scores.append(_score)
674 |
675 | CV[cc_fold, :] = kth_scores
676 | cc_fold += 1
677 |
678 | avg_scores = np.mean(CV, axis=0)
679 | std_scores = np.std(CV, axis=0)
680 |
681 | FS[fs_name][comb]['ACC'] = avg_scores
682 | FS[fs_name][comb]['STD'] = std_scores
683 |
684 | n_iter += 1
685 |
686 | #tuning analysis
687 | print 'Applying tuning analysis...'
688 | num_feat = 10
689 | best_params = tuning_analysis(FS,num_feat)
690 |
691 | print best_params
692 |
693 | # print 'Saving results...\n'
694 | # curr_res_output_fold = path_data_folder + '/' + res_path_output + '/TUNING/' + resample
695 | # checkFolder(path_data_folder, res_path_output + '/TUNING/' + resample)
696 | #
697 | # with open(curr_res_output_fold + '/' + 'best_params.pickle', 'wb') as handle:
698 | # pickle.dump(results, handle, protocol=pickle.HIGHEST_PROTOCOL)
699 |
700 | else:
701 | print 'Wrong tuning type selected'
702 |
--------------------------------------------------------------------------------
/src/skfeature/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DavideNardone/A-Sparse-Coding-Based-Approach-for-Class-Specific-Feature-Selection/32a4eba5ca437c8fe853c001b7a769ed716801d7/src/skfeature/__init__.py
--------------------------------------------------------------------------------
/src/skfeature/function/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DavideNardone/A-Sparse-Coding-Based-Approach-for-Class-Specific-Feature-Selection/32a4eba5ca437c8fe853c001b7a769ed716801d7/src/skfeature/function/__init__.py
--------------------------------------------------------------------------------
/src/skfeature/function/information_theoretical_based/CIFE.py:
--------------------------------------------------------------------------------
1 | import LCSI
2 |
3 |
4 | def cife(X, y, **kwargs):
5 | """
6 | This function implements the CIFE feature selection
7 |
8 | Input
9 | -----
10 | X: {numpy array}, shape (n_samples, n_features)
11 | input data, guaranteed to be discrete
12 | y: {numpy array}, shape (n_samples,)
13 | input class labels
14 | kwargs: {dictionary}
15 | n_selected_features: {int}
16 | number of features to select
17 |
18 | Output
19 | ------
20 | F: {numpy array}, shape (n_features,)
21 | index of selected features, F[1] is the most important feature
22 |
23 | Reference
24 | ---------
25 | Brown, Gavin et al. "Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection." JMLR 2012.
26 | """
27 |
28 | if 'n_selected_features' in kwargs.keys():
29 | n_selected_features = kwargs['n_selected_features']
30 | F = LCSI.lcsi(X, y, beta=1, gamma=1, n_selected_features=n_selected_features)
31 | else:
32 | F = LCSI.lcsi(X, y, beta=1, gamma=1)
33 | return F
34 |
--------------------------------------------------------------------------------
/src/skfeature/function/information_theoretical_based/CMIM.py:
--------------------------------------------------------------------------------
1 | from skfeature.utility.entropy_estimators import *
2 |
3 |
4 | def cmim(X, y, **kwargs):
5 | """
6 | This function implements the CMIM feature selection.
7 | The scoring criteria is calculated based on the formula j_cmim=I(f;y)-max_j(I(fj;f)-I(fj;f|y))
8 |
9 | Input
10 | -----
11 | X: {numpy array}, shape (n_samples, n_features)
12 | Input data, guaranteed to be a discrete numpy array
13 | y: {numpy array}, shape (n_samples,)
14 | guaranteed to be a numpy array
15 | kwargs: {dictionary}
16 | n_selected_features: {int}
17 | number of features to select
18 |
19 | Output
20 | ------
21 | F: {numpy array}, shape (n_features,)
22 | index of selected features, F(1) is the most important feature
23 |
24 | Reference
25 | ---------
26 | Brown, Gavin et al. "Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection." JMLR 2012.
27 | """
28 |
29 | n_samples, n_features = X.shape
30 | # index of selected features, initialized to be empty
31 | F = []
32 | # indicate whether the user specifies the number of features
33 | is_n_selected_features_specified = False
34 |
35 | if 'n_selected_features' in kwargs.keys():
36 | n_selected_features = kwargs['n_selected_features']
37 | is_n_selected_features_specified = True
38 |
39 | # t1 stores I(f;y) for each feature f
40 | t1 = np.zeros(n_features)
41 |
42 | # max stores max(I(fj;f)-I(fj;f|y)) for each feature f
43 | # we assign an extreme small value to max[i] ito make it is smaller than possible value of max(I(fj;f)-I(fj;f|y))
44 | max = -10000000*np.ones(n_features)
45 | for i in range(n_features):
46 | f = X[:, i]
47 | t1[i] = midd(f, y)
48 |
49 | # make sure that j_cmi is positive at the very beginning
50 | j_cmim = 1
51 |
52 | while True:
53 | if len(F) == 0:
54 | # select the feature whose mutual information is the largest
55 | idx = np.argmax(t1)
56 | F.append(idx)
57 | f_select = X[:, idx]
58 |
59 | if is_n_selected_features_specified is True:
60 | if len(F) == n_selected_features:
61 | break
62 | if is_n_selected_features_specified is not True:
63 | if j_cmim <= 0:
64 | break
65 |
66 | # we assign an extreme small value to j_cmim to ensure it is smaller than all possible values of j_cmim
67 | j_cmim = -1000000000000
68 | for i in range(n_features):
69 | if i not in F:
70 | f = X[:, i]
71 | t2 = midd(f_select, f)
72 | t3 = cmidd(f_select, f, y)
73 | if t2-t3 > max[i]:
74 | max[i] = t2-t3
75 | # calculate j_cmim for feature i (not in F)
76 | t = t1[i] - max[i]
77 | # record the largest j_cmim and the corresponding feature index
78 | if t > j_cmim:
79 | j_cmim = t
80 | idx = i
81 | F.append(idx)
82 | f_select = X[:, idx]
83 |
84 | return np.array(F)
--------------------------------------------------------------------------------
/src/skfeature/function/information_theoretical_based/DISR.py:
--------------------------------------------------------------------------------
1 | from skfeature.utility.entropy_estimators import *
2 | from skfeature.utility.mutual_information import conditional_entropy
3 |
4 |
5 | def disr(X, y, **kwargs):
6 | """
7 | This function implement the DISR feature selection.
8 | The scoring criteria is calculated based on the formula j_disr=sum_j(I(f,fj;y)/H(f,fj,y))
9 |
10 | Input
11 | -----
12 | X: {numpy array}, shape (n_samples, n_features)
13 | input data, guaranteed to be a discrete data matrix
14 | y: {numpy array}, shape (n_samples,)
15 | input class labels
16 |
17 | kwargs: {dictionary}
18 | n_selected_features: {int}
19 | number of features to select
20 |
21 | Output
22 | ------
23 | F: {numpy array}, shape (n_features, )
24 | index of selected features, F[1] is the most important feature
25 |
26 | Reference
27 | ---------
28 | Brown, Gavin et al. "Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection." JMLR 2012.
29 | """
30 |
31 | n_samples, n_features = X.shape
32 | # index of selected features, initialized to be empty
33 | F = []
34 | # indicate whether the user specifies the number of features
35 | is_n_selected_features_specified = False
36 |
37 | if 'n_selected_features' in kwargs.keys():
38 | n_selected_features = kwargs['n_selected_features']
39 | is_n_selected_features_specified = True
40 |
41 | # sum stores sum_j(I(f,fj;y)/H(f,fj,y)) for each feature f
42 | sum = np.zeros(n_features)
43 |
44 | # make sure that j_cmi is positive at the very beginning
45 | j_disr = 1
46 |
47 | while True:
48 | if len(F) == 0:
49 | # t1 stores I(f;y) for each feature f
50 | t1 = np.zeros(n_features)
51 | for i in range(n_features):
52 | f = X[:, i]
53 | t1[i] = midd(f, y)
54 | # select the feature whose mutual information is the largest
55 | idx = np.argmax(t1)
56 | F.append(idx)
57 | f_select = X[:, idx]
58 |
59 | if is_n_selected_features_specified is True:
60 | if len(F) == n_selected_features:
61 | break
62 | if is_n_selected_features_specified is not True:
63 | if j_disr <= 0:
64 | break
65 |
66 | # we assign an extreme small value to j_disr to ensure that it is smaller than all possible value of j_disr
67 | j_disr = -1000000000000
68 | for i in range(n_features):
69 | if i not in F:
70 | f = X[:, i]
71 | t1 = midd(f_select, y) + cmidd(f, y, f_select)
72 | t2 = entropyd(f) + conditional_entropy(f_select, f) + (conditional_entropy(y, f_select) - cmidd(y, f, f_select))
73 | sum[i] += np.true_divide(t1, t2)
74 | # record the largest j_disr and the corresponding feature index
75 | if sum[i] > j_disr:
76 | j_disr = sum[i]
77 | idx = i
78 | F.append(idx)
79 | f_select = X[:, idx]
80 |
81 | return np.array(F)
82 |
83 |
--------------------------------------------------------------------------------
/src/skfeature/function/information_theoretical_based/FCBF.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from skfeature.utility.mutual_information import su_calculation
3 |
4 |
5 | def fcbf(X, y, **kwargs):
6 | """
7 | This function implements Fast Correlation Based Filter algorithm
8 |
9 | Input
10 | -----
11 | X: {numpy array}, shape (n_samples, n_features)
12 | input data, guaranteed to be discrete
13 | y: {numpy array}, shape (n_samples,)
14 | input class labels
15 | kwargs: {dictionary}
16 | delta: {float}
17 | delta is a threshold parameter, the default value of delta is 0
18 |
19 | Output
20 | ------
21 | F: {numpy array}, shape (n_features,)
22 | index of selected features, F[1] is the most important feature
23 |
24 | Reference
25 | ---------
26 | Yu, Lei and Liu, Huan. "Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution." ICML 2003.
27 | """
28 |
29 | n_samples, n_features = X.shape
30 | if 'delta' in kwargs.keys():
31 | delta = kwargs['delta']
32 | else:
33 | # the default value of delta is 0
34 | delta = 0
35 |
36 | # t1[:,0] stores index of features, t1[:,1] stores symmetrical uncertainty of features
37 | t1 = np.zeros((n_features, 2))
38 | for i in range(n_features):
39 | f = X[:, i]
40 | t1[i, 0] = i
41 | t1[i, 1] = su_calculation(f, y)
42 | s_list = t1[t1[:, 1] > delta, :]
43 | # index of selected features, initialized to be empty
44 | F = []
45 | while len(s_list) != 0:
46 | # select the largest su inside s_list
47 | idx = np.argmax(s_list[:, 1])
48 | # record the index of the feature with the largest su
49 | fp = X[:, s_list[idx, 0]]
50 | np.delete(s_list, idx, 0)
51 | F.append(s_list[idx, 0])
52 | for i in s_list[:, 0]:
53 | fi = X[:, i]
54 | if su_calculation(fp, fi) >= t1[i, 1]:
55 | # construct the mask for feature whose su is larger than su(fp,y)
56 | idx = s_list[:, 0] != i
57 | idx = np.array([idx, idx])
58 | idx = np.transpose(idx)
59 | # delete the feature by using the mask
60 | s_list = s_list[idx]
61 | length = len(s_list)/2
62 | s_list = s_list.reshape((length, 2))
63 | return np.array(F, dtype=int)
--------------------------------------------------------------------------------
/src/skfeature/function/information_theoretical_based/ICAP.py:
--------------------------------------------------------------------------------
1 | from skfeature.utility.entropy_estimators import *
2 |
3 |
4 | def icap(X, y, **kwargs):
5 | """
6 | This function implements the ICAP feature selection.
7 | The scoring criteria is calculated based on the formula j_icap = I(f;y) - max_j(0,(I(fj;f)-I(fj;f|y)))
8 |
9 | Input
10 | -----
11 | X: {numpy array}, shape (n_samples, n_features)
12 | input data, guaranteed to be a discrete data matrix
13 | y: {numpy array}, shape (n_samples,)
14 | input class labels
15 | kwargs: {dictionary}
16 | n_selected_features: {int}
17 | number of features to select
18 |
19 | Output
20 | ------
21 | F: {numpy array}, shape (n_features,)
22 | index of selected features, F(1) is the most important feature
23 | """
24 | n_samples, n_features = X.shape
25 | # index of selected features, initialized to be empty
26 | F = []
27 | # indicate whether the user specifies the number of features
28 | is_n_selected_features_specified = False
29 | if 'n_selected_features' in kwargs.keys():
30 | n_selected_features = kwargs['n_selected_features']
31 | is_n_selected_features_specified = True
32 |
33 | # t1 contains I(f;y) for each feature f
34 | t1 = np.zeros(n_features)
35 | # max contains max_j(0,(I(fj;f)-I(fj;f|y))) for each feature f
36 | max = np.zeros(n_features)
37 | for i in range(n_features):
38 | f = X[:, i]
39 | t1[i] = midd(f, y)
40 |
41 | # make sure that j_cmi is positive at the very beginning
42 | j_icap = 1
43 |
44 | while True:
45 | if len(F) == 0:
46 | # select the feature whose mutual information is the largest
47 | idx = np.argmax(t1)
48 | F.append(idx)
49 | f_select = X[:, idx]
50 |
51 | if is_n_selected_features_specified is True:
52 | if len(F) == n_selected_features:
53 | break
54 | if is_n_selected_features_specified is not True:
55 | if j_icap <= 0:
56 | break
57 |
58 | # we assign an extreme small value to j_icap to ensure it is smaller than all possible values of j_icap
59 | j_icap = -1000000000000
60 | for i in range(n_features):
61 | if i not in F:
62 | f = X[:, i]
63 | t2 = midd(f_select, f)
64 | t3 = cmidd(f_select, f, y)
65 | if t2-t3 > max[i]:
66 | max[i] = t2-t3
67 | # calculate j_icap for feature i (not in F)
68 | t = t1[i] - max[i]
69 | # record the largest j_icap and the corresponding feature index
70 | if t > j_icap:
71 | j_icap = t
72 | idx = i
73 | F.append(idx)
74 | f_select = X[:, idx]
75 |
76 | return np.array(F)
77 |
--------------------------------------------------------------------------------
/src/skfeature/function/information_theoretical_based/JMI.py:
--------------------------------------------------------------------------------
1 | import LCSI
2 |
3 |
4 | def jmi(X, y, **kwargs):
5 | """
6 | This function implements the JMI feature selection
7 |
8 | Input
9 | -----
10 | X: {numpy array}, shape (n_samples, n_features)
11 | input data, guaranteed to be discrete
12 | y: {numpy array}, shape (n_samples,)
13 | input class labels
14 | kwargs: {dictionary}
15 | n_selected_features: {int}
16 | number of features to select
17 |
18 | Output
19 | ------
20 | F: {numpy array}, shape (n_features,)
21 | index of selected features, F[1] is the most important feature
22 |
23 | Reference
24 | ---------
25 | Brown, Gavin et al. "Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection." JMLR 2012.
26 | """
27 | if 'n_selected_features' in kwargs.keys():
28 | n_selected_features = kwargs['n_selected_features']
29 | F = LCSI.lcsi(X, y, function_name='JMI', n_selected_features=n_selected_features)
30 | else:
31 | F = LCSI.lcsi(X, y, function_name='JMI')
32 | return F
--------------------------------------------------------------------------------
/src/skfeature/function/information_theoretical_based/LCSI.py:
--------------------------------------------------------------------------------
1 | from skfeature.utility.entropy_estimators import *
2 |
3 |
4 | def lcsi(X, y, **kwargs):
5 | """
6 | This function implements the basic scoring criteria for linear combination of shannon information term.
7 | The scoring criteria is calculated based on the formula j_cmi=I(f;y)-beta*sum_j(I(fj;f))+gamma*sum(I(fj;f|y))
8 |
9 | Input
10 | -----
11 | X: {numpy array}, shape (n_samples, n_features)
12 | input data, guaranteed to be a discrete data matrix
13 | y: {numpy array}, shape (n_samples,)
14 | input class labels
15 | kwargs: {dictionary}
16 | Parameters for different feature selection algorithms.
17 | beta: {float}
18 | beta is the parameter in j_cmi=I(f;y)-beta*sum(I(fj;f))+gamma*sum(I(fj;f|y))
19 | gamma: {float}
20 | gamma is the parameter in j_cmi=I(f;y)-beta*sum(I(fj;f))+gamma*sum(I(fj;f|y))
21 | function_name: {string}
22 | name of the feature selection function
23 | n_selected_features: {int}
24 | number of features to select
25 |
26 | Output
27 | ------
28 | F: {numpy array}, shape: (n_features,)
29 | index of selected features, F[1] is the most important feature
30 |
31 | Reference
32 | ---------
33 | Brown, Gavin et al. "Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection." JMLR 2012.
34 | """
35 |
36 | n_samples, n_features = X.shape
37 | # index of selected features, initialized to be empty
38 | F = []
39 | # indicate whether the user specifies the number of features
40 | is_n_selected_features_specified = False
41 | # initialize the parameters
42 | if 'beta' in kwargs.keys():
43 | beta = kwargs['beta']
44 | if 'gamma' in kwargs.keys():
45 | gamma = kwargs['gamma']
46 | if 'n_selected_features' in kwargs.keys():
47 | n_selected_features = kwargs['n_selected_features']
48 | is_n_selected_features_specified = True
49 |
50 | # select the feature whose j_cmi is the largest
51 | # t1 stores I(f;y) for each feature f
52 | t1 = np.zeros(n_features)
53 | # t2 sotres sum_j(I(fj;f)) for each feature f
54 | t2 = np.zeros(n_features)
55 | # t3 stores sum_j(I(fj;f|y)) for each feature f
56 | t3 = np.zeros(n_features)
57 | for i in range(n_features):
58 | f = X[:, i]
59 | t1[i] = midd(f, y)
60 |
61 | # make sure that j_cmi is positive at the very beginning
62 | j_cmi = 1
63 |
64 | while True:
65 | if len(F) == 0:
66 | # select the feature whose mutual information is the largest
67 | idx = np.argmax(t1)
68 | F.append(idx)
69 | f_select = X[:, idx]
70 |
71 | if is_n_selected_features_specified is True:
72 | if len(F) == n_selected_features:
73 | break
74 | if is_n_selected_features_specified is not True:
75 | if j_cmi < 0:
76 | break
77 |
78 | # we assign an extreme small value to j_cmi to ensure it is smaller than all possible values of j_cmi
79 | j_cmi = -1000000000000
80 | if 'function_name' in kwargs.keys():
81 | if kwargs['function_name'] == 'MRMR':
82 | beta = 1.0 / len(F)
83 | elif kwargs['function_name'] == 'JMI':
84 | beta = 1.0 / len(F)
85 | gamma = 1.0 / len(F)
86 | for i in range(n_features):
87 | if i not in F:
88 | f = X[:, i]
89 | t2[i] += midd(f_select, f)
90 | t3[i] += cmidd(f_select, f, y)
91 | # calculate j_cmi for feature i (not in F)
92 | t = t1[i] - beta*t2[i] + gamma*t3[i]
93 | # record the largest j_cmi and the corresponding feature index
94 | if t > j_cmi:
95 | j_cmi = t
96 | idx = i
97 | F.append(idx)
98 | f_select = X[:, idx]
99 |
100 | return np.array(F)
101 |
102 |
103 |
104 |
105 |
106 |
--------------------------------------------------------------------------------
/src/skfeature/function/information_theoretical_based/MIFS.py:
--------------------------------------------------------------------------------
1 | import LCSI
2 |
3 |
4 | def mifs(X, y, **kwargs):
5 | """
6 | This function implements the MIFS feature selection
7 |
8 | Input
9 | -----
10 | X: {numpy array}, shape (n_samples, n_features)
11 | input data, guaranteed to be discrete
12 | y: {numpy array}, shape (n_samples,)
13 | input class labels
14 | kwargs: {dictionary}
15 | n_selected_features: {int}
16 | number of features to select
17 |
18 | Output
19 | ------
20 | F: {numpy array}, shape (n_features,)
21 | index of selected features, F[1] is the most important feature
22 |
23 | Reference
24 | ---------
25 | Brown, Gavin et al. "Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection." JMLR 2012.
26 | """
27 |
28 | if 'beta' not in kwargs.keys():
29 | beta = 0.5
30 | else:
31 | beta = kwargs['beta']
32 | if 'n_selected_features' in kwargs.keys():
33 | n_selected_features = kwargs['n_selected_features']
34 | F = LCSI.lcsi(X, y, beta=beta, gamma=0, n_selected_features=n_selected_features)
35 | else:
36 | F = LCSI.lcsi(X, y, beta=beta, gamma=0)
37 | return F
38 |
--------------------------------------------------------------------------------
/src/skfeature/function/information_theoretical_based/MIM.py:
--------------------------------------------------------------------------------
1 | import LCSI
2 |
3 |
4 | def mim(X, y, **kwargs):
5 | """
6 | This function implements the MIM feature selection
7 |
8 | Input
9 | -----
10 | X: {numpy array}, shape (n_samples, n_features)
11 | input data, guaranteed to be discrete
12 | y: {numpy array}, shape (n_samples,)
13 | input class labels
14 | kwargs: {dictionary}
15 | n_selected_features: {int}
16 | number of features to select
17 |
18 | Output
19 | ------
20 | F: {numpy array}, shape (n_features, )
21 | index of selected features, F[1] is the most important feature
22 |
23 | Reference
24 | ---------
25 | Brown, Gavin et al. "Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection." JMLR 2012.
26 | """
27 |
28 | if 'n_selected_features' in kwargs.keys():
29 | n_selected_features = kwargs['n_selected_features']
30 | F = LCSI.lcsi(X, y, beta=0, gamma=0, n_selected_features=n_selected_features)
31 | else:
32 | F = LCSI.lcsi(X, y, beta=0, gamma=0)
33 | return F
34 |
--------------------------------------------------------------------------------
/src/skfeature/function/information_theoretical_based/MRMR.py:
--------------------------------------------------------------------------------
1 | import LCSI
2 |
3 |
4 | def mrmr(X, y, **kwargs):
5 | """
6 | This function implements the MRMR feature selection
7 |
8 | Input
9 | -----
10 | X: {numpy array}, shape (n_samples, n_features)
11 | input data, guaranteed to be discrete
12 | y: {numpy array}, shape (n_samples,)
13 | input class labels
14 | kwargs: {dictionary}
15 | n_selected_features: {int}
16 | number of features to select
17 |
18 | Output
19 | ------
20 | F: {numpy array}, shape (n_features,)
21 | index of selected features, F[1] is the most important feature
22 |
23 | Reference
24 | ---------
25 | Brown, Gavin et al. "Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection." JMLR 2012.
26 | """
27 | if 'n_selected_features' in kwargs.keys():
28 | n_selected_features = kwargs['n_selected_features']
29 | F = LCSI.lcsi(X, y, gamma=0, function_name='MRMR', n_selected_features=n_selected_features)
30 | else:
31 | F = LCSI.lcsi(X, y, gamma=0, function_name='MRMR')
32 | return F
--------------------------------------------------------------------------------
/src/skfeature/function/information_theoretical_based/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DavideNardone/A-Sparse-Coding-Based-Approach-for-Class-Specific-Feature-Selection/32a4eba5ca437c8fe853c001b7a769ed716801d7/src/skfeature/function/information_theoretical_based/__init__.py
--------------------------------------------------------------------------------
/src/skfeature/function/similarity_based/SPEC.py:
--------------------------------------------------------------------------------
1 | import numpy.matlib
2 | import numpy as np
3 | from scipy.sparse import *
4 | from sklearn.metrics.pairwise import rbf_kernel
5 | from numpy import linalg as LA
6 |
7 |
8 | def spec(X, **kwargs):
9 | """
10 | This function implements the SPEC feature selection
11 |
12 | Input
13 | -----
14 | X: {numpy array}, shape (n_samples, n_features)
15 | input data
16 | kwargs: {dictionary}
17 | style: {int}
18 | style == -1, the first feature ranking function, use all eigenvalues
19 | style == 0, the second feature ranking function, use all except the 1st eigenvalue
20 | style >= 2, the third feature ranking function, use the first k except 1st eigenvalue
21 | W: {sparse matrix}, shape (n_samples, n_samples}
22 | input affinity matrix
23 |
24 | Output
25 | ------
26 | w_fea: {numpy array}, shape (n_features,)
27 | SPEC feature score for each feature
28 |
29 | Reference
30 | ---------
31 | Zhao, Zheng and Liu, Huan. "Spectral Feature Selection for Supervised and Unsupervised Learning." ICML 2007.
32 | """
33 |
34 | if 'style' not in kwargs:
35 | kwargs['style'] = 0
36 | if 'W' not in kwargs:
37 | kwargs['W'] = rbf_kernel(X, gamma=1)
38 |
39 | style = kwargs['style']
40 | W = kwargs['W']
41 | if type(W) is numpy.ndarray:
42 | W = csc_matrix(W)
43 |
44 | n_samples, n_features = X.shape
45 |
46 | # build the degree matrix
47 | X_sum = np.array(W.sum(axis=1))
48 | D = np.zeros((n_samples, n_samples))
49 | for i in range(n_samples):
50 | D[i, i] = X_sum[i]
51 |
52 | # build the laplacian matrix
53 | L = D - W
54 | d1 = np.power(np.array(W.sum(axis=1)), -0.5)
55 | d1[np.isinf(d1)] = 0
56 | d2 = np.power(np.array(W.sum(axis=1)), 0.5)
57 | v = np.dot(np.diag(d2[:, 0]), np.ones(n_samples))
58 | v = v/LA.norm(v)
59 |
60 | # build the normalized laplacian matrix
61 | L_hat = (np.matlib.repmat(d1, 1, n_samples)) * np.array(L) * np.matlib.repmat(np.transpose(d1), n_samples, 1)
62 |
63 | # calculate and construct spectral information
64 | s, U = np.linalg.eigh(L_hat)
65 | s = np.flipud(s)
66 | U = np.fliplr(U)
67 |
68 | # begin to select features
69 | w_fea = np.ones(n_features)*1000
70 |
71 | for i in range(n_features):
72 | f = X[:, i]
73 | F_hat = np.dot(np.diag(d2[:, 0]), f)
74 | l = LA.norm(F_hat)
75 | if l < 100*np.spacing(1):
76 | w_fea[i] = 1000
77 | continue
78 | else:
79 | F_hat = F_hat/l
80 | a = np.array(np.dot(np.transpose(F_hat), U))
81 | a = np.multiply(a, a)
82 | a = np.transpose(a)
83 |
84 | # use f'Lf formulation
85 | if style == -1:
86 | w_fea[i] = np.sum(a * s)
87 | # using all eigenvalues except the 1st
88 | elif style == 0:
89 | a1 = a[0:n_samples-1]
90 | w_fea[i] = np.sum(a1 * s[0:n_samples-1])/(1-np.power(np.dot(np.transpose(F_hat), v), 2))
91 | # use first k except the 1st
92 | else:
93 | a1 = a[n_samples-style:n_samples-1]
94 | w_fea[i] = np.sum(a1 * (2-s[n_samples-style: n_samples-1]))
95 |
96 | if style != -1 and style != 0:
97 | w_fea[w_fea == 1000] = -1000
98 |
99 | return w_fea
100 |
101 |
102 | def feature_ranking(score, **kwargs):
103 | if 'style' not in kwargs:
104 | kwargs['style'] = 0
105 | style = kwargs['style']
106 |
107 | # if style = -1 or 0, ranking features in descending order, the higher the score, the more important the feature is
108 | if style == -1 or style == 0:
109 | idx = np.argsort(score, 0)
110 | return idx[::-1]
111 | # if style != -1 and 0, ranking features in ascending order, the lower the score, the more important the feature is
112 | elif style != -1 and style != 0:
113 | idx = np.argsort(score, 0)
114 | return idx
--------------------------------------------------------------------------------
/src/skfeature/function/similarity_based/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DavideNardone/A-Sparse-Coding-Based-Approach-for-Class-Specific-Feature-Selection/32a4eba5ca437c8fe853c001b7a769ed716801d7/src/skfeature/function/similarity_based/__init__.py
--------------------------------------------------------------------------------
/src/skfeature/function/similarity_based/fisher_score.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from scipy.sparse import *
3 | from skfeature.utility.construct_W import construct_W
4 |
5 |
6 | def fisher_score(X, y):
7 | """
8 | This function implements the fisher score feature selection, steps are as follows:
9 | 1. Construct the affinity matrix W in fisher score way
10 | 2. For the r-th feature, we define fr = X(:,r), D = diag(W*ones), ones = [1,...,1]', L = D - W
11 | 3. Let fr_hat = fr - (fr'*D*ones)*ones/(ones'*D*ones)
12 | 4. Fisher score for the r-th feature is score = (fr_hat'*D*fr_hat)/(fr_hat'*L*fr_hat)-1
13 |
14 | Input
15 | -----
16 | X: {numpy array}, shape (n_samples, n_features)
17 | input data
18 | y: {numpy array}, shape (n_samples,)
19 | input class labels
20 |
21 | Output
22 | ------
23 | score: {numpy array}, shape (n_features,)
24 | fisher score for each feature
25 |
26 | Reference
27 | ---------
28 | He, Xiaofei et al. "Laplacian Score for Feature Selection." NIPS 2005.
29 | Duda, Richard et al. "Pattern classification." John Wiley & Sons, 2012.
30 | """
31 |
32 | # Construct weight matrix W in a fisherScore way
33 | kwargs = {"neighbor_mode": "supervised", "fisher_score": True, 'y': y}
34 | W = construct_W(X, **kwargs)
35 |
36 | # build the diagonal D matrix from affinity matrix W
37 | D = np.array(W.sum(axis=1))
38 | L = W
39 | tmp = np.dot(np.transpose(D), X)
40 | D = diags(np.transpose(D), [0])
41 | Xt = np.transpose(X)
42 | t1 = np.transpose(np.dot(Xt, D.todense()))
43 | t2 = np.transpose(np.dot(Xt, L.todense()))
44 | # compute the numerator of Lr
45 | D_prime = np.sum(np.multiply(t1, X), 0) - np.multiply(tmp, tmp)/D.sum()
46 | # compute the denominator of Lr
47 | L_prime = np.sum(np.multiply(t2, X), 0) - np.multiply(tmp, tmp)/D.sum()
48 | # avoid the denominator of Lr to be 0
49 | D_prime[D_prime < 1e-12] = 10000
50 | lap_score = 1 - np.array(np.multiply(L_prime, 1/D_prime))[0, :]
51 |
52 | # compute fisher score from laplacian score, where fisher_score = 1/lap_score - 1
53 | score = 1.0/lap_score - 1
54 | return np.transpose(score)
55 |
56 |
57 | def feature_ranking(score):
58 | """
59 | Rank features in descending order according to fisher score, the larger the fisher score, the more important the
60 | feature is
61 | """
62 | idx = np.argsort(score, 0)
63 | return idx[::-1]
--------------------------------------------------------------------------------
/src/skfeature/function/similarity_based/lap_score.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from scipy.sparse import *
3 | from skfeature.utility.construct_W import construct_W
4 |
5 |
6 | def lap_score(X, **kwargs):
7 | """
8 | This function implements the laplacian score feature selection, steps are as follows:
9 | 1. Construct the affinity matrix W if it is not specified
10 | 2. For the r-th feature, we define fr = X(:,r), D = diag(W*ones), ones = [1,...,1]', L = D - W
11 | 3. Let fr_hat = fr - (fr'*D*ones)*ones/(ones'*D*ones)
12 | 4. Laplacian score for the r-th feature is score = (fr_hat'*L*fr_hat)/(fr_hat'*D*fr_hat)
13 |
14 | Input
15 | -----
16 | X: {numpy array}, shape (n_samples, n_features)
17 | input data
18 | kwargs: {dictionary}
19 | W: {sparse matrix}, shape (n_samples, n_samples)
20 | input affinity matrix
21 |
22 | Output
23 | ------
24 | score: {numpy array}, shape (n_features,)
25 | laplacian score for each feature
26 |
27 | Reference
28 | ---------
29 | He, Xiaofei et al. "Laplacian Score for Feature Selection." NIPS 2005.
30 | """
31 |
32 | # if 'W' is not specified, use the default W
33 | if 'W' not in kwargs.keys():
34 | W = construct_W(X)
35 | # construct the affinity matrix W
36 | W = kwargs['W']
37 | # build the diagonal D matrix from affinity matrix W
38 | D = np.array(W.sum(axis=1))
39 | L = W
40 | tmp = np.dot(np.transpose(D), X)
41 | D = diags(np.transpose(D), [0])
42 | Xt = np.transpose(X)
43 | t1 = np.transpose(np.dot(Xt, D.todense()))
44 | t2 = np.transpose(np.dot(Xt, L.todense()))
45 | # compute the numerator of Lr
46 | D_prime = np.sum(np.multiply(t1, X), 0) - np.multiply(tmp, tmp)/D.sum()
47 | # compute the denominator of Lr
48 | L_prime = np.sum(np.multiply(t2, X), 0) - np.multiply(tmp, tmp)/D.sum()
49 | # avoid the denominator of Lr to be 0
50 | D_prime[D_prime < 1e-12] = 10000
51 |
52 | # compute laplacian score for all features
53 | score = 1 - np.array(np.multiply(L_prime, 1/D_prime))[0, :]
54 | return np.transpose(score)
55 |
56 |
57 | def feature_ranking(score):
58 | """
59 | Rank features in ascending order according to their laplacian scores, the smaller the laplacian score is, the more
60 | important the feature is
61 | """
62 | idx = np.argsort(score, 0)
63 | return idx
64 |
--------------------------------------------------------------------------------
/src/skfeature/function/similarity_based/reliefF.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from random import randrange
3 | from sklearn.metrics.pairwise import pairwise_distances
4 |
5 |
6 | def reliefF(X, y, **kwargs):
7 | """
8 | This function implements the reliefF feature selection
9 |
10 | Input
11 | -----
12 | X: {numpy array}, shape (n_samples, n_features)
13 | input data
14 | y: {numpy array}, shape (n_samples,)
15 | input class labels
16 | kwargs: {dictionary}
17 | parameters of reliefF:
18 | k: {int}
19 | choices for the number of neighbors (default k = 5)
20 |
21 | Output
22 | ------
23 | score: {numpy array}, shape (n_features,)
24 | reliefF score for each feature
25 |
26 | Reference
27 | ---------
28 | Robnik-Sikonja, Marko et al. "Theoretical and empirical analysis of relieff and rrelieff." Machine Learning 2003.
29 | Zhao, Zheng et al. "On Similarity Preserving Feature Selection." TKDE 2013.
30 | """
31 |
32 | if "k" not in kwargs.keys():
33 | k = 5
34 | else:
35 | k = kwargs["k"]
36 | n_samples, n_features = X.shape
37 |
38 | # calculate pairwise distances between instances
39 | distance = pairwise_distances(X, metric='manhattan')
40 |
41 | score = np.zeros(n_features)
42 |
43 | # the number of sampled instances is equal to the number of total instances
44 | for iter in range(n_samples):
45 | idx = randrange(0, n_samples, 1)
46 | near_hit = []
47 | near_miss = dict()
48 |
49 | self_fea = X[idx, :]
50 | c = np.unique(y).tolist()
51 |
52 | stop_dict = dict()
53 | for label in c:
54 | stop_dict[label] = 0
55 | del c[c.index(y[idx])]
56 |
57 | p_dict = dict()
58 | p_label_idx = float(len(y[y == y[idx]]))/float(n_samples)
59 |
60 | for label in c:
61 | p_label_c = float(len(y[y == label]))/float(n_samples)
62 | p_dict[label] = p_label_c/(1-p_label_idx)
63 | near_miss[label] = []
64 |
65 | distance_sort = []
66 | distance[idx, idx] = np.max(distance[idx, :])
67 |
68 | for i in range(n_samples):
69 | distance_sort.append([distance[idx, i], int(i), y[i]])
70 | distance_sort.sort(key=lambda x: x[0])
71 |
72 | for i in range(n_samples):
73 | # find k nearest hit points
74 | if distance_sort[i][2] == y[idx]:
75 | if len(near_hit) < k:
76 | near_hit.append(distance_sort[i][1])
77 | elif len(near_hit) == k:
78 | stop_dict[y[idx]] = 1
79 | else:
80 | # find k nearest miss points for each label
81 | if len(near_miss[distance_sort[i][2]]) < k:
82 | near_miss[distance_sort[i][2]].append(distance_sort[i][1])
83 | else:
84 | if len(near_miss[distance_sort[i][2]]) == k:
85 | stop_dict[distance_sort[i][2]] = 1
86 | stop = True
87 | for (key, value) in stop_dict.items():
88 | if value != 1:
89 | stop = False
90 | if stop:
91 | break
92 |
93 | # update reliefF score
94 | near_hit_term = np.zeros(n_features)
95 | for ele in near_hit:
96 | near_hit_term = np.array(abs(self_fea-X[ele, :]))+np.array(near_hit_term)
97 |
98 | near_miss_term = dict()
99 | for (label, miss_list) in near_miss.items():
100 | near_miss_term[label] = np.zeros(n_features)
101 | for ele in miss_list:
102 | near_miss_term[label] = np.array(abs(self_fea-X[ele, :]))+np.array(near_miss_term[label])
103 | score += near_miss_term[label]/(k*p_dict[label])
104 | score -= near_hit_term/k
105 | return score
106 |
107 |
108 | def feature_ranking(score):
109 | """
110 | Rank features in descending order according to reliefF score, the higher the reliefF score, the more important the
111 | feature is
112 | """
113 | idx = np.argsort(score, 0)
114 | return idx[::-1]
115 |
116 |
117 |
118 |
119 |
120 |
121 |
--------------------------------------------------------------------------------
/src/skfeature/function/similarity_based/trace_ratio.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from skfeature.utility.construct_W import construct_W
3 |
4 |
5 | def trace_ratio(X, y, n_selected_features, **kwargs):
6 | """
7 | This function implements the trace ratio criterion for feature selection
8 |
9 | Input
10 | -----
11 | X: {numpy array}, shape (n_samples, n_features)
12 | input data
13 | y: {numpy array}, shape (n_samples,)
14 | input class labels
15 | n_selected_features: {int}
16 | number of features to select
17 | kwargs: {dictionary}
18 | style: {string}
19 | style == 'fisher', build between-class matrix and within-class affinity matrix in a fisher score way
20 | style == 'laplacian', build between-class matrix and within-class affinity matrix in a laplacian score way
21 | verbose: {boolean}
22 | True if user want to print out the objective function value in each iteration, False if not
23 |
24 | Output
25 | ------
26 | feature_idx: {numpy array}, shape (n_features,)
27 | the ranked (descending order) feature index based on subset-level score
28 | feature_score: {numpy array}, shape (n_features,)
29 | the feature-level score
30 | subset_score: {float}
31 | the subset-level score
32 |
33 | Reference
34 | ---------
35 | Feiping Nie et al. "Trace Ratio Criterion for Feature Selection." AAAI 2008.
36 | """
37 |
38 | # if 'style' is not specified, use the fisher score way to built two affinity matrix
39 | if 'style' not in kwargs.keys():
40 | kwargs['style'] = 'fisher'
41 | # get the way to build affinity matrix, 'fisher' or 'laplacian'
42 | style = kwargs['style']
43 | n_samples, n_features = X.shape
44 |
45 | # if 'verbose' is not specified, do not output the value of objective function
46 | if 'verbose' not in kwargs:
47 | kwargs['verbose'] = False
48 | verbose = kwargs['verbose']
49 |
50 | if style is 'fisher':
51 | kwargs_within = {"neighbor_mode": "supervised", "fisher_score": True, 'y': y}
52 | # build within class and between class laplacian matrix L_w and L_b
53 | W_within = construct_W(X, **kwargs_within)
54 | L_within = np.eye(n_samples) - W_within
55 | L_tmp = np.eye(n_samples) - np.ones([n_samples, n_samples])/n_samples
56 | L_between = L_within - L_tmp
57 |
58 | if style is 'laplacian':
59 | kwargs_within = {"metric": "euclidean", "neighbor_mode": "knn", "weight_mode": "heat_kernel", "k": 5, 't': 1}
60 | # build within class and between class laplacian matrix L_w and L_b
61 | W_within = construct_W(X, **kwargs_within)
62 | D_within = np.diag(np.array(W_within.sum(1))[:, 0])
63 | L_within = D_within - W_within
64 | W_between = np.dot(np.dot(D_within, np.ones([n_samples, n_samples])), D_within)/np.sum(D_within)
65 | D_between = np.diag(np.array(W_between.sum(1)))
66 | L_between = D_between - W_between
67 |
68 | # build X'*L_within*X and X'*L_between*X
69 | L_within = (np.transpose(L_within) + L_within)/2
70 | L_between = (np.transpose(L_between) + L_between)/2
71 | S_within = np.array(np.dot(np.dot(np.transpose(X), L_within), X))
72 | S_between = np.array(np.dot(np.dot(np.transpose(X), L_between), X))
73 |
74 | # reflect the within-class or local affinity relationship encoded on graph, Sw = X*Lw*X'
75 | S_within = (np.transpose(S_within) + S_within)/2
76 | # reflect the between-class or global affinity relationship encoded on graph, Sb = X*Lb*X'
77 | S_between = (np.transpose(S_between) + S_between)/2
78 |
79 | # take the absolute values of diagonal
80 | s_within = np.absolute(S_within.diagonal())
81 | s_between = np.absolute(S_between.diagonal())
82 | s_between[s_between == 0] = 1e-14 # this number if from authors' code
83 |
84 | # preprocessing
85 | fs_idx = np.argsort(np.divide(s_between, s_within), 0)[::-1]
86 | k = np.sum(s_between[0:n_selected_features])/np.sum(s_within[0:n_selected_features])
87 | s_within = s_within[fs_idx[0:n_selected_features]]
88 | s_between = s_between[fs_idx[0:n_selected_features]]
89 |
90 | # iterate util converge
91 | count = 0
92 | while True:
93 | score = np.sort(s_between-k*s_within)[::-1]
94 | I = np.argsort(s_between-k*s_within)[::-1]
95 | idx = I[0:n_selected_features]
96 | old_k = k
97 | k = np.sum(s_between[idx])/np.sum(s_within[idx])
98 | if verbose:
99 | print 'obj at iter ' + str(count+1) + ': ' + str(k)
100 | count += 1
101 | if abs(k - old_k) < 1e-3:
102 | break
103 |
104 | # get feature index, feature-level score and subset-level score
105 | feature_idx = fs_idx[I]
106 | feature_score = score
107 | subset_score = k
108 |
109 | return feature_idx, feature_score, subset_score
110 |
111 |
112 |
113 |
--------------------------------------------------------------------------------
/src/skfeature/function/sparse_learning_based/MCFS.py:
--------------------------------------------------------------------------------
1 | import scipy
2 | import numpy as np
3 | from sklearn import linear_model
4 | from skfeature.utility.construct_W import construct_W
5 |
6 |
7 | def mcfs(X, n_selected_features, **kwargs):
8 | """
9 | This function implements unsupervised feature selection for multi-cluster data.
10 |
11 | Input
12 | -----
13 | X: {numpy array}, shape (n_samples, n_features)
14 | input data
15 | n_selected_features: {int}
16 | number of features to select
17 | kwargs: {dictionary}
18 | W: {sparse matrix}, shape (n_samples, n_samples)
19 | affinity matrix
20 | n_clusters: {int}
21 | number of clusters (default is 5)
22 |
23 | Output
24 | ------
25 | W: {numpy array}, shape(n_features, n_clusters)
26 | feature weight matrix
27 |
28 | Reference
29 | ---------
30 | Cai, Deng et al. "Unsupervised Feature Selection for Multi-Cluster Data." KDD 2010.
31 | """
32 |
33 | # use the default affinity matrix
34 | if 'W' not in kwargs:
35 | W = construct_W(X)
36 | else:
37 | W = kwargs['W']
38 | # default number of clusters is 5
39 | if 'n_clusters' not in kwargs:
40 | n_clusters = 5
41 | else:
42 | n_clusters = kwargs['n_clusters']
43 |
44 | # solve the generalized eigen-decomposition problem and get the top K
45 | # eigen-vectors with respect to the smallest eigenvalues
46 | W = W.toarray()
47 | W = (W + W.T) / 2
48 | W_norm = np.diag(np.sqrt(1 / W.sum(1)))
49 | W = np.dot(W_norm, np.dot(W, W_norm))
50 | WT = W.T
51 | W[W < WT] = WT[W < WT]
52 | eigen_value, ul = scipy.linalg.eigh(a=W)
53 | Y = np.dot(W_norm, ul[:, -1*n_clusters-1:-1])
54 |
55 | # solve K L1-regularized regression problem using LARs algorithm with cardinality constraint being d
56 | n_sample, n_feature = X.shape
57 | W = np.zeros((n_feature, n_clusters))
58 | for i in range(n_clusters):
59 | clf = linear_model.Lars(n_nonzero_coefs=n_selected_features)
60 | clf.fit(X, Y[:, i])
61 | W[:, i] = clf.coef_
62 | return W
63 |
64 |
65 | def feature_ranking(W):
66 | """
67 | This function computes MCFS score and ranking features according to feature weights matrix W
68 | """
69 | mcfs_score = W.max(1)
70 | idx = np.argsort(mcfs_score, 0)
71 | idx = idx[::-1]
72 | return idx
--------------------------------------------------------------------------------
/src/skfeature/function/sparse_learning_based/NDFS.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import sys
3 | import math
4 | import sklearn.cluster
5 | from skfeature.utility.construct_W import construct_W
6 |
7 |
8 | def ndfs(X, **kwargs):
9 | """
10 | This function implement unsupervised feature selection using nonnegative spectral analysis, i.e.,
11 | min_{F,W} Tr(F^T L F) + alpha*(||XW-F||_F^2 + beta*||W||_{2,1}) + gamma/2 * ||F^T F - I||_F^2
12 | s.t. F >= 0
13 |
14 | Input
15 | -----
16 | X: {numpy array}, shape (n_samples, n_features)
17 | input data
18 | kwargs: {dictionary}
19 | W: {sparse matrix}, shape {n_samples, n_samples}
20 | affinity matrix
21 | alpha: {float}
22 | Parameter alpha in objective function
23 | beta: {float}
24 | Parameter beta in objective function
25 | gamma: {float}
26 | a very large number used to force F^T F = I
27 | F0: {numpy array}, shape (n_samples, n_clusters)
28 | initialization of the pseudo label matirx F, if not provided
29 | n_clusters: {int}
30 | number of clusters
31 | verbose: {boolean}
32 | True if user want to print out the objective function value in each iteration, false if not
33 |
34 | Output
35 | ------
36 | W: {numpy array}, shape(n_features, n_clusters)
37 | feature weight matrix
38 |
39 | Reference:
40 | Li, Zechao, et al. "Unsupervised Feature Selection Using Nonnegative Spectral Analysis." AAAI. 2012.
41 | """
42 |
43 | # default gamma is 10e8
44 | if 'gamma' not in kwargs:
45 | gamma = 10e8
46 | else:
47 | gamma = kwargs['gamma']
48 | # use the default affinity matrix
49 | if 'W' not in kwargs:
50 | W = construct_W(X)
51 | else:
52 | W = kwargs['W']
53 | if 'alpha' not in kwargs:
54 | alpha = 1
55 | else:
56 | alpha = kwargs['alpha']
57 | if 'beta' not in kwargs:
58 | beta = 1
59 | else:
60 | beta = kwargs['beta']
61 | if 'F0' not in kwargs:
62 | if 'n_clusters' not in kwargs:
63 | print >>sys.stderr, "either F0 or n_clusters should be provided"
64 | else:
65 | # initialize F
66 | n_clusters = kwargs['n_clusters']
67 | F = kmeans_initialization(X, n_clusters)
68 | else:
69 | F = kwargs['F0']
70 | if 'verbose' not in kwargs:
71 | verbose = False
72 | else:
73 | verbose = kwargs['verbose']
74 |
75 | n_samples, n_features = X.shape
76 |
77 | # initialize D as identity matrix
78 | D = np.identity(n_features)
79 | I = np.identity(n_samples)
80 |
81 | # build laplacian matrix
82 | L = np.array(W.sum(1))[:, 0] - W
83 |
84 | max_iter = 1000
85 | obj = np.zeros(max_iter)
86 | for iter_step in range(max_iter):
87 | # update W
88 | T = np.linalg.inv(np.dot(X.transpose(), X) + beta * D + 1e-6*np.eye(n_features))
89 | W = np.dot(np.dot(T, X.transpose()), F)
90 | # update D
91 | temp = np.sqrt((W*W).sum(1))
92 | temp[temp < 1e-16] = 1e-16
93 | temp = 0.5 / temp
94 | D = np.diag(temp)
95 | # update M
96 | M = L + alpha * (I - np.dot(np.dot(X, T), X.transpose()))
97 | M = (M + M.transpose())/2
98 | # update F
99 | denominator = np.dot(M, F) + gamma*np.dot(np.dot(F, F.transpose()), F)
100 | temp = np.divide(gamma*F, denominator)
101 | F = F*np.array(temp)
102 | temp = np.diag(np.sqrt(np.diag(1 / (np.dot(F.transpose(), F) + 1e-16))))
103 | F = np.dot(F, temp)
104 |
105 | # calculate objective function
106 | obj[iter_step] = np.trace(np.dot(np.dot(F.transpose(), M), F)) + gamma/4*np.linalg.norm(np.dot(F.transpose(), F)-np.identity(n_clusters), 'fro')
107 | if verbose:
108 | print 'obj at iter ' + str(iter_step+1) + ': ' + str(obj[iter_step])
109 |
110 | if iter_step >= 1 and math.fabs(obj[iter_step] - obj[iter_step-1]) < 1e-3:
111 | break
112 | return W
113 |
114 |
115 | def kmeans_initialization(X, n_clusters):
116 | """
117 | This function uses kmeans to initialize the pseudo label
118 |
119 | Input
120 | -----
121 | X: {numpy array}, shape (n_samples, n_features)
122 | input data
123 | n_clusters: {int}
124 | number of clusters
125 |
126 | Output
127 | ------
128 | Y: {numpy array}, shape (n_samples, n_clusters)
129 | pseudo label matrix
130 | """
131 |
132 | n_samples, n_features = X.shape
133 | kmeans = sklearn.cluster.KMeans(n_clusters=n_clusters, init='k-means++', n_init=10, max_iter=300,
134 | tol=0.0001, precompute_distances=True, verbose=0,
135 | random_state=None, copy_x=True, n_jobs=1)
136 | kmeans.fit(X)
137 | labels = kmeans.labels_
138 | Y = np.zeros((n_samples, n_clusters))
139 | for row in range(0, n_samples):
140 | Y[row, labels[row]] = 1
141 | T = np.dot(Y.transpose(), Y)
142 | F = np.dot(Y, np.sqrt(np.linalg.inv(T)))
143 | F = F + 0.02*np.ones((n_samples, n_clusters))
144 | return F
145 |
146 |
147 | def calculate_obj(X, W, F, L, alpha, beta):
148 | """
149 | This function calculates the objective function of NDFS
150 | """
151 | # Tr(F^T L F)
152 | T1 = np.trace(np.dot(np.dot(F.transpose(), L), F))
153 | T2 = np.linalg.norm(np.dot(X, W) - F, 'fro')
154 | T3 = (np.sqrt((W*W).sum(1))).sum()
155 | obj = T1 + alpha*(T2 + beta*T3)
156 | return obj
--------------------------------------------------------------------------------
/src/skfeature/function/sparse_learning_based/RFS.py:
--------------------------------------------------------------------------------
1 | import math
2 | import numpy as np
3 | from numpy import linalg as LA
4 | from skfeature.utility.sparse_learning import generate_diagonal_matrix
5 | from skfeature.utility.sparse_learning import calculate_l21_norm
6 |
7 |
8 | def rfs(X, Y, **kwargs):
9 | """
10 | This function implementS efficient and robust feature selection via joint l21-norms minimization
11 | min_W||X^T W - Y||_2,1 + gamma||W||_2,1
12 |
13 | Input
14 | -----
15 | X: {numpy array}, shape (n_samples, n_features)
16 | input data
17 | Y: {numpy array}, shape (n_samples, n_classes)
18 | input class label matrix, each row is a one-hot-coding class label
19 | kwargs: {dictionary}
20 | gamma: {float}
21 | parameter in RFS
22 | verbose: boolean
23 | True if want to display the objective function value, false if not
24 |
25 | Output
26 | ------
27 | W: {numpy array}, shape(n_samples, n_features)
28 | feature weight matrix
29 |
30 | Reference
31 | ---------
32 | Nie, Feiping et al. "Efficient and Robust Feature Selection via Joint l2,1-Norms Minimization" NIPS 2010.
33 | """
34 |
35 | # default gamma is 1
36 | if 'gamma' not in kwargs:
37 | gamma = 1
38 | else:
39 | gamma = kwargs['gamma']
40 | if 'verbose' not in kwargs:
41 | verbose = False
42 | else:
43 | verbose = kwargs['verbose']
44 |
45 | n_samples, n_features = X.shape
46 | A = np.zeros((n_samples, n_samples + n_features))
47 | A[:, 0:n_features] = X
48 | A[:, n_features:n_features+n_samples] = gamma*np.eye(n_samples)
49 | D = np.eye(n_features+n_samples)
50 |
51 | max_iter = 1000
52 | obj = np.zeros(max_iter)
53 | for iter_step in range(max_iter):
54 | # update U as U = D^{-1} A^T (A D^-1 A^T)^-1 Y
55 | D_inv = LA.inv(D)
56 | temp = LA.inv(np.dot(np.dot(A, D_inv), A.T) + 1e-6*np.eye(n_samples)) # (A D^-1 A^T)^-1
57 | U = np.dot(np.dot(np.dot(D_inv, A.T), temp), Y)
58 | # update D as D_ii = 1 / 2 / ||U(i,:)||
59 | D = generate_diagonal_matrix(U)
60 |
61 | obj[iter_step] = calculate_obj(X, Y, U[0:n_features, :], gamma)
62 |
63 | if verbose:
64 | print 'obj at iter ' + str(iter_step+1) + ': ' + str(obj[iter_step])
65 | if iter_step >= 1 and math.fabs(obj[iter_step] - obj[iter_step-1]) < 1e-3:
66 | break
67 |
68 | # the first d rows of U are the feature weights
69 | W = U[0:n_features, :]
70 | return W
71 |
72 |
73 | def calculate_obj(X, Y, W, gamma):
74 | """
75 | This function calculates the objective function of rfs
76 | """
77 | temp = np.dot(X, W) - Y
78 | return calculate_l21_norm(temp) + gamma*calculate_l21_norm(W)
--------------------------------------------------------------------------------
/src/skfeature/function/sparse_learning_based/UDFS.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import scipy
3 | import math
4 | from skfeature.utility.sparse_learning import generate_diagonal_matrix, calculate_l21_norm
5 | from sklearn.metrics.pairwise import pairwise_distances
6 |
7 |
8 | def udfs(X, **kwargs):
9 | """
10 | This function implements l2,1-norm regularized discriminative feature
11 | selection for unsupervised learning, i.e., min_W Tr(W^T M W) + gamma ||W||_{2,1}, s.t. W^T W = I
12 |
13 | Input
14 | -----
15 | X: {numpy array}, shape (n_samples, n_features)
16 | input data
17 | kwargs: {dictionary}
18 | gamma: {float}
19 | parameter in the objective function of UDFS (default is 1)
20 | n_clusters: {int}
21 | Number of clusters
22 | k: {int}
23 | number of nearest neighbor
24 | verbose: {boolean}
25 | True if want to display the objective function value, false if not
26 |
27 | Output
28 | ------
29 | W: {numpy array}, shape(n_features, n_clusters)
30 | feature weight matrix
31 |
32 | Reference
33 | Yang, Yi et al. "l2,1-Norm Regularized Discriminative Feature Selection for Unsupervised Learning." AAAI 2012.
34 | """
35 |
36 | # default gamma is 0.1
37 | if 'gamma' not in kwargs:
38 | gamma = 0.1
39 | else:
40 | gamma = kwargs['gamma']
41 | # default k is set to be 5
42 | if 'k' not in kwargs:
43 | k = 5
44 | else:
45 | k = kwargs['k']
46 | if 'n_clusters' not in kwargs:
47 | n_clusters = 5
48 | else:
49 | n_clusters = kwargs['n_clusters']
50 | if 'verbose' not in kwargs:
51 | verbose = False
52 | else:
53 | verbose = kwargs['verbose']
54 |
55 | # construct M
56 | n_sample, n_feature = X.shape
57 | M = construct_M(X, k, gamma)
58 |
59 | D = np.eye(n_feature)
60 | max_iter = 1000
61 | obj = np.zeros(max_iter)
62 | for iter_step in range(max_iter):
63 | # update W as the eigenvectors of P corresponding to the first n_clusters
64 | # smallest eigenvalues
65 | P = M + gamma*D
66 | eigen_value, eigen_vector = scipy.linalg.eigh(a=P)
67 | W = eigen_vector[:, 0:n_clusters]
68 | # update D as D_ii = 1 / 2 / ||W(i,:)||
69 | D = generate_diagonal_matrix(W)
70 |
71 | obj[iter_step] = calculate_obj(X, W, M, gamma)
72 | if verbose:
73 | print 'obj at iter ' + str(iter_step+1) + ': ' + str(obj[iter_step])
74 |
75 | if iter_step >= 1 and math.fabs(obj[iter_step] - obj[iter_step-1]) < 1e-3:
76 | break
77 | return W
78 |
79 |
80 | def construct_M(X, k, gamma):
81 | """
82 | This function constructs the M matrix described in the paper
83 | """
84 | n_sample, n_feature = X.shape
85 | Xt = X.T
86 | D = pairwise_distances(X)
87 | # sort the distance matrix D in ascending order
88 | idx = np.argsort(D, axis=1)
89 | # choose the k-nearest neighbors for each instance
90 | idx_new = idx[:, 0:k+1]
91 | H = np.eye(k+1) - 1/(k+1) * np.ones((k+1, k+1))
92 | I = np.eye(k+1)
93 | Mi = np.zeros((n_sample, n_sample))
94 | for i in range(n_sample):
95 | Xi = Xt[:, idx_new[i, :]]
96 | Xi_tilde =np.dot(Xi, H)
97 | Bi = np.linalg.inv(np.dot(Xi_tilde.T, Xi_tilde) + gamma*I)
98 | Si = np.zeros((n_sample, k+1))
99 | for q in range(k+1):
100 | Si[idx_new[q], q] = 1
101 | Mi = Mi + np.dot(np.dot(Si, np.dot(np.dot(H, Bi), H)), Si.T)
102 | M = np.dot(np.dot(X.T, Mi), X)
103 | return M
104 |
105 |
106 | def calculate_obj(X, W, M, gamma):
107 | """
108 | This function calculates the objective function of ls_l21 described in the paper
109 | """
110 | return np.trace(np.dot(np.dot(W.T, M), W)) + gamma*calculate_l21_norm(W)
--------------------------------------------------------------------------------
/src/skfeature/function/sparse_learning_based/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DavideNardone/A-Sparse-Coding-Based-Approach-for-Class-Specific-Feature-Selection/32a4eba5ca437c8fe853c001b7a769ed716801d7/src/skfeature/function/sparse_learning_based/__init__.py
--------------------------------------------------------------------------------
/src/skfeature/function/sparse_learning_based/ll_l21.py:
--------------------------------------------------------------------------------
1 | import math
2 | import numpy as np
3 | from numpy import linalg as LA
4 | from skfeature.utility.sparse_learning import euclidean_projection, calculate_l21_norm
5 |
6 |
7 | def proximal_gradient_descent(X, Y, z, **kwargs):
8 | """
9 | This function implements supervised sparse feature selection via l2,1 norm, i.e.,
10 | min_{W} sum_{i}log(1+exp(-yi*(W'*x+C))) + z*||W||_{2,1}
11 |
12 | Input
13 | -----
14 | X: {numpy array}, shape (n_samples, n_features)
15 | input data
16 | Y: {numpy array}, shape (n_samples, n_classes)
17 | input class labels, each row is a one-hot-coding class label, guaranteed to be a numpy array
18 | z: {float}
19 | regularization parameter
20 | kwargs: {dictionary}
21 | verbose: {boolean}
22 | True if user want to print out the objective function value in each iteration, false if not
23 |
24 | Output
25 | ------
26 | W: {numpy array}, shape (n_features, n_classes)
27 | weight matrix
28 | obj: {numpy array}, shape (n_iterations,)
29 | objective function value during iterations
30 | value_gamma: {numpy array}, shape (n_iterations,s)
31 | suitable step size during iterations
32 |
33 |
34 | Reference:
35 | Liu, Jun, et al. "Multi-Task Feature Learning Via Efficient l2,1-Norm Minimization." UAI. 2009.
36 | """
37 |
38 | if 'verbose' not in kwargs:
39 | verbose = False
40 | else:
41 | verbose = kwargs['verbose']
42 |
43 | # Starting point initialization #
44 | n_samples, n_features = X.shape
45 | n_samples, n_classes = Y.shape
46 |
47 | # the indices of positive samples
48 | p_flag = (Y == 1)
49 | # the total number of positive samples
50 | n_positive_samples = np.sum(p_flag, 0)
51 | # the total number of negative samples
52 | n_negative_samples = n_samples - n_positive_samples
53 | n_positive_samples = n_positive_samples.astype(float)
54 | n_negative_samples = n_negative_samples.astype(float)
55 |
56 | # initialize a starting point
57 | W = np.zeros((n_features, n_classes))
58 | C = np.log(np.divide(n_positive_samples, n_negative_samples))
59 |
60 | # compute XW = X*W
61 | XW = np.dot(X, W)
62 |
63 | # starting the main program, the Armijo Goldstein line search scheme + accelerated gradient descent
64 | # the intial guess of the Lipschitz continuous gradient
65 | gamma = 1.0/(n_samples*n_classes)
66 |
67 | # assign Wp with W, and XWp with XW
68 | XWp = XW
69 | WWp =np.zeros((n_features, n_classes))
70 | CCp = np.zeros((1, n_classes))
71 |
72 | alphap = 0
73 | alpha = 1
74 |
75 | # indicates whether the gradient step only changes a little
76 | flag = False
77 |
78 | max_iter = 1000
79 | value_gamma = np.zeros(max_iter)
80 | obj = np.zeros(max_iter)
81 | for iter_step in range(max_iter):
82 | # step1: compute search point S based on Wp and W (with beta)
83 | beta = (alphap-1)/alpha
84 | S = W + beta*WWp
85 | SC = C + beta*CCp
86 |
87 | # step2: line search for gamma and compute the new approximation solution W
88 | XS = XW + beta*(XW - XWp)
89 | aa = -np.multiply(Y, XS+np.tile(SC, (n_samples, 1)))
90 | # fun_S is the logistic loss at the search point
91 | bb = np.maximum(aa, 0)
92 | fun_S = np.sum(np.log(np.exp(-bb)+np.exp(aa-bb))+bb)/(n_samples*n_classes)
93 | # compute prob = [p_1;p_2;...;p_m]
94 | prob = 1.0/(1+np.exp(aa))
95 |
96 | b = np.multiply(-Y, (1-prob))/(n_samples*n_classes)
97 | # compute the gradient of C
98 | GC = np.sum(b, 0)
99 | # compute the gradient of W as X'*b
100 | G = np.dot(np.transpose(X), b)
101 |
102 | # copy W and XW to Wp and XWp
103 | Wp = W
104 | XWp = XW
105 | Cp = C
106 |
107 | while True:
108 | # let S walk in a step in the antigradient of S to get V and then do the L1/L2-norm regularized projection
109 | V = S - G/gamma
110 | C = SC - GC/gamma
111 | W = euclidean_projection(V, n_features, n_classes, z, gamma)
112 |
113 | # the difference between the new approximate solution W and the search point S
114 | V = W - S
115 | # compute XW = X*W
116 | XW = np.dot(X, W)
117 | aa = -np.multiply(Y, XW+np.tile(C, (n_samples, 1)))
118 | # fun_W is the logistic loss at the new approximate solution
119 | bb = np.maximum(aa, 0)
120 | fun_W = np.sum(np.log(np.exp(-bb)+np.exp(aa-bb))+bb)/(n_samples*n_classes)
121 |
122 | r_sum = (LA.norm(V, 'fro')**2 + LA.norm(C-SC, 2)**2) / 2
123 | l_sum = fun_W - fun_S - np.sum(np.multiply(V, G)) - np.inner((C-SC), GC)
124 |
125 | # determine weather the gradient step makes little improvement
126 | if r_sum <= 1e-20:
127 | flag = True
128 | break
129 |
130 | # the condition is fun_W <= fun_S + + + gamma/2 * ( + )
131 | if l_sum < r_sum*gamma:
132 | break
133 | else:
134 | gamma = max(2*gamma, l_sum/r_sum)
135 | value_gamma[iter_step] = gamma
136 |
137 | # step3: update alpha and alphap, and check weather converge
138 | alphap = alpha
139 | alpha = (1+math.sqrt(4*alpha*alpha+1))/2
140 |
141 | WWp = W - Wp
142 | CCp = C - Cp
143 |
144 | # calculate obj
145 | obj[iter_step] = fun_W
146 | obj[iter_step] += z*calculate_l21_norm(W)
147 |
148 | if verbose:
149 | print 'obj at iter ' + str(iter_step+1) + ': ' + str(obj[iter_step])
150 |
151 | if flag is True:
152 | break
153 |
154 | # determine weather converge
155 | if iter_step >= 1 and math.fabs(obj[iter_step] - obj[iter_step-1]) < 1e-3:
156 | break
157 | return W, obj, value_gamma
158 |
--------------------------------------------------------------------------------
/src/skfeature/function/sparse_learning_based/ls_l21.py:
--------------------------------------------------------------------------------
1 | import math
2 | import numpy as np
3 | from numpy import linalg as LA
4 | from skfeature.utility.sparse_learning import euclidean_projection, calculate_l21_norm
5 |
6 |
7 | def proximal_gradient_descent(X, Y, z, **kwargs):
8 | """
9 | This function implements supervised sparse feature selection via l2,1 norm, i.e.,
10 | min_{W} ||XW-Y||_F^2 + z*||W||_{2,1}
11 |
12 | Input
13 | -----
14 | X: {numpy array}, shape (n_samples, n_features)
15 | input data, guaranteed to be a numpy array
16 | Y: {numpy array}, shape (n_samples, n_classes)
17 | input class labels, each row is a one-hot-coding class label
18 | z: {float}
19 | regularization parameter
20 | kwargs: {dictionary}
21 | verbose: {boolean}
22 | True if user want to print out the objective function value in each iteration, false if not
23 |
24 | Output
25 | ------
26 | W: {numpy array}, shape (n_features, n_classes)
27 | weight matrix
28 | obj: {numpy array}, shape (n_iterations,)
29 | objective function value during iterations
30 | value_gamma: {numpy array}, shape (n_iterations,)
31 | suitable step size during iterations
32 |
33 | Reference
34 | ---------
35 | Liu, Jun, et al. "Multi-Task Feature Learning Via Efficient l2,1-Norm Minimization." UAI. 2009.
36 | """
37 |
38 | if 'verbose' not in kwargs:
39 | verbose = False
40 | else:
41 | verbose = kwargs['verbose']
42 |
43 | # starting point initialization
44 | n_samples, n_features = X.shape
45 | n_samples, n_classes = Y.shape
46 |
47 | # compute X'Y
48 | XtY = np.dot(np.transpose(X), Y)
49 |
50 | # initialize a starting point
51 | W = XtY
52 |
53 | # compute XW = X*W
54 | XW = np.dot(X, W)
55 |
56 | # compute l2,1 norm of W
57 | W_norm = calculate_l21_norm(W)
58 |
59 | if W_norm >= 1e-6:
60 | ratio = init_factor(W_norm, XW, Y, z)
61 | W = ratio*W
62 | XW = ratio*XW
63 |
64 | # starting the main program, the Armijo Goldstein line search scheme + accelerated gradient descent
65 | # initialize step size gamma = 1
66 | gamma = 1
67 |
68 | # assign Wp with W, and XWp with XW
69 | XWp = XW
70 | WWp =np.zeros((n_features, n_classes))
71 | alphap = 0
72 | alpha = 1
73 |
74 | # indicate whether the gradient step only changes a little
75 | flag = False
76 |
77 | max_iter = 1000
78 | value_gamma = np.zeros(max_iter)
79 | obj = np.zeros(max_iter)
80 | for iter_step in range(max_iter):
81 | # step1: compute search point S based on Wp and W (with beta)
82 | beta = (alphap-1)/alpha
83 | S = W + beta*WWp
84 |
85 | # step2: line search for gamma and compute the new approximation solution W
86 | XS = XW + beta*(XW - XWp)
87 | # compute X'* XS
88 | XtXS = np.dot(np.transpose(X), XS)
89 | # obtain the gradient g
90 | G = XtXS - XtY
91 | # copy W and XW to Wp and XWp
92 | Wp = W
93 | XWp = XW
94 |
95 | while True:
96 | # let S walk in a step in the antigradient of S to get V and then do the L1/L2-norm regularized projection
97 | V = S - G/gamma
98 | W = euclidean_projection(V, n_features, n_classes, z, gamma)
99 | # the difference between the new approximate solution W and the search point S
100 | V = W - S
101 | # compute XW = X*W
102 | XW = np.dot(X, W)
103 | XV = XW - XS
104 | r_sum = LA.norm(V, 'fro')**2
105 | l_sum = LA.norm(XV, 'fro')**2
106 |
107 | # determine weather the gradient step makes little improvement
108 | if r_sum <= 1e-20:
109 | flag = True
110 | break
111 |
112 | # the condition is ||XV||_2^2 <= gamma * ||V||_2^2
113 | if l_sum < r_sum*gamma:
114 | break
115 | else:
116 | gamma = max(2*gamma, l_sum/r_sum)
117 | value_gamma[iter_step] = gamma
118 |
119 | # step3: update alpha and alphap, and check weather converge
120 | alphap = alpha
121 | alpha = (1+math.sqrt(4*alpha*alpha+1))/2
122 |
123 | WWp = W - Wp
124 | XWY = XW -Y
125 |
126 | # calculate obj
127 | obj[iter_step] = LA.norm(XWY, 'fro')**2/2
128 | obj[iter_step] += z*calculate_l21_norm(W)
129 |
130 | if verbose:
131 | print 'obj at iter ' + str(iter_step+1) + ': ' + str(obj[iter_step])
132 |
133 | if flag is True:
134 | break
135 |
136 | # determine weather converge
137 | if iter_step >= 1 and math.fabs(obj[iter_step] - obj[iter_step-1]) < 1e-3:
138 | break
139 | return W, obj, value_gamma
140 |
141 |
142 | def init_factor(W_norm, XW, Y, z):
143 | """
144 | Initialize the starting point of W, according to the author's code
145 | """
146 | n_samples, n_classes = XW.shape
147 | a = np.inner(np.reshape(XW, n_samples*n_classes), np.reshape(Y, n_samples*n_classes)) - z*W_norm
148 | b = LA.norm(XW, 'fro')**2
149 | ratio = a / b
150 | return ratio
--------------------------------------------------------------------------------
/src/skfeature/function/statistical_based/CFS.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from skfeature.utility.mutual_information import su_calculation
3 |
4 |
5 | def merit_calculation(X, y):
6 | """
7 | This function calculates the merit of X given class labels y, where
8 | merits = (k * rcf)/sqrt(k+k*(k-1)*rff)
9 | rcf = (1/k)*sum(su(fi,y)) for all fi in X
10 | rff = (1/(k*(k-1)))*sum(su(fi,fj)) for all fi and fj in X
11 |
12 | Input
13 | ----------
14 | X: {numpy array}, shape (n_samples, n_features)
15 | input data
16 | y: {numpy array}, shape (n_samples,)
17 | input class labels
18 |
19 | Output
20 | ----------
21 | merits: {float}
22 | merit of a feature subset X
23 | """
24 |
25 | n_samples, n_features = X.shape
26 | rff = 0
27 | rcf = 0
28 | for i in range(n_features):
29 | fi = X[:, i]
30 | rcf += su_calculation(fi, y)
31 | for j in range(n_features):
32 | if j > i:
33 | fj = X[:, j]
34 | rff += su_calculation(fi, fj)
35 | rff *= 2
36 | merits = rcf / np.sqrt(n_features + rff)
37 | return merits
38 |
39 |
40 | def cfs(X, y):
41 | """
42 | This function uses a correlation based heuristic to evaluate the worth of features which is called CFS
43 |
44 | Input
45 | -----
46 | X: {numpy array}, shape (n_samples, n_features)
47 | input data
48 | y: {numpy array}, shape (n_samples,)
49 | input class labels
50 |
51 | Output
52 | ------
53 | F: {numpy array}
54 | index of selected features
55 |
56 | Reference
57 | ---------
58 | Zhao, Zheng et al. "Advancing Feature Selection Research - ASU Feature Selection Repository" 2010.
59 | """
60 |
61 | n_samples, n_features = X.shape
62 | F = []
63 | # M stores the merit values
64 | M = []
65 | while True:
66 | merit = -100000000000
67 | idx = -1
68 | for i in range(n_features):
69 | if i not in F:
70 | F.append(i)
71 | # calculate the merit of current selected features
72 | t = merit_calculation(X[:, F], y)
73 | if t > merit:
74 | merit = t
75 | idx = i
76 | F.pop()
77 | F.append(idx)
78 | M.append(merit)
79 | if len(M) > 5:
80 | if M[len(M)-1] <= M[len(M)-2]:
81 | if M[len(M)-2] <= M[len(M)-3]:
82 | if M[len(M)-3] <= M[len(M)-4]:
83 | if M[len(M)-4] <= M[len(M)-5]:
84 | break
85 | return np.array(F)
86 |
87 |
--------------------------------------------------------------------------------
/src/skfeature/function/statistical_based/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DavideNardone/A-Sparse-Coding-Based-Approach-for-Class-Specific-Feature-Selection/32a4eba5ca437c8fe853c001b7a769ed716801d7/src/skfeature/function/statistical_based/__init__.py
--------------------------------------------------------------------------------
/src/skfeature/function/statistical_based/chi_square.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from sklearn.feature_selection import chi2
3 |
4 |
5 | def chi_square(X, y):
6 | """
7 | This function implements the chi-square feature selection (existing method for classification in scikit-learn)
8 |
9 | Input
10 | -----
11 | X: {numpy array}, shape (n_samples, n_features)
12 | input data
13 | y: {numpy array},shape (n_samples,)
14 | input class labels
15 |
16 | Output
17 | ------
18 | F: {numpy array}, shape (n_features,)
19 | chi-square score for each feature
20 | """
21 | F, pval = chi2(X, y)
22 | return F
23 |
24 |
25 | def feature_ranking(F):
26 | """
27 | Rank features in descending order according to chi2-score, the higher the chi2-score, the more important the feature is
28 | """
29 | idx = np.argsort(F)
30 | return idx[::-1]
--------------------------------------------------------------------------------
/src/skfeature/function/statistical_based/f_score.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from sklearn.feature_selection import f_classif
3 |
4 |
5 | def f_score(X, y):
6 | """
7 | This function implements the anova f_value feature selection (existing method for classification in scikit-learn),
8 | where f_score = sum((ni/(c-1))*(mean_i - mean)^2)/((1/(n - c))*sum((ni-1)*std_i^2))
9 |
10 | Input
11 | -----
12 | X: {numpy array}, shape (n_samples, n_features)
13 | input data
14 | y : {numpy array},shape (n_samples,)
15 | input class labels
16 |
17 | Output
18 | ------
19 | F: {numpy array}, shape (n_features,)
20 | f-score for each feature
21 | """
22 |
23 | F, pval = f_classif(X, y)
24 | return F
25 |
26 |
27 | def feature_ranking(F):
28 | """
29 | Rank features in descending order according to f-score, the higher the f-score, the more important the feature is
30 | """
31 | idx = np.argsort(F)
32 | return idx[::-1]
--------------------------------------------------------------------------------
/src/skfeature/function/statistical_based/gini_index.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 |
3 |
4 | def gini_index(X, y):
5 | """
6 | This function implements the gini index feature selection.
7 |
8 | Input
9 | ----------
10 | X: {numpy array}, shape (n_samples, n_features)
11 | input data
12 | y: {numpy array}, shape (n_samples,)
13 | input class labels
14 |
15 | Output
16 | ----------
17 | gini: {numpy array}, shape (n_features, )
18 | gini index value of each feature
19 | """
20 |
21 | n_samples, n_features = X.shape
22 |
23 | # initialize gini_index for all features to be 0.5
24 | gini = np.ones(n_features) * 0.5
25 |
26 | # For i-th feature we define fi = x[:,i] ,v include all unique values in fi
27 | for i in range(n_features):
28 | v = np.unique(X[:, i])
29 | for j in range(len(v)):
30 | # left_y contains labels of instances whose i-th feature value is less than or equal to v[j]
31 | left_y = y[X[:, i] <= v[j]]
32 | # right_y contains labels of instances whose i-th feature value is larger than v[j]
33 | right_y = y[X[:, i] > v[j]]
34 |
35 | # gini_left is sum of square of probability of occurrence of v[i] in left_y
36 | # gini_right is sum of square of probability of occurrence of v[i] in right_y
37 | gini_left = 0
38 | gini_right = 0
39 |
40 | for k in range(np.min(y), np.max(y)+1):
41 | if len(left_y) != 0:
42 | # t1_left is probability of occurrence of k in left_y
43 | t1_left = np.true_divide(len(left_y[left_y == k]), len(left_y))
44 | t2_left = np.power(t1_left, 2)
45 | gini_left += t2_left
46 |
47 | if len(right_y) != 0:
48 | # t1_right is probability of occurrence of k in left_y
49 | t1_right = np.true_divide(len(right_y[right_y == k]), len(right_y))
50 | t2_right = np.power(t1_right, 2)
51 | gini_right += t2_right
52 |
53 | gini_left = 1 - gini_left
54 | gini_right = 1 - gini_right
55 |
56 | # weighted average of len(left_y) and len(right_y)
57 | t1_gini = (len(left_y) * gini_left + len(right_y) * gini_right)
58 |
59 | # compute the gini_index for the i-th feature
60 | value = np.true_divide(t1_gini, len(y))
61 |
62 | if value < gini[i]:
63 | gini[i] = value
64 | return gini
65 |
66 |
67 | def feature_ranking(W):
68 | """
69 | Rank features in descending order according to their gini index values, the smaller the gini index,
70 | the more important the feature is
71 | """
72 | idx = np.argsort(W)
73 | return idx
74 |
75 |
76 |
77 |
78 |
79 |
80 |
--------------------------------------------------------------------------------
/src/skfeature/function/statistical_based/low_variance.py:
--------------------------------------------------------------------------------
1 | from sklearn.feature_selection import VarianceThreshold
2 |
3 |
4 | def low_variance_feature_selection(X, threshold):
5 | """
6 | This function implements the low_variance feature selection (existing method in scikit-learn)
7 |
8 | Input
9 | -----
10 | X: {numpy array}, shape (n_samples, n_features)
11 | input data
12 | p:{float}
13 | parameter used to calculate the threshold(threshold = p*(1-p))
14 |
15 | Output
16 | ------
17 | X_new: {numpy array}, shape (n_samples, n_selected_features)
18 | data with selected features
19 | """
20 | sel = VarianceThreshold(threshold)
21 | return sel.fit_transform(X)
--------------------------------------------------------------------------------
/src/skfeature/function/statistical_based/t_score.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 |
3 |
4 | def t_score(X, y):
5 | """
6 | This function calculates t_score for each feature, where t_score is only used for binary problem
7 | t_score = |mean1-mean2|/sqrt(((std1^2)/n1)+((std2^2)/n2)))
8 |
9 | Input
10 | -----
11 | X: {numpy array}, shape (n_samples, n_features)
12 | input data
13 | y: {numpy array}, shape (n_samples,)
14 | input class labels
15 |
16 | Output
17 | ------
18 | F: {numpy array}, shape (n_features,)
19 | t-score for each feature
20 | """
21 |
22 | n_samples, n_features = X.shape
23 | F = np.zeros(n_features)
24 | c = np.unique(y)
25 | if len(c) == 2:
26 | for i in range(n_features):
27 | f = X[:, i]
28 | # class0 contains instances belonging to the first class
29 | # class1 contains instances belonging to the second class
30 | class0 = f[y == c[0]]
31 | class1 = f[y == c[1]]
32 | mean0 = np.mean(class0)
33 | mean1 = np.mean(class1)
34 | std0 = np.std(class0)
35 | std1 = np.std(class1)
36 | n0 = len(class0)
37 | n1 = len(class1)
38 | t = mean0 - mean1
39 | t0 = np.true_divide(std0**2, n0)
40 | t1 = np.true_divide(std1**2, n1)
41 | F[i] = np.true_divide(t, (t0 + t1)**0.5)
42 | else:
43 | print('y should be guaranteed to a binary class vector')
44 | exit(0)
45 | return np.abs(F)
46 |
47 |
48 | def feature_ranking(F):
49 | """
50 | Rank features in descending order according to t-score, the higher the t-score, the more important the feature is
51 | """
52 | idx = np.argsort(F)
53 | return idx[::-1]
54 |
55 |
--------------------------------------------------------------------------------
/src/skfeature/function/streaming/__init__.py:
--------------------------------------------------------------------------------
1 | __author__ = 'jundongl'
2 |
--------------------------------------------------------------------------------
/src/skfeature/function/streaming/alpha_investing.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from sklearn import linear_model
3 |
4 |
5 | def alpha_investing(X, y, w0, dw):
6 | """
7 | This function implements streamwise feature selection (SFS) algorithm alpha_investing for binary regression or
8 | univariate regression
9 |
10 | Input
11 | -----
12 | X: {numpy array}, shape (n_samples, n_features)
13 | input data, assume feature arrives one at each time step
14 | y: {numpy array}, shape (n_samples,)
15 | input class labels or regression target
16 |
17 | Output
18 | ------
19 | F: {numpy array}, shape (n_selected_features,)
20 | index of selected features in a streamwise way
21 |
22 | Reference
23 | ---------
24 | Zhou, Jing et al. "Streaming Feature Selection using Alpha-investing." KDD 2006.
25 | """
26 |
27 | n_samples, n_features = X.shape
28 | w = w0
29 | F = [] # selected features
30 | for i in range(n_features):
31 | x_can = X[:, i] # generate next feature
32 | alpha = w/2/(i+1)
33 | X_old = X[:, F]
34 | if i is 0:
35 | X_old = np.ones((n_samples, 1))
36 | linreg_old = linear_model.LinearRegression()
37 | linreg_old.fit(X_old, y)
38 | error_old = 1 - linreg_old.score(X_old, y)
39 | if i is not 0:
40 | # model built with only X_old
41 | linreg_old = linear_model.LinearRegression()
42 | linreg_old.fit(X_old, y)
43 | error_old = 1 - linreg_old.score(X_old, y)
44 |
45 | # model built with X_old & {x_can}
46 | X_new = np.concatenate((X_old, x_can.reshape(n_samples, 1)), axis=1)
47 | logreg_new = linear_model.LinearRegression()
48 | logreg_new.fit(X_new, y)
49 | error_new = 1 - logreg_new.score(X_new, y)
50 |
51 | # calculate p-value
52 | pval = np.exp((error_new - error_old)/(2*error_old/n_samples))
53 | if pval < alpha:
54 | F.append(i)
55 | w = w + dw - alpha
56 | else:
57 | w -= alpha
58 | return np.array(F)
59 |
60 |
--------------------------------------------------------------------------------
/src/skfeature/function/structure/__init__.py:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/src/skfeature/function/structure/graph_fs.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 |
3 |
4 | def soft_threshold(A,b):
5 | """
6 | This function implement the soft-threshold operator
7 | Input:
8 | A: {numpy scalar, vector, or matrix}
9 | b: scalar}
10 | """
11 | res = np.zeros(A.shape)
12 | res[A > b] = A[A > b] - b
13 | res[A < -b] = A[A < -b] + b
14 | return res
15 |
16 |
17 | def calculate_obj(X, y, w, lambda1, lambda2, T):
18 | return 1/2 * (np.linalg.norm(y- np.dot(X, w), 'fro'))**2 + lambda1*np.abs(w).sum() + lambda2*np.abs(np.dot(T, w)).sum()
19 |
20 |
21 | def graph_fs(X, y, **kwargs):
22 | """
23 | This function implement the graph structural feature selection algorithm GOSCAR
24 |
25 | Objective Function
26 | min_{w} 1/2 ||X*w - y||_F^2 + lambda1 ||w||_1 + lambda2 \sum_{(i,j) \in E} max{|w_i|, |w|_j}
27 |
28 | Input:
29 | X: {numpy array}, shape (n_samples, n_features)
30 | Input data, guaranteed to be a numpy array
31 | y: {numpy array}, shape (n_samples, 1)
32 | Input data, the label matrix
33 | edge_list: {numpy array}, shape (n_edges, 2)
34 | Input data, each row is a pair of linked features, note feature index should start from 0
35 | lambda1: {float}
36 | Parameter lambda1 in objective function
37 | lambda2: {float}
38 | Parameter labmda2 in objective function
39 | rho: {flot}
40 | parameter used for optimization
41 | max_iter: {int}
42 | maximal iteration
43 | verbose: {boolean} True or False
44 | True if we want to print out the objective function value in each iteration, False if not
45 |
46 | Output:
47 | w: the weights of the features
48 | obj: the value of the objective function in each iteration
49 | """
50 |
51 | if 'lambda1' not in kwargs:
52 | lambda1 = 0.8
53 | else:
54 | lambda1 = kwargs['lambda1']
55 | if 'lambda2' not in kwargs:
56 | lambda2 = 0.8
57 | else:
58 | lambda2 = kwargs['lambda2']
59 | if 'edge_list' not in kwargs:
60 | print 'Error using function, the network structure E is required'
61 | raise()
62 | else :
63 | edge_list = kwargs['edge_list']
64 | if 'max_iter' not in kwargs:
65 | max_iter = 300
66 | else:
67 | max_iter = kwargs['max_iter']
68 | if 'verbose' not in kwargs:
69 | verbose = 0
70 | else:
71 | verbose = kwargs['verbose']
72 | if 'rho' not in kwargs:
73 | rho = 5
74 | else:
75 | rho = kwargs['rho']
76 |
77 | n_samples, n_features = X.shape
78 |
79 | # construct T from E
80 | ind1 = edge_list[:, 0]
81 | ind2 = edge_list[:, 1]
82 | num_edge = ind1.shape[0]
83 | T = np.zeros((num_edge*2, n_features))
84 | for i in range(num_edge):
85 | T[i, ind1[i]] = 0.5
86 | T[i, ind2[i]] = 0.5
87 | T[i+num_edge, ind1[i]] = 0.5
88 | T[i+num_edge, ind2[i]] = -0.5
89 |
90 | # calculate F = X^T X + rho(I + T^T * T)
91 | F = np.dot(X.T, X) + rho*(np.identity(n_features) + np.dot(T.T, T))
92 |
93 | # Cholesky factorization of F = R^T R
94 | R = np.linalg.cholesky(F) # NOTE, this return F = R R^T
95 | R = R.T
96 | Rinv = np.linalg.inv(R)
97 | Rtinv = Rinv.T
98 |
99 | # initialize p, q, mu , v to be zero vectors
100 | p = np.zeros((2*num_edge, 1))
101 | q = np.zeros((n_features, 1))
102 | mu = np.zeros((n_features, 1))
103 | v = np.zeros((2*num_edge, 1))
104 |
105 | # start the main loop
106 | iter = 0
107 | obj = np.zeros((max_iter,1))
108 | while iter < max_iter:
109 | print iter
110 | # update w
111 | b = np.dot(X.T, y) - mu - np.dot(T.T, v) + rho*np.dot(T.T,p) + rho*q
112 | w_hat = np.dot(Rtinv, b)
113 | w = np.dot(Rinv, w_hat)
114 |
115 | # update q
116 | q = soft_threshold(w + 1/rho*mu, lambda1/rho)
117 | # update p
118 |
119 | p = soft_threshold(np.dot(T, w)+1/rho*v, lambda2/rho)
120 | # update mu, v
121 | mu += rho*(w - q)
122 | v += rho*(np.dot(T, w) - p)
123 |
124 | # calculate objective function
125 | obj[iter] = calculate_obj(X, y, w, lambda1, lambda2, T)
126 | if verbose:
127 | print 'obj at iter ' + str(iter) + ': ' + str(obj[iter])
128 | iter += 1
129 | return w, obj, q
130 |
131 | def feature_ranking(w):
132 | T = w.abs()
133 | idx = np.argsort(T, 0)
134 | return idx[::-1]
135 |
--------------------------------------------------------------------------------
/src/skfeature/function/structure/group_fs.py:
--------------------------------------------------------------------------------
1 | import math
2 | import numpy as np
3 | from skfeature.utility.sparse_learning import tree_lasso_projection, tree_norm
4 |
5 |
6 | def group_fs(X, y, z1, z2, idx, **kwargs):
7 | """
8 | This function implements supervised sparse group feature selection with least square loss, i.e.,
9 | min_{w} ||Xw-y||_2^2 + z_1||w||_1 + z_2*sum_{i} h_{i}||w_{G_{i}}|| where h_i is the weight for the i-th group
10 |
11 | Input
12 | -----
13 | X: {numpy array}, shape (n_samples, n_features)
14 | input data
15 | y: {numpy array}, shape (n_samples,)
16 | input class labels or regression target
17 | z1: {float}
18 | regularization parameter of L1 norm for each element
19 | z2: {float}
20 | regularization parameter of L2 norm for the non-overlapping group
21 | idx: {numpy array}, shape (3, n_nodes)
22 | 3*nodes matrix, where nodes denotes the number of groups
23 | idx[1,:] contains the starting index of a group
24 | idx[2,: contains the ending index of a group
25 | idx[3,:] contains the corresponding weight (w_{j})
26 | kwargs: {dictionary}
27 | verbose: {boolean}
28 | True if user want to print out the objective function value in each iteration, false if not
29 |
30 | Output
31 | ------
32 | w: {numpy array}, shape (n_features, )
33 | weight matrix
34 | obj: {numpy array}, shape (n_iterations, )
35 | objective function value during iterations
36 | value_gamma: {numpy array}, shape (n_iterations, )
37 | suitable step size during iterations
38 |
39 | Reference
40 | ---------
41 | Liu, Jun, et al. "Moreau-Yosida Regularization for Grouped Tree Structure Learning." NIPS. 2010.
42 | Liu, Jun, et al. "SLEP: Sparse Learning with Efficient Projections." http://www.public.asu.edu/~jye02/Software/SLEP, 2009.
43 | """
44 | if 'verbose' not in kwargs:
45 | verbose = False
46 | else:
47 | verbose = kwargs['verbose']
48 |
49 | # starting point initialization
50 | n_samples, n_features = X.shape
51 |
52 | # compute X'y
53 | Xty = np.dot(np.transpose(X), y)
54 |
55 | # initialize a starting point
56 | w = np.zeros(n_features)
57 |
58 | # compute Xw = X*w
59 | Xw = np.dot(X, w)
60 |
61 | # starting the main program, the Armijo Goldstein line search scheme + accelerated gradient descent
62 | # initialize step size gamma = 1
63 | gamma = 1
64 |
65 | # assign wp with w, and Xwp with Xw
66 | Xwp = Xw
67 | wwp = np.zeros(n_features)
68 | alphap = 0
69 | alpha = 1
70 |
71 | # indicates whether the gradient step only changes a little
72 | flag = False
73 |
74 | max_iter = 1000
75 | value_gamma = np.zeros(max_iter)
76 | obj = np.zeros(max_iter)
77 | for iter_step in range(max_iter):
78 | # step1: compute search point s based on wp and w (with beta)
79 | beta = (alphap-1)/alpha
80 | s = w + beta*wwp
81 |
82 | # step2: line search for gamma and compute the new approximation solution w
83 | Xs = Xw + beta*(Xw - Xwp)
84 | # compute X'* Xs
85 | XtXs = np.dot(np.transpose(X), Xs)
86 | # obtain the gradient g
87 | G = XtXs - Xty
88 | # copy w and Xw to wp and Xwp
89 | wp = w
90 | Xwp = Xw
91 |
92 | while True:
93 | # let s walk in a step in the antigradient of s to get v and then do the L1/L2-norm regularized projection
94 | v = s - G/gamma
95 | # tree overlapping group lasso projection
96 | n_nodes = int(idx.shape[1])
97 | idx_tmp = np.zeros((3, n_nodes+1))
98 | idx_tmp[0:2, :] = np.concatenate((np.array([[-1], [-1]]), idx[0:2, :]), axis=1)
99 | idx_tmp[2, :] = np.concatenate((np.array([z1/gamma]), z2/gamma*idx[2, :]), axis=1)
100 | w = tree_lasso_projection(v, n_features, idx_tmp, n_nodes+1)
101 | # the difference between the new approximate solution w and the search point s
102 | v = w - s
103 | # compute Xw = X*w
104 | Xw = np.dot(X, w)
105 | Xv = Xw - Xs
106 | r_sum = np.inner(v, v)
107 | l_sum = np.inner(Xv, Xv)
108 | # determine weather the gradient step makes little improvement
109 | if r_sum <= 1e-20:
110 | flag = True
111 | break
112 |
113 | # the condition is ||Xv||_2^2 <= gamma * ||v||_2^2
114 | if l_sum <= r_sum*gamma:
115 | break
116 | else:
117 | gamma = max(2*gamma, l_sum/r_sum)
118 | value_gamma[iter_step] = gamma
119 |
120 | # step3: update alpha and alphap, and check weather converge
121 | alphap = alpha
122 | alpha = (1+math.sqrt(4*alpha*alpha+1))/2
123 |
124 | wwp = w - wp
125 | Xwy = Xw -y
126 |
127 | # calculate the regularization part
128 | idx_tmp = np.zeros((3, n_nodes+1))
129 | idx_tmp[0:2, :] = np.concatenate((np.array([[-1], [-1]]), idx[0:2, :]), axis=1)
130 | idx_tmp[2, :] = np.concatenate((np.array([z1]), z2*idx[2, :]), axis=1)
131 | tree_norm_val = tree_norm(w, n_features, idx_tmp, n_nodes+1)
132 |
133 | # function value = loss + regularization
134 | obj[iter_step] = np.inner(Xwy, Xwy)/2 + tree_norm_val
135 |
136 | if verbose:
137 | print 'obj at iter ' + str(iter_step+1) + ': ' + str(obj[iter_step])
138 |
139 | if flag is True:
140 | break
141 |
142 | # determine weather converge
143 | if iter_step >= 2 and math.fabs(obj[iter_step] - obj[iter_step-1]) < 1e-3:
144 | break
145 |
146 | return w, obj, value_gamma
147 |
148 |
149 |
--------------------------------------------------------------------------------
/src/skfeature/function/structure/tree_fs.py:
--------------------------------------------------------------------------------
1 | import math
2 | import numpy as np
3 | from skfeature.utility.sparse_learning import tree_lasso_projection, tree_norm
4 |
5 |
6 | def tree_fs(X, y, z, idx, **kwargs):
7 | """
8 | This function implements tree structured group lasso regularization with least square loss, i.e.,
9 | min_{w} ||Xw-Y||_2^2 + z\sum_{i}\sum_{j} h_{j}^{i}|||w_{G_{j}^{i}}|| where h_{j}^{i} is the weight for the j-th group
10 | from the i-th level (the root node is in level 0)
11 |
12 | Input
13 | -----
14 | X: {numpy array}, shape (n_samples, n_features)
15 | input data
16 | y: {numpy array}, shape (n_samples,)
17 | input class labels or regression target
18 | z: {float}
19 | regularization parameter of L2 norm for the non-overlapping group
20 | idx: {numpy array}, shape (3, n_nodes)
21 | 3*nodes matrix, where nodes denotes the number of nodes of the tree
22 | idx(1,:) contains the starting index
23 | idx(2,:) contains the ending index
24 | idx(3,:) contains the corresponding weight (w_{j})
25 | kwargs: {dictionary}
26 | verbose: {boolean}
27 | True if user want to print out the objective function value in each iteration, false if not
28 |
29 | Output
30 | ------
31 | w: {numpy array}, shape (n_features,)
32 | weight vector
33 | obj: {numpy array}, shape (n_iterations,)
34 | objective function value during iterations
35 | value_gamma: {numpy array}, shape (n_iterations,)
36 | suitable step size during iterations
37 |
38 | Note for input parameter idx:
39 | (1) For idx, if each entry in w is a leaf node of the tree and the weight for this leaf node are the same, then
40 | idx[0,0] = -1 and idx[1,0] = -1, idx[2,0] denotes the common weight
41 | (2) In idx, the features of the left tree is smaller than the right tree (idx[0,i] is always smaller than idx[1,i])
42 |
43 | Reference:
44 | Liu, Jun, et al. "Moreau-Yosida Regularization for Grouped Tree Structure Learning." NIPS. 2010.
45 | Liu, Jun, et al. "SLEP: Sparse Learning with Efficient Projections." http://www.public.asu.edu/~jye02/Software/SLEP, 2009.
46 | """
47 |
48 | if 'verbose' not in kwargs:
49 | verbose = False
50 | else:
51 | verbose = kwargs['verbose']
52 |
53 | # starting point initialization
54 | n_samples, n_features = X.shape
55 |
56 | # compute X'y
57 | Xty = np.dot(np.transpose(X), y)
58 |
59 | # initialize a starting point
60 | w = np.zeros(n_features)
61 |
62 | # compute Xw = X*w
63 | Xw = np.dot(X, w)
64 |
65 | # starting the main program, the Armijo Goldstein line search scheme + accelerated gradient descent
66 | # initialize step size gamma = 1
67 | gamma = 1
68 |
69 | # assign wp with w, and Xwp with Xw
70 | Xwp = Xw
71 | wwp = np.zeros(n_features)
72 | alphap = 0
73 | alpha = 1
74 |
75 | # indicates whether the gradient step only changes a little
76 | flag = False
77 |
78 | max_iter = 1000
79 | value_gamma = np.zeros(max_iter)
80 | obj = np.zeros(max_iter)
81 | for iter_step in range(max_iter):
82 | # step1: compute search point s based on wp and w (with beta)
83 | beta = (alphap-1)/alpha
84 | s = w + beta*wwp
85 |
86 | # step2: line search for gamma and compute the new approximation solution w
87 | Xs = Xw + beta*(Xw - Xwp)
88 | # compute X'* Xs
89 | XtXs = np.dot(np.transpose(X), Xs)
90 |
91 | # obtain the gradient g
92 | G = XtXs - Xty
93 |
94 | # copy w and Xw to wp and Xwp
95 | wp = w
96 | Xwp = Xw
97 |
98 | while True:
99 | # let s walk in a step in the antigradient of s to get v and then do the L1/L2-norm regularized projection
100 | v = s - G/gamma
101 | # tree overlapping group lasso projection
102 | n_nodes = int(idx.shape[1])
103 | idx_tmp = idx.copy()
104 | idx_tmp[2, :] = idx[2, :] * z / gamma
105 | w = tree_lasso_projection(v, n_features, idx_tmp, n_nodes)
106 | # the difference between the new approximate solution w and the search point s
107 | v = w - s
108 | # compute Xw = X*w
109 | Xw = np.dot(X, w)
110 | Xv = Xw - Xs
111 | r_sum = np.inner(v, v)
112 | l_sum = np.inner(Xv, Xv)
113 | # determine weather the gradient step makes little improvement
114 | if r_sum <= 1e-20:
115 | flag = True
116 | break
117 |
118 | # the condition is ||Xv||_2^2 <= gamma * ||v||_2^2
119 | if l_sum <= r_sum*gamma:
120 | break
121 | else:
122 | gamma = max(2*gamma, l_sum/r_sum)
123 | value_gamma[iter_step] = gamma
124 |
125 | # step3: update alpha and alphap, and check weather converge
126 | alphap = alpha
127 | alpha = (1+math.sqrt(4*alpha*alpha+1))/2
128 |
129 | wwp = w - wp
130 | Xwy = Xw -y
131 | # calculate the regularization part
132 | tree_norm_val = tree_norm(w, n_features, idx, n_nodes)
133 |
134 | # function value = loss + regularization
135 | obj[iter_step] = np.inner(Xwy, Xwy)/2 + z*tree_norm_val
136 |
137 | if verbose:
138 | print 'obj at iter ' + str(iter_step+1) + ': ' + str(obj[iter_step])
139 |
140 | if flag is True:
141 | break
142 |
143 | # determine whether converge
144 | if iter_step >= 2 and math.fabs(obj[iter_step] - obj[iter_step-1]) < 1e-3:
145 | break
146 |
147 | return w, obj, value_gamma
148 |
149 |
150 |
151 |
152 |
--------------------------------------------------------------------------------
/src/skfeature/function/wrapper/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DavideNardone/A-Sparse-Coding-Based-Approach-for-Class-Specific-Feature-Selection/32a4eba5ca437c8fe853c001b7a769ed716801d7/src/skfeature/function/wrapper/__init__.py
--------------------------------------------------------------------------------
/src/skfeature/function/wrapper/decision_tree_backward.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from sklearn.tree import DecisionTreeClassifier
3 | from sklearn.cross_validation import KFold
4 | from sklearn.metrics import accuracy_score
5 |
6 |
7 | def decision_tree_backward(X, y, n_selected_features):
8 | """
9 | This function implements the backward feature selection algorithm based on decision tree
10 |
11 | Input
12 | -----
13 | X: {numpy array}, shape (n_samples, n_features)
14 | input data
15 | y: {numpy array}, shape (n_samples,)
16 | input class labels
17 | n_selected_features : {int}
18 | number of selected features
19 |
20 | Output
21 | ------
22 | F: {numpy array}, shape (n_features, )
23 | index of selected features
24 | """
25 |
26 | n_samples, n_features = X.shape
27 | # using 10 fold cross validation
28 | cv = KFold(n_samples, n_folds=10, shuffle=True)
29 | # choose decision tree as the classifier
30 | clf = DecisionTreeClassifier()
31 |
32 | # selected feature set, initialized to contain all features
33 | F = range(n_features)
34 | count = n_features
35 |
36 | while count > n_selected_features:
37 | max_acc = 0
38 | for i in range(n_features):
39 | if i in F:
40 | F.remove(i)
41 | X_tmp = X[:, F]
42 | acc = 0
43 | for train, test in cv:
44 | clf.fit(X_tmp[train], y[train])
45 | y_predict = clf.predict(X_tmp[test])
46 | acc_tmp = accuracy_score(y[test], y_predict)
47 | acc += acc_tmp
48 | acc = float(acc)/10
49 | F.append(i)
50 | # record the feature which results in the largest accuracy
51 | if acc > max_acc:
52 | max_acc = acc
53 | idx = i
54 | # delete the feature which results in the largest accuracy
55 | F.remove(idx)
56 | count -= 1
57 | return np.array(F)
58 |
59 |
60 |
--------------------------------------------------------------------------------
/src/skfeature/function/wrapper/decision_tree_forward.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from sklearn.tree import DecisionTreeClassifier
3 | from sklearn.cross_validation import KFold
4 | from sklearn.metrics import accuracy_score
5 |
6 |
7 | def decision_tree_forward(X, y, n_selected_features):
8 | """
9 | This function implements the forward feature selection algorithm based on decision tree
10 |
11 | Input
12 | -----
13 | X: {numpy array}, shape (n_samples, n_features)
14 | input data
15 | y: {numpy array}, shape (n_samples, )
16 | input class labels
17 | n_selected_features: {int}
18 | number of selected features
19 |
20 | Output
21 | ------
22 | F: {numpy array}, shape (n_features,)
23 | index of selected features
24 | """
25 |
26 | n_samples, n_features = X.shape
27 | # using 10 fold cross validation
28 | cv = KFold(n_samples, n_folds=10, shuffle=True)
29 | # choose decision tree as the classifier
30 | clf = DecisionTreeClassifier()
31 |
32 | # selected feature set, initialized to be empty
33 | F = []
34 | count = 0
35 | while count < n_selected_features:
36 | max_acc = 0
37 | for i in range(n_features):
38 | if i not in F:
39 | F.append(i)
40 | X_tmp = X[:, F]
41 | acc = 0
42 | for train, test in cv:
43 | clf.fit(X_tmp[train], y[train])
44 | y_predict = clf.predict(X_tmp[test])
45 | acc_tmp = accuracy_score(y[test], y_predict)
46 | acc += acc_tmp
47 | acc = float(acc)/10
48 | F.pop()
49 | # record the feature which results in the largest accuracy
50 | if acc > max_acc:
51 | max_acc = acc
52 | idx = i
53 | # add the feature which results in the largest accuracy
54 | F.append(idx)
55 | count += 1
56 | return np.array(F)
57 |
58 |
--------------------------------------------------------------------------------
/src/skfeature/function/wrapper/svm_backward.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from sklearn.svm import SVC
3 | from sklearn.cross_validation import KFold
4 | from sklearn.metrics import accuracy_score
5 |
6 |
7 | def svm_backward(X, y, n_selected_features):
8 | """
9 | This function implements the backward feature selection algorithm based on SVM
10 |
11 | Input
12 | -----
13 | X: {numpy array}, shape (n_samples, n_features)
14 | input data
15 | y: {numpy array}, shape (n_samples,)
16 | input class labels
17 | n_selected_features: {int}
18 | number of selected features
19 |
20 | Output
21 | ------
22 | F: {numpy array}, shape (n_features, )
23 | index of selected features
24 | """
25 |
26 | n_samples, n_features = X.shape
27 | # using 10 fold cross validation
28 | cv = KFold(n_samples, n_folds=10, shuffle=True)
29 | # choose SVM as the classifier
30 | clf = SVC()
31 |
32 | # selected feature set, initialized to contain all features
33 | F = range(n_features)
34 | count = n_features
35 |
36 | while count > n_selected_features:
37 | max_acc = 0
38 | for i in range(n_features):
39 | if i in F:
40 | F.remove(i)
41 | X_tmp = X[:, F]
42 | acc = 0
43 | for train, test in cv:
44 | clf.fit(X_tmp[train], y[train])
45 | y_predict = clf.predict(X_tmp[test])
46 | acc_tmp = accuracy_score(y[test], y_predict)
47 | acc += acc_tmp
48 | acc = float(acc)/10
49 | F.append(i)
50 | # record the feature which results in the largest accuracy
51 | if acc > max_acc:
52 | max_acc = acc
53 | idx = i
54 | # delete the feature which results in the largest accuracy
55 | F.remove(idx)
56 | count -= 1
57 | return np.array(F)
58 |
59 |
60 |
--------------------------------------------------------------------------------
/src/skfeature/function/wrapper/svm_forward.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from sklearn.svm import SVC
3 | from sklearn.cross_validation import KFold
4 | from sklearn.metrics import accuracy_score
5 |
6 |
7 | def svm_forward(X, y, n_selected_features):
8 | """
9 | This function implements the forward feature selection algorithm based on SVM
10 |
11 | Input
12 | -----
13 | X: {numpy array}, shape (n_samples, n_features)
14 | input data
15 | y: {numpy array}, shape (n_samples,)
16 | input class labels
17 | n_selected_features: {int}
18 | number of selected features
19 |
20 | Output
21 | ------
22 | F: {numpy array}, shape (n_features, )
23 | index of selected features
24 | """
25 |
26 | n_samples, n_features = X.shape
27 | # using 10 fold cross validation
28 | cv = KFold(n_samples, n_folds=10, shuffle=True)
29 | # choose SVM as the classifier
30 | clf = SVC()
31 |
32 | # selected feature set, initialized to be empty
33 | F = []
34 | count = 0
35 | while count < n_selected_features:
36 | max_acc = 0
37 | for i in range(n_features):
38 | if i not in F:
39 | F.append(i)
40 | X_tmp = X[:, F]
41 | acc = 0
42 | for train, test in cv:
43 | clf.fit(X_tmp[train], y[train])
44 | y_predict = clf.predict(X_tmp[test])
45 | acc_tmp = accuracy_score(y[test], y_predict)
46 | acc += acc_tmp
47 | acc = float(acc)/10
48 | F.pop()
49 | # record the feature which results in the largest accuracy
50 | if acc > max_acc:
51 | max_acc = acc
52 | idx = i
53 | # add the feature which results in the largest accuracy
54 | F.append(idx)
55 | count += 1
56 | return np.array(F)
--------------------------------------------------------------------------------
/src/skfeature/utility/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DavideNardone/A-Sparse-Coding-Based-Approach-for-Class-Specific-Feature-Selection/32a4eba5ca437c8fe853c001b7a769ed716801d7/src/skfeature/utility/__init__.py
--------------------------------------------------------------------------------
/src/skfeature/utility/construct_W.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from scipy.sparse import *
3 | from sklearn.metrics.pairwise import pairwise_distances
4 |
5 |
6 | def construct_W(X, **kwargs):
7 | """
8 | Construct the affinity matrix W through different ways
9 |
10 | Notes
11 | -----
12 | if kwargs is null, use the default parameter settings;
13 | if kwargs is not null, construct the affinity matrix according to parameters in kwargs
14 |
15 | Input
16 | -----
17 | X: {numpy array}, shape (n_samples, n_features)
18 | input data
19 | kwargs: {dictionary}
20 | parameters to construct different affinity matrix W:
21 | y: {numpy array}, shape (n_samples, 1)
22 | the true label information needed under the 'supervised' neighbor mode
23 | metric: {string}
24 | choices for different distance measures
25 | 'euclidean' - use euclidean distance
26 | 'cosine' - use cosine distance (default)
27 | neighbor_mode: {string}
28 | indicates how to construct the graph
29 | 'knn' - put an edge between two nodes if and only if they are among the
30 | k nearest neighbors of each other (default)
31 | 'supervised' - put an edge between two nodes if they belong to same class
32 | and they are among the k nearest neighbors of each other
33 | weight_mode: {string}
34 | indicates how to assign weights for each edge in the graph
35 | 'binary' - 0-1 weighting, every edge receives weight of 1 (default)
36 | 'heat_kernel' - if nodes i and j are connected, put weight W_ij = exp(-norm(x_i - x_j)/2t^2)
37 | this weight mode can only be used under 'euclidean' metric and you are required
38 | to provide the parameter t
39 | 'cosine' - if nodes i and j are connected, put weight cosine(x_i,x_j).
40 | this weight mode can only be used under 'cosine' metric
41 | k: {int}
42 | choices for the number of neighbors (default k = 5)
43 | t: {float}
44 | parameter for the 'heat_kernel' weight_mode
45 | fisher_score: {boolean}
46 | indicates whether to build the affinity matrix in a fisher score way, in which W_ij = 1/n_l if yi = yj = l;
47 | otherwise W_ij = 0 (default fisher_score = false)
48 | reliefF: {boolean}
49 | indicates whether to build the affinity matrix in a reliefF way, NH(x) and NM(x,y) denotes a set of
50 | k nearest points to x with the same class as x, and a different class (the class y), respectively.
51 | W_ij = 1 if i = j; W_ij = 1/k if x_j \in NH(x_i); W_ij = -1/(c-1)k if x_j \in NM(x_i, y) (default reliefF = false)
52 |
53 | Output
54 | ------
55 | W: {sparse matrix}, shape (n_samples, n_samples)
56 | output affinity matrix W
57 | """
58 |
59 | # default metric is 'cosine'
60 | if 'metric' not in kwargs.keys():
61 | kwargs['metric'] = 'cosine'
62 |
63 | # default neighbor mode is 'knn' and default neighbor size is 5
64 | if 'neighbor_mode' not in kwargs.keys():
65 | kwargs['neighbor_mode'] = 'knn'
66 | if kwargs['neighbor_mode'] == 'knn' and 'k' not in kwargs.keys():
67 | kwargs['k'] = 5
68 | if kwargs['neighbor_mode'] == 'supervised' and 'k' not in kwargs.keys():
69 | kwargs['k'] = 5
70 | if kwargs['neighbor_mode'] == 'supervised' and 'y' not in kwargs.keys():
71 | print ('Warning: label is required in the supervised neighborMode!!!')
72 | exit(0)
73 |
74 | # default weight mode is 'binary', default t in heat kernel mode is 1
75 | if 'weight_mode' not in kwargs.keys():
76 | kwargs['weight_mode'] = 'binary'
77 | if kwargs['weight_mode'] == 'heat_kernel':
78 | if kwargs['metric'] != 'euclidean':
79 | kwargs['metric'] = 'euclidean'
80 | if 't' not in kwargs.keys():
81 | kwargs['t'] = 1
82 | elif kwargs['weight_mode'] == 'cosine':
83 | if kwargs['metric'] != 'cosine':
84 | kwargs['metric'] = 'cosine'
85 |
86 | # default fisher_score and reliefF mode are 'false'
87 | if 'fisher_score' not in kwargs.keys():
88 | kwargs['fisher_score'] = False
89 | if 'reliefF' not in kwargs.keys():
90 | kwargs['reliefF'] = False
91 |
92 | n_samples, n_features = np.shape(X)
93 |
94 | # choose 'knn' neighbor mode
95 | if kwargs['neighbor_mode'] == 'knn':
96 | k = kwargs['k']
97 | if kwargs['weight_mode'] == 'binary':
98 | if kwargs['metric'] == 'euclidean':
99 | # compute pairwise euclidean distances
100 | D = pairwise_distances(X)
101 | D **= 2
102 | # sort the distance matrix D in ascending order
103 | dump = np.sort(D, axis=1)
104 | idx = np.argsort(D, axis=1)
105 | # choose the k-nearest neighbors for each instance
106 | idx_new = idx[:, 0:k+1]
107 | G = np.zeros((n_samples*(k+1), 3))
108 | G[:, 0] = np.tile(np.arange(n_samples), (k+1, 1)).reshape(-1)
109 | G[:, 1] = np.ravel(idx_new, order='F')
110 | G[:, 2] = 1
111 | # build the sparse affinity matrix W
112 | W = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples))
113 | bigger = np.transpose(W) > W
114 | W = W - W.multiply(bigger) + np.transpose(W).multiply(bigger)
115 | return W
116 |
117 | elif kwargs['metric'] == 'cosine':
118 | # normalize the data first
119 | X_normalized = np.power(np.sum(X*X, axis=1), 0.5)
120 | for i in range(n_samples):
121 | X[i, :] = X[i, :]/max(1e-12, X_normalized[i])
122 | # compute pairwise cosine distances
123 | D_cosine = np.dot(X, np.transpose(X))
124 | # sort the distance matrix D in descending order
125 | dump = np.sort(-D_cosine, axis=1)
126 | idx = np.argsort(-D_cosine, axis=1)
127 | idx_new = idx[:, 0:k+1]
128 | G = np.zeros((n_samples*(k+1), 3))
129 | G[:, 0] = np.tile(np.arange(n_samples), (k+1, 1)).reshape(-1)
130 | G[:, 1] = np.ravel(idx_new, order='F')
131 | G[:, 2] = 1
132 | # build the sparse affinity matrix W
133 | W = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples))
134 | bigger = np.transpose(W) > W
135 | W = W - W.multiply(bigger) + np.transpose(W).multiply(bigger)
136 | return W
137 |
138 | elif kwargs['weight_mode'] == 'heat_kernel':
139 | t = kwargs['t']
140 | # compute pairwise euclidean distances
141 | D = pairwise_distances(X)
142 | D **= 2
143 | # sort the distance matrix D in ascending order
144 | dump = np.sort(D, axis=1)
145 | idx = np.argsort(D, axis=1)
146 | idx_new = idx[:, 0:k+1]
147 | dump_new = dump[:, 0:k+1]
148 | # compute the pairwise heat kernel distances
149 | dump_heat_kernel = np.exp(-dump_new/(2*t*t))
150 | G = np.zeros((n_samples*(k+1), 3))
151 | G[:, 0] = np.tile(np.arange(n_samples), (k+1, 1)).reshape(-1)
152 | G[:, 1] = np.ravel(idx_new, order='F')
153 | G[:, 2] = np.ravel(dump_heat_kernel, order='F')
154 | # build the sparse affinity matrix W
155 | W = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples))
156 | bigger = np.transpose(W) > W
157 | W = W - W.multiply(bigger) + np.transpose(W).multiply(bigger)
158 | return W
159 |
160 | elif kwargs['weight_mode'] == 'cosine':
161 | # normalize the data first
162 | X_normalized = np.power(np.sum(X*X, axis=1), 0.5)
163 | for i in range(n_samples):
164 | X[i, :] = X[i, :]/max(1e-12, X_normalized[i])
165 | # compute pairwise cosine distances
166 | D_cosine = np.dot(X, np.transpose(X))
167 | # sort the distance matrix D in ascending order
168 | dump = np.sort(-D_cosine, axis=1)
169 | idx = np.argsort(-D_cosine, axis=1)
170 | idx_new = idx[:, 0:k+1]
171 | dump_new = -dump[:, 0:k+1]
172 | G = np.zeros((n_samples*(k+1), 3))
173 | G[:, 0] = np.tile(np.arange(n_samples), (k+1, 1)).reshape(-1)
174 | G[:, 1] = np.ravel(idx_new, order='F')
175 | G[:, 2] = np.ravel(dump_new, order='F')
176 | # build the sparse affinity matrix W
177 | W = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples))
178 | bigger = np.transpose(W) > W
179 | W = W - W.multiply(bigger) + np.transpose(W).multiply(bigger)
180 | return W
181 |
182 | # choose supervised neighborMode
183 | elif kwargs['neighbor_mode'] == 'supervised':
184 | k = kwargs['k']
185 | # get true labels and the number of classes
186 | y = kwargs['y']
187 | label = np.unique(y)
188 | n_classes = np.unique(y).size
189 | # construct the weight matrix W in a fisherScore way, W_ij = 1/n_l if yi = yj = l, otherwise W_ij = 0
190 | if kwargs['fisher_score'] is True:
191 | W = lil_matrix((n_samples, n_samples))
192 | for i in range(n_classes):
193 | class_idx = (y == label[i])
194 | class_idx_all = (class_idx[:, np.newaxis] & class_idx[np.newaxis, :])
195 | W[class_idx_all] = 1.0/np.sum(np.sum(class_idx))
196 | return W
197 |
198 | # construct the weight matrix W in a reliefF way, NH(x) and NM(x,y) denotes a set of k nearest
199 | # points to x with the same class as x, a different class (the class y), respectively. W_ij = 1 if i = j;
200 | # W_ij = 1/k if x_j \in NH(x_i); W_ij = -1/(c-1)k if x_j \in NM(x_i, y)
201 | if kwargs['reliefF'] is True:
202 | # when xj in NH(xi)
203 | G = np.zeros((n_samples*(k+1), 3))
204 | id_now = 0
205 | for i in range(n_classes):
206 | class_idx = np.column_stack(np.where(y == label[i]))[:, 0]
207 | D = pairwise_distances(X[class_idx, :])
208 | D **= 2
209 | idx = np.argsort(D, axis=1)
210 | idx_new = idx[:, 0:k+1]
211 | n_smp_class = (class_idx[idx_new[:]]).size
212 | if len(class_idx) <= k:
213 | k = len(class_idx) - 1
214 | G[id_now:n_smp_class+id_now, 0] = np.tile(class_idx, (k+1, 1)).reshape(-1)
215 | G[id_now:n_smp_class+id_now, 1] = np.ravel(class_idx[idx_new[:]], order='F')
216 | G[id_now:n_smp_class+id_now, 2] = 1.0/k
217 | id_now += n_smp_class
218 | W1 = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples))
219 | # when i = j, W_ij = 1
220 | for i in range(n_samples):
221 | W1[i, i] = 1
222 | # when x_j in NM(x_i, y)
223 | G = np.zeros((n_samples*k*(n_classes - 1), 3))
224 | id_now = 0
225 | for i in range(n_classes):
226 | class_idx1 = np.column_stack(np.where(y == label[i]))[:, 0]
227 | X1 = X[class_idx1, :]
228 | for j in range(n_classes):
229 | if label[j] != label[i]:
230 | class_idx2 = np.column_stack(np.where(y == label[j]))[:, 0]
231 | X2 = X[class_idx2, :]
232 | D = pairwise_distances(X1, X2)
233 | idx = np.argsort(D, axis=1)
234 | idx_new = idx[:, 0:k]
235 | n_smp_class = len(class_idx1)*k
236 | G[id_now:n_smp_class+id_now, 0] = np.tile(class_idx1, (k, 1)).reshape(-1)
237 | G[id_now:n_smp_class+id_now, 1] = np.ravel(class_idx2[idx_new[:]], order='F')
238 | G[id_now:n_smp_class+id_now, 2] = -1.0/((n_classes-1)*k)
239 | id_now += n_smp_class
240 | W2 = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples))
241 | bigger = np.transpose(W2) > W2
242 | W2 = W2 - W2.multiply(bigger) + np.transpose(W2).multiply(bigger)
243 | W = W1 + W2
244 | return W
245 |
246 | if kwargs['weight_mode'] == 'binary':
247 | if kwargs['metric'] == 'euclidean':
248 | G = np.zeros((n_samples*(k+1), 3))
249 | id_now = 0
250 | for i in range(n_classes):
251 | class_idx = np.column_stack(np.where(y == label[i]))[:, 0]
252 | # compute pairwise euclidean distances for instances in class i
253 | D = pairwise_distances(X[class_idx, :])
254 | D **= 2
255 | # sort the distance matrix D in ascending order for instances in class i
256 | idx = np.argsort(D, axis=1)
257 | idx_new = idx[:, 0:k+1]
258 | n_smp_class = len(class_idx)*(k+1)
259 | G[id_now:n_smp_class+id_now, 0] = np.tile(class_idx, (k+1, 1)).reshape(-1)
260 | G[id_now:n_smp_class+id_now, 1] = np.ravel(class_idx[idx_new[:]], order='F')
261 | G[id_now:n_smp_class+id_now, 2] = 1
262 | id_now += n_smp_class
263 | # build the sparse affinity matrix W
264 | W = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples))
265 | bigger = np.transpose(W) > W
266 | W = W - W.multiply(bigger) + np.transpose(W).multiply(bigger)
267 | return W
268 |
269 | if kwargs['metric'] == 'cosine':
270 | # normalize the data first
271 | X_normalized = np.power(np.sum(X*X, axis=1), 0.5)
272 | for i in range(n_samples):
273 | X[i, :] = X[i, :]/max(1e-12, X_normalized[i])
274 | G = np.zeros((n_samples*(k+1), 3))
275 | id_now = 0
276 | for i in range(n_classes):
277 | class_idx = np.column_stack(np.where(y == label[i]))[:, 0]
278 | # compute pairwise cosine distances for instances in class i
279 | D_cosine = np.dot(X[class_idx, :], np.transpose(X[class_idx, :]))
280 | # sort the distance matrix D in descending order for instances in class i
281 | idx = np.argsort(-D_cosine, axis=1)
282 | idx_new = idx[:, 0:k+1]
283 | n_smp_class = len(class_idx)*(k+1)
284 | G[id_now:n_smp_class+id_now, 0] = np.tile(class_idx, (k+1, 1)).reshape(-1)
285 | G[id_now:n_smp_class+id_now, 1] = np.ravel(class_idx[idx_new[:]], order='F')
286 | G[id_now:n_smp_class+id_now, 2] = 1
287 | id_now += n_smp_class
288 | # build the sparse affinity matrix W
289 | W = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples))
290 | bigger = np.transpose(W) > W
291 | W = W - W.multiply(bigger) + np.transpose(W).multiply(bigger)
292 | return W
293 |
294 | elif kwargs['weight_mode'] == 'heat_kernel':
295 | G = np.zeros((n_samples*(k+1), 3))
296 | id_now = 0
297 | for i in range(n_classes):
298 | class_idx = np.column_stack(np.where(y == label[i]))[:, 0]
299 | # compute pairwise cosine distances for instances in class i
300 | D = pairwise_distances(X[class_idx, :])
301 | D **= 2
302 | # sort the distance matrix D in ascending order for instances in class i
303 | dump = np.sort(D, axis=1)
304 | idx = np.argsort(D, axis=1)
305 | idx_new = idx[:, 0:k+1]
306 | dump_new = dump[:, 0:k+1]
307 | t = kwargs['t']
308 | # compute pairwise heat kernel distances for instances in class i
309 | dump_heat_kernel = np.exp(-dump_new/(2*t*t))
310 | n_smp_class = len(class_idx)*(k+1)
311 | G[id_now:n_smp_class+id_now, 0] = np.tile(class_idx, (k+1, 1)).reshape(-1)
312 | G[id_now:n_smp_class+id_now, 1] = np.ravel(class_idx[idx_new[:]], order='F')
313 | G[id_now:n_smp_class+id_now, 2] = np.ravel(dump_heat_kernel, order='F')
314 | id_now += n_smp_class
315 | # build the sparse affinity matrix W
316 | W = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples))
317 | bigger = np.transpose(W) > W
318 | W = W - W.multiply(bigger) + np.transpose(W).multiply(bigger)
319 | return W
320 |
321 | elif kwargs['weight_mode'] == 'cosine':
322 | # normalize the data first
323 | X_normalized = np.power(np.sum(X*X, axis=1), 0.5)
324 | for i in range(n_samples):
325 | X[i, :] = X[i, :]/max(1e-12, X_normalized[i])
326 | G = np.zeros((n_samples*(k+1), 3))
327 | id_now = 0
328 | for i in range(n_classes):
329 | class_idx = np.column_stack(np.where(y == label[i]))[:, 0]
330 | # compute pairwise cosine distances for instances in class i
331 | D_cosine = np.dot(X[class_idx, :], np.transpose(X[class_idx, :]))
332 | # sort the distance matrix D in descending order for instances in class i
333 | dump = np.sort(-D_cosine, axis=1)
334 | idx = np.argsort(-D_cosine, axis=1)
335 | idx_new = idx[:, 0:k+1]
336 | dump_new = -dump[:, 0:k+1]
337 | n_smp_class = len(class_idx)*(k+1)
338 | G[id_now:n_smp_class+id_now, 0] = np.tile(class_idx, (k+1, 1)).reshape(-1)
339 | G[id_now:n_smp_class+id_now, 1] = np.ravel(class_idx[idx_new[:]], order='F')
340 | G[id_now:n_smp_class+id_now, 2] = np.ravel(dump_new, order='F')
341 | id_now += n_smp_class
342 | # build the sparse affinity matrix W
343 | W = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples))
344 | bigger = np.transpose(W) > W
345 | W = W - W.multiply(bigger) + np.transpose(W).multiply(bigger)
346 | return W
--------------------------------------------------------------------------------
/src/skfeature/utility/data_discretization.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import sklearn.preprocessing
3 |
4 |
5 | def data_discretization(X, n_bins):
6 | """
7 | This function implements the data discretization function to discrete data into n_bins
8 |
9 | Input
10 | -----
11 | X: {numpy array}, shape (n_samples, n_features)
12 | input data
13 | n_bins: {int}
14 | number of bins to be discretized
15 |
16 | Output
17 | ------
18 | X_discretized: {numpy array}, shape (n_samples, n_features)
19 | output discretized data, where features are digitized to n_bins
20 | """
21 |
22 | # normalize each feature
23 | min_max_scaler = sklearn.preprocessing.MinMaxScaler()
24 | X_normalized = min_max_scaler.fit_transform(X)
25 |
26 | # discretize X
27 | n_samples, n_features = X.shape
28 | X_discretized = np.zeros((n_samples, n_features))
29 | bins = np.linspace(0, 1, n_bins)
30 | for i in range(n_features):
31 | X_discretized[:, i] = np.digitize(X_normalized[:, i], bins)
32 |
33 | return X_discretized
34 |
--------------------------------------------------------------------------------
/src/skfeature/utility/entropy_estimators.py:
--------------------------------------------------------------------------------
1 | # Written by Greg Ver Steeg (http://www.isi.edu/~gregv/npeet.html)
2 |
3 | import scipy.spatial as ss
4 | from scipy.special import digamma
5 | from math import log
6 | import numpy.random as nr
7 | import numpy as np
8 | import random
9 |
10 |
11 | # continuous estimators
12 |
13 | def entropy(x, k=3, base=2):
14 | """
15 | The classic K-L k-nearest neighbor continuous entropy estimator x should be a list of vectors,
16 | e.g. x = [[1.3],[3.7],[5.1],[2.4]] if x is a one-dimensional scalar and we have four samples
17 | """
18 |
19 | assert k <= len(x)-1, "Set k smaller than num. samples - 1"
20 | d = len(x[0])
21 | N = len(x)
22 | intens = 1e-10 # small noise to break degeneracy, see doc.
23 | x = [list(p + intens * nr.rand(len(x[0]))) for p in x]
24 | tree = ss.cKDTree(x)
25 | nn = [tree.query(point, k+1, p=float('inf'))[0][k] for point in x]
26 | const = digamma(N)-digamma(k) + d*log(2)
27 | return (const + d*np.mean(map(log, nn)))/log(base)
28 |
29 |
30 | def mi(x, y, k=3, base=2):
31 | """
32 | Mutual information of x and y; x, y should be a list of vectors, e.g. x = [[1.3],[3.7],[5.1],[2.4]]
33 | if x is a one-dimensional scalar and we have four samples
34 | """
35 |
36 | assert len(x) == len(y), "Lists should have same length"
37 | assert k <= len(x) - 1, "Set k smaller than num. samples - 1"
38 | intens = 1e-10 # small noise to break degeneracy, see doc.
39 | x = [list(p + intens * nr.rand(len(x[0]))) for p in x]
40 | y = [list(p + intens * nr.rand(len(y[0]))) for p in y]
41 | points = zip2(x, y)
42 | # Find nearest neighbors in joint space, p=inf means max-norm
43 | tree = ss.cKDTree(points)
44 | dvec = [tree.query(point, k+1, p=float('inf'))[0][k] for point in points]
45 | a, b, c, d = avgdigamma(x, dvec), avgdigamma(y, dvec), digamma(k), digamma(len(x))
46 | return (-a-b+c+d)/log(base)
47 |
48 |
49 | def cmi(x, y, z, k=3, base=2):
50 | """
51 | Mutual information of x and y, conditioned on z; x, y, z should be a list of vectors, e.g. x = [[1.3],[3.7],[5.1],[2.4]]
52 | if x is a one-dimensional scalar and we have four samples
53 | """
54 |
55 | assert len(x) == len(y), "Lists should have same length"
56 | assert k <= len(x) - 1, "Set k smaller than num. samples - 1"
57 | intens = 1e-10 # small noise to break degeneracy, see doc.
58 | x = [list(p + intens * nr.rand(len(x[0]))) for p in x]
59 | y = [list(p + intens * nr.rand(len(y[0]))) for p in y]
60 | z = [list(p + intens * nr.rand(len(z[0]))) for p in z]
61 | points = zip2(x, y, z)
62 | # Find nearest neighbors in joint space, p=inf means max-norm
63 | tree = ss.cKDTree(points)
64 | dvec = [tree.query(point, k+1, p=float('inf'))[0][k] for point in points]
65 | a, b, c, d = avgdigamma(zip2(x, z), dvec), avgdigamma(zip2(y, z), dvec), avgdigamma(z, dvec), digamma(k)
66 | return (-a-b+c+d)/log(base)
67 |
68 |
69 | def kldiv(x, xp, k=3, base=2):
70 | """
71 | KL Divergence between p and q for x~p(x), xp~q(x); x, xp should be a list of vectors, e.g. x = [[1.3],[3.7],[5.1],[2.4]]
72 | if x is a one-dimensional scalar and we have four samples
73 | """
74 |
75 | assert k <= len(x) - 1, "Set k smaller than num. samples - 1"
76 | assert k <= len(xp) - 1, "Set k smaller than num. samples - 1"
77 | assert len(x[0]) == len(xp[0]), "Two distributions must have same dim."
78 | d = len(x[0])
79 | n = len(x)
80 | m = len(xp)
81 | const = log(m) - log(n-1)
82 | tree = ss.cKDTree(x)
83 | treep = ss.cKDTree(xp)
84 | nn = [tree.query(point, k+1, p=float('inf'))[0][k] for point in x]
85 | nnp = [treep.query(point, k, p=float('inf'))[0][k-1] for point in x]
86 | return (const + d*np.mean(map(log, nnp))-d*np.mean(map(log, nn)))/log(base)
87 |
88 |
89 | # Discrete estimators
90 | def entropyd(sx, base=2):
91 | """
92 | Discrete entropy estimator given a list of samples which can be any hashable object
93 | """
94 |
95 | return entropyfromprobs(hist(sx), base=base)
96 |
97 |
98 | def midd(x, y):
99 | """
100 | Discrete mutual information estimator given a list of samples which can be any hashable object
101 | """
102 |
103 | return -entropyd(zip(x, y))+entropyd(x)+entropyd(y)
104 |
105 |
106 | def cmidd(x, y, z):
107 | """
108 | Discrete mutual information estimator given a list of samples which can be any hashable object
109 | """
110 |
111 | return entropyd(zip(y, z))+entropyd(zip(x, z))-entropyd(zip(x, y, z))-entropyd(z)
112 |
113 |
114 | def hist(sx):
115 | # Histogram from list of samples
116 | d = dict()
117 | for s in sx:
118 | d[s] = d.get(s, 0) + 1
119 | return map(lambda z: float(z)/len(sx), d.values())
120 |
121 |
122 | def entropyfromprobs(probs, base=2):
123 | # Turn a normalized list of probabilities of discrete outcomes into entropy (base 2)
124 | return -sum(map(elog, probs))/log(base)
125 |
126 |
127 | def elog(x):
128 | # for entropy, 0 log 0 = 0. but we get an error for putting log 0
129 | if x <= 0. or x >= 1.:
130 | return 0
131 | else:
132 | return x*log(x)
133 |
134 |
135 | # Mixed estimators
136 | def micd(x, y, k=3, base=2, warning=True):
137 | """ If x is continuous and y is discrete, compute mutual information
138 | """
139 |
140 | overallentropy = entropy(x, k, base)
141 | n = len(y)
142 | word_dict = dict()
143 | for sample in y:
144 | word_dict[sample] = word_dict.get(sample, 0) + 1./n
145 | yvals = list(set(word_dict.keys()))
146 |
147 | mi = overallentropy
148 | for yval in yvals:
149 | xgiveny = [x[i] for i in range(n) if y[i] == yval]
150 | if k <= len(xgiveny) - 1:
151 | mi -= word_dict[yval]*entropy(xgiveny, k, base)
152 | else:
153 | if warning:
154 | print "Warning, after conditioning, on y=", yval, " insufficient data. Assuming maximal entropy in this case."
155 | mi -= word_dict[yval]*overallentropy
156 | return mi # units already applied
157 |
158 |
159 | # Utility functions
160 | def vectorize(scalarlist):
161 | """
162 | Turn a list of scalars into a list of one-d vectors
163 | """
164 |
165 | return [(x,) for x in scalarlist]
166 |
167 |
168 | def shuffle_test(measure, x, y, z=False, ns=200, ci=0.95, **kwargs):
169 | """
170 | Shuffle test
171 | Repeatedly shuffle the x-values and then estimate measure(x,y,[z]).
172 | Returns the mean and conf. interval ('ci=0.95' default) over 'ns' runs, 'measure' could me mi,cmi,
173 | e.g. Keyword arguments can be passed. Mutual information and CMI should have a mean near zero.
174 | """
175 |
176 | xp = x[:] # A copy that we can shuffle
177 | outputs = []
178 | for i in range(ns):
179 | random.shuffle(xp)
180 | if z:
181 | outputs.append(measure(xp, y, z, **kwargs))
182 | else:
183 | outputs.append(measure(xp, y, **kwargs))
184 | outputs.sort()
185 | return np.mean(outputs), (outputs[int((1.-ci)/2*ns)], outputs[int((1.+ci)/2*ns)])
186 |
187 |
188 | # Internal functions
189 | def avgdigamma(points, dvec):
190 | # This part finds number of neighbors in some radius in the marginal space
191 | # returns expectation value of
192 | N = len(points)
193 | tree = ss.cKDTree(points)
194 | avg = 0.
195 | for i in range(N):
196 | dist = dvec[i]
197 | # subtlety, we don't include the boundary point,
198 | # but we are implicitly adding 1 to kraskov def bc center point is included
199 | num_points = len(tree.query_ball_point(points[i], dist-1e-15, p=float('inf')))
200 | avg += digamma(num_points)/N
201 | return avg
202 |
203 |
204 | def zip2(*args):
205 | # zip2(x,y) takes the lists of vectors and makes it a list of vectors in a joint space
206 | # E.g. zip2([[1],[2],[3]],[[4],[5],[6]]) = [[1,4],[2,5],[3,6]]
207 | return [sum(sublist, []) for sublist in zip(*args)]
208 |
--------------------------------------------------------------------------------
/src/skfeature/utility/mutual_information.py:
--------------------------------------------------------------------------------
1 | import entropy_estimators as ee
2 |
3 |
4 | def information_gain(f1, f2):
5 | """
6 | This function calculates the information gain, where ig(f1,f2) = H(f1) - H(f1|f2)
7 |
8 | Input
9 | -----
10 | f1: {numpy array}, shape (n_samples,)
11 | f2: {numpy array}, shape (n_samples,)
12 |
13 | Output
14 | ------
15 | ig: {float}
16 | """
17 |
18 | ig = ee.entropyd(f1) - conditional_entropy(f1, f2)
19 | return ig
20 |
21 |
22 | def conditional_entropy(f1, f2):
23 | """
24 | This function calculates the conditional entropy, where ce = H(f1) - I(f1;f2)
25 |
26 | Input
27 | -----
28 | f1: {numpy array}, shape (n_samples,)
29 | f2: {numpy array}, shape (n_samples,)
30 |
31 | Output
32 | ------
33 | ce: {float}
34 | ce is conditional entropy of f1 and f2
35 | """
36 |
37 | ce = ee.entropyd(f1) - ee.midd(f1, f2)
38 | return ce
39 |
40 |
41 | def su_calculation(f1, f2):
42 | """
43 | This function calculates the symmetrical uncertainty, where su(f1,f2) = 2*IG(f1,f2)/(H(f1)+H(f2))
44 |
45 | Input
46 | -----
47 | f1: {numpy array}, shape (n_samples,)
48 | f2: {numpy array}, shape (n_samples,)
49 |
50 | Output
51 | ------
52 | su: {float}
53 | su is the symmetrical uncertainty of f1 and f2
54 |
55 | """
56 |
57 | # calculate information gain of f1 and f2, t1 = ig(f1,f2)
58 | t1 = information_gain(f1, f2)
59 | # calculate entropy of f1, t2 = H(f1)
60 | t2 = ee.entropyd(f1)
61 | # calculate entropy of f2, t3 = H(f2)
62 | t3 = ee.entropyd(f2)
63 | # su(f1,f2) = 2*t1/(t2+t3)
64 | su = 2.0*t1/(t2+t3)
65 |
66 | return su
--------------------------------------------------------------------------------
/src/skfeature/utility/sparse_learning.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from numpy import linalg as LA
3 |
4 |
5 | def feature_ranking(W):
6 | """
7 | This function ranks features according to the feature weights matrix W
8 |
9 | Input:
10 | -----
11 | W: {numpy array}, shape (n_features, n_classes)
12 | feature weights matrix
13 |
14 | Output:
15 | ------
16 | idx: {numpy array}, shape {n_features,}
17 | feature index ranked in descending order by feature importance
18 | """
19 | T = (W*W).sum(1)
20 | idx = np.argsort(T, 0)
21 | return idx[::-1]
22 |
23 |
24 | def generate_diagonal_matrix(U):
25 | """
26 | This function generates a diagonal matrix D from an input matrix U as D_ii = 0.5 / ||U[i,:]||
27 |
28 | Input:
29 | -----
30 | U: {numpy array}, shape (n_samples, n_features)
31 |
32 | Output:
33 | ------
34 | D: {numpy array}, shape (n_samples, n_samples)
35 | """
36 | temp = np.sqrt(np.multiply(U, U).sum(1))
37 | temp[temp < 1e-16] = 1e-16
38 | temp = 0.5 / temp
39 | D = np.diag(temp)
40 | return D
41 |
42 |
43 | def calculate_l21_norm(X):
44 | """
45 | This function calculates the l21 norm of a matrix X, i.e., \sum ||X[i,:]||_2
46 |
47 | Input:
48 | -----
49 | X: {numpy array}, shape (n_samples, n_features)
50 |
51 | Output:
52 | ------
53 | l21_norm: {float}
54 | """
55 | return (np.sqrt(np.multiply(X, X).sum(1))).sum()
56 |
57 |
58 | def construct_label_matrix(label):
59 | """
60 | This function converts a 1d numpy array to a 2d array, for each instance, the class label is 1 or 0
61 |
62 | Input:
63 | -----
64 | label: {numpy array}, shape(n_samples,)
65 |
66 | Output:
67 | ------
68 | label_matrix: {numpy array}, shape(n_samples, n_classes)
69 | """
70 |
71 | n_samples = label.shape[0]
72 | unique_label = np.unique(label)
73 | n_classes = unique_label.shape[0]
74 | label_matrix = np.zeros((n_samples, n_classes))
75 | for i in range(n_classes):
76 | label_matrix[label == unique_label[i], i] = 1
77 |
78 | return label_matrix.astype(int)
79 |
80 |
81 | def construct_label_matrix_pan(label):
82 | """
83 | This function converts a 1d numpy array to a 2d array, for each instance, the class label is 1 or -1
84 |
85 | Input:
86 | -----
87 | label: {numpy array}, shape(n_samples,)
88 |
89 | Output:
90 | ------
91 | label_matrix: {numpy array}, shape(n_samples, n_classes)
92 | """
93 | n_samples = label.shape[0]
94 | unique_label = np.unique(label)
95 | n_classes = unique_label.shape[0]
96 | label_matrix = np.zeros((n_samples, n_classes))
97 | for i in range(n_classes):
98 | label_matrix[label == unique_label[i], i] = 1
99 | label_matrix[label_matrix == 0] = -1
100 |
101 | return label_matrix.astype(int)
102 |
103 |
104 | def euclidean_projection(V, n_features, n_classes, z, gamma):
105 | """
106 | L2 Norm regularized euclidean projection min_W 1/2 ||W- V||_2^2 + z * ||W||_2
107 | """
108 | W_projection = np.zeros((n_features, n_classes))
109 | for i in range(n_features):
110 | if LA.norm(V[i, :]) > z/gamma:
111 | W_projection[i, :] = (1-z/(gamma*LA.norm(V[i, :])))*V[i, :]
112 | else:
113 | W_projection[i, :] = np.zeros(n_classes)
114 | return W_projection
115 |
116 |
117 | def tree_lasso_projection(v, n_features, idx, n_nodes):
118 | """
119 | This functions solves the following optimization problem min_w 1/2 ||w-v||_2^2 + \sum z_i||w_{G_{i}}||
120 | where w and v are of dimensions of n_features; z_i >=0, and G_{i} follows the tree structure
121 | """
122 | # test whether the first node is special
123 | if idx[0, 0] == -1 and idx[1, 0] == -1:
124 | w_projection = np.zeros(n_features)
125 | z = idx[2, 0]
126 | for j in range(n_features):
127 | if v[j] > z:
128 | w_projection[j] = v[j] - z
129 | else:
130 | if v[j] < -z:
131 | w_projection[j] = v[j] + z
132 | else:
133 | w_projection[j] = 0
134 | i = 1
135 |
136 | else:
137 | w = v.copy()
138 | i = 0
139 |
140 | # sequentially process each node
141 | while i < n_nodes:
142 | # compute the L2 norm of this group
143 | two_norm = 0
144 | start_idx = int(idx[0, i] - 1)
145 | end_idx = int(idx[1, i])
146 | for j in range(start_idx, end_idx):
147 | two_norm += w_projection[j] * w_projection[j]
148 | two_norm = np.sqrt(two_norm)
149 | z = idx[2, i]
150 | if two_norm > z:
151 | ratio = (two_norm - z) / two_norm
152 | # shrinkage this group by ratio
153 | for j in range(start_idx, end_idx):
154 | w_projection[j] *= ratio
155 | else:
156 | for j in range(start_idx, end_idx):
157 | w_projection[j] = 0
158 | i += 1
159 | return w_projection
160 |
161 |
162 | def tree_norm(w, n_features, idx, n_nodes):
163 | """
164 | This function computes \sum z_i||w_{G_{i}}||
165 | """
166 | obj = 0
167 | # test whether the first node is special
168 | if idx[0, 0] == -1 and idx[1, 0] == -1:
169 | z = idx[2, 0]
170 | for j in range(n_features):
171 | obj += np.abs(w[j])
172 | obj *= z
173 | i = 1
174 | else:
175 | i = 0
176 |
177 | # sequentially process each node
178 | while i < n_nodes:
179 | two_norm = 0
180 | start_idx = int(idx[0, i] - 1)
181 | end_idx = int(idx[1, i])
182 | for j in range(start_idx, end_idx):
183 | two_norm += w[j] * w[j]
184 | two_norm = np.sqrt(two_norm)
185 | z = idx[2, i]
186 | obj += z*two_norm
187 | i += 1
188 | return obj
189 |
190 |
--------------------------------------------------------------------------------
/src/skfeature/utility/unsupervised_evaluation.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import sklearn.utils.linear_assignment_ as la
3 | from sklearn.metrics import accuracy_score
4 | from sklearn.metrics.cluster import normalized_mutual_info_score
5 | from sklearn.cluster import KMeans
6 |
7 |
8 | def best_map(l1, l2):
9 | """
10 | Permute labels of l2 to match l1 as much as possible
11 | """
12 | if len(l1) != len(l2):
13 | print "L1.shape must == L2.shape"
14 | exit(0)
15 |
16 | label1 = np.unique(l1)
17 | n_class1 = len(label1)
18 |
19 | label2 = np.unique(l2)
20 | n_class2 = len(label2)
21 |
22 | n_class = max(n_class1, n_class2)
23 | G = np.zeros((n_class, n_class))
24 |
25 | for i in range(0, n_class1):
26 | for j in range(0, n_class2):
27 | ss = l1 == label1[i]
28 | tt = l2 == label2[j]
29 | G[i, j] = np.count_nonzero(ss & tt)
30 |
31 | A = la.linear_assignment(-G)
32 |
33 | new_l2 = np.zeros(l2.shape)
34 | for i in range(0, n_class2):
35 | new_l2[l2 == label2[A[i][1]]] = label1[A[i][0]]
36 | return new_l2.astype(int)
37 |
38 |
39 | def evaluation(X_selected, n_clusters, y):
40 | """
41 | This function calculates ARI, ACC and NMI of clustering results
42 |
43 | Input
44 | -----
45 | X_selected: {numpy array}, shape (n_samples, n_selected_features}
46 | input data on the selected features
47 | n_clusters: {int}
48 | number of clusters
49 | y: {numpy array}, shape (n_samples,)
50 | true labels
51 |
52 | Output
53 | ------
54 | nmi: {float}
55 | Normalized Mutual Information
56 | acc: {float}
57 | Accuracy
58 | """
59 | k_means = KMeans(n_clusters=n_clusters, init='k-means++', n_init=10, max_iter=300,
60 | tol=0.0001, precompute_distances=True, verbose=0,
61 | random_state=None, copy_x=True, n_jobs=1)
62 |
63 | k_means.fit(X_selected)
64 | y_predict = k_means.labels_
65 |
66 | # calculate NMI
67 | nmi = normalized_mutual_info_score(y, y_predict)
68 |
69 | # calculate ACC
70 | y_permuted_predict = best_map(y, y_predict)
71 | acc = accuracy_score(y, y_permuted_predict)
72 |
73 | return nmi, acc
--------------------------------------------------------------------------------