├── LICENSE ├── OpenSA ├── Classification │ ├── CNN.py │ ├── ClassicCls.py │ ├── Cls.py │ ├── DeepCls.py │ ├── SAE.py │ └── __pycache__ │ │ ├── CNN.cpython-38.pyc │ │ ├── CNN.cpython-39.pyc │ │ ├── ClassicCls.cpython-38.pyc │ │ ├── ClassicCls.cpython-39.pyc │ │ ├── Cls.cpython-38.pyc │ │ ├── Cls.cpython-39.pyc │ │ └── SAE.cpython-38.pyc ├── Clustering │ ├── Cluster.py │ └── __pycache__ │ │ ├── Cluster.cpython-38.pyc │ │ └── Cluster.cpython-39.pyc ├── Data │ ├── Cls │ │ └── table.csv │ └── Rgs │ │ ├── Cdata1.csv │ │ ├── Cdata2.csv │ │ ├── Tdata1.csv │ │ ├── Tdata2.csv │ │ ├── Vdata1.csv │ │ └── Vdata2.csv ├── DataLoad │ ├── DataLoad.py │ └── __pycache__ │ │ ├── DataLoad.cpython-38.pyc │ │ └── DataLoad.cpython-39.pyc ├── Evaluate │ ├── RgsEvaluate.py │ └── __pycache__ │ │ ├── RgsEvaluate.cpython-38.pyc │ │ └── RgsEvaluate.cpython-39.pyc ├── Plot │ └── readme.txt ├── Preprocessing │ ├── Preprocessing.py │ └── __pycache__ │ │ ├── Preprocessing.cpython-38.pyc │ │ └── Preprocessing.cpython-39.pyc ├── Regression │ ├── CNN.py │ ├── ClassicRgs.py │ ├── CnnModel.py │ ├── DeepRgs.py │ ├── Rgs.py │ └── __pycache__ │ │ ├── CNN.cpython-38.pyc │ │ ├── ClassicRgs.cpython-38.pyc │ │ ├── ClassicRgs.cpython-39.pyc │ │ ├── CnnModel.cpython-38.pyc │ │ ├── Rgs.cpython-38.pyc │ │ └── Rgs.cpython-39.pyc ├── Simcalculation │ ├── SimCa.py │ └── __pycache__ │ │ ├── SimCa.cpython-38.pyc │ │ └── SimCa.cpython-39.pyc ├── WaveSelect │ ├── Cars.py │ ├── GA.py │ ├── Lar.py │ ├── Pca.py │ ├── Spa.py │ ├── Uve.py │ ├── WaveSelcet.py │ └── __pycache__ │ │ ├── Cars.cpython-38.pyc │ │ ├── Cars.cpython-39.pyc │ │ ├── GA.cpython-38.pyc │ │ ├── Lar.cpython-38.pyc │ │ ├── Lar.cpython-39.pyc │ │ ├── Pca.cpython-38.pyc │ │ ├── Pca.cpython-39.pyc │ │ ├── Spa.cpython-38.pyc │ │ ├── Spa.cpython-39.pyc │ │ ├── Uve.cpython-38.pyc │ │ ├── Uve.cpython-39.pyc │ │ ├── WaveSelcet.cpython-38.pyc │ │ └── WaveSelcet.cpython-39.pyc ├── example.py └── opt │ └── readme.txt ├── OpenSAV2 └── README.md /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /OpenSA/Classification/CNN.py: -------------------------------------------------------------------------------- 1 | """ 2 | -*- coding: utf-8 -*- 3 | @Time :2022/04/15 9:36 4 | @Author : Pengyou FU 5 | @blogs : https://blog.csdn.net/Echo_Code?spm=1000.2115.3001.5343 6 | @github : https://github.com/FuSiry/OpenSA 7 | @WeChat : Fu_siry 8 | @License:Apache-2.0 license 9 | 10 | """ 11 | 12 | import torch.nn.functional as F 13 | import numpy as np 14 | import torch 15 | import torch.nn as nn 16 | from torch.autograd import Variable 17 | from torch.utils.data import Dataset 18 | from sklearn.metrics import accuracy_score,auc,roc_curve,precision_recall_curve,f1_score 19 | import torch.optim as optim 20 | # from EarlyStop import EarlyStopping 21 | from sklearn.preprocessing import scale,MinMaxScaler,Normalizer,StandardScaler 22 | import time 23 | 24 | device = torch.device("cuda" if torch.cuda.is_available() else "cpu") 25 | 26 | 27 | 28 | 29 | def conv_k(in_chs, out_chs, k=1, s=1, p=1): 30 | """ Build size k kernel's convolution layer with padding""" 31 | return nn.Conv1d(in_chs, out_chs, kernel_size=k, stride=s, padding=p, bias=False) 32 | 33 | #自定义加载数据集 34 | class MyDataset(Dataset): 35 | def __init__(self,specs,labels): 36 | self.specs = specs 37 | self.labels = labels 38 | 39 | def __getitem__(self, index): 40 | spec,target = self.specs[index],self.labels[index] 41 | return spec,target 42 | 43 | def __len__(self): 44 | return len(self.specs) 45 | 46 | ###定义是否需要标准化 47 | def ZspPocess(X_train, X_test,y_train,y_test,need=True): #True:需要标准化,Flase:不需要标准化 48 | if (need == True): 49 | # X_train_Nom = scale(X_train) 50 | # X_test_Nom = scale(X_test) 51 | standscale = StandardScaler() 52 | X_train_Nom = standscale.fit_transform(X_train) 53 | X_test_Nom = standscale.transform(X_test) 54 | 55 | X_train_Nom = X_train_Nom[:, np.newaxis, :] 56 | X_test_Nom = X_test_Nom[:, np.newaxis, :] 57 | data_train = MyDataset(X_train_Nom, y_train) 58 | ##使用loader加载测试数据 59 | data_test = MyDataset(X_test_Nom, y_test) 60 | return data_train, data_test 61 | else: 62 | X_train = X_train[:, np.newaxis, :] # (483, 1, 2074) 63 | X_test = X_test[:, np.newaxis, :] 64 | data_train = MyDataset(X_train, y_train) 65 | ##使用loader加载测试数据 66 | data_test = MyDataset(X_test, y_test) 67 | return data_train, data_test 68 | 69 | class CNN3Lyaers(nn.Module): 70 | def __init__(self, nls): 71 | super(CNN3Lyaers, self).__init__() 72 | self.CONV1 = nn.Sequential( 73 | nn.Conv1d(1, 64, 21, 1), 74 | nn.BatchNorm1d(64), # 对输出做均一化 75 | nn.ReLU(), 76 | nn.MaxPool1d(3, 3) 77 | ) 78 | self.CONV2 = nn.Sequential( 79 | nn.Conv1d(64, 64, 19, 1), 80 | nn.BatchNorm1d(64), # 对输出做均一化 81 | nn.ReLU(), 82 | nn.MaxPool1d(3, 3) 83 | ) 84 | self.CONV3 = nn.Sequential( 85 | nn.Conv1d(64, 64, 17, 1), 86 | nn.BatchNorm1d(64), # 对输出做均一化 87 | nn.ReLU(), 88 | nn.MaxPool1d(3, 3), 89 | ) 90 | self.fc = nn.Sequential( 91 | # nn.Linear(4224, nls) 92 | nn.Linear(384, nls) 93 | ) 94 | 95 | def forward(self, x): 96 | x = self.CONV1(x) 97 | x = self.CONV2(x) 98 | x = self.CONV3(x) 99 | x = x.view(x.size(0), -1) 100 | # print(x.size()) 101 | out = self.fc(x) 102 | out = F.softmax(out,dim=1) 103 | return out 104 | 105 | class mlpmodel(nn.Module): 106 | def __init__(self, inputdim, outputdim): 107 | super(mlpmodel, self).__init__() 108 | self.fc1 = nn.Linear(inputdim, inputdim//2) 109 | self.fc2= nn.Linear(inputdim//2, inputdim // 4) 110 | self.fc3 = nn.Linear(inputdim//4, outputdim) 111 | def forward(self, x): 112 | x = self.fc1(x) 113 | x = self.fc2(x) 114 | x = self.fc3(x) 115 | # x = F.softmax(x, dim=1) 116 | return x 117 | 118 | 119 | def CNNTrain(X_train, X_test,y_train,y_test, BATCH_SIZE, n_epochs, nls): 120 | 121 | 122 | data_train, data_test = ZspPocess(X_train, X_test,y_train,y_test,need=True) 123 | train_loader = torch.utils.data.DataLoader(data_train, batch_size=BATCH_SIZE, shuffle=True) 124 | test_loader = torch.utils.data.DataLoader(data_test, batch_size=BATCH_SIZE, shuffle=True) 125 | 126 | store_path = ".//model//all//CNN18" 127 | 128 | model = CNN3Lyaers(nls=nls).to(device) 129 | optimizer = optim.Adam(model.parameters(), 130 | lr=0.0001,weight_decay=0.0001) 131 | scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, 'min', factor=0.5, verbose=1, eps=1e-06, 132 | patience=10) 133 | criterion = nn.CrossEntropyLoss().to(device) # 损失函数为焦损函数,多用于类别不平衡的多分类问题 134 | #early_stopping = EarlyStopping(patience=30, delta=1e-4, path=store_path, verbose=False) 135 | 136 | for epoch in range(n_epochs): 137 | train_acc = [] 138 | for i, data in enumerate(train_loader): # gives batch data, normalize x when iterate train_loader 139 | model.train() # 不训练 140 | inputs, labels = data # 输入和标签都等于data 141 | inputs = Variable(inputs).type(torch.FloatTensor).to(device) # batch x 142 | labels = Variable(labels).type(torch.LongTensor).to(device) # batch y 143 | output = model(inputs) # cnn output 144 | trian_loss = criterion(output, labels) # cross entropy loss 145 | optimizer.zero_grad() # clear gradients for this training step 146 | trian_loss.backward() # backpropagation, compute gradients 147 | optimizer.step() # apply gradients 148 | _, predicted = torch.max(output.data, 1) 149 | y_predicted = predicted.detach().cpu().numpy() 150 | y_label = labels.detach().cpu().numpy() 151 | acc = accuracy_score(y_label, y_predicted) 152 | train_acc.append(acc) 153 | 154 | with torch.no_grad(): # 无梯度 155 | test_acc = [] 156 | testloss = [] 157 | for i, data in enumerate(test_loader): 158 | model.eval() # 不训练 159 | inputs, labels = data # 输入和标签都等于data 160 | inputs = Variable(inputs).type(torch.FloatTensor).to(device) # batch x 161 | labels = Variable(labels).type(torch.LongTensor).to(device) # batch y 162 | outputs = model(inputs) # 输出等于进入网络后的输入 163 | test_loss = criterion(outputs, labels) # cross entropy loss 164 | _, predicted = torch.max(outputs.data,1) 165 | predicted = predicted.cpu().numpy() 166 | labels = labels.cpu().numpy() 167 | acc = accuracy_score(labels, predicted) 168 | test_acc.append(acc) 169 | testloss.append(test_loss.item()) 170 | avg_loss = np.mean(testloss) 171 | 172 | scheduler.step(avg_loss) 173 | # early_stopping(avg_loss, model) 174 | # if early_stopping.early_stop: 175 | # print(f'Early stopping! Best validation loss: {early_stopping.get_best_score()}') 176 | # break 177 | 178 | def CNNtest(X_train, X_test, y_train, y_test, BATCH_SIZE, nls): 179 | # data_train, data_test = DataLoad(tp, test_ratio, 0, 404) 180 | 181 | data_train, data_test = ZspPocess(X_train, X_test, y_train, y_test, need=True) 182 | test_loader = torch.utils.data.DataLoader(data_test, batch_size=BATCH_SIZE, shuffle=True) 183 | 184 | store_path = ".//model//all//CNN18" 185 | 186 | model = CNN3Lyaers(nls=nls).to(device) 187 | 188 | model.load_state_dict(torch.load(store_path)) 189 | test_acc = [] 190 | for i, data in enumerate(test_loader): 191 | model.eval() # 不训练 192 | inputs, labels = data # 输入和标签都等于data 193 | inputs = Variable(inputs).type(torch.FloatTensor).to(device) # batch x 194 | labels = Variable(labels).type(torch.LongTensor).to(device) # batch y 195 | outputs = model(inputs) # 输出等于进入网络后的输入 196 | _, predicted = torch.max(outputs.data, 1) 197 | predicted = predicted.cpu().numpy() 198 | labels = labels.cpu().numpy() 199 | acc = accuracy_score(labels, predicted) 200 | test_acc.append(acc) 201 | return np.mean(test_acc) 202 | 203 | 204 | def CNN(X_train, X_test, y_train, y_test, BATCH_SIZE, n_epochs,nls): 205 | 206 | CNNTrain(X_train, X_test, y_train, y_test,BATCH_SIZE,n_epochs,nls) 207 | acc = CNNtest(X_train, X_test, y_train, y_test,BATCH_SIZE,nls) 208 | 209 | return acc -------------------------------------------------------------------------------- /OpenSA/Classification/ClassicCls.py: -------------------------------------------------------------------------------- 1 | """ 2 | -*- coding: utf-8 -*- 3 | @Time :2022/04/12 17:10 4 | @Author : Pengyou FU 5 | @blogs : https://blog.csdn.net/Echo_Code?spm=1000.2115.3001.5343 6 | @github : https://github.com/FuSiry/OpenSA 7 | @WeChat : Fu_siry 8 | @License:Apache-2.0 license 9 | 10 | """ 11 | 12 | 13 | from sklearn.neural_network import MLPClassifier 14 | from sklearn.preprocessing import StandardScaler 15 | from sklearn.metrics import accuracy_score 16 | import sklearn.svm as svm 17 | import numpy as np 18 | from sklearn.cross_decomposition import PLSRegression 19 | from sklearn.ensemble import RandomForestClassifier 20 | import pandas as pd 21 | 22 | def ANN(X_train, X_test, y_train, y_test, StandScaler=None): 23 | 24 | if StandScaler: 25 | scaler = StandardScaler() # 标准化转换 26 | X_train = scaler.fit_transform(X_train) 27 | X_test = scaler.transform(X_test) 28 | 29 | # 神经网络输入为2,第一隐藏层神经元个数为5,第二隐藏层神经元个数为2,输出结果为2分类。 30 | # solver='lbfgs', MLP的求解方法:L-BFGS 在小数据上表现较好,Adam 较为鲁棒, 31 | # SGD在参数调整较优时会有最佳表现(分类效果与迭代次数),SGD标识随机梯度下降。 32 | #clf = MLPClassifier(solver='adam', alpha=1e-5, hidden_layer_sizes=(8,8), random_state=1, activation='relu') 33 | clf = MLPClassifier(activation='relu', alpha=1e-05, batch_size='auto', beta_1=0.9, 34 | beta_2=0.999, early_stopping=False, epsilon=1e-08, 35 | hidden_layer_sizes=(10, 8), learning_rate='constant', 36 | learning_rate_init=0.001, max_iter=200, momentum=0.9, 37 | nesterovs_momentum=True, power_t=0.5, random_state=1, shuffle=True, 38 | solver='lbfgs', tol=0.0001, validation_fraction=0.1, verbose=False, 39 | warm_start=False) 40 | 41 | clf.fit(X_train,y_train.ravel()) 42 | predict_results=clf.predict(X_test) 43 | acc = accuracy_score(predict_results, y_test.ravel()) 44 | 45 | return acc 46 | 47 | def SVM(X_train, X_test, y_train, y_test): 48 | 49 | clf = svm.SVC(C=1, gamma=1e-3) 50 | clf.fit(X_train, y_train) 51 | 52 | predict_results = clf.predict(X_test) 53 | acc = accuracy_score(predict_results, y_test.ravel()) 54 | 55 | return acc 56 | 57 | def PLS_DA(X_train, X_test, y_train, y_test): 58 | 59 | y_train = pd.get_dummies(y_train) 60 | # 建模 61 | model = PLSRegression(n_components=228) 62 | model.fit(X_train, y_train) 63 | # 预测 64 | y_pred = model.predict(X_test) 65 | # 将预测结果(类别矩阵)转换为数值标签 66 | y_pred = np.array([np.argmax(i) for i in y_pred]) 67 | acc = accuracy_score(y_test, y_pred) 68 | 69 | return acc 70 | 71 | def RF(X_train, X_test, y_train, y_test): 72 | 73 | RF = RandomForestClassifier(n_estimators=15,max_depth=3,min_samples_split=3,min_samples_leaf=3) 74 | RF.fit(X_train, y_train) 75 | y_pred = RF.predict(X_test) 76 | acc = accuracy_score(y_test, y_pred) 77 | 78 | return acc 79 | -------------------------------------------------------------------------------- /OpenSA/Classification/Cls.py: -------------------------------------------------------------------------------- 1 | """ 2 | -*- coding: utf-8 -*- 3 | @Time :2022/04/12 17:10 4 | @Author : Pengyou FU 5 | @blogs : https://blog.csdn.net/Echo_Code?spm=1000.2115.3001.5343 6 | @github : https://github.com/FuSiry/OpenSA 7 | @WeChat : Fu_siry 8 | @License:Apache-2.0 license 9 | 10 | """ 11 | 12 | from Classification.ClassicCls import ANN, SVM, PLS_DA, RF 13 | from Classification.CNN import CNN 14 | from Classification.SAE import SAE 15 | 16 | def QualitativeAnalysis(model, X_train, X_test, y_train, y_test): 17 | 18 | if model == "PLS_DA": 19 | acc = PLS_DA(X_train, X_test, y_train, y_test) 20 | elif model == "ANN": 21 | acc = ANN(X_train, X_test, y_train, y_test) 22 | elif model == "SVM": 23 | acc = SVM(X_train, X_test, y_train, y_test) 24 | elif model == "RF": 25 | acc = RF(X_train, X_test, y_train, y_test) 26 | elif model == "CNN": 27 | acc = CNN(X_train, X_test, y_train, y_test, 16, 160, 4) 28 | elif model == "SAE": 29 | acc = SAE(X_train, X_test, y_train, y_test) 30 | else: 31 | print("no this model of QuantitativeAnalysis") 32 | 33 | return acc -------------------------------------------------------------------------------- /OpenSA/Classification/DeepCls.py: -------------------------------------------------------------------------------- 1 | """ 2 | -*- coding: utf-8 -*- 3 | @Time :2022/04/12 17:10 4 | @Author : Pengyou FU 5 | @blogs : https://blog.csdn.net/Echo_Code?spm=1000.2115.3001.5343 6 | @github : https://github.com/FuSiry/OpenSA 7 | @WeChat : Fu_siry 8 | @License:Apache-2.0 license 9 | 10 | """ 11 | 12 | -------------------------------------------------------------------------------- /OpenSA/Classification/SAE.py: -------------------------------------------------------------------------------- 1 | """ 2 | -*- coding: utf-8 -*- 3 | @Time :2022/04/15 9:36 4 | @Author : Pengyou FU 5 | @blogs : https://blog.csdn.net/Echo_Code?spm=1000.2115.3001.5343 6 | @github : https://github.com/FuSiry/OpenSA 7 | @WeChat : Fu_siry 8 | @License:Apache-2.0 license 9 | 10 | """ 11 | 12 | 13 | 14 | import torch 15 | from torch import nn 16 | import torch.nn.functional as F 17 | from torch.autograd import Variable 18 | from torch import optim 19 | import torch.utils.data as data 20 | import numpy as np 21 | import time 22 | from sklearn.metrics import accuracy_score 23 | 24 | device = torch.device("cuda" if torch.cuda.is_available() else "cpu") 25 | 26 | class MyDataset(data.Dataset): 27 | def __init__(self,specs,labels): 28 | self.specs = specs 29 | self.labels = labels 30 | def __getitem__(self, index): 31 | spec,target = self.specs[index],self.labels[index] 32 | return spec,target 33 | def __len__(self): 34 | return len(self.specs) 35 | 36 | 37 | class AutoEncoder(nn.Module): 38 | 39 | def __init__(self, inputDim, hiddenDim): 40 | super().__init__() 41 | self.inputDim = inputDim 42 | self.hiddenDim = hiddenDim 43 | self.encoder = nn.Linear(inputDim, hiddenDim, bias=True) 44 | self.decoder = nn.Linear(hiddenDim, inputDim, bias=True) 45 | self.act = F.relu 46 | 47 | def forward(self, x, rep=False): 48 | 49 | hidden = self.encoder(x) 50 | hidden = self.act(hidden) 51 | if rep == False: 52 | out = self.decoder(hidden) 53 | #out = self.act(out) 54 | return out 55 | else: 56 | return hidden 57 | 58 | 59 | class SAE(nn.Module): 60 | 61 | def __init__(self, encoderList): 62 | 63 | super().__init__() 64 | 65 | self.encoderList = encoderList 66 | self.en1 = encoderList[0] 67 | self.en2 = encoderList[1] 68 | # self.en3 = encoderList[2] 69 | 70 | self.fc = nn.Linear(128, 4, bias=True) 71 | 72 | def forward(self, x): 73 | 74 | out = x 75 | out = self.en1(out, rep=True) 76 | out = self.en2(out, rep=True) 77 | #out = self.en3(out, rep=True) 78 | out = self.fc(out) 79 | # out = F.log_softmax(out) 80 | 81 | return out 82 | 83 | 84 | class SAE_net(object): 85 | def __init__(self, AE_epoch = 200, SAE_epoch = 200, 86 | input_dim = 404, hidden1_dim = 512, 87 | hidden2_dim = 128, output_dim = 4, 88 | batch_size = 128): 89 | self.AE_epoch = AE_epoch 90 | self.SAE_epoch = SAE_epoch 91 | self.input_dim = input_dim 92 | self.hidden1_dim = hidden1_dim 93 | self.hidden2_dim = hidden2_dim 94 | self.output_dim = output_dim 95 | self.batch_size = batch_size 96 | self.train_loader = None 97 | 98 | encoder1 = AutoEncoder(self.input_dim, self.hidden1_dim) 99 | encoder2 = AutoEncoder(self.hidden1_dim, self.hidden2_dim) 100 | self.encoder_list = [encoder1, encoder2] 101 | 102 | 103 | def trainAE(self, x_train, y_train, encoderList, trainLayer, batchSize, epoch, useCuda=False): 104 | if useCuda: 105 | for i in range(len(encoderList)): 106 | encoderList[i].to(device) 107 | 108 | optimizer = optim.Adam(encoderList[trainLayer].parameters()) 109 | ceriation = nn.MSELoss() 110 | 111 | data_train = MyDataset(x_train, y_train) 112 | self.train_loader = torch.utils.data.DataLoader(data_train, batch_size=batchSize, shuffle=True) 113 | 114 | for i in range(epoch): 115 | sum_loss = 0 116 | if trainLayer != 0: # 单独处理第0层,因为第一个编码器之前没有前驱的编码器了 117 | for i in range(trainLayer): # 冻结要训练前面的所有参数 118 | for param in encoderList[i].parameters(): 119 | param.requires_grad = False 120 | 121 | for batch_idx, (x, target) in enumerate(self.train_loader): 122 | optimizer.zero_grad() 123 | if useCuda: 124 | x, target = x.to(device), target.to(device) 125 | x, target = Variable(x).type(torch.FloatTensor), Variable(target).type(torch.LongTensor) 126 | # x = x.view(-1, 404) 127 | x = x.view(x.size(0), -1) 128 | # 产生需要训练层的输入数据 129 | # inputs = Variable(inputs).type(torch.FloatTensor).to(device) # batch x 130 | # labels = Variable(labels).type(torch.LongTensor).to(device) # batch y 131 | out = x 132 | if trainLayer != 0: 133 | for i in range(trainLayer): 134 | out = encoderList[i](out, rep=True) 135 | 136 | # 训练指定的自编码器 137 | pred = encoderList[trainLayer](out, rep=False).cpu() 138 | 139 | loss = ceriation(pred, out) 140 | sum_loss += loss.item() 141 | loss.backward() 142 | optimizer.step() 143 | 144 | def trainClassifier(self, model, epoch, useCuda=False): 145 | if useCuda: 146 | model = model.to(device) 147 | 148 | # 解锁参数 149 | for param in model.parameters(): 150 | param.requires_grad = True 151 | 152 | optimizer = optim.Adam(model.parameters()) 153 | ceriation = nn.CrossEntropyLoss() 154 | 155 | for i in range(epoch): 156 | # trainning 157 | sum_loss = 0 158 | for batch_idx, (x, target) in enumerate(self.train_loader): 159 | optimizer.zero_grad() 160 | if useCuda: 161 | x, target = x.to(device), target.to(device) 162 | x, target = Variable(x).type(torch.FloatTensor), Variable(target).type(torch.LongTensor) 163 | x = x.view(-1, 404) 164 | 165 | out = model(x) 166 | 167 | loss = ceriation(out, target) 168 | sum_loss += loss.item() 169 | loss.backward() 170 | optimizer.step() 171 | self.model = model 172 | 173 | def fit(self, x_train = None, y_train = None): 174 | x_train = x_train[:, np.newaxis, :] 175 | x_train = torch.from_numpy(x_train) 176 | x_train = x_train.float() 177 | 178 | # pre-train 179 | for i in range(2): 180 | self.trainAE(x_train=x_train, y_train=y_train, 181 | encoderList = self.encoder_list, trainLayer=i, batchSize=self.batch_size, 182 | epoch = self.AE_epoch) 183 | model = SAE(encoderList=self.encoder_list) 184 | self.trainClassifier(model=model, epoch=self.SAE_epoch) 185 | 186 | def predict_proba(self, x_test): 187 | x_test = torch.from_numpy(x_test) 188 | x_test = x_test.float() 189 | x_test = x_test[:, np.newaxis, :] 190 | x_test = Variable(x_test) 191 | x_test = x_test.view(-1, 404) 192 | 193 | out = self.model(x_test) 194 | outdata = out.data 195 | self.y_proba = outdata 196 | y_proba = outdata.numpy() 197 | return y_proba 198 | 199 | def predict(self, x_test): 200 | _, y_out = torch.max(self.y_proba, 1) 201 | y_pred = [] 202 | for i in y_out: 203 | y_pred.append(i) 204 | return y_pred 205 | 206 | def SAE(X_train, y_train, X_test, y_test): 207 | 208 | clf = SAE_net() 209 | clf.fit(X_train, y_train) 210 | y_proba = clf.predict_proba(X_test) 211 | y_pred = clf.predict(X_test) 212 | # ACC 213 | acc = accuracy_score(y_test, y_pred) 214 | 215 | return acc -------------------------------------------------------------------------------- /OpenSA/Classification/__pycache__/CNN.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/Classification/__pycache__/CNN.cpython-38.pyc -------------------------------------------------------------------------------- /OpenSA/Classification/__pycache__/CNN.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/Classification/__pycache__/CNN.cpython-39.pyc -------------------------------------------------------------------------------- /OpenSA/Classification/__pycache__/ClassicCls.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/Classification/__pycache__/ClassicCls.cpython-38.pyc -------------------------------------------------------------------------------- /OpenSA/Classification/__pycache__/ClassicCls.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/Classification/__pycache__/ClassicCls.cpython-39.pyc -------------------------------------------------------------------------------- /OpenSA/Classification/__pycache__/Cls.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/Classification/__pycache__/Cls.cpython-38.pyc -------------------------------------------------------------------------------- /OpenSA/Classification/__pycache__/Cls.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/Classification/__pycache__/Cls.cpython-39.pyc -------------------------------------------------------------------------------- /OpenSA/Classification/__pycache__/SAE.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/Classification/__pycache__/SAE.cpython-38.pyc -------------------------------------------------------------------------------- /OpenSA/Clustering/Cluster.py: -------------------------------------------------------------------------------- 1 | """ 2 | -*- coding: utf-8 -*- 3 | @Time :2022/04/12 17:10 4 | @Author : Pengyou FU 5 | @blogs : https://blog.csdn.net/Echo_Code?spm=1000.2115.3001.5343 6 | @github : https://github.com/FuSiry/OpenSA 7 | @WeChat : Fu_siry 8 | @License:Apache-2.0 license 9 | 10 | """ 11 | 12 | 13 | 14 | from sklearn.cluster import KMeans 15 | import numpy as np 16 | 17 | def Kmeans(data, n_clusters=10, iter_num=30): 18 | 19 | cluster = KMeans(n_clusters=n_clusters, random_state=0, max_iter=iter_num) 20 | cluster.fit(data) 21 | label = cluster.labels_ # 对原数据表进行类别标记 22 | 23 | return label 24 | 25 | class FCM: 26 | def __init__(self, data, clust_num, iter_num=10, m=2) : 27 | self.data = data 28 | self.cnum = clust_num 29 | self.sample_num=data.shape[0] 30 | self.m = m 31 | self.dim = data.shape[-1] # 数据最后一维度数 32 | Jlist=[] # 存储目标函数计算值的 33 | 34 | U = self.Initial_U(self.sample_num, self.cnum) 35 | for i in range(0, iter_num): # 迭代次数默认为10 36 | C = self.Cen_Iter(self.data, U, self.cnum) 37 | U = self.U_Iter(U, C) 38 | print("第%d次迭代" %(i+1) ,end="") 39 | print("聚类中心",C) 40 | J = self.J_calcu(self.data, U, C) # 计算目标函数 41 | Jlist = np.append(Jlist, J) 42 | self.label = np.argmax(U, axis=0) # 所有样本的分类标签 43 | self.Clast = C # 最后的类中心矩阵 44 | self.Jlist = Jlist # 存储目标函数计算值的矩阵 45 | 46 | # 初始化隶属度矩阵U 47 | def Initial_U(self, sample_num, cluster_n): 48 | U = np.random.rand(sample_num, cluster_n) # sample_num为样本个数, cluster_n为分类数 49 | row_sum = np.sum(U, axis=1) # 按行求和 row_sum: sample_num*1 50 | row_sum = 1 / row_sum # 该矩阵每个数取倒数 51 | U = np.multiply(U.T, row_sum) # 确保U的每列和为1 (cluster_n*sample_num).*(sample_num*1) 52 | return U # cluster_n*sample_num 53 | 54 | # 计算类中心 55 | def Cen_Iter(self, data, U, cluster_n, m): 56 | c_new = np.empty(shape=[0, self.dim]) # self.dim为样本矩阵的最后一维度 57 | for i in range(0, cluster_n): # 如散点的dim为2,图片像素值的dim为1 58 | u_ij_m = U[i, :] ** m # (sample_num,) 59 | sum_u = np.sum(u_ij_m) 60 | ux = np.dot(u_ij_m, data) # (dim,) 61 | ux = np.reshape(ux, (1, self.dim)) # (1,dim) 62 | c_new = np.append(c_new, ux / sum_u, axis=0) # 按列的方向添加类中心到类中心矩阵 63 | return c_new # cluster_num*dim 64 | 65 | # 隶属度矩阵迭代 66 | def U_Iter(self, U, c, m): 67 | for i in range(0, self.cnum): 68 | for j in range(0, self.sample_num): 69 | sum = 0 70 | for k in range(0, self.cnum): 71 | temp = (np.linalg.norm(self.data[j, :] - c[i, :]) / 72 | np.linalg.norm(self.data[j, :] - c[k, :])) ** ( 73 | 2 / (m - 1)) 74 | sum = temp + sum 75 | U[i, j] = 1 / sum 76 | 77 | return U 78 | 79 | # 计算目标函数值 80 | def J_calcu(self, data, U, c, m): 81 | temp1 = np.zeros(U.shape) 82 | for i in range(0, U.shape[0]): 83 | for j in range(0, U.shape[1]): 84 | temp1[i, j] = (np.linalg.norm(data[j, :] - c[i, :])) ** 2 * U[i, j] ** m 85 | 86 | J = np.sum(np.sum(temp1)) 87 | print("目标函数值:%.2f" %J) 88 | return J 89 | 90 | def Fcm(data, n_clusters=10, iter_num=30): 91 | 92 | Fcm = FCM(data, n_clusters, iter_num) 93 | label =Fcm.U_Iter() 94 | 95 | return label 96 | 97 | def Cluster(method, data): 98 | if method == "Kmeans": 99 | label = Kmeans(data) 100 | if method == "Fcm": 101 | label = Fcm(data) 102 | return label 103 | 104 | 105 | 106 | -------------------------------------------------------------------------------- /OpenSA/Clustering/__pycache__/Cluster.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/Clustering/__pycache__/Cluster.cpython-38.pyc -------------------------------------------------------------------------------- /OpenSA/Clustering/__pycache__/Cluster.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/Clustering/__pycache__/Cluster.cpython-39.pyc -------------------------------------------------------------------------------- /OpenSA/DataLoad/DataLoad.py: -------------------------------------------------------------------------------- 1 | """ 2 | -*- coding: utf-8 -*- 3 | @Time :2022/04/12 17:10 4 | @Author : Pengyou FU 5 | @blogs : https://blog.csdn.net/Echo_Code?spm=1000.2115.3001.5343 6 | @github : https://github.com/FuSiry/OpenSA 7 | @WeChat : Fu_siry 8 | @License:Apache-2.0 license 9 | 10 | """ 11 | 12 | 13 | 14 | from sklearn.model_selection import train_test_split 15 | import numpy as np 16 | 17 | #随机划分数据集 18 | def random(data, label, test_ratio=0.2, random_state=123): 19 | """ 20 | :param data: shape (n_samples, n_features) 21 | :param label: shape (n_sample, ) 22 | :param test_size: the ratio of test_size, default: 0.2 23 | :param random_state: the randomseed, default: 123 24 | :return: X_train :(n_samples, n_features) 25 | X_test: (n_samples, n_features) 26 | y_train: (n_sample, ) 27 | y_test: (n_sample, ) 28 | """ 29 | 30 | X_train, X_test, y_train, y_test = train_test_split(data, label, test_size=test_ratio, random_state=random_state) 31 | 32 | return X_train, X_test, y_train, y_test 33 | 34 | #利用SPXY算法划分数据集 35 | def spxy(data, label, test_size=0.2): 36 | """ 37 | :param data: shape (n_samples, n_features) 38 | :param label: shape (n_sample, ) 39 | :param test_size: the ratio of test_size, default: 0.2 40 | :return: X_train :(n_samples, n_features) 41 | X_test: (n_samples, n_features) 42 | y_train: (n_sample, ) 43 | y_test: (n_sample, ) 44 | """ 45 | x_backup = data 46 | y_backup = label 47 | M = data.shape[0] 48 | N = round((1 - test_size) * M) 49 | samples = np.arange(M) 50 | 51 | label = (label - np.mean(label)) / np.std(label) 52 | D = np.zeros((M, M)) 53 | Dy = np.zeros((M, M)) 54 | 55 | for i in range(M - 1): 56 | xa = data[i, :] 57 | ya = label[i] 58 | for j in range((i + 1), M): 59 | xb = data[j, :] 60 | yb = label[j] 61 | D[i, j] = np.linalg.norm(xa - xb) 62 | Dy[i, j] = np.linalg.norm(ya - yb) 63 | 64 | Dmax = np.max(D) 65 | Dymax = np.max(Dy) 66 | D = D / Dmax + Dy / Dymax 67 | 68 | maxD = D.max(axis=0) 69 | index_row = D.argmax(axis=0) 70 | index_column = maxD.argmax() 71 | 72 | m = np.zeros(N) 73 | m[0] = index_row[index_column] 74 | m[1] = index_column 75 | m = m.astype(int) 76 | 77 | dminmax = np.zeros(N) 78 | dminmax[1] = D[m[0], m[1]] 79 | 80 | for i in range(2, N): 81 | pool = np.delete(samples, m[:i]) 82 | dmin = np.zeros(M - i) 83 | for j in range(M - i): 84 | indexa = pool[j] 85 | d = np.zeros(i) 86 | for k in range(i): 87 | indexb = m[k] 88 | if indexa < indexb: 89 | d[k] = D[indexa, indexb] 90 | else: 91 | d[k] = D[indexb, indexa] 92 | dmin[j] = np.min(d) 93 | dminmax[i] = np.max(dmin) 94 | index = np.argmax(dmin) 95 | m[i] = pool[index] 96 | 97 | m_complement = np.delete(np.arange(data.shape[0]), m) 98 | 99 | X_train = data[m, :] 100 | y_train = y_backup[m] 101 | X_test = data[m_complement, :] 102 | y_test = y_backup[m_complement] 103 | 104 | return X_train, X_test, y_train, y_test 105 | 106 | #利用kennard-stone算法划分数据集 107 | def ks(data, label, test_size=0.2): 108 | """ 109 | :param data: shape (n_samples, n_features) 110 | :param label: shape (n_sample, ) 111 | :param test_size: the ratio of test_size, default: 0.2 112 | :return: X_train: (n_samples, n_features) 113 | X_test: (n_samples, n_features) 114 | y_train: (n_sample, ) 115 | y_test: (n_sample, ) 116 | """ 117 | M = data.shape[0] 118 | N = round((1 - test_size) * M) 119 | samples = np.arange(M) 120 | 121 | D = np.zeros((M, M)) 122 | 123 | for i in range((M - 1)): 124 | xa = data[i, :] 125 | for j in range((i + 1), M): 126 | xb = data[j, :] 127 | D[i, j] = np.linalg.norm(xa - xb) 128 | 129 | maxD = np.max(D, axis=0) 130 | index_row = np.argmax(D, axis=0) 131 | index_column = np.argmax(maxD) 132 | 133 | m = np.zeros(N) 134 | m[0] = np.array(index_row[index_column]) 135 | m[1] = np.array(index_column) 136 | m = m.astype(int) 137 | dminmax = np.zeros(N) 138 | dminmax[1] = D[m[0], m[1]] 139 | 140 | for i in range(2, N): 141 | pool = np.delete(samples, m[:i]) 142 | dmin = np.zeros((M - i)) 143 | for j in range((M - i)): 144 | indexa = pool[j] 145 | d = np.zeros(i) 146 | for k in range(i): 147 | indexb = m[k] 148 | if indexa < indexb: 149 | d[k] = D[indexa, indexb] 150 | else: 151 | d[k] = D[indexb, indexa] 152 | dmin[j] = np.min(d) 153 | dminmax[i] = np.max(dmin) 154 | index = np.argmax(dmin) 155 | m[i] = pool[index] 156 | 157 | m_complement = np.delete(np.arange(data.shape[0]), m) 158 | 159 | X_train = data[m, :] 160 | y_train = label[m] 161 | X_test = data[m_complement, :] 162 | y_test = label[m_complement] 163 | 164 | return X_train, X_test, y_train, y_test 165 | 166 | # 分别使用一个回归、一个分类的公开数据集做为example 167 | def LoadNirtest(type): 168 | 169 | if type == "Rgs": 170 | CDataPath1 = './/Data//Rgs//Cdata1.csv' 171 | VDataPath1 = './/Data//Rgs//Vdata1.csv' 172 | TDataPath1 = './/Data//Rgs//Tdata1.csv' 173 | 174 | Cdata1 = np.loadtxt(open(CDataPath1, 'rb'), dtype=np.float64, delimiter=',', skiprows=0) 175 | Vdata1 = np.loadtxt(open(VDataPath1, 'rb'), dtype=np.float64, delimiter=',', skiprows=0) 176 | Tdata1 = np.loadtxt(open(TDataPath1, 'rb'), dtype=np.float64, delimiter=',', skiprows=0) 177 | 178 | Nirdata1 = np.concatenate((Cdata1, Vdata1)) 179 | Nirdata = np.concatenate((Nirdata1, Tdata1)) 180 | data = Nirdata[:, :-4] 181 | label = Nirdata[:, -1] 182 | 183 | elif type == "Cls": 184 | path = './/Data//Cls//table.csv' 185 | Nirdata = np.loadtxt(open(path, 'rb'), dtype=np.float64, delimiter=',', skiprows=0) 186 | data = Nirdata[:, :-1] 187 | label = Nirdata[:, -1] 188 | 189 | return data, label 190 | 191 | def SetSplit(method, data, label, test_size=0.2, randomseed=123): 192 | 193 | """ 194 | :param method: the method to split trainset and testset, include: random, kennard-stone(ks), spxy 195 | :param data: shape (n_samples, n_features) 196 | :param label: shape (n_sample, ) 197 | :param test_size: the ratio of test_size, default: 0.2 198 | :return: X_train: (n_samples, n_features) 199 | X_test: (n_samples, n_features) 200 | y_train: (n_sample, ) 201 | y_test: (n_sample, ) 202 | """ 203 | 204 | if method == "random": 205 | X_train, X_test, y_train, y_test = random(data, label, test_size, randomseed) 206 | elif method == "spxy": 207 | X_train, X_test, y_train, y_test = spxy(data, label, test_size) 208 | elif method == "ks": 209 | X_train, X_test, y_train, y_test = ks(data, label, test_size) 210 | else: 211 | print("no this method of split dataset! ") 212 | 213 | return X_train, X_test, y_train, y_test 214 | -------------------------------------------------------------------------------- /OpenSA/DataLoad/__pycache__/DataLoad.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/DataLoad/__pycache__/DataLoad.cpython-38.pyc -------------------------------------------------------------------------------- /OpenSA/DataLoad/__pycache__/DataLoad.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/DataLoad/__pycache__/DataLoad.cpython-39.pyc -------------------------------------------------------------------------------- /OpenSA/Evaluate/RgsEvaluate.py: -------------------------------------------------------------------------------- 1 | """ 2 | -*- coding: utf-8 -*- 3 | @Time :2022/04/12 17:10 4 | @Author : Pengyou FU 5 | @blogs : https://blog.csdn.net/Echo_Code?spm=1000.2115.3001.5343 6 | @github : https://github.com/FuSiry/OpenSA 7 | @WeChat : Fu_siry 8 | @License:Apache-2.0 license 9 | 10 | """ 11 | 12 | from sklearn.preprocessing import scale,MinMaxScaler,Normalizer,StandardScaler 13 | from sklearn.metrics import mean_squared_error,r2_score,mean_absolute_error 14 | from sklearn.neural_network import MLPRegressor 15 | import numpy as np 16 | 17 | 18 | def ModelRgsevaluate(y_pred, y_true): 19 | 20 | mse = mean_squared_error(y_true,y_pred) 21 | R2 = r2_score(y_true,y_pred) 22 | mae = mean_absolute_error(y_true,y_pred) 23 | 24 | return np.sqrt(mse), R2, mae 25 | 26 | def ModelRgsevaluatePro(y_pred, y_true, yscale): 27 | 28 | yscaler = yscale 29 | y_true = yscaler.inverse_transform(y_true) 30 | y_pred = yscaler.inverse_transform(y_pred) 31 | 32 | mse = mean_squared_error(y_true,y_pred) 33 | R2 = r2_score(y_true,y_pred) 34 | mae = mean_absolute_error(y_true, y_pred) 35 | 36 | return np.sqrt(mse), R2, mae -------------------------------------------------------------------------------- /OpenSA/Evaluate/__pycache__/RgsEvaluate.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/Evaluate/__pycache__/RgsEvaluate.cpython-38.pyc -------------------------------------------------------------------------------- /OpenSA/Evaluate/__pycache__/RgsEvaluate.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/Evaluate/__pycache__/RgsEvaluate.cpython-39.pyc -------------------------------------------------------------------------------- /OpenSA/Plot/readme.txt: -------------------------------------------------------------------------------- 1 | 暂不提供 -------------------------------------------------------------------------------- /OpenSA/Preprocessing/Preprocessing.py: -------------------------------------------------------------------------------- 1 | """ 2 | -*- coding: utf-8 -*- 3 | @Time :2022/04/12 17:10 4 | @Author : Pengyou FU 5 | @blogs : https://blog.csdn.net/Echo_Code?spm=1000.2115.3001.5343 6 | @github : 7 | @WeChat : Fu_siry 8 | @License: 9 | 10 | """ 11 | import numpy as np 12 | from scipy import signal 13 | from sklearn.linear_model import LinearRegression 14 | from sklearn.preprocessing import MinMaxScaler, StandardScaler 15 | from copy import deepcopy 16 | import pandas as pd 17 | #import pywt 18 | 19 | # ref1: 湖南示范大学同学实列,并做了部分修改 20 | # ref2: https://blog.csdn.net/qq2512446791 21 | 22 | # 最大最小值归一化 23 | def MMS(data): 24 | """ 25 | :param data: raw spectrum data, shape (n_samples, n_features) 26 | :return: data after MinMaxScaler :(n_samples, n_features) 27 | """ 28 | return MinMaxScaler().fit_transform(data) 29 | 30 | 31 | # 标准化 32 | def SS(data): 33 | """ 34 | :param data: raw spectrum data, shape (n_samples, n_features) 35 | :return: data after StandScaler :(n_samples, n_features) 36 | """ 37 | return StandardScaler().fit_transform(data) 38 | 39 | 40 | # 均值中心化 41 | def CT(data): 42 | """ 43 | :param data: raw spectrum data, shape (n_samples, n_features) 44 | :return: data after MeanScaler :(n_samples, n_features) 45 | """ 46 | for i in range(data.shape[0]): 47 | MEAN = np.mean(data[i]) 48 | data[i] = data[i] - MEAN 49 | return data 50 | 51 | 52 | # 标准正态变换 53 | def SNV(data): 54 | """ 55 | :param data: raw spectrum data, shape (n_samples, n_features) 56 | :return: data after SNV :(n_samples, n_features) 57 | """ 58 | m = data.shape[0] 59 | n = data.shape[1] 60 | print(m, n) # 61 | # 求标准差 62 | data_std = np.std(data, axis=1) # 每条光谱的标准差 63 | # 求平均值 64 | data_average = np.mean(data, axis=1) # 每条光谱的平均值 65 | # SNV计算 66 | data_snv = [[((data[i][j] - data_average[i]) / data_std[i]) for j in range(n)] for i in range(m)] 67 | return np.array(data_snv) 68 | 69 | 70 | 71 | # 移动平均平滑 72 | def MA(data, WSZ=11): 73 | """ 74 | :param data: raw spectrum data, shape (n_samples, n_features) 75 | :param WSZ: int 76 | :return: data after MA :(n_samples, n_features) 77 | """ 78 | 79 | for i in range(data.shape[0]): 80 | out0 = np.convolve(data[i], np.ones(WSZ, dtype=int), 'valid') / WSZ # WSZ是窗口宽度,是奇数 81 | r = np.arange(1, WSZ - 1, 2) 82 | start = np.cumsum(data[i, :WSZ - 1])[::2] / r 83 | stop = (np.cumsum(data[i, :-WSZ:-1])[::2] / r)[::-1] 84 | data[i] = np.concatenate((start, out0, stop)) 85 | return data 86 | 87 | 88 | # Savitzky-Golay平滑滤波 89 | def SG(data, w=11, p=2): 90 | """ 91 | :param data: raw spectrum data, shape (n_samples, n_features) 92 | :param w: int 93 | :param p: int 94 | :return: data after SG :(n_samples, n_features) 95 | """ 96 | return signal.savgol_filter(data, w, p) 97 | 98 | 99 | # 一阶导数 100 | def D1(data): 101 | """ 102 | :param data: raw spectrum data, shape (n_samples, n_features) 103 | :return: data after First derivative :(n_samples, n_features) 104 | """ 105 | n, p = data.shape 106 | Di = np.ones((n, p - 1)) 107 | for i in range(n): 108 | Di[i] = np.diff(data[i]) 109 | return Di 110 | 111 | 112 | # 二阶导数 113 | def D2(data): 114 | """ 115 | :param data: raw spectrum data, shape (n_samples, n_features) 116 | :return: data after second derivative :(n_samples, n_features) 117 | """ 118 | data = deepcopy(data) 119 | if isinstance(data, pd.DataFrame): 120 | data = data.values 121 | temp2 = (pd.DataFrame(data)).diff(axis=1) 122 | temp3 = np.delete(temp2.values, 0, axis=1) 123 | temp4 = (pd.DataFrame(temp3)).diff(axis=1) 124 | spec_D2 = np.delete(temp4.values, 0, axis=1) 125 | return spec_D2 126 | 127 | 128 | # 趋势校正(DT) 129 | def DT(data): 130 | """ 131 | :param data: raw spectrum data, shape (n_samples, n_features) 132 | :return: data after DT :(n_samples, n_features) 133 | """ 134 | lenth = data.shape[1] 135 | x = np.asarray(range(lenth), dtype=np.float32) 136 | out = np.array(data) 137 | l = LinearRegression() 138 | for i in range(out.shape[0]): 139 | l.fit(x.reshape(-1, 1), out[i].reshape(-1, 1)) 140 | k = l.coef_ 141 | b = l.intercept_ 142 | for j in range(out.shape[1]): 143 | out[i][j] = out[i][j] - (j * k + b) 144 | 145 | return out 146 | 147 | 148 | # 多元散射校正 149 | def MSC(data): 150 | """ 151 | :param data: raw spectrum data, shape (n_samples, n_features) 152 | :return: data after MSC :(n_samples, n_features) 153 | """ 154 | n, p = data.shape 155 | msc = np.ones((n, p)) 156 | 157 | for j in range(n): 158 | mean = np.mean(data, axis=0) 159 | 160 | # 线性拟合 161 | for i in range(n): 162 | y = data[i, :] 163 | l = LinearRegression() 164 | l.fit(mean.reshape(-1, 1), y.reshape(-1, 1)) 165 | k = l.coef_ 166 | b = l.intercept_ 167 | msc[i, :] = (y - b) / k 168 | return msc 169 | 170 | # 小波变换 171 | def wave(data): 172 | """ 173 | :param data: raw spectrum data, shape (n_samples, n_features) 174 | :return: data after wave :(n_samples, n_features) 175 | """ 176 | data = deepcopy(data) 177 | if isinstance(data, pd.DataFrame): 178 | data = data.values 179 | def wave_(data): 180 | w = pywt.Wavelet('db8') # 选用Daubechies8小波 181 | maxlev = pywt.dwt_max_level(len(data), w.dec_len) 182 | coeffs = pywt.wavedec(data, 'db8', level=maxlev) 183 | threshold = 0.04 184 | for i in range(1, len(coeffs)): 185 | coeffs[i] = pywt.threshold(coeffs[i], threshold * max(coeffs[i])) 186 | datarec = pywt.waverec(coeffs, 'db8') 187 | return datarec 188 | 189 | tmp = None 190 | for i in range(data.shape[0]): 191 | if (i == 0): 192 | tmp = wave_(data[i]) 193 | else: 194 | tmp = np.vstack((tmp, wave_(data[i]))) 195 | 196 | return tmp 197 | 198 | def Preprocessing(method, data): 199 | 200 | if method == "None": 201 | data = data 202 | elif method == 'MMS': 203 | data = MMS(data) 204 | elif method == 'SS': 205 | data = SS(data) 206 | elif method == 'CT': 207 | data = CT(data) 208 | elif method == 'SNV': 209 | data = SNV(data) 210 | elif method == 'MA': 211 | data = MA(data) 212 | elif method == 'SG': 213 | data = SG(data) 214 | elif method == 'MSC': 215 | data = MSC(data) 216 | elif method == 'D1': 217 | data = D1(data) 218 | elif method == 'D2': 219 | data = D2(data) 220 | elif method == 'DT': 221 | data = DT(data) 222 | elif method == 'WVAE': 223 | data = wave(data) 224 | else: 225 | print("no this method of preprocessing!") 226 | 227 | return data -------------------------------------------------------------------------------- /OpenSA/Preprocessing/__pycache__/Preprocessing.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/Preprocessing/__pycache__/Preprocessing.cpython-38.pyc -------------------------------------------------------------------------------- /OpenSA/Preprocessing/__pycache__/Preprocessing.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/Preprocessing/__pycache__/Preprocessing.cpython-39.pyc -------------------------------------------------------------------------------- /OpenSA/Regression/CNN.py: -------------------------------------------------------------------------------- 1 | """ 2 | Create on 2021-1-21 3 | Author:Pengyou Fu 4 | Describe:this for train NIRS with use 1-D Resnet model to transfer 5 | """ 6 | 7 | import numpy as np 8 | import torch 9 | import torch.nn as nn 10 | from torch.autograd import Variable 11 | from torch.utils.data import Dataset 12 | import torchvision 13 | import torch.nn.functional as F 14 | from sklearn.preprocessing import scale,MinMaxScaler,Normalizer,StandardScaler 15 | import torch.optim as optim 16 | from Regression.CnnModel import ConvNet, DeepSpectra, AlexNet 17 | import os 18 | from datetime import datetime 19 | from Evaluate.RgsEvaluate import ModelRgsevaluate, ModelRgsevaluatePro 20 | import matplotlib.pyplot as plt 21 | 22 | 23 | LR = 0.001 24 | BATCH_SIZE = 16 25 | TBATCH_SIZE = 240 26 | 27 | 28 | device = torch.device("cuda" if torch.cuda.is_available() else "cpu") 29 | 30 | #自定义加载数据集 31 | class MyDataset(Dataset): 32 | def __init__(self,specs,labels): 33 | self.specs = specs 34 | self.labels = labels 35 | 36 | def __getitem__(self, index): 37 | spec,target = self.specs[index],self.labels[index] 38 | return spec,target 39 | 40 | def __len__(self): 41 | return len(self.specs) 42 | 43 | 44 | 45 | ###定义是否需要标准化 46 | def ZspPocessnew(X_train, X_test, y_train, y_test, need=True): #True:需要标准化,Flase:不需要标准化 47 | 48 | global standscale 49 | global yscaler 50 | 51 | if (need == True): 52 | standscale = StandardScaler() 53 | X_train_Nom = standscale.fit_transform(X_train) 54 | X_test_Nom = standscale.transform(X_test) 55 | 56 | #yscaler = StandardScaler() 57 | yscaler = MinMaxScaler() 58 | y_train = yscaler.fit_transform(y_train.reshape(-1, 1)) 59 | y_test = yscaler.transform(y_test.reshape(-1, 1)) 60 | 61 | X_train_Nom = X_train_Nom[:, np.newaxis, :] 62 | X_test_Nom = X_test_Nom[:, np.newaxis, :] 63 | 64 | ##使用loader加载测试数据 65 | data_train = MyDataset(X_train_Nom, y_train) 66 | data_test = MyDataset(X_test_Nom, y_test) 67 | return data_train, data_test 68 | elif((need == False)): 69 | yscaler = StandardScaler() 70 | # yscaler = MinMaxScaler() 71 | 72 | X_train_new = X_train[:, np.newaxis, :] # 73 | X_test_new = X_test[:, np.newaxis, :] 74 | 75 | y_train = yscaler.fit_transform(y_train) 76 | y_test = yscaler.transform(y_test) 77 | 78 | data_train = MyDataset(X_train_new, y_train) 79 | ##使用loader加载测试数据 80 | data_test = MyDataset(X_test_new, y_test) 81 | 82 | return data_train, data_test 83 | 84 | 85 | 86 | 87 | def CNNTrain(NetType, X_train, X_test, y_train, y_test, EPOCH): 88 | 89 | 90 | data_train, data_test = ZspPocessnew(X_train, X_test, y_train, y_test, need=True) 91 | # data_train, data_test = ZPocess(X_train, X_test, y_train, y_test) 92 | 93 | train_loader = torch.utils.data.DataLoader(data_train, batch_size=BATCH_SIZE, shuffle=True) 94 | test_loader = torch.utils.data.DataLoader(data_test, batch_size=TBATCH_SIZE, shuffle=True) 95 | 96 | if NetType == 'ConNet': 97 | model = ConvNet().to(device) 98 | elif NetType == 'AlexNet': 99 | model = AlexNet().to(device) 100 | elif NetType == 'DeepSpectra': 101 | model = DeepSpectra().to(device) 102 | 103 | 104 | 105 | criterion = nn.MSELoss().to(device) # 损失函数为焦损函数,多用于类别不平衡的多分类问题 106 | optimizer = optim.Adam(model.parameters(), lr=LR)#, weight_decay=0.001) # 优化方式为mini-batch momentum-SGD,并采用L2正则化(权重衰减) 107 | # # initialize the early_stopping object 108 | scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, 'min', factor=0.5, verbose=1, eps=1e-06, 109 | patience=20) 110 | print("Start Training!") # 定义遍历数据集的次数 111 | # to track the training loss as the model trains 112 | for epoch in range(EPOCH): 113 | train_losses = [] 114 | model.train() # 不训练 115 | train_rmse = [] 116 | train_r2 = [] 117 | train_mae = [] 118 | for i, data in enumerate(train_loader): # gives batch data, normalize x when iterate train_loader 119 | inputs, labels = data # 输入和标签都等于data 120 | inputs = Variable(inputs).type(torch.FloatTensor).to(device) # batch x 121 | labels = Variable(labels).type(torch.FloatTensor).to(device) # batch y 122 | output = model(inputs) # cnn output 123 | loss = criterion(output, labels) # MSE 124 | optimizer.zero_grad() # clear gradients for this training step 125 | loss.backward() # backpropagation, compute gradients 126 | optimizer.step() # apply gradients 127 | pred = output.detach().cpu().numpy() 128 | y_true = labels.detach().cpu().numpy() 129 | train_losses.append(loss.item()) 130 | rmse, R2, mae = ModelRgsevaluatePro(pred, y_true, yscaler) 131 | # plotpred(pred, y_true, yscaler)) 132 | train_rmse.append(rmse) 133 | train_r2.append(R2) 134 | train_mae.append(mae) 135 | avg_train_loss = np.mean(train_losses) 136 | avgrmse = np.mean(train_rmse) 137 | avgr2 = np.mean(train_r2) 138 | avgmae = np.mean(train_mae) 139 | print('Epoch:{}, TRAIN:rmse:{}, R2:{}, mae:{}'.format((epoch+1), (avgrmse), (avgr2), (avgmae))) 140 | print('lr:{}, avg_train_loss:{}'.format((optimizer.param_groups[0]['lr']), avg_train_loss)) 141 | 142 | with torch.no_grad(): # 无梯度 143 | model.eval() # 不训练 144 | test_rmse = [] 145 | test_r2 = [] 146 | test_mae = [] 147 | for i, data in enumerate(test_loader): 148 | inputs, labels = data # 输入和标签都等于data 149 | inputs = Variable(inputs).type(torch.FloatTensor).to(device) # batch x 150 | labels = Variable(labels).type(torch.FloatTensor).to(device) # batch y 151 | outputs = model(inputs) # 输出等于进入网络后的输入 152 | pred = outputs.detach().cpu().numpy() 153 | y_true = labels.detach().cpu().numpy() 154 | rmse, R2, mae = ModelRgsevaluatePro(pred, y_true, yscaler) 155 | test_rmse.append(rmse) 156 | test_r2.append(R2) 157 | test_mae.append(mae) 158 | avgrmse = np.mean(test_rmse) 159 | avgr2 = np.mean(test_r2) 160 | avgmae = np.mean(test_mae) 161 | print('EPOCH:{}, TEST: rmse:{}, R2:{}, mae:{}'.format((epoch+1), (avgrmse), (avgr2), (avgmae))) 162 | # 将每次测试结果实时写入acc.txt文件中 163 | scheduler.step(rmse) 164 | 165 | return avgrmse, avgr2, avgmae 166 | 167 | 168 | 169 | 170 | 171 | 172 | 173 | 174 | 175 | 176 | 177 | # 178 | # def CNN(X_train, X_test, y_train, y_test, BATCH_SIZE, n_epochs): 179 | # 180 | # CNNTrain(X_train, X_test, y_train, y_test,BATCH_SIZE,n_epochs) 181 | -------------------------------------------------------------------------------- /OpenSA/Regression/ClassicRgs.py: -------------------------------------------------------------------------------- 1 | 2 | from sklearn.cross_decomposition import PLSRegression 3 | from sklearn.neural_network import MLPRegressor 4 | # import hpelm 5 | 6 | """ 7 | -*- coding: utf-8 -*- 8 | @Time :2022/04/12 17:10 9 | @Author : Pengyou FU 10 | @blogs : https://blog.csdn.net/Echo_Code?spm=1000.2115.3001.5343 11 | @github : https://github.com/FuSiry/OpenSA 12 | @WeChat : Fu_siry 13 | @License:Apache-2.0 license 14 | 15 | """ 16 | 17 | from sklearn.svm import SVR 18 | from Evaluate.RgsEvaluate import ModelRgsevaluate 19 | 20 | def Pls( X_train, X_test, y_train, y_test): 21 | 22 | 23 | model = PLSRegression(n_components=8) 24 | # fit the model 25 | model.fit(X_train, y_train) 26 | 27 | # predict the values 28 | y_pred = model.predict(X_test) 29 | 30 | Rmse, R2, Mae = ModelRgsevaluate(y_pred, y_test) 31 | 32 | return Rmse, R2, Mae 33 | 34 | 35 | def Svregression(X_train, X_test, y_train, y_test): 36 | 37 | 38 | model = SVR(C=2, gamma=1e-07, kernel='linear') 39 | model.fit(X_train, y_train) 40 | 41 | # predict the values 42 | y_pred = model.predict(X_test) 43 | Rmse, R2, Mae = ModelRgsevaluate(y_pred, y_test) 44 | 45 | return Rmse, R2, Mae 46 | 47 | def Anngression(X_train, X_test, y_train, y_test): 48 | 49 | 50 | model = MLPRegressor( 51 | hidden_layer_sizes=(20, 20), activation='relu', solver='adam', alpha=0.0001, batch_size='auto', 52 | learning_rate='constant', learning_rate_init=0.001, power_t=0.5, max_iter=400, shuffle=True, 53 | random_state=1, tol=0.0001, verbose=False, warm_start=False, momentum=0.9, nesterovs_momentum=True, 54 | early_stopping=False, beta_1=0.9, beta_2=0.999, epsilon=1e-08) 55 | 56 | model.fit(X_train, y_train) 57 | 58 | # predict the values 59 | y_pred = model.predict(X_test) 60 | Rmse, R2, Mae = ModelRgsevaluate(y_pred, y_test) 61 | 62 | return Rmse, R2, Mae 63 | 64 | def ELM(X_train, X_test, y_train, y_test): 65 | 66 | model = hpelm.ELM(X_train.shape[1], 1) 67 | model.add_neurons(20, 'sigm') 68 | 69 | 70 | model.train(X_train, y_train, 'r') 71 | y_pred = model.predict(X_test) 72 | 73 | 74 | Rmse, R2, Mae = ModelRgsevaluate(y_pred, y_test) 75 | 76 | return Rmse, R2, Mae -------------------------------------------------------------------------------- /OpenSA/Regression/CnnModel.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import torch.nn.functional as F 4 | from collections.abc import Iterable 5 | 6 | class ConvNet(nn.Module): 7 | def __init__(self): 8 | super(ConvNet,self).__init__() 9 | self.conv1 = nn.Sequential( 10 | nn.Conv1d(1, 16, kernel_size=21, padding=0), 11 | nn.BatchNorm1d(16), 12 | nn.ReLU() 13 | ) 14 | self.conv2 = nn.Sequential( 15 | nn.Conv1d(16, 32, kernel_size=19, padding=0), 16 | nn.BatchNorm1d(32), 17 | nn.ReLU() 18 | ) 19 | self.conv3 = nn.Sequential( 20 | nn.Conv1d(32, 64, kernel_size=17, padding=0), 21 | nn.BatchNorm1d(64), 22 | nn.ReLU() 23 | ) 24 | self.fc = nn. Linear(38080,1) #8960 ,17920 25 | self.drop = nn.Dropout(0.2) 26 | 27 | def forward(self,out): 28 | out = self.conv1(out) 29 | out = self.conv2(out) 30 | out = self.conv3(out) 31 | out = out.view(out.size(0),-1) 32 | # print(out.size(1)) 33 | out = self.fc(out) 34 | return out 35 | 36 | 37 | class AlexNet(nn.Module): 38 | def __init__(self, num_classes=1, reduction=16): 39 | super(AlexNet, self).__init__() 40 | self.features = nn.Sequential( 41 | # conv1 42 | nn.Conv1d(1, 16, kernel_size=3, stride=1, padding=1), 43 | nn.BatchNorm1d(num_features=16), 44 | nn.ReLU(inplace=True), 45 | # nn.LeakyReLU(inplace=True), 46 | nn.MaxPool1d(kernel_size=2, stride=2), 47 | # conv2 48 | nn.Conv1d(16, 32, kernel_size=3, stride=1, padding=1), 49 | nn.BatchNorm1d(num_features=32), 50 | nn.ReLU(inplace=True), 51 | # nn.LeakyReLU(inplace=True), 52 | nn.MaxPool1d(kernel_size=2, stride=2), 53 | # conv3 54 | nn.Conv1d(32, 64, kernel_size=3, stride=1, padding=1), 55 | nn.ReLU(inplace=True), 56 | # nn.LeakyReLU(inplace=True), 57 | nn.MaxPool1d(kernel_size=2, stride=2), 58 | # conv4 59 | nn.Conv1d(64, 128, kernel_size=3, stride=1, padding=1), 60 | nn.BatchNorm1d(num_features=128), 61 | nn.ReLU(inplace=True), 62 | # nn.LeakyReLU(inplace=True), 63 | nn.MaxPool1d(kernel_size=2, stride=2), 64 | # conv5 65 | nn.Conv1d(128, 192, kernel_size=3, stride=1, padding=1), 66 | nn.BatchNorm1d(num_features=192), 67 | nn.ReLU(inplace=True), 68 | nn.MaxPool1d(kernel_size=2, stride=2), 69 | # SELayer(256, reduction), 70 | # nn.LeakyReLU(inplace=True), 71 | ) 72 | self.reg = nn.Sequential( 73 | nn.Linear(3840, 1000), #根据自己数据集修改 74 | nn.ReLU(inplace=True), 75 | # nn.LeakyReLU(inplace=True), 76 | nn.Linear(1000, 500), 77 | nn.ReLU(inplace=True), 78 | # nn.LeakyReLU(inplace=True), 79 | nn.Dropout(0.5), 80 | nn.Linear(500, num_classes), 81 | ) 82 | 83 | def forward(self, x): 84 | out = self.features(x) 85 | out = out.flatten(start_dim=1) 86 | out = self.reg(out) 87 | return out 88 | 89 | class Inception(nn.Module): 90 | def __init__(self,in_c,c1,c2,c3,out_C): 91 | super(Inception,self).__init__() 92 | self.p1 = nn.Sequential( 93 | nn.Conv1d(in_c, c1,kernel_size=1,padding=0), 94 | nn.Conv1d(c1, c1, kernel_size=3, padding=1) 95 | ) 96 | self.p2 = nn.Sequential( 97 | nn.Conv1d(in_c, c2,kernel_size=1,padding=0), 98 | nn.Conv1d(c2, c2, kernel_size=5, padding=2) 99 | 100 | ) 101 | self.p3 = nn.Sequential( 102 | nn.MaxPool1d(kernel_size=3,stride=1,padding=1), 103 | nn.Conv1d(in_c, c3,kernel_size=3,padding=1), 104 | ) 105 | self.conv_linear = nn.Conv1d((c1+c2+c3), out_C, 1, 1, 0, bias=True) 106 | self.short_cut = nn.Sequential() 107 | if in_c != out_C: 108 | self.short_cut = nn.Sequential( 109 | nn.Conv1d(in_c, out_C, 1, 1, 0, bias=False), 110 | 111 | ) 112 | def forward(self, x): 113 | p1 = self.p1(x) 114 | p2 = self.p2(x) 115 | p3 = self.p3(x) 116 | out = torch.cat((p1,p2,p3),dim=1) 117 | out += self.short_cut(x) 118 | return out 119 | 120 | 121 | 122 | 123 | class DeepSpectra(nn.Module): 124 | def __init__(self): 125 | super(DeepSpectra, self).__init__() 126 | self.conv1 = nn.Sequential( 127 | nn.Conv1d(1, 16, kernel_size=5, stride=3, padding=0) 128 | ) 129 | self.Inception = Inception(16, 32, 32, 32, 96) 130 | self.fc = nn.Sequential( 131 | nn.Linear(20640, 5000), 132 | nn.Dropout(0.5), 133 | nn.Linear(5000, 1) 134 | ) 135 | self.dropout = nn.Dropout(0.1) 136 | 137 | def forward(self, x): 138 | x = self.conv1(x) 139 | x = self.Inception(x) 140 | x = x.view(x.size(0), -1) 141 | x = self.fc(x) 142 | 143 | return x 144 | 145 | -------------------------------------------------------------------------------- /OpenSA/Regression/DeepRgs.py: -------------------------------------------------------------------------------- 1 | """ 2 | -*- coding: utf-8 -*- 3 | @Time :2022/04/12 17:10 4 | @Author : Pengyou FU 5 | @blogs : https://blog.csdn.net/Echo_Code?spm=1000.2115.3001.5343 6 | @github : https://github.com/FuSiry/OpenSA 7 | @WeChat : Fu_siry 8 | @License:Apache-2.0 license 9 | 10 | """ 11 | -------------------------------------------------------------------------------- /OpenSA/Regression/Rgs.py: -------------------------------------------------------------------------------- 1 | """ 2 | -*- coding: utf-8 -*- 3 | @Time :2022/04/12 17:10 4 | @Author : Pengyou FU 5 | @blogs : https://blog.csdn.net/Echo_Code?spm=1000.2115.3001.5343 6 | @github : https://github.com/FuSiry/OpenSA 7 | @WeChat : Fu_siry 8 | @License:Apache-2.0 license 9 | 10 | """ 11 | 12 | from Regression.ClassicRgs import Pls, Anngression, Svregression, ELM 13 | from Regression.CNN import CNNTrain 14 | 15 | def QuantitativeAnalysis(model, X_train, X_test, y_train, y_test): 16 | 17 | if model == "Pls": 18 | Rmse, R2, Mae = Pls(X_train, X_test, y_train, y_test) 19 | elif model == "ANN": 20 | Rmse, R2, Mae = Anngression(X_train, X_test, y_train, y_test) 21 | elif model == "SVR": 22 | Rmse, R2, Mae = Svregression(X_train, X_test, y_train, y_test) 23 | elif model == "ELM": 24 | Rmse, R2, Mae = ELM(X_train, X_test, y_train, y_test) 25 | elif model == "CNN": 26 | Rmse, R2, Mae = CNNTrain("AlexNet",X_train, X_test, y_train, y_test, 150) 27 | else: 28 | print("no this model of QuantitativeAnalysis") 29 | 30 | return Rmse, R2, Mae -------------------------------------------------------------------------------- /OpenSA/Regression/__pycache__/CNN.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/Regression/__pycache__/CNN.cpython-38.pyc -------------------------------------------------------------------------------- /OpenSA/Regression/__pycache__/ClassicRgs.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/Regression/__pycache__/ClassicRgs.cpython-38.pyc -------------------------------------------------------------------------------- /OpenSA/Regression/__pycache__/ClassicRgs.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/Regression/__pycache__/ClassicRgs.cpython-39.pyc -------------------------------------------------------------------------------- /OpenSA/Regression/__pycache__/CnnModel.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/Regression/__pycache__/CnnModel.cpython-38.pyc -------------------------------------------------------------------------------- /OpenSA/Regression/__pycache__/Rgs.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/Regression/__pycache__/Rgs.cpython-38.pyc -------------------------------------------------------------------------------- /OpenSA/Regression/__pycache__/Rgs.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/Regression/__pycache__/Rgs.cpython-39.pyc -------------------------------------------------------------------------------- /OpenSA/Simcalculation/SimCa.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | """ 3 | -*- coding: utf-8 -*- 4 | @Time :2022/04/12 17:10 5 | @Author : Pengyou FU 6 | @blogs : https://blog.csdn.net/Echo_Code?spm=1000.2115.3001.5343 7 | @github : https://github.com/FuSiry/OpenSA 8 | @WeChat : Fu_siry 9 | @License:Apache-2.0 license 10 | 11 | """ 12 | 13 | 14 | from numpy.linalg import norm 15 | # from skimage.measure import compare_psnr, compare_ssim 16 | # from skimage.metrics import structural_similarity as compare_ssim 17 | # from skimage.metrics import peak_signal_noise_ratio as compare_psnr 18 | 19 | def Simcalculation(type,data1, data2): 20 | """ 21 | :param method: 计算相似度的方法 22 | :param data1: 光谱数据:格式:(1,length),高光谱图像:格式:(H, W, C) 23 | :param data2: 光谱数据:格式:(1,length),高光谱图像:格式:(H, W, C) 24 | :return: 计算原始光谱数据与目标光谱数据的相似度,float 25 | """ 26 | 27 | if type == 'SAM': 28 | return SAM(data1, data2) 29 | elif type == 'SID': 30 | return SID(data1,data2) 31 | elif type == 'HsiSam': 32 | return HsiSam(data1,data2) 33 | elif type == 'mssim': 34 | return mssim(data1,data2) 35 | elif type == 'mpsnr': 36 | return mpsnr(data1,data2) 37 | else: 38 | print("no this method of Simcalculation!") 39 | 40 | def SAM(x,y): 41 | """ 42 | :param x_true: 光谱数据:格式:(1,length) 43 | :param x_pred: 光谱数据:格式:(1,length) 44 | :return: 计算原始光谱数据与目标光谱数据的光谱角差异 45 | """ 46 | s = np.sum(np.dot(x,y)) 47 | t1 = (norm(x)) * (norm(y)) 48 | val = s/t1 49 | sam = 1.0/np.cos(val) 50 | 51 | return sam 52 | 53 | # 计算SID 54 | def SID(x,y): 55 | """ 56 | :param x_true: 光谱数据:格式:(1,length) 57 | :param x_pred: 光谱数据:格式:(1,length) 58 | :return: 计算原始光谱数据与目标光谱数据的光谱角差异 59 | References 60 | :param x_true: 光谱数据:格式:(1,length) 61 | :param x_pred: 光谱数据:格式:(1,length) 62 | :return: 计算原始光谱数据与目标光谱数据的光谱角差异 63 | References 64 | ---------- 65 | ---------- 66 | """ 67 | p = np.zeros_like(x,dtype=np.float) 68 | q = np.zeros_like(y,dtype=np.float) 69 | Sid = 0 70 | for i in range(len(x)): 71 | p[i] = np.around((x[i]/np.sum(x)), 8) 72 | q[i] = np.around((y[i]/np.sum(y)), 8) 73 | for j in range(len(x)): 74 | Sid += p[j]*np.log10(p[j]/q[j])+q[j]*np.log10(q[j]/p[j]) 75 | return Sid 76 | 77 | def mpsnr(x_true, x_pred): 78 | """ 79 | :param x_true: 高光谱图像:格式:(H, W, C) 80 | :param x_pred: 高光谱图像:格式:(H, W, C) 81 | :return: 计算原始高光谱数据与重构高光谱数据的均方误差 82 | """ 83 | n_bands = x_true.shape[2] 84 | p = [compare_psnr(x_true[:, :, k], x_pred[:, :, k], data_range=(0, 10000)) for k in range(n_bands)] 85 | return np.mean(p) 86 | 87 | 88 | def HsiSam(x_true, x_pred): 89 | """ 90 | :param x_true: 高光谱图像:格式:(H, W, C) 91 | :param x_pred: 高光谱图像:格式:(H, W, C) 92 | :return: 计算原始高光谱数据与重构高光谱数据的光谱角相似度 93 | """ 94 | assert x_true.ndim ==3 and x_true.shape == x_pred.shape 95 | sam_rad = np.zeros(x_pred.shape[0, 1]) 96 | for x in range(x_true.shape[0]): 97 | for y in range(x_true.shape[1]): 98 | tmp_pred = x_pred[x, y].ravel() 99 | tmp_true = x_true[x, y].ravel() 100 | sam_rad[x, y] = np.arccos(tmp_pred / (norm(tmp_pred) * tmp_true / norm(tmp_true))) 101 | sam_deg = sam_rad.mean() * 180 / np.pi 102 | return sam_deg 103 | 104 | 105 | def mssim(x_true,x_pred): 106 | """ 107 | :param x_true: 高光谱图像:格式:(H, W, C) 108 | :param x_pred: 高光谱图像:格式:(H, W, C) 109 | :return: 计算原始高光谱数据与重构高光谱数据的结构相似度 110 | """ 111 | SSIM = compare_ssim(im1=x_true, im2=x_pred, multichannel=True) 112 | return SSIM -------------------------------------------------------------------------------- /OpenSA/Simcalculation/__pycache__/SimCa.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/Simcalculation/__pycache__/SimCa.cpython-38.pyc -------------------------------------------------------------------------------- /OpenSA/Simcalculation/__pycache__/SimCa.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/Simcalculation/__pycache__/SimCa.cpython-39.pyc -------------------------------------------------------------------------------- /OpenSA/WaveSelect/Cars.py: -------------------------------------------------------------------------------- 1 | """ 2 | -*- coding: utf-8 -*- 3 | @Time :2022/04/12 17:10 4 | @Author : Pengyou FU 5 | @blogs : https://blog.csdn.net/Echo_Code?spm=1000.2115.3001.5343 6 | @github : https://github.com/FuSiry/OpenSA 7 | @WeChat : Fu_siry 8 | @License:Apache-2.0 license 9 | 10 | """ 11 | 12 | import numpy as np 13 | import matplotlib.pyplot as plt 14 | from matplotlib import font_manager as fm, rcParams 15 | from sklearn.cross_decomposition import PLSRegression 16 | from sklearn.model_selection import KFold 17 | from sklearn.metrics import mean_squared_error 18 | import copy 19 | 20 | # ref: https://blog.csdn.net/qq2512446791 21 | 22 | def PC_Cross_Validation(X, y, pc, cv): 23 | ''' 24 | x :光谱矩阵 nxm 25 | y :浓度阵 (化学值) 26 | pc:最大主成分数 27 | cv:交叉验证数量 28 | return : 29 | RMSECV:各主成分数对应的RMSECV 30 | PRESS :各主成分数对应的PRESS 31 | rindex:最佳主成分数 32 | ''' 33 | kf = KFold(n_splits=cv) 34 | RMSECV = [] 35 | for i in range(pc): 36 | RMSE = [] 37 | for train_index, test_index in kf.split(X): 38 | x_train, x_test = X[train_index], X[test_index] 39 | y_train, y_test = y[train_index], y[test_index] 40 | pls = PLSRegression(n_components=i + 1) 41 | pls.fit(x_train, y_train) 42 | y_predict = pls.predict(x_test) 43 | RMSE.append(np.sqrt(mean_squared_error(y_test, y_predict))) 44 | RMSE_mean = np.mean(RMSE) 45 | RMSECV.append(RMSE_mean) 46 | rindex = np.argmin(RMSECV) 47 | return RMSECV, rindex 48 | 49 | def Cross_Validation(X, y, pc, cv): 50 | ''' 51 | x :光谱矩阵 nxm 52 | y :浓度阵 (化学值) 53 | pc:最大主成分数 54 | cv:交叉验证数量 55 | return : 56 | RMSECV:各主成分数对应的RMSECV 57 | ''' 58 | kf = KFold(n_splits=cv) 59 | RMSE = [] 60 | for train_index, test_index in kf.split(X): 61 | x_train, x_test = X[train_index], X[test_index] 62 | y_train, y_test = y[train_index], y[test_index] 63 | pls = PLSRegression(n_components=pc) 64 | pls.fit(x_train, y_train) 65 | y_predict = pls.predict(x_test) 66 | RMSE.append(np.sqrt(mean_squared_error(y_test, y_predict))) 67 | RMSE_mean = np.mean(RMSE) 68 | return RMSE_mean 69 | 70 | def CARS_Cloud(X, y, N=50, f=20, cv=10): 71 | p = 0.8 72 | m, n = X.shape 73 | u = np.power((n/2), (1/(N-1))) 74 | k = (1/(N-1)) * np.log(n/2) 75 | cal_num = np.round(m * p) 76 | # val_num = m - cal_num 77 | b2 = np.arange(n) 78 | x = copy.deepcopy(X) 79 | D = np.vstack((np.array(b2).reshape(1, -1), X)) 80 | WaveData = [] 81 | # Coeff = [] 82 | WaveNum =[] 83 | RMSECV = [] 84 | r = [] 85 | for i in range(1, N+1): 86 | r.append(u*np.exp(-1*k*i)) 87 | wave_num = int(np.round(r[i-1]*n)) 88 | WaveNum = np.hstack((WaveNum, wave_num)) 89 | cal_index = np.random.choice \ 90 | (np.arange(m), size=int(cal_num), replace=False) 91 | wave_index = b2[:wave_num].reshape(1, -1)[0] 92 | xcal = x[np.ix_(list(cal_index), list(wave_index))] 93 | #xcal = xcal[:,wave_index].reshape(-1,wave_num) 94 | ycal = y[cal_index] 95 | x = x[:, wave_index] 96 | D = D[:, wave_index] 97 | d = D[0, :].reshape(1,-1) 98 | wnum = n - wave_num 99 | if wnum > 0: 100 | d = np.hstack((d, np.full((1, wnum), -1))) 101 | if len(WaveData) == 0: 102 | WaveData = d 103 | else: 104 | WaveData = np.vstack((WaveData, d.reshape(1, -1))) 105 | 106 | if wave_num < f: 107 | f = wave_num 108 | 109 | pls = PLSRegression(n_components=f) 110 | pls.fit(xcal, ycal) 111 | beta = pls.coef_ 112 | b = np.abs(beta) 113 | b2 = np.argsort(-b, axis=0) 114 | coef = copy.deepcopy(beta) 115 | coeff = coef[b2, :].reshape(len(b2), -1) 116 | rmsecv, rindex = PC_Cross_Validation(xcal, ycal, f, cv) 117 | RMSECV.append(Cross_Validation(xcal, ycal, rindex+1, cv)) 118 | 119 | WAVE = [] 120 | 121 | for i in range(WaveData.shape[0]): 122 | wd = WaveData[i, :] 123 | # cd = CoeffData[i, :] 124 | WD = np.ones((len(wd))) 125 | # CO = np.ones((len(wd))) 126 | for j in range(len(wd)): 127 | ind = np.where(wd == j) 128 | if len(ind[0]) == 0: 129 | WD[j] = 0 130 | # CO[j] = 0 131 | else: 132 | WD[j] = wd[ind[0]] 133 | # CO[j] = cd[ind[0]] 134 | if len(WAVE) == 0: 135 | WAVE = copy.deepcopy(WD) 136 | else: 137 | WAVE = np.vstack((WAVE, WD.reshape(1, -1))) 138 | 139 | 140 | MinIndex = np.argmin(RMSECV) 141 | Optimal = WAVE[MinIndex, :] 142 | boindex = np.where(Optimal != 0) 143 | OptWave = boindex[0] 144 | 145 | 146 | return OptWave -------------------------------------------------------------------------------- /OpenSA/WaveSelect/GA.py: -------------------------------------------------------------------------------- 1 | from deap import base 2 | from deap import creator 3 | from deap import tools 4 | import pandas as pd 5 | import numpy as np 6 | import random 7 | from sklearn import model_selection 8 | from sklearn.cross_decomposition import PLSRegression 9 | 10 | creator.create('FitnessMax', base.Fitness, weights=(1.0,)) # for minimization, set weights as (-1.0,) 11 | creator.create('Individual', list, fitness=creator.FitnessMax) 12 | 13 | 14 | 15 | def GA(X, y, number_of_generation=10): 16 | 17 | scaled_x_train = (X - X.mean(axis=0)) / X.std(axis=0, ddof=1) 18 | scaled_y_train = (y - y.mean()) / y.std(ddof=1) 19 | 20 | 21 | toolbox = base.Toolbox() 22 | min_boundary = np.zeros(X.shape[1]) 23 | max_boundary = np.ones(X.shape[1]) * 1.0 24 | 25 | # 基础参数 26 | 27 | probability_of_crossover = 0.5 28 | probability_of_mutation = 0.2 29 | threshold_of_variable_selection = 0.5 30 | 31 | def create_ind_uniform(min_boundary, max_boundary): 32 | index = [] 33 | for min, max in zip(min_boundary, max_boundary): 34 | index.append(random.uniform(min, max)) 35 | return index 36 | 37 | # individual 个体 38 | # population 种群 39 | toolbox.register('create_ind', create_ind_uniform, min_boundary, max_boundary) 40 | toolbox.register('individual', tools.initIterate, creator.Individual, toolbox.create_ind) 41 | toolbox.register('population', tools.initRepeat, list, toolbox.individual) 42 | 43 | 44 | 45 | def evalOneMax(individual): 46 | individual_array = np.array(individual) 47 | selected_x_variable_numbers = np.where(individual_array > threshold_of_variable_selection)[0] 48 | selected_scaled_x_train = scaled_x_train[:, selected_x_variable_numbers] 49 | max_number_of_components = 10 50 | if len(selected_x_variable_numbers): 51 | # cross-validation 52 | pls_components = np.arange(1, min(np.linalg.matrix_rank(selected_scaled_x_train) + 1, 53 | max_number_of_components + 1), 1) 54 | r2_cv_all = [] 55 | for pls_component in pls_components: 56 | model_in_cv = PLSRegression(n_components=pls_component) 57 | estimated_y_train_in_cv = np.ndarray.flatten( 58 | model_selection.cross_val_predict(model_in_cv, selected_scaled_x_train, scaled_y_train, 59 | cv=5)) 60 | estimated_y_train_in_cv = estimated_y_train_in_cv * y.std(ddof=1) + y.mean() 61 | r2_cv_all.append( 62 | 1 - sum((y - estimated_y_train_in_cv) ** 2) / sum((y - y.mean()) ** 2)) 63 | value = [np.max(r2_cv_all)] 64 | return value 65 | 66 | toolbox.register('evaluate', evalOneMax) 67 | # 加入交叉变换 68 | toolbox.register('mate', tools.cxTwoPoint) 69 | # 设置突变几率 70 | toolbox.register('mutate', tools.mutFlipBit, indpb=0.05) 71 | # 挑选个体 72 | toolbox.register('select', tools.selTournament, tournsize=3) 73 | # 种群 74 | random.seed() 75 | pop = toolbox.population(n=len(y)) 76 | 77 | 78 | for generation in range(number_of_generation): 79 | print('-- Generation {0} --'.format(generation + 1)) 80 | 81 | offspring = toolbox.select(pop, len(pop)) 82 | offspring = list(map(toolbox.clone, offspring)) 83 | 84 | for child1, child2 in zip(offspring[::2], offspring[1::2]): 85 | if random.random() < probability_of_crossover: 86 | toolbox.mate(child1, child2) 87 | del child1.fitness.values 88 | del child2.fitness.values 89 | 90 | for mutant in offspring: 91 | if random.random() < probability_of_mutation: 92 | toolbox.mutate(mutant) 93 | del mutant.fitness.values 94 | 95 | invalid_ind = [ind for ind in offspring if not ind.fitness.valid] 96 | fitnesses = map(toolbox.evaluate, invalid_ind) 97 | for ind, fit in zip(invalid_ind, fitnesses): 98 | ind.fitness.values = fit 99 | # 选出来的个体(描述符) 100 | print(' Evaluated %i individuals' % len(invalid_ind)) 101 | 102 | pop[:] = offspring 103 | fits = [ind.fitness.values[0] for ind in pop] 104 | 105 | length = len(pop) 106 | mean = sum(fits) / length 107 | sum2 = sum(x * x for x in fits) 108 | std = abs(sum2 / length - mean ** 2) ** 0.5 109 | 110 | print(' Min %s' % min(fits)) 111 | print(' Max %s' % max(fits)) 112 | 113 | best_individual = tools.selBest(pop, 1)[0] 114 | best_individual_array = np.array(best_individual) 115 | selected_x_variable_numbers = np.where(best_individual_array > threshold_of_variable_selection)[0] 116 | 117 | return selected_x_variable_numbers 118 | 119 | if __name__ == '__main__': 120 | pass -------------------------------------------------------------------------------- /OpenSA/WaveSelect/Lar.py: -------------------------------------------------------------------------------- 1 | """ 2 | -*- coding: utf-8 -*- 3 | @Time :2022/04/12 17:10 4 | @Author : Pengyou FU 5 | @blogs : https://blog.csdn.net/Echo_Code?spm=1000.2115.3001.5343 6 | @github : https://github.com/FuSiry/OpenSA 7 | @WeChat : Fu_siry 8 | @License:Apache-2.0 license 9 | 10 | """ 11 | 12 | 13 | from sklearn import linear_model 14 | import numpy as np 15 | 16 | def Lar(X, y, nums=40): 17 | ''' 18 | X : 预测变量矩阵 19 | y :标签 20 | nums : 选择的特征点的数目,默认为40 21 | return :选择变量集的索引 22 | ''' 23 | Lars = linear_model.Lars() 24 | Lars.fit(X, y) 25 | corflist = np.abs(Lars.coef_) 26 | 27 | corf = np.asarray(corflist) 28 | SpectrumList = corf.argsort()[-1:-(nums+1):-1] 29 | SpectrumList = np.sort(SpectrumList) 30 | 31 | return SpectrumList -------------------------------------------------------------------------------- /OpenSA/WaveSelect/Pca.py: -------------------------------------------------------------------------------- 1 | """ 2 | -*- coding: utf-8 -*- 3 | @Time :2022/04/12 17:10 4 | @Author : Pengyou FU 5 | @blogs : https://blog.csdn.net/Echo_Code?spm=1000.2115.3001.5343 6 | @github : https://github.com/FuSiry/OpenSA 7 | @WeChat : Fu_siry 8 | @License:Apache-2.0 license 9 | 10 | """ 11 | 12 | from sklearn.decomposition import PCA 13 | 14 | def Pca(X, nums=20): 15 | """ 16 | :param X: raw spectrum data, shape (n_samples, n_features) 17 | :param nums: Number of principal components retained 18 | :return: X_reduction:Spectral data after dimensionality reduction 19 | """ 20 | pca = PCA(n_components=nums) # 保留的特征数码 21 | pca.fit(X) 22 | X_reduction = pca.transform(X) 23 | 24 | return X_reduction 25 | -------------------------------------------------------------------------------- /OpenSA/WaveSelect/Spa.py: -------------------------------------------------------------------------------- 1 | """ 2 | -*- coding: utf-8 -*- 3 | @Time :2022/04/12 17:10 4 | @Author : Pengyou FU 5 | @blogs : https://blog.csdn.net/Echo_Code?spm=1000.2115.3001.5343 6 | @github : https://github.com/FuSiry/OpenSA 7 | @WeChat : Fu_siry 8 | @License:Apache-2.0 license 9 | 10 | """ 11 | 12 | 13 | import pandas as pd 14 | import numpy as np 15 | from scipy.linalg import qr, inv, pinv 16 | import scipy.stats 17 | import scipy.io as scio 18 | # from progress.bar import Bar 19 | from matplotlib import pyplot as plt 20 | 21 | # ref: https://blog.csdn.net/qq2512446791 22 | 23 | class SPA: 24 | 25 | def _projections_qr(self, X, k, M): 26 | ''' 27 | X : 预测变量矩阵 28 | K :投影操作的初始列的索引 29 | M : 结果包含的变量个数 30 | return :由投影操作生成的变量集的索引 31 | ''' 32 | 33 | X_projected = X.copy() 34 | 35 | # 计算列向量的平方和 36 | norms = np.sum((X ** 2), axis=0) 37 | # 找到norms中数值最大列的平方和 38 | norm_max = np.amax(norms) 39 | 40 | # 缩放第K列 使其成为“最大的”列 41 | X_projected[:, k] = X_projected[:, k] * 2 * norm_max / norms[k] 42 | 43 | # 矩阵分割 ,order 为列交换索引 44 | _, __, order = qr(X_projected, 0, pivoting=True) 45 | 46 | return order[:M].T 47 | 48 | def _validation(self, Xcal, ycal, var_sel, Xval=None, yval=None): 49 | ''' 50 | [yhat,e] = validation(Xcal,var_sel,ycal,Xval,yval) --> 使用单独的验证集进行验证 51 | [yhat,e] = validation(Xcal,ycalvar_sel) --> 交叉验证 52 | ''' 53 | N = Xcal.shape[0] # N 测试集的个数 54 | if Xval is None: # 判断是否使用验证集 55 | NV = 0 56 | else: 57 | NV = Xval.shape[0] # NV 验证集的个数 58 | 59 | yhat = e = None 60 | 61 | # 使用单独的验证集进行验证 62 | if NV > 0: 63 | Xcal_ones = np.hstack( 64 | [np.ones((N, 1)), Xcal[:, var_sel].reshape(N, -1)]) 65 | 66 | # 对偏移量进行多元线性回归 67 | b = np.linalg.lstsq(Xcal_ones, ycal, rcond=None)[0] 68 | # 对验证集进行预测 69 | np_ones = np.ones((NV, 1)) 70 | Xval_ = Xval[:, var_sel] 71 | X = np.hstack([np.ones((NV, 1)), Xval[:, var_sel]]) 72 | yhat = X.dot(b) 73 | # 计算误差 74 | e = yval - yhat 75 | else: 76 | # 为yhat 设置适当大小 77 | yhat = np.zeros((N, 1)) 78 | for i in range(N): 79 | # 从测试集中 去除掉第 i 项 80 | cal = np.hstack([np.arange(i), np.arange(i + 1, N)]) 81 | X = Xcal[cal, :][:, var_sel.astype(np.int)] 82 | y = ycal[cal] 83 | xtest = Xcal[i, var_sel] 84 | # ytest = ycal[i] 85 | X_ones = np.hstack([np.ones((N - 1, 1)), X.reshape(N - 1, -1)]) 86 | # 对偏移量进行多元线性回归 87 | b = np.linalg.lstsq(X_ones, y, rcond=None)[0] 88 | # 对验证集进行预测 89 | yhat[i] = np.hstack([np.ones(1), xtest]).dot(b) 90 | # 计算误差 91 | e = ycal - yhat 92 | 93 | return yhat, e 94 | 95 | def spa(self, Xcal, ycal, m_min=1, m_max=None, Xval=None, yval=None, autoscaling=1): 96 | ''' 97 | [var_sel,var_sel_phase2] = spa(Xcal,ycal,m_min,m_max,Xval,yval,autoscaling) --> 使用单独的验证集进行验证 98 | [var_sel,var_sel_phase2] = spa(Xcal,ycal,m_min,m_max,autoscaling) --> 交叉验证 99 | 100 | 如果 m_min 为空时, 默认 m_min = 1 101 | 如果 m_max 为空时: 102 | 1. 当使用单独的验证集进行验证时, m_max = min(N-1, K) 103 | 2. 当使用交叉验证时,m_max = min(N-2, K) 104 | 105 | autoscaling : 是否使用自动刻度 yes = 1,no = 0, 默认为 1 106 | 107 | ''' 108 | 109 | assert (autoscaling == 0 or autoscaling == 1), "请选择是否使用自动计算" 110 | 111 | N, K = Xcal.shape 112 | 113 | if m_max is None: 114 | if Xval is None: 115 | m_max = min(N - 1, K) 116 | else: 117 | m_max = min(N - 2, K) 118 | 119 | assert (m_max < min(N - 1, K)), "m_max 参数异常" 120 | 121 | # 第一步: 对测试集进行投影操作 122 | 123 | # 124 | 125 | normalization_factor = None 126 | if autoscaling == 1: 127 | normalization_factor = np.std( 128 | Xcal, ddof=1, axis=0).reshape(1, -1)[0] 129 | else: 130 | normalization_factor = np.ones((1, K))[0] 131 | 132 | Xcaln = np.empty((N, K)) 133 | for k in range(K): 134 | x = Xcal[:, k] 135 | Xcaln[:, k] = (x - np.mean(x)) / normalization_factor[k] 136 | 137 | SEL = np.zeros((m_max, K)) 138 | 139 | # 进度条 140 | # with Bar('Projections :', max=K) as bar: 141 | for k in range(K): 142 | SEL[:, k] = self._projections_qr(Xcaln, k, m_max) 143 | # bar.next() 144 | 145 | # 第二步: 进行评估 146 | 147 | PRESS = float('inf') * np.ones((m_max + 1, K)) 148 | 149 | # with Bar('Evaluation of variable subsets :', max=(K) * (m_max - m_min + 1)) as bar: 150 | for k in range(K): 151 | for m in range(m_min, m_max + 1): 152 | var_sel = SEL[:m, k].astype(np.int) 153 | _, e = self._validation(Xcal, ycal, var_sel, Xval, yval) 154 | PRESS[m, k] = np.conj(e).T.dot(e) 155 | 156 | # bar.next() 157 | 158 | PRESSmin = np.min(PRESS, axis=0) 159 | m_sel = np.argmin(PRESS, axis=0) 160 | k_sel = np.argmin(PRESSmin) 161 | 162 | # 第 k_sel波段为初始波段时最佳,波段数目为 m_sel(k_sel) 163 | var_sel_phase2 = SEL[:m_sel[k_sel], k_sel].astype(np.int) 164 | 165 | # 最后消去变量 166 | 167 | # 第 3.1 步 计算相关指数 168 | Xcal2 = np.hstack([np.ones((N, 1)), Xcal[:, var_sel_phase2]]) 169 | b = np.linalg.lstsq(Xcal2, ycal, rcond=None)[0] 170 | std_deviation = np.std(Xcal2, ddof=1, axis=0) 171 | 172 | relev = np.abs(b * std_deviation.T) 173 | relev = relev[1:] 174 | 175 | index_increasing_relev = np.argsort(relev, axis=0) 176 | index_decreasing_relev = index_increasing_relev[::-1].reshape(1, -1)[0] 177 | 178 | PRESS_scree = np.empty(len(var_sel_phase2)) 179 | yhat = e = None 180 | for i in range(len(var_sel_phase2)): 181 | var_sel = var_sel_phase2[index_decreasing_relev[:i + 1]] 182 | _, e = self._validation(Xcal, ycal, var_sel, Xval, yval) 183 | 184 | PRESS_scree[i] = np.conj(e).T.dot(e) 185 | 186 | RMSEP_scree = np.sqrt(PRESS_scree / len(e)) 187 | 188 | # 第 3.3: F-test 验证 189 | PRESS_scree_min = np.min(PRESS_scree) 190 | alpha = 0.25 191 | dof = len(e) 192 | fcrit = scipy.stats.f.ppf(1 - alpha, dof, dof) 193 | PRESS_crit = PRESS_scree_min * fcrit 194 | 195 | # 找到不明显比 PRESS_scree_min 大的最小变量 196 | 197 | i_crit = np.min(np.nonzero(PRESS_scree < PRESS_crit)) 198 | i_crit = max(m_min, i_crit) 199 | 200 | var_sel = var_sel_phase2[index_decreasing_relev[:i_crit]] 201 | 202 | # print("var_sel") 203 | # print(var_sel) 204 | 205 | return var_sel 206 | 207 | def __repr__(self): 208 | return "SPA()" 209 | -------------------------------------------------------------------------------- /OpenSA/WaveSelect/Uve.py: -------------------------------------------------------------------------------- 1 | """ 2 | -*- coding: utf-8 -*- 3 | @Time :2022/04/12 17:10 4 | @Author : Pengyou FU 5 | @blogs : https://blog.csdn.net/Echo_Code?spm=1000.2115.3001.5343 6 | @github : https://github.com/FuSiry/OpenSA 7 | @WeChat : Fu_siry 8 | @License:Apache-2.0 license 9 | 10 | """ 11 | 12 | 13 | 14 | 15 | from sklearn.cross_decomposition import PLSRegression 16 | from sklearn.linear_model import LinearRegression 17 | from sklearn.metrics import mean_squared_error 18 | from sklearn.model_selection import ShuffleSplit 19 | from sklearn.model_selection import cross_val_predict 20 | from sklearn.model_selection import cross_val_score 21 | from sklearn.utils import shuffle 22 | from numpy.linalg import matrix_rank as rank 23 | import numpy as np 24 | 25 | class UVE: 26 | def __init__(self, x, y, ncomp=1, nrep=500, testSize=0.2): 27 | 28 | ''' 29 | X : 预测变量矩阵 30 | y :标签 31 | ncomp : 结果包含的变量个数 32 | testSize: PLS中划分的数据集 33 | return :波长筛选后的光谱数据 34 | ''' 35 | 36 | self.x = x 37 | self.y = y 38 | # The number of latent components should not be larger than any dimension size of independent matrix 39 | self.ncomp = min([ncomp, rank(x)]) 40 | self.nrep = nrep 41 | self.testSize = testSize 42 | self.criteria = None 43 | 44 | self.featureIndex = None 45 | self.featureR2 = np.full(self.x.shape[1], np.nan) 46 | self.selFeature = None 47 | 48 | def calcCriteria(self): 49 | PLSCoef = np.zeros((self.nrep, self.x.shape[1])) 50 | ss = ShuffleSplit(n_splits=self.nrep, test_size=self.testSize) 51 | step = 0 52 | for train, test in ss.split(self.x, self.y): 53 | xtrain = self.x[train, :] 54 | ytrain = self.y[train] 55 | plsModel = PLSRegression(min([self.ncomp, rank(xtrain)])) 56 | plsModel.fit(xtrain, ytrain) 57 | PLSCoef[step, :] = plsModel.coef_.T 58 | step += 1 59 | meanCoef = np.mean(PLSCoef, axis=0) 60 | stdCoef = np.std(PLSCoef, axis=0) 61 | self.criteria = meanCoef / stdCoef 62 | 63 | def evalCriteria(self, cv=3): 64 | self.featureIndex = np.argsort(-np.abs(self.criteria)) 65 | for i in range(self.x.shape[1]): 66 | xi = self.x[:, self.featureIndex[:i + 1]] 67 | if i“光晰本质,谱见不同”,光谱作为物质的指纹,被广泛应用于成分分析中。伴随微型光谱仪/光谱成像仪的发展与普及,基于光谱的分析技术将不只停留于工业和实验室,即将走入生活,实现万物感知,见微知著。本系列文章致力于光谱分析技术的科普和应用。 5 |
6 | 7 | 8 | 9 | 10 | @[TOC](文章目录) 11 | 12 | 13 | 14 |
15 | 16 | # 前言 17 | 典型的光谱分析模型(以近红外光谱作为示意,可见光、中远红外、荧光、拉曼、高光谱等分析流程亦相似)建立流程如下所示,在建立过程中,需要使用算法对训练样本进行选择,然后使用预处理算法对光谱进行预处理,或对光谱的特征进行提取,再构建校正模型实现定量分析,最后针对不同测量仪器或环境,进行模型转移或传递。因此训练样本的选择、光谱的预处理、波长筛选、校正模型、模型传递以及上述算法的参数都影响着模型的应用效果。 18 | 19 | ![图 1近红外光谱建模及应用流程](https://img-blog.csdnimg.cn/e4038170fff643468cacfed4fb34ab04.png?x-oss-process=image/watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBARWNob19Db2Rl,size_20,color_FFFFFF,t_70,g_se,x_16) 20 | 针对光谱分析流程所涉及的常见的训练样本的划分、光谱的预处理、波长筛选、校正模型算法建立了完整的算法库,名为OpenSA(OpenSpectrumAnalysis)。整套算法库的架构如下所示。 21 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/cf63e5d8980542bf824cb889d01f2e00.png?x-oss-process=image/watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBARWNob19Db2Rl,size_20,color_FFFFFF,t_70,g_se,x_16) 22 | 样本划分模块提供随机划分、SPXY划分、KS划分三种数据集划分方法,光谱预处理模块提供常见光谱预处理,波长筛选模块提供Spa、Cars、Lars、Uve、Pca等特征降维方法,分析模块由光谱相似度计算、聚类、分类(定性分析)、回归(定量分析)构建,光谱相似度子模块计算提供SAM、SID、MSSIM、MPSNR等相似计算方法,聚类子模块提供KMeans、FCM等聚类方法,分类子模块提供ANN、SVM、PLS_DA、RF等经典化学计量学方法,亦提供CNN、AE、Transformer等前沿深度学习方法,回归子模块提供ANN、SVR、PLS等经典化学计量学定量分析方法,亦提供CNN、AE、Transformer等前沿深度学习定量分析方法。模型评估模块提供常见的评价指标,用于模型评估。自动参数优化模块用于自动进行最佳的模型设置参数寻找,提供网格搜索、遗传算法、贝叶斯概率三种最优参数寻找方法。可视化模块提供全程的分析可视化,可为科研绘图,模型选择提供视觉信息。可通过几行代码快速实现完整的光谱分析及应用(注: 自动参数优化模块和可视化模块暂不开源,等毕业后再说) 23 | 24 | 25 |
26 | 27 | 28 | 本篇针对OpenSA的光谱预处理模块进行代码开源和使用示意 29 | # 更新日志 20220521 30 | 给OpenSA完善了一下 31 | 1、在波长筛选算法中,加入了遗传算法GA 32 | 2、在定量分析算法中,加入ELM,普通卷积神经网络 33 | 以及复现了一区文章的网络DeepSpectra,和二区文章的网络1-D ALENET 34 | 35 | # 一、光谱数据读入 36 | 提供两个开源数据作为实列,一个为公开定量分析数据集,一个为公开定性分析数据集,本章仅以公开定量分析数据集作为演示。 37 | ## 1.1 光谱数据读入 38 | 39 | ```python 40 | # 分别使用一个回归、一个分类的公开数据集做为example 41 | def LoadNirtest(type): 42 | 43 | if type == "Rgs": 44 | CDataPath1 = './/Data//Rgs//Cdata1.csv' 45 | VDataPath1 = './/Data//Rgs//Vdata1.csv' 46 | TDataPath1 = './/Data//Rgs//Tdata1.csv' 47 | 48 | Cdata1 = np.loadtxt(open(CDataPath1, 'rb'), dtype=np.float64, delimiter=',', skiprows=0) 49 | Vdata1 = np.loadtxt(open(VDataPath1, 'rb'), dtype=np.float64, delimiter=',', skiprows=0) 50 | Tdata1 = np.loadtxt(open(TDataPath1, 'rb'), dtype=np.float64, delimiter=',', skiprows=0) 51 | 52 | Nirdata1 = np.concatenate((Cdata1, Vdata1)) 53 | Nirdata = np.concatenate((Nirdata1, Tdata1)) 54 | data = Nirdata[:, :-4] 55 | label = Nirdata[:, -1] 56 | 57 | elif type == "Cls": 58 | path = './/Data//Cls//table.csv' 59 | Nirdata = np.loadtxt(open(path, 'rb'), dtype=np.float64, delimiter=',', skiprows=0) 60 | data = Nirdata[:, :-1] 61 | label = Nirdata[:, -1] 62 | 63 | return data, label 64 | 65 | ``` 66 | ## 1.2 光谱可视化 67 | ```python 68 | #载入原始数据并可视化 69 | data, label = LoadNirtest('Rgs') 70 | plotspc(data, "raw specturm") 71 | ``` 72 | 采用的开源光谱如图所示: 73 | ![原始光谱](https://img-blog.csdnimg.cn/04a9549619fd48198c9072c2d1acfd99.png?x-oss-process=image/watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBARWNob19Db2Rl,size_20,color_FFFFFF,t_70,g_se,x_16) 74 | 75 | # 二、光谱预处理 76 | ## 2.1 光谱预处理模块 77 | 将常见的光谱进行了封装,使用者仅需要改变名字,即可选择对应的光谱分析,下面是光谱预处理模块的核心代码 78 | ```python 79 | """ 80 | -*- coding: utf-8 -*- 81 | @Time :2022/04/12 17:10 82 | @Author : Pengyou FU 83 | @blogs : https://blog.csdn.net/Echo_Code?spm=1000.2115.3001.5343 84 | @github : 85 | @WeChat : Fu_siry 86 | @License: 87 | 88 | """ 89 | import numpy as np 90 | from scipy import signal 91 | from sklearn.linear_model import LinearRegression 92 | from sklearn.preprocessing import MinMaxScaler, StandardScaler 93 | from copy import deepcopy 94 | import pandas as pd 95 | import pywt 96 | 97 | 98 | # 最大最小值归一化 99 | def MMS(data): 100 | """ 101 | :param data: raw spectrum data, shape (n_samples, n_features) 102 | :return: data after MinMaxScaler :(n_samples, n_features) 103 | """ 104 | return MinMaxScaler().fit_transform(data) 105 | 106 | 107 | # 标准化 108 | def SS(data): 109 | """ 110 | :param data: raw spectrum data, shape (n_samples, n_features) 111 | :return: data after StandScaler :(n_samples, n_features) 112 | """ 113 | return StandardScaler().fit_transform(data) 114 | 115 | 116 | # 均值中心化 117 | def CT(data): 118 | """ 119 | :param data: raw spectrum data, shape (n_samples, n_features) 120 | :return: data after MeanScaler :(n_samples, n_features) 121 | """ 122 | for i in range(data.shape[0]): 123 | MEAN = np.mean(data[i]) 124 | data[i] = data[i] - MEAN 125 | return data 126 | 127 | 128 | # 标准正态变换 129 | def SNV(data): 130 | """ 131 | :param data: raw spectrum data, shape (n_samples, n_features) 132 | :return: data after SNV :(n_samples, n_features) 133 | """ 134 | m = data.shape[0] 135 | n = data.shape[1] 136 | print(m, n) # 137 | # 求标准差 138 | data_std = np.std(data, axis=1) # 每条光谱的标准差 139 | # 求平均值 140 | data_average = np.mean(data, axis=1) # 每条光谱的平均值 141 | # SNV计算 142 | data_snv = [[((data[i][j] - data_average[i]) / data_std[i]) for j in range(n)] for i in range(m)] 143 | return data_snv 144 | 145 | 146 | 147 | # 移动平均平滑 148 | def MA(data, WSZ=11): 149 | """ 150 | :param data: raw spectrum data, shape (n_samples, n_features) 151 | :param WSZ: int 152 | :return: data after MA :(n_samples, n_features) 153 | """ 154 | 155 | for i in range(data.shape[0]): 156 | out0 = np.convolve(data[i], np.ones(WSZ, dtype=int), 'valid') / WSZ # WSZ是窗口宽度,是奇数 157 | r = np.arange(1, WSZ - 1, 2) 158 | start = np.cumsum(data[i, :WSZ - 1])[::2] / r 159 | stop = (np.cumsum(data[i, :-WSZ:-1])[::2] / r)[::-1] 160 | data[i] = np.concatenate((start, out0, stop)) 161 | return data 162 | 163 | 164 | # Savitzky-Golay平滑滤波 165 | def SG(data, w=11, p=2): 166 | """ 167 | :param data: raw spectrum data, shape (n_samples, n_features) 168 | :param w: int 169 | :param p: int 170 | :return: data after SG :(n_samples, n_features) 171 | """ 172 | return signal.savgol_filter(data, w, p) 173 | 174 | 175 | # 一阶导数 176 | def D1(data): 177 | """ 178 | :param data: raw spectrum data, shape (n_samples, n_features) 179 | :return: data after First derivative :(n_samples, n_features) 180 | """ 181 | n, p = data.shape 182 | Di = np.ones((n, p - 1)) 183 | for i in range(n): 184 | Di[i] = np.diff(data[i]) 185 | return Di 186 | 187 | 188 | # 二阶导数 189 | def D2(data): 190 | """ 191 | :param data: raw spectrum data, shape (n_samples, n_features) 192 | :return: data after second derivative :(n_samples, n_features) 193 | """ 194 | data = deepcopy(data) 195 | if isinstance(data, pd.DataFrame): 196 | data = data.values 197 | temp2 = (pd.DataFrame(data)).diff(axis=1) 198 | temp3 = np.delete(temp2.values, 0, axis=1) 199 | temp4 = (pd.DataFrame(temp3)).diff(axis=1) 200 | spec_D2 = np.delete(temp4.values, 0, axis=1) 201 | return spec_D2 202 | 203 | 204 | # 趋势校正(DT) 205 | def DT(data): 206 | """ 207 | :param data: raw spectrum data, shape (n_samples, n_features) 208 | :return: data after DT :(n_samples, n_features) 209 | """ 210 | lenth = data.shape[1] 211 | x = np.asarray(range(lenth), dtype=np.float32) 212 | out = np.array(data) 213 | l = LinearRegression() 214 | for i in range(out.shape[0]): 215 | l.fit(x.reshape(-1, 1), out[i].reshape(-1, 1)) 216 | k = l.coef_ 217 | b = l.intercept_ 218 | for j in range(out.shape[1]): 219 | out[i][j] = out[i][j] - (j * k + b) 220 | 221 | return out 222 | 223 | 224 | # 多元散射校正 225 | def MSC(data): 226 | """ 227 | :param data: raw spectrum data, shape (n_samples, n_features) 228 | :return: data after MSC :(n_samples, n_features) 229 | """ 230 | n, p = data.shape 231 | msc = np.ones((n, p)) 232 | 233 | for j in range(n): 234 | mean = np.mean(data, axis=0) 235 | 236 | # 线性拟合 237 | for i in range(n): 238 | y = data[i, :] 239 | l = LinearRegression() 240 | l.fit(mean.reshape(-1, 1), y.reshape(-1, 1)) 241 | k = l.coef_ 242 | b = l.intercept_ 243 | msc[i, :] = (y - b) / k 244 | return msc 245 | 246 | # 小波变换 247 | def wave(data): 248 | """ 249 | :param data: raw spectrum data, shape (n_samples, n_features) 250 | :return: data after wave :(n_samples, n_features) 251 | """ 252 | data = deepcopy(data) 253 | if isinstance(data, pd.DataFrame): 254 | data = data.values 255 | def wave_(data): 256 | w = pywt.Wavelet('db8') # 选用Daubechies8小波 257 | maxlev = pywt.dwt_max_level(len(data), w.dec_len) 258 | coeffs = pywt.wavedec(data, 'db8', level=maxlev) 259 | threshold = 0.04 260 | for i in range(1, len(coeffs)): 261 | coeffs[i] = pywt.threshold(coeffs[i], threshold * max(coeffs[i])) 262 | datarec = pywt.waverec(coeffs, 'db8') 263 | return datarec 264 | 265 | tmp = None 266 | for i in range(data.shape[0]): 267 | if (i == 0): 268 | tmp = wave_(data[i]) 269 | else: 270 | tmp = np.vstack((tmp, wave_(data[i]))) 271 | 272 | return tmp 273 | 274 | def Preprocessing(method, data): 275 | 276 | if method == "None": 277 | data = data 278 | elif method == 'MMS': 279 | data = MMS(data) 280 | elif method == 'SS': 281 | data = SS(data) 282 | elif method == 'CT': 283 | data = CT(data) 284 | elif method == 'SNV': 285 | data = SNV(data) 286 | elif method == 'MA': 287 | data = MA(data) 288 | elif method == 'SG': 289 | data = SG(data) 290 | elif method == 'MSC': 291 | data = MSC(data) 292 | elif method == 'D1': 293 | data = D1(data) 294 | elif method == 'D2': 295 | data = D2(data) 296 | elif method == 'DT': 297 | data = DT(data) 298 | elif method == 'WVAE': 299 | data = wave(data) 300 | else: 301 | print("no this method of preprocessing!") 302 | 303 | return data 304 | 305 | 306 | ``` 307 | ## 2 .2 光谱预处理的使用 308 | 在example.py文件中,提供了光谱预处理模块的使用方法,具体如下,仅需要两行代码即可实现所有常见的光谱预处理。 309 | 示意1:利用OpenSA实现MSC多元散射校正 310 | ```python 311 | #载入原始数据并可视化 312 | data, label = LoadNirtest('Rgs') 313 | plotspc(data, "raw specturm") 314 | #光谱预处理并可视化 315 | method = "MSC" 316 | Preprocessingdata = Preprocessing(method, data) 317 | plotspc(Preprocessingdata, method) 318 | ``` 319 | 预处理后的光谱数据如图所示: 320 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/3b38f01e6ebe4a22821274bca50aa5a2.png?x-oss-process=image/watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBARWNob19Db2Rl,size_20,color_FFFFFF,t_70,g_se,x_16) 321 | 322 | 323 | 示意2:利用OpenSA实现SNV预处理 324 | 325 | ```python 326 | #载入原始数据并可视化 327 | data, label = LoadNirtest('Rgs') 328 | plotspc(data, "raw specturm") 329 | #光谱预处理并可视化 330 | method = "SNV" 331 | Preprocessingdata = Preprocessing(method, data) 332 | plotspc(Preprocessingdata, method) 333 | ``` 334 | 预处理后的光谱数据如图所示: 335 | ![SNV](https://img-blog.csdnimg.cn/558d1c710da04519b72cab08da67e9cc.png?x-oss-process=image/watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBARWNob19Db2Rl,size_20,color_FFFFFF,t_70,g_se,x_16) 336 | # 总结 337 | 利用OpenSA可以非常简单的实现对光谱的预处理,完整代码可从获得[GitHub仓库](https://github.com/FuSiry/OpenSA) 如果对您有用,请点赞! 338 | 代码现仅供学术使用,若对您的学术研究有帮助,请引用本人的论文,同时,未经许可不得用于商业化应用,欢迎大家继续补充OpenSA中所涉及到的算法 339 | --------------------------------------------------------------------------------