├── LICENSE
├── OpenSA
    ├── Classification
    │   ├── CNN.py
    │   ├── ClassicCls.py
    │   ├── Cls.py
    │   ├── DeepCls.py
    │   ├── SAE.py
    │   └── __pycache__
    │   │   ├── CNN.cpython-38.pyc
    │   │   ├── CNN.cpython-39.pyc
    │   │   ├── ClassicCls.cpython-38.pyc
    │   │   ├── ClassicCls.cpython-39.pyc
    │   │   ├── Cls.cpython-38.pyc
    │   │   ├── Cls.cpython-39.pyc
    │   │   └── SAE.cpython-38.pyc
    ├── Clustering
    │   ├── Cluster.py
    │   └── __pycache__
    │   │   ├── Cluster.cpython-38.pyc
    │   │   └── Cluster.cpython-39.pyc
    ├── Data
    │   ├── Cls
    │   │   └── table.csv
    │   └── Rgs
    │   │   ├── Cdata1.csv
    │   │   ├── Cdata2.csv
    │   │   ├── Tdata1.csv
    │   │   ├── Tdata2.csv
    │   │   ├── Vdata1.csv
    │   │   └── Vdata2.csv
    ├── DataLoad
    │   ├── DataLoad.py
    │   └── __pycache__
    │   │   ├── DataLoad.cpython-38.pyc
    │   │   └── DataLoad.cpython-39.pyc
    ├── Evaluate
    │   ├── RgsEvaluate.py
    │   └── __pycache__
    │   │   ├── RgsEvaluate.cpython-38.pyc
    │   │   └── RgsEvaluate.cpython-39.pyc
    ├── Plot
    │   └── readme.txt
    ├── Preprocessing
    │   ├── Preprocessing.py
    │   └── __pycache__
    │   │   ├── Preprocessing.cpython-38.pyc
    │   │   └── Preprocessing.cpython-39.pyc
    ├── Regression
    │   ├── CNN.py
    │   ├── ClassicRgs.py
    │   ├── CnnModel.py
    │   ├── DeepRgs.py
    │   ├── Rgs.py
    │   └── __pycache__
    │   │   ├── CNN.cpython-38.pyc
    │   │   ├── ClassicRgs.cpython-38.pyc
    │   │   ├── ClassicRgs.cpython-39.pyc
    │   │   ├── CnnModel.cpython-38.pyc
    │   │   ├── Rgs.cpython-38.pyc
    │   │   └── Rgs.cpython-39.pyc
    ├── Simcalculation
    │   ├── SimCa.py
    │   └── __pycache__
    │   │   ├── SimCa.cpython-38.pyc
    │   │   └── SimCa.cpython-39.pyc
    ├── WaveSelect
    │   ├── Cars.py
    │   ├── GA.py
    │   ├── Lar.py
    │   ├── Pca.py
    │   ├── Spa.py
    │   ├── Uve.py
    │   ├── WaveSelcet.py
    │   └── __pycache__
    │   │   ├── Cars.cpython-38.pyc
    │   │   ├── Cars.cpython-39.pyc
    │   │   ├── GA.cpython-38.pyc
    │   │   ├── Lar.cpython-38.pyc
    │   │   ├── Lar.cpython-39.pyc
    │   │   ├── Pca.cpython-38.pyc
    │   │   ├── Pca.cpython-39.pyc
    │   │   ├── Spa.cpython-38.pyc
    │   │   ├── Spa.cpython-39.pyc
    │   │   ├── Uve.cpython-38.pyc
    │   │   ├── Uve.cpython-39.pyc
    │   │   ├── WaveSelcet.cpython-38.pyc
    │   │   └── WaveSelcet.cpython-39.pyc
    ├── example.py
    └── opt
    │   └── readme.txt
├── OpenSAV2
└── README.md


/LICENSE:
--------------------------------------------------------------------------------
  1 |                                  Apache License
  2 |                            Version 2.0, January 2004
  3 |                         http://www.apache.org/licenses/
  4 | 
  5 |    TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
  6 | 
  7 |    1. Definitions.
  8 | 
  9 |       "License" shall mean the terms and conditions for use, reproduction,
 10 |       and distribution as defined by Sections 1 through 9 of this document.
 11 | 
 12 |       "Licensor" shall mean the copyright owner or entity authorized by
 13 |       the copyright owner that is granting the License.
 14 | 
 15 |       "Legal Entity" shall mean the union of the acting entity and all
 16 |       other entities that control, are controlled by, or are under common
 17 |       control with that entity. For the purposes of this definition,
 18 |       "control" means (i) the power, direct or indirect, to cause the
 19 |       direction or management of such entity, whether by contract or
 20 |       otherwise, or (ii) ownership of fifty percent (50%) or more of the
 21 |       outstanding shares, or (iii) beneficial ownership of such entity.
 22 | 
 23 |       "You" (or "Your") shall mean an individual or Legal Entity
 24 |       exercising permissions granted by this License.
 25 | 
 26 |       "Source" form shall mean the preferred form for making modifications,
 27 |       including but not limited to software source code, documentation
 28 |       source, and configuration files.
 29 | 
 30 |       "Object" form shall mean any form resulting from mechanical
 31 |       transformation or translation of a Source form, including but
 32 |       not limited to compiled object code, generated documentation,
 33 |       and conversions to other media types.
 34 | 
 35 |       "Work" shall mean the work of authorship, whether in Source or
 36 |       Object form, made available under the License, as indicated by a
 37 |       copyright notice that is included in or attached to the work
 38 |       (an example is provided in the Appendix below).
 39 | 
 40 |       "Derivative Works" shall mean any work, whether in Source or Object
 41 |       form, that is based on (or derived from) the Work and for which the
 42 |       editorial revisions, annotations, elaborations, or other modifications
 43 |       represent, as a whole, an original work of authorship. For the purposes
 44 |       of this License, Derivative Works shall not include works that remain
 45 |       separable from, or merely link (or bind by name) to the interfaces of,
 46 |       the Work and Derivative Works thereof.
 47 | 
 48 |       "Contribution" shall mean any work of authorship, including
 49 |       the original version of the Work and any modifications or additions
 50 |       to that Work or Derivative Works thereof, that is intentionally
 51 |       submitted to Licensor for inclusion in the Work by the copyright owner
 52 |       or by an individual or Legal Entity authorized to submit on behalf of
 53 |       the copyright owner. For the purposes of this definition, "submitted"
 54 |       means any form of electronic, verbal, or written communication sent
 55 |       to the Licensor or its representatives, including but not limited to
 56 |       communication on electronic mailing lists, source code control systems,
 57 |       and issue tracking systems that are managed by, or on behalf of, the
 58 |       Licensor for the purpose of discussing and improving the Work, but
 59 |       excluding communication that is conspicuously marked or otherwise
 60 |       designated in writing by the copyright owner as "Not a Contribution."
 61 | 
 62 |       "Contributor" shall mean Licensor and any individual or Legal Entity
 63 |       on behalf of whom a Contribution has been received by Licensor and
 64 |       subsequently incorporated within the Work.
 65 | 
 66 |    2. Grant of Copyright License. Subject to the terms and conditions of
 67 |       this License, each Contributor hereby grants to You a perpetual,
 68 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 69 |       copyright license to reproduce, prepare Derivative Works of,
 70 |       publicly display, publicly perform, sublicense, and distribute the
 71 |       Work and such Derivative Works in Source or Object form.
 72 | 
 73 |    3. Grant of Patent License. Subject to the terms and conditions of
 74 |       this License, each Contributor hereby grants to You a perpetual,
 75 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 76 |       (except as stated in this section) patent license to make, have made,
 77 |       use, offer to sell, sell, import, and otherwise transfer the Work,
 78 |       where such license applies only to those patent claims licensable
 79 |       by such Contributor that are necessarily infringed by their
 80 |       Contribution(s) alone or by combination of their Contribution(s)
 81 |       with the Work to which such Contribution(s) was submitted. If You
 82 |       institute patent litigation against any entity (including a
 83 |       cross-claim or counterclaim in a lawsuit) alleging that the Work
 84 |       or a Contribution incorporated within the Work constitutes direct
 85 |       or contributory patent infringement, then any patent licenses
 86 |       granted to You under this License for that Work shall terminate
 87 |       as of the date such litigation is filed.
 88 | 
 89 |    4. Redistribution. You may reproduce and distribute copies of the
 90 |       Work or Derivative Works thereof in any medium, with or without
 91 |       modifications, and in Source or Object form, provided that You
 92 |       meet the following conditions:
 93 | 
 94 |       (a) You must give any other recipients of the Work or
 95 |           Derivative Works a copy of this License; and
 96 | 
 97 |       (b) You must cause any modified files to carry prominent notices
 98 |           stating that You changed the files; and
 99 | 
100 |       (c) You must retain, in the Source form of any Derivative Works
101 |           that You distribute, all copyright, patent, trademark, and
102 |           attribution notices from the Source form of the Work,
103 |           excluding those notices that do not pertain to any part of
104 |           the Derivative Works; and
105 | 
106 |       (d) If the Work includes a "NOTICE" text file as part of its
107 |           distribution, then any Derivative Works that You distribute must
108 |           include a readable copy of the attribution notices contained
109 |           within such NOTICE file, excluding those notices that do not
110 |           pertain to any part of the Derivative Works, in at least one
111 |           of the following places: within a NOTICE text file distributed
112 |           as part of the Derivative Works; within the Source form or
113 |           documentation, if provided along with the Derivative Works; or,
114 |           within a display generated by the Derivative Works, if and
115 |           wherever such third-party notices normally appear. The contents
116 |           of the NOTICE file are for informational purposes only and
117 |           do not modify the License. You may add Your own attribution
118 |           notices within Derivative Works that You distribute, alongside
119 |           or as an addendum to the NOTICE text from the Work, provided
120 |           that such additional attribution notices cannot be construed
121 |           as modifying the License.
122 | 
123 |       You may add Your own copyright statement to Your modifications and
124 |       may provide additional or different license terms and conditions
125 |       for use, reproduction, or distribution of Your modifications, or
126 |       for any such Derivative Works as a whole, provided Your use,
127 |       reproduction, and distribution of the Work otherwise complies with
128 |       the conditions stated in this License.
129 | 
130 |    5. Submission of Contributions. Unless You explicitly state otherwise,
131 |       any Contribution intentionally submitted for inclusion in the Work
132 |       by You to the Licensor shall be under the terms and conditions of
133 |       this License, without any additional terms or conditions.
134 |       Notwithstanding the above, nothing herein shall supersede or modify
135 |       the terms of any separate license agreement you may have executed
136 |       with Licensor regarding such Contributions.
137 | 
138 |    6. Trademarks. This License does not grant permission to use the trade
139 |       names, trademarks, service marks, or product names of the Licensor,
140 |       except as required for reasonable and customary use in describing the
141 |       origin of the Work and reproducing the content of the NOTICE file.
142 | 
143 |    7. Disclaimer of Warranty. Unless required by applicable law or
144 |       agreed to in writing, Licensor provides the Work (and each
145 |       Contributor provides its Contributions) on an "AS IS" BASIS,
146 |       WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147 |       implied, including, without limitation, any warranties or conditions
148 |       of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149 |       PARTICULAR PURPOSE. You are solely responsible for determining the
150 |       appropriateness of using or redistributing the Work and assume any
151 |       risks associated with Your exercise of permissions under this License.
152 | 
153 |    8. Limitation of Liability. In no event and under no legal theory,
154 |       whether in tort (including negligence), contract, or otherwise,
155 |       unless required by applicable law (such as deliberate and grossly
156 |       negligent acts) or agreed to in writing, shall any Contributor be
157 |       liable to You for damages, including any direct, indirect, special,
158 |       incidental, or consequential damages of any character arising as a
159 |       result of this License or out of the use or inability to use the
160 |       Work (including but not limited to damages for loss of goodwill,
161 |       work stoppage, computer failure or malfunction, or any and all
162 |       other commercial damages or losses), even if such Contributor
163 |       has been advised of the possibility of such damages.
164 | 
165 |    9. Accepting Warranty or Additional Liability. While redistributing
166 |       the Work or Derivative Works thereof, You may choose to offer,
167 |       and charge a fee for, acceptance of support, warranty, indemnity,
168 |       or other liability obligations and/or rights consistent with this
169 |       License. However, in accepting such obligations, You may act only
170 |       on Your own behalf and on Your sole responsibility, not on behalf
171 |       of any other Contributor, and only if You agree to indemnify,
172 |       defend, and hold each Contributor harmless for any liability
173 |       incurred by, or claims asserted against, such Contributor by reason
174 |       of your accepting any such warranty or additional liability.
175 | 
176 |    END OF TERMS AND CONDITIONS
177 | 
178 |    APPENDIX: How to apply the Apache License to your work.
179 | 
180 |       To apply the Apache License to your work, attach the following
181 |       boilerplate notice, with the fields enclosed by brackets "[]"
182 |       replaced with your own identifying information. (Don't include
183 |       the brackets!)  The text should be enclosed in the appropriate
184 |       comment syntax for the file format. We also recommend that a
185 |       file or class name and description of purpose be included on the
186 |       same "printed page" as the copyright notice for easier
187 |       identification within third-party archives.
188 | 
189 |    Copyright [yyyy] [name of copyright owner]
190 | 
191 |    Licensed under the Apache License, Version 2.0 (the "License");
192 |    you may not use this file except in compliance with the License.
193 |    You may obtain a copy of the License at
194 | 
195 |        http://www.apache.org/licenses/LICENSE-2.0
196 | 
197 |    Unless required by applicable law or agreed to in writing, software
198 |    distributed under the License is distributed on an "AS IS" BASIS,
199 |    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200 |    See the License for the specific language governing permissions and
201 |    limitations under the License.
202 | 


--------------------------------------------------------------------------------
/OpenSA/Classification/CNN.py:
--------------------------------------------------------------------------------
  1 | """
  2 |     -*- coding: utf-8 -*-
  3 |     @Time   :2022/04/15 9:36
  4 |     @Author : Pengyou FU
  5 |     @blogs  : https://blog.csdn.net/Echo_Code?spm=1000.2115.3001.5343
  6 |     @github : https://github.com/FuSiry/OpenSA
  7 |     @WeChat : Fu_siry
  8 |     @License：Apache-2.0 license
  9 | 
 10 | """
 11 | 
 12 | import torch.nn.functional as F
 13 | import numpy as np
 14 | import torch
 15 | import torch.nn as nn
 16 | from torch.autograd import Variable
 17 | from torch.utils.data import Dataset
 18 | from sklearn.metrics import accuracy_score,auc,roc_curve,precision_recall_curve,f1_score
 19 | import torch.optim as optim
 20 | # from EarlyStop import EarlyStopping
 21 | from sklearn.preprocessing import scale,MinMaxScaler,Normalizer,StandardScaler
 22 | import time
 23 | 
 24 | device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 25 | 
 26 | 
 27 | 
 28 | 
 29 | def conv_k(in_chs, out_chs, k=1, s=1, p=1):
 30 |     """ Build size k kernel's convolution layer with padding"""
 31 |     return nn.Conv1d(in_chs, out_chs, kernel_size=k, stride=s, padding=p, bias=False)
 32 | 
 33 | #自定义加载数据集
 34 | class MyDataset(Dataset):
 35 |     def __init__(self,specs,labels):
 36 |         self.specs = specs
 37 |         self.labels = labels
 38 | 
 39 |     def __getitem__(self, index):
 40 |         spec,target = self.specs[index],self.labels[index]
 41 |         return spec,target
 42 | 
 43 |     def __len__(self):
 44 |         return len(self.specs)
 45 | 
 46 | ###定义是否需要标准化
 47 | def ZspPocess(X_train, X_test,y_train,y_test,need=True): #True:需要标准化，Flase：不需要标准化
 48 |     if (need == True):
 49 |         # X_train_Nom = scale(X_train)
 50 |         # X_test_Nom = scale(X_test)
 51 |         standscale = StandardScaler()
 52 |         X_train_Nom = standscale.fit_transform(X_train)
 53 |         X_test_Nom = standscale.transform(X_test)
 54 | 
 55 |         X_train_Nom = X_train_Nom[:, np.newaxis, :]
 56 |         X_test_Nom = X_test_Nom[:, np.newaxis, :]
 57 |         data_train = MyDataset(X_train_Nom, y_train)
 58 |         ##使用loader加载测试数据
 59 |         data_test = MyDataset(X_test_Nom, y_test)
 60 |         return data_train, data_test
 61 |     else:
 62 |         X_train = X_train[:, np.newaxis, :]  # （483， 1， 2074）
 63 |         X_test = X_test[:, np.newaxis, :]
 64 |         data_train = MyDataset(X_train, y_train)
 65 |         ##使用loader加载测试数据
 66 |         data_test = MyDataset(X_test, y_test)
 67 |         return data_train, data_test
 68 | 
 69 | class CNN3Lyaers(nn.Module):
 70 |     def __init__(self, nls):
 71 |         super(CNN3Lyaers, self).__init__()
 72 |         self.CONV1 = nn.Sequential(
 73 |             nn.Conv1d(1, 64, 21, 1),
 74 |             nn.BatchNorm1d(64),  # 对输出做均一化
 75 |             nn.ReLU(),
 76 |             nn.MaxPool1d(3, 3)
 77 |         )
 78 |         self.CONV2 = nn.Sequential(
 79 |             nn.Conv1d(64, 64, 19, 1),
 80 |             nn.BatchNorm1d(64),  # 对输出做均一化
 81 |             nn.ReLU(),
 82 |             nn.MaxPool1d(3, 3)
 83 |         )
 84 |         self.CONV3 = nn.Sequential(
 85 |             nn.Conv1d(64, 64, 17, 1),
 86 |             nn.BatchNorm1d(64),  # 对输出做均一化
 87 |             nn.ReLU(),
 88 |             nn.MaxPool1d(3, 3),
 89 |         )
 90 |         self.fc = nn.Sequential(
 91 |             # nn.Linear(4224, nls)
 92 |             nn.Linear(384, nls)
 93 |         )
 94 | 
 95 |     def forward(self, x):
 96 |         x = self.CONV1(x)
 97 |         x = self.CONV2(x)
 98 |         x = self.CONV3(x)
 99 |         x = x.view(x.size(0), -1)
100 |         # print(x.size())
101 |         out = self.fc(x)
102 |         out = F.softmax(out,dim=1)
103 |         return out
104 | 
105 | class mlpmodel(nn.Module):
106 |     def __init__(self, inputdim, outputdim):
107 |         super(mlpmodel, self).__init__()
108 |         self.fc1 = nn.Linear(inputdim, inputdim//2)
109 |         self.fc2= nn.Linear(inputdim//2, inputdim // 4)
110 |         self.fc3 = nn.Linear(inputdim//4, outputdim)
111 |     def forward(self, x):
112 |         x = self.fc1(x)
113 |         x = self.fc2(x)
114 |         x = self.fc3(x)
115 |         # x = F.softmax(x, dim=1)
116 |         return x
117 | 
118 | 
119 | def CNNTrain(X_train, X_test,y_train,y_test, BATCH_SIZE, n_epochs, nls):
120 | 
121 | 
122 |     data_train, data_test = ZspPocess(X_train, X_test,y_train,y_test,need=True)
123 |     train_loader = torch.utils.data.DataLoader(data_train, batch_size=BATCH_SIZE, shuffle=True)
124 |     test_loader = torch.utils.data.DataLoader(data_test, batch_size=BATCH_SIZE, shuffle=True)
125 | 
126 |     store_path = ".//model//all//CNN18"
127 | 
128 |     model = CNN3Lyaers(nls=nls).to(device)
129 |     optimizer = optim.Adam(model.parameters(),
130 |                            lr=0.0001,weight_decay=0.0001)
131 |     scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, 'min', factor=0.5, verbose=1, eps=1e-06,
132 |                                                            patience=10)
133 |     criterion = nn.CrossEntropyLoss().to(device)  # 损失函数为焦损函数，多用于类别不平衡的多分类问题
134 |     #early_stopping = EarlyStopping(patience=30, delta=1e-4, path=store_path, verbose=False)
135 | 
136 |     for epoch in range(n_epochs):
137 |         train_acc = []
138 |         for i, data in enumerate(train_loader):  # gives batch data, normalize x when iterate train_loader
139 |             model.train()  # 不训练
140 |             inputs, labels = data  # 输入和标签都等于data
141 |             inputs = Variable(inputs).type(torch.FloatTensor).to(device)  # batch x
142 |             labels = Variable(labels).type(torch.LongTensor).to(device)  # batch y
143 |             output = model(inputs)  # cnn output
144 |             trian_loss = criterion(output, labels)  # cross entropy loss
145 |             optimizer.zero_grad()  # clear gradients for this training step
146 |             trian_loss.backward()  # backpropagation, compute gradients
147 |             optimizer.step()  # apply gradients
148 |             _, predicted = torch.max(output.data, 1)
149 |             y_predicted = predicted.detach().cpu().numpy()
150 |             y_label = labels.detach().cpu().numpy()
151 |             acc = accuracy_score(y_label, y_predicted)
152 |             train_acc.append(acc)
153 | 
154 |         with torch.no_grad():  # 无梯度
155 |             test_acc = []
156 |             testloss = []
157 |             for i, data in enumerate(test_loader):
158 |                 model.eval()  # 不训练
159 |                 inputs, labels = data  # 输入和标签都等于data
160 |                 inputs = Variable(inputs).type(torch.FloatTensor).to(device)  # batch x
161 |                 labels = Variable(labels).type(torch.LongTensor).to(device)  # batch y
162 |                 outputs = model(inputs)  # 输出等于进入网络后的输入
163 |                 test_loss = criterion(outputs, labels)  # cross entropy loss
164 |                 _, predicted = torch.max(outputs.data,1)
165 |                 predicted = predicted.cpu().numpy()
166 |                 labels = labels.cpu().numpy()
167 |                 acc = accuracy_score(labels, predicted)
168 |                 test_acc.append(acc)
169 |                 testloss.append(test_loss.item())
170 |                 avg_loss = np.mean(testloss)
171 | 
172 |             scheduler.step(avg_loss)
173 |             # early_stopping(avg_loss, model)
174 |             # if early_stopping.early_stop:
175 |             #     print(f'Early stopping! Best validation loss: {early_stopping.get_best_score()}')
176 |             #     break
177 | 
178 | def CNNtest(X_train, X_test, y_train, y_test, BATCH_SIZE, nls):
179 |     # data_train, data_test = DataLoad(tp, test_ratio, 0, 404)
180 | 
181 |     data_train, data_test = ZspPocess(X_train, X_test, y_train, y_test, need=True)
182 |     test_loader = torch.utils.data.DataLoader(data_test, batch_size=BATCH_SIZE, shuffle=True)
183 | 
184 |     store_path = ".//model//all//CNN18"
185 | 
186 |     model = CNN3Lyaers(nls=nls).to(device)
187 | 
188 |     model.load_state_dict(torch.load(store_path))
189 |     test_acc = []
190 |     for i, data in enumerate(test_loader):
191 |         model.eval()  # 不训练
192 |         inputs, labels = data  # 输入和标签都等于data
193 |         inputs = Variable(inputs).type(torch.FloatTensor).to(device)  # batch x
194 |         labels = Variable(labels).type(torch.LongTensor).to(device)  # batch y
195 |         outputs = model(inputs)  # 输出等于进入网络后的输入
196 |         _, predicted = torch.max(outputs.data, 1)
197 |         predicted = predicted.cpu().numpy()
198 |         labels = labels.cpu().numpy()
199 |         acc = accuracy_score(labels, predicted)
200 |         test_acc.append(acc)
201 |     return np.mean(test_acc)
202 | 
203 | 
204 | def CNN(X_train, X_test, y_train, y_test, BATCH_SIZE, n_epochs,nls):
205 | 
206 |     CNNTrain(X_train, X_test, y_train, y_test,BATCH_SIZE,n_epochs,nls)
207 |     acc = CNNtest(X_train, X_test, y_train, y_test,BATCH_SIZE,nls)
208 | 
209 |     return acc


--------------------------------------------------------------------------------
/OpenSA/Classification/ClassicCls.py:
--------------------------------------------------------------------------------
 1 | """
 2 |     -*- coding: utf-8 -*-
 3 |     @Time   :2022/04/12 17:10
 4 |     @Author : Pengyou FU
 5 |     @blogs  : https://blog.csdn.net/Echo_Code?spm=1000.2115.3001.5343
 6 |     @github : https://github.com/FuSiry/OpenSA
 7 |     @WeChat : Fu_siry
 8 |     @License：Apache-2.0 license
 9 | 
10 | """
11 | 
12 | 
13 | from sklearn.neural_network import MLPClassifier
14 | from sklearn.preprocessing import StandardScaler
15 | from sklearn.metrics import accuracy_score
16 | import sklearn.svm as svm
17 | import numpy as np
18 | from sklearn.cross_decomposition import PLSRegression
19 | from sklearn.ensemble import RandomForestClassifier
20 | import pandas  as pd
21 | 
22 | def ANN(X_train, X_test, y_train, y_test, StandScaler=None):
23 | 
24 |     if StandScaler:
25 |         scaler = StandardScaler() # 标准化转换
26 |         X_train = scaler.fit_transform(X_train)
27 |         X_test = scaler.transform(X_test)
28 | 
29 |     # 神经网络输入为2，第一隐藏层神经元个数为5，第二隐藏层神经元个数为2，输出结果为2分类。
30 |     # solver='lbfgs',  MLP的求解方法：L-BFGS 在小数据上表现较好，Adam 较为鲁棒，
31 |     # SGD在参数调整较优时会有最佳表现（分类效果与迭代次数）,SGD标识随机梯度下降。
32 |     #clf =  MLPClassifier(solver='adam', alpha=1e-5, hidden_layer_sizes=(8,8), random_state=1, activation='relu')
33 |     clf =  MLPClassifier(activation='relu', alpha=1e-05, batch_size='auto', beta_1=0.9,
34 |                   beta_2=0.999, early_stopping=False, epsilon=1e-08,
35 |                   hidden_layer_sizes=(10, 8), learning_rate='constant',
36 |                   learning_rate_init=0.001, max_iter=200, momentum=0.9,
37 |                   nesterovs_momentum=True, power_t=0.5, random_state=1, shuffle=True,
38 |                   solver='lbfgs', tol=0.0001, validation_fraction=0.1, verbose=False,
39 |                   warm_start=False)
40 | 
41 |     clf.fit(X_train,y_train.ravel())
42 |     predict_results=clf.predict(X_test)
43 |     acc = accuracy_score(predict_results, y_test.ravel())
44 | 
45 |     return acc
46 | 
47 | def SVM(X_train, X_test, y_train, y_test):
48 | 
49 |     clf = svm.SVC(C=1, gamma=1e-3)
50 |     clf.fit(X_train, y_train)
51 | 
52 |     predict_results = clf.predict(X_test)
53 |     acc = accuracy_score(predict_results, y_test.ravel())
54 | 
55 |     return acc
56 | 
57 | def PLS_DA(X_train, X_test, y_train, y_test):
58 | 
59 |     y_train = pd.get_dummies(y_train)
60 |     # 建模
61 |     model = PLSRegression(n_components=228)
62 |     model.fit(X_train, y_train)
63 |     # 预测
64 |     y_pred = model.predict(X_test)
65 |     # 将预测结果（类别矩阵）转换为数值标签
66 |     y_pred = np.array([np.argmax(i) for i in y_pred])
67 |     acc = accuracy_score(y_test, y_pred)
68 | 
69 |     return acc
70 | 
71 | def RF(X_train, X_test, y_train, y_test):
72 | 
73 |     RF = RandomForestClassifier(n_estimators=15,max_depth=3,min_samples_split=3,min_samples_leaf=3)
74 |     RF.fit(X_train, y_train)
75 |     y_pred = RF.predict(X_test)
76 |     acc = accuracy_score(y_test, y_pred)
77 | 
78 |     return acc
79 | 


--------------------------------------------------------------------------------
/OpenSA/Classification/Cls.py:
--------------------------------------------------------------------------------
 1 | """
 2 |     -*- coding: utf-8 -*-
 3 |     @Time   :2022/04/12 17:10
 4 |     @Author : Pengyou FU
 5 |     @blogs  : https://blog.csdn.net/Echo_Code?spm=1000.2115.3001.5343
 6 |     @github : https://github.com/FuSiry/OpenSA
 7 |     @WeChat : Fu_siry
 8 |     @License：Apache-2.0 license
 9 | 
10 | """
11 | 
12 | from Classification.ClassicCls import ANN, SVM, PLS_DA, RF
13 | from Classification.CNN import CNN
14 | from Classification.SAE import SAE
15 | 
16 | def  QualitativeAnalysis(model, X_train, X_test, y_train, y_test):
17 | 
18 |     if model == "PLS_DA":
19 |         acc = PLS_DA(X_train, X_test, y_train, y_test)
20 |     elif model == "ANN":
21 |         acc = ANN(X_train, X_test, y_train, y_test)
22 |     elif model == "SVM":
23 |         acc = SVM(X_train, X_test, y_train, y_test)
24 |     elif model == "RF":
25 |         acc = RF(X_train, X_test, y_train, y_test)
26 |     elif model == "CNN":
27 |         acc = CNN(X_train, X_test, y_train, y_test, 16, 160, 4)
28 |     elif model == "SAE":
29 |         acc = SAE(X_train, X_test, y_train, y_test)
30 |     else:
31 |         print("no this model of QuantitativeAnalysis")
32 | 
33 |     return acc


--------------------------------------------------------------------------------
/OpenSA/Classification/DeepCls.py:
--------------------------------------------------------------------------------
 1 | """
 2 |     -*- coding: utf-8 -*-
 3 |     @Time   :2022/04/12 17:10
 4 |     @Author : Pengyou FU
 5 |     @blogs  : https://blog.csdn.net/Echo_Code?spm=1000.2115.3001.5343
 6 |     @github : https://github.com/FuSiry/OpenSA
 7 |     @WeChat : Fu_siry
 8 |     @License：Apache-2.0 license
 9 | 
10 | """
11 | 
12 | 


--------------------------------------------------------------------------------
/OpenSA/Classification/SAE.py:
--------------------------------------------------------------------------------
  1 | """
  2 |     -*- coding: utf-8 -*-
  3 |     @Time   :2022/04/15 9:36
  4 |     @Author : Pengyou FU
  5 |     @blogs  : https://blog.csdn.net/Echo_Code?spm=1000.2115.3001.5343
  6 |     @github : https://github.com/FuSiry/OpenSA
  7 |     @WeChat : Fu_siry
  8 |     @License：Apache-2.0 license
  9 | 
 10 | """
 11 | 
 12 | 
 13 | 
 14 | import torch
 15 | from torch import nn
 16 | import torch.nn.functional as F
 17 | from torch.autograd import Variable
 18 | from torch import optim
 19 | import torch.utils.data as data
 20 | import numpy as np
 21 | import time
 22 | from sklearn.metrics import accuracy_score
 23 | 
 24 | device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 25 | 
 26 | class MyDataset(data.Dataset):
 27 |     def __init__(self,specs,labels):
 28 |         self.specs = specs
 29 |         self.labels = labels
 30 |     def __getitem__(self, index):
 31 |         spec,target = self.specs[index],self.labels[index]
 32 |         return spec,target
 33 |     def __len__(self):
 34 |         return len(self.specs)
 35 | 
 36 | 
 37 | class AutoEncoder(nn.Module):
 38 | 
 39 |     def __init__(self, inputDim, hiddenDim):
 40 |         super().__init__()
 41 |         self.inputDim = inputDim
 42 |         self.hiddenDim = hiddenDim
 43 |         self.encoder = nn.Linear(inputDim, hiddenDim, bias=True)
 44 |         self.decoder = nn.Linear(hiddenDim, inputDim, bias=True)
 45 |         self.act = F.relu
 46 | 
 47 |     def forward(self, x, rep=False):
 48 | 
 49 |         hidden = self.encoder(x)
 50 |         hidden = self.act(hidden)
 51 |         if rep == False:
 52 |             out = self.decoder(hidden)
 53 |             #out = self.act(out)
 54 |             return out
 55 |         else:
 56 |             return hidden
 57 | 
 58 | 
 59 | class SAE(nn.Module):
 60 | 
 61 |     def __init__(self, encoderList):
 62 | 
 63 |         super().__init__()
 64 | 
 65 |         self.encoderList = encoderList
 66 |         self.en1 = encoderList[0]
 67 |         self.en2 = encoderList[1]
 68 |         # self.en3 = encoderList[2]
 69 | 
 70 |         self.fc = nn.Linear(128, 4, bias=True)
 71 | 
 72 |     def forward(self, x):
 73 | 
 74 |         out = x
 75 |         out = self.en1(out, rep=True)
 76 |         out = self.en2(out, rep=True)
 77 |         #out = self.en3(out, rep=True)
 78 |         out = self.fc(out)
 79 |         # out = F.log_softmax(out)
 80 | 
 81 |         return out
 82 | 
 83 | 
 84 | class SAE_net(object):
 85 |     def __init__(self, AE_epoch = 200, SAE_epoch = 200,
 86 |                  input_dim = 404, hidden1_dim = 512,
 87 |                  hidden2_dim = 128, output_dim = 4,
 88 |                  batch_size = 128):
 89 |         self.AE_epoch = AE_epoch
 90 |         self.SAE_epoch = SAE_epoch
 91 |         self.input_dim = input_dim
 92 |         self.hidden1_dim = hidden1_dim
 93 |         self.hidden2_dim = hidden2_dim
 94 |         self.output_dim = output_dim
 95 |         self.batch_size = batch_size
 96 |         self.train_loader = None
 97 | 
 98 |         encoder1 = AutoEncoder(self.input_dim, self.hidden1_dim)
 99 |         encoder2 = AutoEncoder(self.hidden1_dim, self.hidden2_dim)
100 |         self.encoder_list = [encoder1, encoder2]
101 | 
102 | 
103 |     def trainAE(self, x_train, y_train, encoderList, trainLayer, batchSize, epoch, useCuda=False):
104 |         if useCuda:
105 |             for i in range(len(encoderList)):
106 |                 encoderList[i].to(device)
107 | 
108 |         optimizer = optim.Adam(encoderList[trainLayer].parameters())
109 |         ceriation = nn.MSELoss()
110 | 
111 |         data_train = MyDataset(x_train, y_train)
112 |         self.train_loader = torch.utils.data.DataLoader(data_train, batch_size=batchSize, shuffle=True)
113 | 
114 |         for i in range(epoch):
115 |             sum_loss = 0
116 |             if trainLayer != 0:  # 单独处理第0层，因为第一个编码器之前没有前驱的编码器了
117 |                 for i in range(trainLayer):  # 冻结要训练前面的所有参数
118 |                     for param in encoderList[i].parameters():
119 |                         param.requires_grad = False
120 | 
121 |             for batch_idx, (x, target) in enumerate(self.train_loader):
122 |                 optimizer.zero_grad()
123 |                 if useCuda:
124 |                     x, target = x.to(device), target.to(device)
125 |                 x, target = Variable(x).type(torch.FloatTensor), Variable(target).type(torch.LongTensor)
126 |                 # x = x.view(-1, 404)
127 |                 x = x.view(x.size(0), -1)
128 |                 # 产生需要训练层的输入数据
129 |                 # inputs = Variable(inputs).type(torch.FloatTensor).to(device)  # batch x
130 |                 # labels = Variable(labels).type(torch.LongTensor).to(device)  # batch y
131 |                 out = x
132 |                 if trainLayer != 0:
133 |                     for i in range(trainLayer):
134 |                         out = encoderList[i](out, rep=True)
135 | 
136 |                 # 训练指定的自编码器
137 |                 pred = encoderList[trainLayer](out, rep=False).cpu()
138 | 
139 |                 loss = ceriation(pred, out)
140 |                 sum_loss += loss.item()
141 |                 loss.backward()
142 |                 optimizer.step()
143 | 
144 |     def trainClassifier(self, model, epoch, useCuda=False):
145 |         if useCuda:
146 |             model = model.to(device)
147 | 
148 |         # 解锁参数
149 |         for param in model.parameters():
150 |             param.requires_grad = True
151 | 
152 |         optimizer = optim.Adam(model.parameters())
153 |         ceriation = nn.CrossEntropyLoss()
154 | 
155 |         for i in range(epoch):
156 |             # trainning
157 |             sum_loss = 0
158 |             for batch_idx, (x, target) in enumerate(self.train_loader):
159 |                 optimizer.zero_grad()
160 |                 if useCuda:
161 |                     x, target = x.to(device), target.to(device)
162 |                 x, target = Variable(x).type(torch.FloatTensor), Variable(target).type(torch.LongTensor)
163 |                 x = x.view(-1, 404)
164 | 
165 |                 out = model(x)
166 | 
167 |                 loss = ceriation(out, target)
168 |                 sum_loss += loss.item()
169 |                 loss.backward()
170 |                 optimizer.step()
171 |         self.model = model
172 | 
173 |     def fit(self, x_train = None, y_train = None):
174 |         x_train = x_train[:, np.newaxis, :]
175 |         x_train = torch.from_numpy(x_train)
176 |         x_train = x_train.float()
177 | 
178 |         # pre-train
179 |         for i in range(2):
180 |             self.trainAE(x_train=x_train, y_train=y_train,
181 |                         encoderList = self.encoder_list, trainLayer=i, batchSize=self.batch_size,
182 |                          epoch = self.AE_epoch)
183 |         model = SAE(encoderList=self.encoder_list)
184 |         self.trainClassifier(model=model, epoch=self.SAE_epoch)
185 | 
186 |     def predict_proba(self, x_test):
187 |         x_test = torch.from_numpy(x_test)
188 |         x_test = x_test.float()
189 |         x_test = x_test[:, np.newaxis, :]
190 |         x_test = Variable(x_test)
191 |         x_test = x_test.view(-1, 404)
192 | 
193 |         out = self.model(x_test)
194 |         outdata = out.data
195 |         self.y_proba = outdata
196 |         y_proba = outdata.numpy()
197 |         return y_proba
198 | 
199 |     def predict(self, x_test):
200 |         _, y_out = torch.max(self.y_proba, 1)
201 |         y_pred = []
202 |         for i in y_out:
203 |             y_pred.append(i)
204 |         return y_pred
205 | 
206 | def SAE(X_train, y_train, X_test, y_test):
207 | 
208 |     clf = SAE_net()
209 |     clf.fit(X_train, y_train)
210 |     y_proba = clf.predict_proba(X_test)
211 |     y_pred = clf.predict(X_test)
212 |     # ACC
213 |     acc = accuracy_score(y_test, y_pred)
214 | 
215 |     return acc


--------------------------------------------------------------------------------
/OpenSA/Classification/__pycache__/CNN.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/Classification/__pycache__/CNN.cpython-38.pyc


--------------------------------------------------------------------------------
/OpenSA/Classification/__pycache__/CNN.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/Classification/__pycache__/CNN.cpython-39.pyc


--------------------------------------------------------------------------------
/OpenSA/Classification/__pycache__/ClassicCls.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/Classification/__pycache__/ClassicCls.cpython-38.pyc


--------------------------------------------------------------------------------
/OpenSA/Classification/__pycache__/ClassicCls.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/Classification/__pycache__/ClassicCls.cpython-39.pyc


--------------------------------------------------------------------------------
/OpenSA/Classification/__pycache__/Cls.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/Classification/__pycache__/Cls.cpython-38.pyc


--------------------------------------------------------------------------------
/OpenSA/Classification/__pycache__/Cls.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/Classification/__pycache__/Cls.cpython-39.pyc


--------------------------------------------------------------------------------
/OpenSA/Classification/__pycache__/SAE.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/Classification/__pycache__/SAE.cpython-38.pyc


--------------------------------------------------------------------------------
/OpenSA/Clustering/Cluster.py:
--------------------------------------------------------------------------------
  1 | """
  2 |     -*- coding: utf-8 -*-
  3 |     @Time   :2022/04/12 17:10
  4 |     @Author : Pengyou FU
  5 |     @blogs  : https://blog.csdn.net/Echo_Code?spm=1000.2115.3001.5343
  6 |     @github : https://github.com/FuSiry/OpenSA
  7 |     @WeChat : Fu_siry
  8 |     @License：Apache-2.0 license
  9 | 
 10 | """
 11 | 
 12 | 
 13 | 
 14 | from sklearn.cluster import KMeans
 15 | import numpy as np
 16 | 
 17 | def Kmeans(data, n_clusters=10, iter_num=30):
 18 | 
 19 |     cluster = KMeans(n_clusters=n_clusters, random_state=0, max_iter=iter_num)
 20 |     cluster.fit(data)
 21 |     label = cluster.labels_  # 对原数据表进行类别标记
 22 | 
 23 |     return label
 24 | 
 25 | class FCM:
 26 |     def __init__(self, data, clust_num, iter_num=10, m=2) :
 27 |         self.data = data
 28 |         self.cnum = clust_num
 29 |         self.sample_num=data.shape[0]
 30 |         self.m = m
 31 |         self.dim = data.shape[-1]  # 数据最后一维度数
 32 |         Jlist=[]   # 存储目标函数计算值的
 33 | 
 34 |         U = self.Initial_U(self.sample_num, self.cnum)
 35 |         for i in range(0, iter_num): # 迭代次数默认为10
 36 |             C = self.Cen_Iter(self.data, U, self.cnum)
 37 |             U = self.U_Iter(U, C)
 38 |             print("第%d次迭代" %(i+1) ,end="")
 39 |             print("聚类中心",C)
 40 |             J = self.J_calcu(self.data, U, C)  # 计算目标函数
 41 |             Jlist = np.append(Jlist, J)
 42 |         self.label = np.argmax(U, axis=0)  # 所有样本的分类标签
 43 |         self.Clast = C    # 最后的类中心矩阵
 44 |         self.Jlist = Jlist  # 存储目标函数计算值的矩阵
 45 | 
 46 |     # 初始化隶属度矩阵U
 47 |     def Initial_U(self, sample_num, cluster_n):
 48 |         U = np.random.rand(sample_num, cluster_n)  # sample_num为样本个数, cluster_n为分类数
 49 |         row_sum = np.sum(U, axis=1)  # 按行求和 row_sum: sample_num*1
 50 |         row_sum = 1 / row_sum    # 该矩阵每个数取倒数
 51 |         U = np.multiply(U.T, row_sum)  # 确保U的每列和为1 (cluster_n*sample_num).*(sample_num*1)
 52 |         return U   # cluster_n*sample_num
 53 | 
 54 |     # 计算类中心
 55 |     def Cen_Iter(self, data, U, cluster_n, m):
 56 |         c_new = np.empty(shape=[0, self.dim])  # self.dim为样本矩阵的最后一维度
 57 |         for i in range(0, cluster_n):          # 如散点的dim为2，图片像素值的dim为1
 58 |             u_ij_m = U[i, :] ** m  # (sample_num,)
 59 |             sum_u = np.sum(u_ij_m)
 60 |             ux = np.dot(u_ij_m, data)  # (dim,)
 61 |             ux = np.reshape(ux, (1, self.dim))  # (1,dim)
 62 |             c_new = np.append(c_new, ux / sum_u, axis=0)   # 按列的方向添加类中心到类中心矩阵
 63 |         return c_new  # cluster_num*dim
 64 | 
 65 |     # 隶属度矩阵迭代
 66 |     def U_Iter(self, U, c, m):
 67 |         for i in range(0, self.cnum):
 68 |             for j in range(0, self.sample_num):
 69 |                 sum = 0
 70 |                 for k in range(0, self.cnum):
 71 |                     temp = (np.linalg.norm(self.data[j, :] - c[i, :]) /
 72 |                             np.linalg.norm(self.data[j, :] - c[k, :])) ** (
 73 |                                 2 / (m - 1))
 74 |                     sum = temp + sum
 75 |                 U[i, j] = 1 / sum
 76 | 
 77 |         return U
 78 | 
 79 |     # 计算目标函数值
 80 |     def J_calcu(self, data, U, c, m):
 81 |         temp1 = np.zeros(U.shape)
 82 |         for i in range(0, U.shape[0]):
 83 |             for j in range(0, U.shape[1]):
 84 |                 temp1[i, j] = (np.linalg.norm(data[j, :] - c[i, :])) ** 2 * U[i, j] ** m
 85 | 
 86 |         J = np.sum(np.sum(temp1))
 87 |         print("目标函数值:%.2f" %J)
 88 |         return J
 89 | 
 90 | def Fcm(data, n_clusters=10, iter_num=30):
 91 | 
 92 |     Fcm = FCM(data, n_clusters, iter_num)
 93 |     label =Fcm.U_Iter()
 94 | 
 95 |     return  label
 96 | 
 97 | def Cluster(method, data):
 98 |     if method == "Kmeans":
 99 |         label = Kmeans(data)
100 |     if method == "Fcm":
101 |         label = Fcm(data)
102 |     return label
103 | 
104 | 
105 | 
106 | 


--------------------------------------------------------------------------------
/OpenSA/Clustering/__pycache__/Cluster.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/Clustering/__pycache__/Cluster.cpython-38.pyc


--------------------------------------------------------------------------------
/OpenSA/Clustering/__pycache__/Cluster.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/Clustering/__pycache__/Cluster.cpython-39.pyc


--------------------------------------------------------------------------------
/OpenSA/DataLoad/DataLoad.py:
--------------------------------------------------------------------------------
  1 | """
  2 |     -*- coding: utf-8 -*-
  3 |     @Time   :2022/04/12 17:10
  4 |     @Author : Pengyou FU
  5 |     @blogs  : https://blog.csdn.net/Echo_Code?spm=1000.2115.3001.5343
  6 |     @github : https://github.com/FuSiry/OpenSA
  7 |     @WeChat : Fu_siry
  8 |     @License：Apache-2.0 license
  9 | 
 10 | """
 11 | 
 12 | 
 13 | 
 14 | from sklearn.model_selection import train_test_split
 15 | import numpy as np
 16 | 
 17 | #随机划分数据集
 18 | def random(data, label, test_ratio=0.2, random_state=123):
 19 |     """
 20 |     :param data: shape (n_samples, n_features)
 21 |     :param label: shape (n_sample, )
 22 |     :param test_size: the ratio of test_size, default: 0.2
 23 |     :param random_state: the randomseed, default: 123
 24 |     :return: X_train :(n_samples, n_features)
 25 |              X_test: (n_samples, n_features)
 26 |              y_train: (n_sample, )
 27 |              y_test: (n_sample, )
 28 |     """
 29 | 
 30 |     X_train, X_test, y_train, y_test = train_test_split(data, label, test_size=test_ratio, random_state=random_state)
 31 | 
 32 |     return X_train, X_test, y_train, y_test
 33 | 
 34 | #利用SPXY算法划分数据集
 35 | def spxy(data, label, test_size=0.2):
 36 |     """
 37 |     :param data: shape (n_samples, n_features)
 38 |     :param label: shape (n_sample, )
 39 |     :param test_size: the ratio of test_size, default: 0.2
 40 |     :return: X_train :(n_samples, n_features)
 41 |              X_test: (n_samples, n_features)
 42 |              y_train: (n_sample, )
 43 |              y_test: (n_sample, )
 44 |     """
 45 |     x_backup = data
 46 |     y_backup = label
 47 |     M = data.shape[0]
 48 |     N = round((1 - test_size) * M)
 49 |     samples = np.arange(M)
 50 | 
 51 |     label = (label - np.mean(label)) / np.std(label)
 52 |     D = np.zeros((M, M))
 53 |     Dy = np.zeros((M, M))
 54 | 
 55 |     for i in range(M - 1):
 56 |         xa = data[i, :]
 57 |         ya = label[i]
 58 |         for j in range((i + 1), M):
 59 |             xb = data[j, :]
 60 |             yb = label[j]
 61 |             D[i, j] = np.linalg.norm(xa - xb)
 62 |             Dy[i, j] = np.linalg.norm(ya - yb)
 63 | 
 64 |     Dmax = np.max(D)
 65 |     Dymax = np.max(Dy)
 66 |     D = D / Dmax + Dy / Dymax
 67 | 
 68 |     maxD = D.max(axis=0)
 69 |     index_row = D.argmax(axis=0)
 70 |     index_column = maxD.argmax()
 71 | 
 72 |     m = np.zeros(N)
 73 |     m[0] = index_row[index_column]
 74 |     m[1] = index_column
 75 |     m = m.astype(int)
 76 | 
 77 |     dminmax = np.zeros(N)
 78 |     dminmax[1] = D[m[0], m[1]]
 79 | 
 80 |     for i in range(2, N):
 81 |         pool = np.delete(samples, m[:i])
 82 |         dmin = np.zeros(M - i)
 83 |         for j in range(M - i):
 84 |             indexa = pool[j]
 85 |             d = np.zeros(i)
 86 |             for k in range(i):
 87 |                 indexb = m[k]
 88 |                 if indexa < indexb:
 89 |                     d[k] = D[indexa, indexb]
 90 |                 else:
 91 |                     d[k] = D[indexb, indexa]
 92 |             dmin[j] = np.min(d)
 93 |         dminmax[i] = np.max(dmin)
 94 |         index = np.argmax(dmin)
 95 |         m[i] = pool[index]
 96 | 
 97 |     m_complement = np.delete(np.arange(data.shape[0]), m)
 98 | 
 99 |     X_train = data[m, :]
100 |     y_train = y_backup[m]
101 |     X_test = data[m_complement, :]
102 |     y_test = y_backup[m_complement]
103 | 
104 |     return X_train, X_test, y_train, y_test
105 | 
106 | #利用kennard-stone算法划分数据集
107 | def ks(data, label, test_size=0.2):
108 |     """
109 |     :param data: shape (n_samples, n_features)
110 |     :param label: shape (n_sample, )
111 |     :param test_size: the ratio of test_size, default: 0.2
112 |     :return: X_train: (n_samples, n_features)
113 |              X_test: (n_samples, n_features)
114 |              y_train: (n_sample, )
115 |              y_test: (n_sample, )
116 |     """
117 |     M = data.shape[0]
118 |     N = round((1 - test_size) * M)
119 |     samples = np.arange(M)
120 | 
121 |     D = np.zeros((M, M))
122 | 
123 |     for i in range((M - 1)):
124 |         xa = data[i, :]
125 |         for j in range((i + 1), M):
126 |             xb = data[j, :]
127 |             D[i, j] = np.linalg.norm(xa - xb)
128 | 
129 |     maxD = np.max(D, axis=0)
130 |     index_row = np.argmax(D, axis=0)
131 |     index_column = np.argmax(maxD)
132 | 
133 |     m = np.zeros(N)
134 |     m[0] = np.array(index_row[index_column])
135 |     m[1] = np.array(index_column)
136 |     m = m.astype(int)
137 |     dminmax = np.zeros(N)
138 |     dminmax[1] = D[m[0], m[1]]
139 | 
140 |     for i in range(2, N):
141 |         pool = np.delete(samples, m[:i])
142 |         dmin = np.zeros((M - i))
143 |         for j in range((M - i)):
144 |             indexa = pool[j]
145 |             d = np.zeros(i)
146 |             for k in range(i):
147 |                 indexb = m[k]
148 |                 if indexa < indexb:
149 |                     d[k] = D[indexa, indexb]
150 |                 else:
151 |                     d[k] = D[indexb, indexa]
152 |             dmin[j] = np.min(d)
153 |         dminmax[i] = np.max(dmin)
154 |         index = np.argmax(dmin)
155 |         m[i] = pool[index]
156 | 
157 |     m_complement = np.delete(np.arange(data.shape[0]), m)
158 | 
159 |     X_train = data[m, :]
160 |     y_train = label[m]
161 |     X_test = data[m_complement, :]
162 |     y_test = label[m_complement]
163 | 
164 |     return X_train, X_test, y_train, y_test
165 | 
166 | # 分别使用一个回归、一个分类的公开数据集做为example
167 | def LoadNirtest(type):
168 | 
169 |     if type == "Rgs":
170 |         CDataPath1 = './/Data//Rgs//Cdata1.csv'
171 |         VDataPath1 = './/Data//Rgs//Vdata1.csv'
172 |         TDataPath1 = './/Data//Rgs//Tdata1.csv'
173 | 
174 |         Cdata1 = np.loadtxt(open(CDataPath1, 'rb'), dtype=np.float64, delimiter=',', skiprows=0)
175 |         Vdata1 = np.loadtxt(open(VDataPath1, 'rb'), dtype=np.float64, delimiter=',', skiprows=0)
176 |         Tdata1 = np.loadtxt(open(TDataPath1, 'rb'), dtype=np.float64, delimiter=',', skiprows=0)
177 | 
178 |         Nirdata1 = np.concatenate((Cdata1, Vdata1))
179 |         Nirdata = np.concatenate((Nirdata1, Tdata1))
180 |         data = Nirdata[:, :-4]
181 |         label = Nirdata[:, -1]
182 | 
183 |     elif type == "Cls":
184 |         path = './/Data//Cls//table.csv'
185 |         Nirdata = np.loadtxt(open(path, 'rb'), dtype=np.float64, delimiter=',', skiprows=0)
186 |         data = Nirdata[:, :-1]
187 |         label = Nirdata[:, -1]
188 | 
189 |     return data, label
190 | 
191 | def SetSplit(method, data, label, test_size=0.2, randomseed=123):
192 | 
193 |     """
194 |     :param method: the method to split trainset and testset, include: random, kennard-stone(ks), spxy
195 |     :param data: shape (n_samples, n_features)
196 |     :param label: shape (n_sample, )
197 |     :param test_size: the ratio of test_size, default: 0.2
198 |     :return: X_train: (n_samples, n_features)
199 |              X_test: (n_samples, n_features)
200 |              y_train: (n_sample, )
201 |              y_test: (n_sample, )
202 |     """
203 | 
204 |     if method == "random":
205 |         X_train, X_test, y_train, y_test = random(data, label, test_size, randomseed)
206 |     elif method == "spxy":
207 |         X_train, X_test, y_train, y_test = spxy(data, label, test_size)
208 |     elif method == "ks":
209 |         X_train, X_test, y_train, y_test = ks(data, label, test_size)
210 |     else:
211 |         print("no this  method of split dataset! ")
212 | 
213 |     return X_train, X_test, y_train, y_test
214 | 


--------------------------------------------------------------------------------
/OpenSA/DataLoad/__pycache__/DataLoad.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/DataLoad/__pycache__/DataLoad.cpython-38.pyc


--------------------------------------------------------------------------------
/OpenSA/DataLoad/__pycache__/DataLoad.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/DataLoad/__pycache__/DataLoad.cpython-39.pyc


--------------------------------------------------------------------------------
/OpenSA/Evaluate/RgsEvaluate.py:
--------------------------------------------------------------------------------
 1 | """
 2 |     -*- coding: utf-8 -*-
 3 |     @Time   :2022/04/12 17:10
 4 |     @Author : Pengyou FU
 5 |     @blogs  : https://blog.csdn.net/Echo_Code?spm=1000.2115.3001.5343
 6 |     @github : https://github.com/FuSiry/OpenSA
 7 |     @WeChat : Fu_siry
 8 |     @License：Apache-2.0 license
 9 | 
10 | """
11 | 
12 | from sklearn.preprocessing import scale,MinMaxScaler,Normalizer,StandardScaler
13 | from sklearn.metrics import mean_squared_error,r2_score,mean_absolute_error
14 | from sklearn.neural_network import MLPRegressor
15 | import numpy as np
16 | 
17 | 
18 | def ModelRgsevaluate(y_pred, y_true):
19 | 
20 |     mse = mean_squared_error(y_true,y_pred)
21 |     R2  = r2_score(y_true,y_pred)
22 |     mae = mean_absolute_error(y_true,y_pred)
23 | 
24 |     return np.sqrt(mse), R2, mae
25 | 
26 | def ModelRgsevaluatePro(y_pred, y_true, yscale):
27 | 
28 |     yscaler = yscale
29 |     y_true = yscaler.inverse_transform(y_true)
30 |     y_pred = yscaler.inverse_transform(y_pred)
31 | 
32 |     mse = mean_squared_error(y_true,y_pred)
33 |     R2  = r2_score(y_true,y_pred)
34 |     mae = mean_absolute_error(y_true, y_pred)
35 | 
36 |     return np.sqrt(mse), R2, mae


--------------------------------------------------------------------------------
/OpenSA/Evaluate/__pycache__/RgsEvaluate.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/Evaluate/__pycache__/RgsEvaluate.cpython-38.pyc


--------------------------------------------------------------------------------
/OpenSA/Evaluate/__pycache__/RgsEvaluate.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/Evaluate/__pycache__/RgsEvaluate.cpython-39.pyc


--------------------------------------------------------------------------------
/OpenSA/Plot/readme.txt:
--------------------------------------------------------------------------------
1 | 暂不提供


--------------------------------------------------------------------------------
/OpenSA/Preprocessing/Preprocessing.py:
--------------------------------------------------------------------------------
  1 | """
  2 |     -*- coding: utf-8 -*-
  3 |     @Time   :2022/04/12 17:10
  4 |     @Author : Pengyou FU
  5 |     @blogs  : https://blog.csdn.net/Echo_Code?spm=1000.2115.3001.5343
  6 |     @github :
  7 |     @WeChat : Fu_siry
  8 |     @License：
  9 | 
 10 | """
 11 | import numpy as np
 12 | from scipy import signal
 13 | from sklearn.linear_model import LinearRegression
 14 | from sklearn.preprocessing import MinMaxScaler, StandardScaler
 15 | from copy import deepcopy
 16 | import pandas as pd
 17 | #import pywt
 18 | 
 19 | # ref1: 湖南示范大学同学实列，并做了部分修改
 20 | # ref2: https://blog.csdn.net/qq2512446791
 21 | 
 22 | # 最大最小值归一化
 23 | def MMS(data):
 24 |     """
 25 |        :param data: raw spectrum data, shape (n_samples, n_features)
 26 |        :return: data after MinMaxScaler :(n_samples, n_features)
 27 |        """
 28 |     return MinMaxScaler().fit_transform(data)
 29 | 
 30 | 
 31 | # 标准化
 32 | def SS(data):
 33 |     """
 34 |         :param data: raw spectrum data, shape (n_samples, n_features)
 35 |        :return: data after StandScaler :(n_samples, n_features)
 36 |        """
 37 |     return StandardScaler().fit_transform(data)
 38 | 
 39 | 
 40 | # 均值中心化
 41 | def CT(data):
 42 |     """
 43 |        :param data: raw spectrum data, shape (n_samples, n_features)
 44 |        :return: data after MeanScaler :(n_samples, n_features)
 45 |        """
 46 |     for i in range(data.shape[0]):
 47 |         MEAN = np.mean(data[i])
 48 |         data[i] = data[i] - MEAN
 49 |     return data
 50 | 
 51 | 
 52 | # 标准正态变换
 53 | def SNV(data):
 54 |     """
 55 |         :param data: raw spectrum data, shape (n_samples, n_features)
 56 |        :return: data after SNV :(n_samples, n_features)
 57 |     """
 58 |     m = data.shape[0]
 59 |     n = data.shape[1]
 60 |     print(m, n)  #
 61 |     # 求标准差
 62 |     data_std = np.std(data, axis=1)  # 每条光谱的标准差
 63 |     # 求平均值
 64 |     data_average = np.mean(data, axis=1)  # 每条光谱的平均值
 65 |     # SNV计算
 66 |     data_snv = [[((data[i][j] - data_average[i]) / data_std[i]) for j in range(n)] for i in range(m)]
 67 |     return  np.array(data_snv)
 68 | 
 69 | 
 70 | 
 71 | # 移动平均平滑
 72 | def MA(data, WSZ=11):
 73 |     """
 74 |        :param data: raw spectrum data, shape (n_samples, n_features)
 75 |        :param WSZ: int
 76 |        :return: data after MA :(n_samples, n_features)
 77 |     """
 78 | 
 79 |     for i in range(data.shape[0]):
 80 |         out0 = np.convolve(data[i], np.ones(WSZ, dtype=int), 'valid') / WSZ # WSZ是窗口宽度，是奇数
 81 |         r = np.arange(1, WSZ - 1, 2)
 82 |         start = np.cumsum(data[i, :WSZ - 1])[::2] / r
 83 |         stop = (np.cumsum(data[i, :-WSZ:-1])[::2] / r)[::-1]
 84 |         data[i] = np.concatenate((start, out0, stop))
 85 |     return data
 86 | 
 87 | 
 88 | # Savitzky-Golay平滑滤波
 89 | def SG(data, w=11, p=2):
 90 |     """
 91 |        :param data: raw spectrum data, shape (n_samples, n_features)
 92 |        :param w: int
 93 |        :param p: int
 94 |        :return: data after SG :(n_samples, n_features)
 95 |     """
 96 |     return signal.savgol_filter(data, w, p)
 97 | 
 98 | 
 99 | # 一阶导数
100 | def D1(data):
101 |     """
102 |        :param data: raw spectrum data, shape (n_samples, n_features)
103 |        :return: data after First derivative :(n_samples, n_features)
104 |     """
105 |     n, p = data.shape
106 |     Di = np.ones((n, p - 1))
107 |     for i in range(n):
108 |         Di[i] = np.diff(data[i])
109 |     return Di
110 | 
111 | 
112 | # 二阶导数
113 | def D2(data):
114 |     """
115 |        :param data: raw spectrum data, shape (n_samples, n_features)
116 |        :return: data after second derivative :(n_samples, n_features)
117 |     """
118 |     data = deepcopy(data)
119 |     if isinstance(data, pd.DataFrame):
120 |         data = data.values
121 |     temp2 = (pd.DataFrame(data)).diff(axis=1)
122 |     temp3 = np.delete(temp2.values, 0, axis=1)
123 |     temp4 = (pd.DataFrame(temp3)).diff(axis=1)
124 |     spec_D2 = np.delete(temp4.values, 0, axis=1)
125 |     return spec_D2
126 | 
127 | 
128 | # 趋势校正(DT)
129 | def DT(data):
130 |     """
131 |        :param data: raw spectrum data, shape (n_samples, n_features)
132 |        :return: data after DT :(n_samples, n_features)
133 |     """
134 |     lenth = data.shape[1]
135 |     x = np.asarray(range(lenth), dtype=np.float32)
136 |     out = np.array(data)
137 |     l = LinearRegression()
138 |     for i in range(out.shape[0]):
139 |         l.fit(x.reshape(-1, 1), out[i].reshape(-1, 1))
140 |         k = l.coef_
141 |         b = l.intercept_
142 |         for j in range(out.shape[1]):
143 |             out[i][j] = out[i][j] - (j * k + b)
144 | 
145 |     return out
146 | 
147 | 
148 | # 多元散射校正
149 | def MSC(data):
150 |     """
151 |        :param data: raw spectrum data, shape (n_samples, n_features)
152 |        :return: data after MSC :(n_samples, n_features)
153 |     """
154 |     n, p = data.shape
155 |     msc = np.ones((n, p))
156 | 
157 |     for j in range(n):
158 |         mean = np.mean(data, axis=0)
159 | 
160 |     # 线性拟合
161 |     for i in range(n):
162 |         y = data[i, :]
163 |         l = LinearRegression()
164 |         l.fit(mean.reshape(-1, 1), y.reshape(-1, 1))
165 |         k = l.coef_
166 |         b = l.intercept_
167 |         msc[i, :] = (y - b) / k
168 |     return msc
169 | 
170 | # 小波变换
171 | def wave(data):
172 |     """
173 |        :param data: raw spectrum data, shape (n_samples, n_features)
174 |        :return: data after wave :(n_samples, n_features)
175 |     """
176 |     data = deepcopy(data)
177 |     if isinstance(data, pd.DataFrame):
178 |         data = data.values
179 |     def wave_(data):
180 |         w = pywt.Wavelet('db8')  # 选用Daubechies8小波
181 |         maxlev = pywt.dwt_max_level(len(data), w.dec_len)
182 |         coeffs = pywt.wavedec(data, 'db8', level=maxlev)
183 |         threshold = 0.04
184 |         for i in range(1, len(coeffs)):
185 |             coeffs[i] = pywt.threshold(coeffs[i], threshold * max(coeffs[i]))
186 |         datarec = pywt.waverec(coeffs, 'db8')
187 |         return datarec
188 | 
189 |     tmp = None
190 |     for i in range(data.shape[0]):
191 |         if (i == 0):
192 |             tmp = wave_(data[i])
193 |         else:
194 |             tmp = np.vstack((tmp, wave_(data[i])))
195 | 
196 |     return tmp
197 | 
198 | def Preprocessing(method, data):
199 | 
200 |     if method == "None":
201 |         data = data
202 |     elif method == 'MMS':
203 |         data = MMS(data)
204 |     elif method == 'SS':
205 |         data = SS(data)
206 |     elif method == 'CT':
207 |         data = CT(data)
208 |     elif method == 'SNV':
209 |         data = SNV(data)
210 |     elif method == 'MA':
211 |         data = MA(data)
212 |     elif method == 'SG':
213 |         data = SG(data)
214 |     elif method == 'MSC':
215 |         data = MSC(data)
216 |     elif method == 'D1':
217 |         data = D1(data)
218 |     elif method == 'D2':
219 |         data = D2(data)
220 |     elif method == 'DT':
221 |         data = DT(data)
222 |     elif method == 'WVAE':
223 |         data = wave(data)
224 |     else:
225 |         print("no this method of preprocessing!")
226 | 
227 |     return data


--------------------------------------------------------------------------------
/OpenSA/Preprocessing/__pycache__/Preprocessing.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/Preprocessing/__pycache__/Preprocessing.cpython-38.pyc


--------------------------------------------------------------------------------
/OpenSA/Preprocessing/__pycache__/Preprocessing.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/Preprocessing/__pycache__/Preprocessing.cpython-39.pyc


--------------------------------------------------------------------------------
/OpenSA/Regression/CNN.py:
--------------------------------------------------------------------------------
  1 | """
  2 |     Create on 2021-1-21
  3 |     Author：Pengyou Fu
  4 |     Describe：this for train NIRS with use 1-D Resnet model to transfer
  5 | """
  6 | 
  7 | import numpy as np
  8 | import torch
  9 | import torch.nn as nn
 10 | from torch.autograd import Variable
 11 | from torch.utils.data import Dataset
 12 | import torchvision
 13 | import torch.nn.functional as F
 14 | from sklearn.preprocessing import scale,MinMaxScaler,Normalizer,StandardScaler
 15 | import torch.optim as optim
 16 | from Regression.CnnModel import ConvNet, DeepSpectra, AlexNet
 17 | import os
 18 | from datetime import datetime
 19 | from Evaluate.RgsEvaluate import ModelRgsevaluate, ModelRgsevaluatePro
 20 | import matplotlib.pyplot  as plt
 21 | 
 22 | 
 23 | LR = 0.001
 24 | BATCH_SIZE = 16
 25 | TBATCH_SIZE = 240
 26 | 
 27 | 
 28 | device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 29 | 
 30 | #自定义加载数据集
 31 | class MyDataset(Dataset):
 32 |     def __init__(self,specs,labels):
 33 |         self.specs = specs
 34 |         self.labels = labels
 35 | 
 36 |     def __getitem__(self, index):
 37 |         spec,target = self.specs[index],self.labels[index]
 38 |         return spec,target
 39 | 
 40 |     def __len__(self):
 41 |         return len(self.specs)
 42 | 
 43 | 
 44 | 
 45 | ###定义是否需要标准化
 46 | def ZspPocessnew(X_train, X_test, y_train, y_test, need=True): #True:需要标准化，Flase：不需要标准化
 47 | 
 48 |     global standscale
 49 |     global yscaler
 50 | 
 51 |     if (need == True):
 52 |         standscale = StandardScaler()
 53 |         X_train_Nom = standscale.fit_transform(X_train)
 54 |         X_test_Nom = standscale.transform(X_test)
 55 | 
 56 |         #yscaler = StandardScaler()
 57 |         yscaler = MinMaxScaler()
 58 |         y_train = yscaler.fit_transform(y_train.reshape(-1, 1))
 59 |         y_test = yscaler.transform(y_test.reshape(-1, 1))
 60 | 
 61 |         X_train_Nom = X_train_Nom[:, np.newaxis, :]
 62 |         X_test_Nom = X_test_Nom[:, np.newaxis, :]
 63 | 
 64 |         ##使用loader加载测试数据
 65 |         data_train = MyDataset(X_train_Nom, y_train)
 66 |         data_test = MyDataset(X_test_Nom, y_test)
 67 |         return data_train, data_test
 68 |     elif((need == False)):
 69 |         yscaler = StandardScaler()
 70 |         # yscaler = MinMaxScaler()
 71 | 
 72 |         X_train_new = X_train[:, np.newaxis, :]  #
 73 |         X_test_new = X_test[:, np.newaxis, :]
 74 | 
 75 |         y_train = yscaler.fit_transform(y_train)
 76 |         y_test = yscaler.transform(y_test)
 77 | 
 78 |         data_train = MyDataset(X_train_new, y_train)
 79 |         ##使用loader加载测试数据
 80 |         data_test = MyDataset(X_test_new, y_test)
 81 | 
 82 |         return data_train, data_test
 83 | 
 84 | 
 85 | 
 86 | 
 87 | def CNNTrain(NetType, X_train, X_test, y_train, y_test, EPOCH):
 88 | 
 89 | 
 90 |     data_train, data_test = ZspPocessnew(X_train, X_test, y_train, y_test, need=True)
 91 |     # data_train, data_test = ZPocess(X_train, X_test, y_train, y_test)
 92 | 
 93 |     train_loader = torch.utils.data.DataLoader(data_train, batch_size=BATCH_SIZE, shuffle=True)
 94 |     test_loader = torch.utils.data.DataLoader(data_test, batch_size=TBATCH_SIZE, shuffle=True)
 95 | 
 96 |     if NetType == 'ConNet':
 97 |         model = ConvNet().to(device)
 98 |     elif NetType == 'AlexNet':
 99 |         model = AlexNet().to(device)
100 |     elif NetType == 'DeepSpectra':
101 |         model = DeepSpectra().to(device)
102 | 
103 | 
104 | 
105 |     criterion = nn.MSELoss().to(device)  # 损失函数为焦损函数，多用于类别不平衡的多分类问题
106 |     optimizer = optim.Adam(model.parameters(), lr=LR)#,  weight_decay=0.001)  # 优化方式为mini-batch momentum-SGD，并采用L2正则化（权重衰减）
107 |     # # initialize the early_stopping object
108 |     scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, 'min', factor=0.5, verbose=1, eps=1e-06,
109 |                                                            patience=20)
110 |     print("Start Training!")  # 定义遍历数据集的次数
111 |     # to track the training loss as the model trains
112 |     for epoch in range(EPOCH):
113 |         train_losses = []
114 |         model.train()  # 不训练
115 |         train_rmse = []
116 |         train_r2 = []
117 |         train_mae = []
118 |         for i, data in enumerate(train_loader):  # gives batch data, normalize x when iterate train_loader
119 |             inputs, labels = data  # 输入和标签都等于data
120 |             inputs = Variable(inputs).type(torch.FloatTensor).to(device)  # batch x
121 |             labels = Variable(labels).type(torch.FloatTensor).to(device)  # batch y
122 |             output = model(inputs)  # cnn output
123 |             loss = criterion(output, labels)  # MSE
124 |             optimizer.zero_grad()  # clear gradients for this training step
125 |             loss.backward()  # backpropagation, compute gradients
126 |             optimizer.step()  # apply gradients
127 |             pred = output.detach().cpu().numpy()
128 |             y_true = labels.detach().cpu().numpy()
129 |             train_losses.append(loss.item())
130 |             rmse, R2, mae = ModelRgsevaluatePro(pred, y_true, yscaler)
131 |             # plotpred(pred, y_true, yscaler))
132 |             train_rmse.append(rmse)
133 |             train_r2.append(R2)
134 |             train_mae.append(mae)
135 |         avg_train_loss = np.mean(train_losses)
136 |         avgrmse = np.mean(train_rmse)
137 |         avgr2 = np.mean(train_r2)
138 |         avgmae = np.mean(train_mae)
139 |         print('Epoch:{}, TRAIN:rmse:{}, R2:{}, mae:{}'.format((epoch+1), (avgrmse), (avgr2), (avgmae)))
140 |         print('lr:{}, avg_train_loss:{}'.format((optimizer.param_groups[0]['lr']), avg_train_loss))
141 | 
142 |         with torch.no_grad():  # 无梯度
143 |             model.eval()  # 不训练
144 |             test_rmse = []
145 |             test_r2 = []
146 |             test_mae = []
147 |             for i, data in enumerate(test_loader):
148 |                 inputs, labels = data  # 输入和标签都等于data
149 |                 inputs = Variable(inputs).type(torch.FloatTensor).to(device)  # batch x
150 |                 labels = Variable(labels).type(torch.FloatTensor).to(device)  # batch y
151 |                 outputs = model(inputs)  # 输出等于进入网络后的输入
152 |                 pred = outputs.detach().cpu().numpy()
153 |                 y_true = labels.detach().cpu().numpy()
154 |                 rmse, R2, mae = ModelRgsevaluatePro(pred, y_true, yscaler)
155 |                 test_rmse.append(rmse)
156 |                 test_r2.append(R2)
157 |                 test_mae.append(mae)
158 |             avgrmse = np.mean(test_rmse)
159 |             avgr2   = np.mean(test_r2)
160 |             avgmae = np.mean(test_mae)
161 |             print('EPOCH：{}, TEST: rmse:{}, R2:{}, mae:{}'.format((epoch+1), (avgrmse), (avgr2), (avgmae)))
162 |             # 将每次测试结果实时写入acc.txt文件中
163 |             scheduler.step(rmse)
164 | 
165 |     return avgrmse, avgr2, avgmae
166 | 
167 | 
168 | 
169 | 
170 | 
171 | 
172 | 
173 | 
174 | 
175 | 
176 | 
177 | #
178 | # def CNN(X_train, X_test, y_train, y_test, BATCH_SIZE, n_epochs):
179 | #
180 | #     CNNTrain(X_train, X_test, y_train, y_test,BATCH_SIZE,n_epochs)
181 | 


--------------------------------------------------------------------------------
/OpenSA/Regression/ClassicRgs.py:
--------------------------------------------------------------------------------
 1 | 
 2 | from sklearn.cross_decomposition import PLSRegression
 3 | from sklearn.neural_network import MLPRegressor
 4 | # import hpelm
 5 | 
 6 | """
 7 |     -*- coding: utf-8 -*-
 8 |     @Time   :2022/04/12 17:10
 9 |     @Author : Pengyou FU
10 |     @blogs  : https://blog.csdn.net/Echo_Code?spm=1000.2115.3001.5343
11 |     @github : https://github.com/FuSiry/OpenSA
12 |     @WeChat : Fu_siry
13 |     @License：Apache-2.0 license
14 | 
15 | """
16 | 
17 | from sklearn.svm import SVR
18 | from Evaluate.RgsEvaluate import ModelRgsevaluate
19 | 
20 | def Pls( X_train, X_test, y_train, y_test):
21 | 
22 | 
23 |     model = PLSRegression(n_components=8)
24 |     # fit the model
25 |     model.fit(X_train, y_train)
26 | 
27 |     # predict the values
28 |     y_pred = model.predict(X_test)
29 | 
30 |     Rmse, R2, Mae = ModelRgsevaluate(y_pred, y_test)
31 | 
32 |     return Rmse, R2, Mae
33 | 
34 | 
35 | def Svregression(X_train, X_test, y_train, y_test):
36 | 
37 | 
38 |     model = SVR(C=2, gamma=1e-07, kernel='linear')
39 |     model.fit(X_train, y_train)
40 | 
41 |     # predict the values
42 |     y_pred = model.predict(X_test)
43 |     Rmse, R2, Mae = ModelRgsevaluate(y_pred, y_test)
44 | 
45 |     return Rmse, R2, Mae
46 | 
47 | def Anngression(X_train, X_test, y_train, y_test):
48 | 
49 | 
50 |     model = MLPRegressor(
51 |         hidden_layer_sizes=(20, 20), activation='relu', solver='adam', alpha=0.0001, batch_size='auto',
52 |         learning_rate='constant', learning_rate_init=0.001, power_t=0.5, max_iter=400, shuffle=True,
53 |         random_state=1, tol=0.0001, verbose=False, warm_start=False, momentum=0.9, nesterovs_momentum=True,
54 |         early_stopping=False, beta_1=0.9, beta_2=0.999, epsilon=1e-08)
55 | 
56 |     model.fit(X_train, y_train)
57 | 
58 |     # predict the values
59 |     y_pred = model.predict(X_test)
60 |     Rmse, R2, Mae = ModelRgsevaluate(y_pred, y_test)
61 | 
62 |     return Rmse, R2, Mae
63 | 
64 | def ELM(X_train, X_test, y_train, y_test):
65 | 
66 |     model = hpelm.ELM(X_train.shape[1], 1)
67 |     model.add_neurons(20, 'sigm')
68 | 
69 | 
70 |     model.train(X_train, y_train, 'r')
71 |     y_pred = model.predict(X_test)
72 | 
73 | 
74 |     Rmse, R2, Mae = ModelRgsevaluate(y_pred, y_test)
75 | 
76 |     return Rmse, R2, Mae


--------------------------------------------------------------------------------
/OpenSA/Regression/CnnModel.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.nn as nn
  3 | import torch.nn.functional as F
  4 | from collections.abc import Iterable
  5 | 
  6 | class ConvNet(nn.Module):
  7 |     def __init__(self):
  8 |         super(ConvNet,self).__init__()
  9 |         self.conv1 = nn.Sequential(
 10 |             nn.Conv1d(1, 16, kernel_size=21, padding=0),
 11 |             nn.BatchNorm1d(16),
 12 |             nn.ReLU()
 13 |         )
 14 |         self.conv2 = nn.Sequential(
 15 |             nn.Conv1d(16, 32, kernel_size=19, padding=0),
 16 |             nn.BatchNorm1d(32),
 17 |             nn.ReLU()
 18 |         )
 19 |         self.conv3 = nn.Sequential(
 20 |             nn.Conv1d(32, 64, kernel_size=17, padding=0),
 21 |             nn.BatchNorm1d(64),
 22 |             nn.ReLU()
 23 |         )
 24 |         self.fc = nn. Linear(38080,1) #8960 ,17920
 25 |         self.drop = nn.Dropout(0.2)
 26 | 
 27 |     def forward(self,out):
 28 |       out = self.conv1(out)
 29 |       out = self.conv2(out)
 30 |       out = self.conv3(out)
 31 |       out = out.view(out.size(0),-1)
 32 |       # print(out.size(1))
 33 |       out = self.fc(out)
 34 |       return out
 35 | 
 36 | 
 37 | class AlexNet(nn.Module):
 38 |     def __init__(self, num_classes=1, reduction=16):
 39 |         super(AlexNet, self).__init__()
 40 |         self.features = nn.Sequential(
 41 |             # conv1
 42 |             nn.Conv1d(1, 16, kernel_size=3, stride=1, padding=1),
 43 |             nn.BatchNorm1d(num_features=16),
 44 |             nn.ReLU(inplace=True),
 45 |             # nn.LeakyReLU(inplace=True),
 46 |             nn.MaxPool1d(kernel_size=2, stride=2),
 47 |             # conv2
 48 |             nn.Conv1d(16, 32, kernel_size=3, stride=1, padding=1),
 49 |             nn.BatchNorm1d(num_features=32),
 50 |             nn.ReLU(inplace=True),
 51 |             # nn.LeakyReLU(inplace=True),
 52 |             nn.MaxPool1d(kernel_size=2, stride=2),
 53 |             # conv3
 54 |             nn.Conv1d(32, 64, kernel_size=3, stride=1, padding=1),
 55 |             nn.ReLU(inplace=True),
 56 |             # nn.LeakyReLU(inplace=True),
 57 |             nn.MaxPool1d(kernel_size=2, stride=2),
 58 |             # conv4
 59 |             nn.Conv1d(64, 128, kernel_size=3, stride=1, padding=1),
 60 |             nn.BatchNorm1d(num_features=128),
 61 |             nn.ReLU(inplace=True),
 62 |             # nn.LeakyReLU(inplace=True),
 63 |             nn.MaxPool1d(kernel_size=2, stride=2),
 64 |             # conv5
 65 |             nn.Conv1d(128, 192, kernel_size=3, stride=1, padding=1),
 66 |             nn.BatchNorm1d(num_features=192),
 67 |             nn.ReLU(inplace=True),
 68 |             nn.MaxPool1d(kernel_size=2, stride=2),
 69 |             # SELayer(256, reduction),
 70 |             # nn.LeakyReLU(inplace=True),
 71 |         )
 72 |         self.reg = nn.Sequential(
 73 |             nn.Linear(3840, 1000),  #根据自己数据集修改
 74 |             nn.ReLU(inplace=True),
 75 |             # nn.LeakyReLU(inplace=True),
 76 |             nn.Linear(1000, 500),
 77 |             nn.ReLU(inplace=True),
 78 |             # nn.LeakyReLU(inplace=True),
 79 |             nn.Dropout(0.5),
 80 |             nn.Linear(500, num_classes),
 81 |         )
 82 | 
 83 |     def forward(self, x):
 84 |         out = self.features(x)
 85 |         out = out.flatten(start_dim=1)
 86 |         out = self.reg(out)
 87 |         return out
 88 | 
 89 | class Inception(nn.Module):
 90 |     def __init__(self,in_c,c1,c2,c3,out_C):
 91 |         super(Inception,self).__init__()
 92 |         self.p1 = nn.Sequential(
 93 |             nn.Conv1d(in_c, c1,kernel_size=1,padding=0),
 94 |             nn.Conv1d(c1, c1, kernel_size=3, padding=1)
 95 |         )
 96 |         self.p2 = nn.Sequential(
 97 |             nn.Conv1d(in_c, c2,kernel_size=1,padding=0),
 98 |             nn.Conv1d(c2, c2, kernel_size=5, padding=2)
 99 | 
100 |         )
101 |         self.p3 = nn.Sequential(
102 |             nn.MaxPool1d(kernel_size=3,stride=1,padding=1),
103 |             nn.Conv1d(in_c, c3,kernel_size=3,padding=1),
104 |         )
105 |         self.conv_linear = nn.Conv1d((c1+c2+c3), out_C, 1, 1, 0, bias=True)
106 |         self.short_cut = nn.Sequential()
107 |         if in_c != out_C:
108 |             self.short_cut = nn.Sequential(
109 |                 nn.Conv1d(in_c, out_C, 1, 1, 0, bias=False),
110 | 
111 |             )
112 |     def forward(self, x):
113 |         p1 = self.p1(x)
114 |         p2 = self.p2(x)
115 |         p3 = self.p3(x)
116 |         out =  torch.cat((p1,p2,p3),dim=1)
117 |         out += self.short_cut(x)
118 |         return out
119 | 
120 | 
121 | 
122 | 
123 | class DeepSpectra(nn.Module):
124 |     def __init__(self):
125 |         super(DeepSpectra, self).__init__()
126 |         self.conv1 = nn.Sequential(
127 |             nn.Conv1d(1, 16, kernel_size=5, stride=3, padding=0)
128 |         )
129 |         self.Inception = Inception(16, 32, 32, 32, 96)
130 |         self.fc = nn.Sequential(
131 |             nn.Linear(20640, 5000),
132 |             nn.Dropout(0.5),
133 |             nn.Linear(5000, 1)
134 |         )
135 |         self.dropout = nn.Dropout(0.1)
136 | 
137 |     def forward(self, x):
138 |         x = self.conv1(x)
139 |         x = self.Inception(x)
140 |         x = x.view(x.size(0), -1)
141 |         x = self.fc(x)
142 | 
143 |         return x
144 | 
145 | 


--------------------------------------------------------------------------------
/OpenSA/Regression/DeepRgs.py:
--------------------------------------------------------------------------------
 1 | """
 2 |     -*- coding: utf-8 -*-
 3 |     @Time   :2022/04/12 17:10
 4 |     @Author : Pengyou FU
 5 |     @blogs  : https://blog.csdn.net/Echo_Code?spm=1000.2115.3001.5343
 6 |     @github : https://github.com/FuSiry/OpenSA
 7 |     @WeChat : Fu_siry
 8 |     @License：Apache-2.0 license
 9 | 
10 | """
11 | 


--------------------------------------------------------------------------------
/OpenSA/Regression/Rgs.py:
--------------------------------------------------------------------------------
 1 | """
 2 |     -*- coding: utf-8 -*-
 3 |     @Time   :2022/04/12 17:10
 4 |     @Author : Pengyou FU
 5 |     @blogs  : https://blog.csdn.net/Echo_Code?spm=1000.2115.3001.5343
 6 |     @github : https://github.com/FuSiry/OpenSA
 7 |     @WeChat : Fu_siry
 8 |     @License：Apache-2.0 license
 9 | 
10 | """
11 | 
12 | from Regression.ClassicRgs import Pls, Anngression, Svregression, ELM
13 | from Regression.CNN import CNNTrain
14 | 
15 | def  QuantitativeAnalysis(model, X_train, X_test, y_train, y_test):
16 | 
17 |     if model == "Pls":
18 |         Rmse, R2, Mae = Pls(X_train, X_test, y_train, y_test)
19 |     elif model == "ANN":
20 |         Rmse, R2, Mae = Anngression(X_train, X_test, y_train, y_test)
21 |     elif model == "SVR":
22 |         Rmse, R2, Mae = Svregression(X_train, X_test, y_train, y_test)
23 |     elif model == "ELM":
24 |         Rmse, R2, Mae = ELM(X_train, X_test, y_train, y_test)
25 |     elif model == "CNN":
26 |         Rmse, R2, Mae = CNNTrain("AlexNet",X_train, X_test, y_train, y_test,  150)
27 |     else:
28 |         print("no this model of QuantitativeAnalysis")
29 | 
30 |     return Rmse, R2, Mae 


--------------------------------------------------------------------------------
/OpenSA/Regression/__pycache__/CNN.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/Regression/__pycache__/CNN.cpython-38.pyc


--------------------------------------------------------------------------------
/OpenSA/Regression/__pycache__/ClassicRgs.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/Regression/__pycache__/ClassicRgs.cpython-38.pyc


--------------------------------------------------------------------------------
/OpenSA/Regression/__pycache__/ClassicRgs.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/Regression/__pycache__/ClassicRgs.cpython-39.pyc


--------------------------------------------------------------------------------
/OpenSA/Regression/__pycache__/CnnModel.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/Regression/__pycache__/CnnModel.cpython-38.pyc


--------------------------------------------------------------------------------
/OpenSA/Regression/__pycache__/Rgs.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/Regression/__pycache__/Rgs.cpython-38.pyc


--------------------------------------------------------------------------------
/OpenSA/Regression/__pycache__/Rgs.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/Regression/__pycache__/Rgs.cpython-39.pyc


--------------------------------------------------------------------------------
/OpenSA/Simcalculation/SimCa.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | """
  3 |     -*- coding: utf-8 -*-
  4 |     @Time   :2022/04/12 17:10
  5 |     @Author : Pengyou FU
  6 |     @blogs  : https://blog.csdn.net/Echo_Code?spm=1000.2115.3001.5343
  7 |     @github : https://github.com/FuSiry/OpenSA
  8 |     @WeChat : Fu_siry
  9 |     @License：Apache-2.0 license
 10 | 
 11 | """
 12 | 
 13 | 
 14 | from numpy.linalg import norm
 15 | # from skimage.measure import compare_psnr, compare_ssim
 16 | # from skimage.metrics import structural_similarity as compare_ssim
 17 | # from skimage.metrics import peak_signal_noise_ratio as compare_psnr
 18 | 
 19 | def Simcalculation(type,data1, data2):
 20 |     """
 21 |        :param method: 计算相似度的方法
 22 |        :param data1: 光谱数据：格式：(1，length),高光谱图像：格式：(H, W, C)
 23 |        :param data2: 光谱数据：格式：(1，length),高光谱图像：格式：(H, W, C)
 24 |        :return: 计算原始光谱数据与目标光谱数据的相似度，float
 25 |     """
 26 | 
 27 |     if type == 'SAM':
 28 |         return SAM(data1, data2)
 29 |     elif type == 'SID':
 30 |         return SID(data1,data2)
 31 |     elif type == 'HsiSam':
 32 |         return HsiSam(data1,data2)
 33 |     elif type == 'mssim':
 34 |         return mssim(data1,data2)
 35 |     elif type == 'mpsnr':
 36 |         return mpsnr(data1,data2)
 37 |     else:
 38 |         print("no this method of Simcalculation!")
 39 | 
 40 | def SAM(x,y):
 41 |     """
 42 |        :param x_true: 光谱数据：格式：(1，length)
 43 |        :param x_pred: 光谱数据：格式：(1，length)
 44 |        :return: 计算原始光谱数据与目标光谱数据的光谱角差异
 45 |     """
 46 |     s = np.sum(np.dot(x,y))
 47 |     t1 = (norm(x)) * (norm(y))
 48 |     val = s/t1
 49 |     sam = 1.0/np.cos(val)
 50 | 
 51 |     return sam
 52 | 
 53 | # 计算SID
 54 | def SID(x,y):
 55 |     """
 56 |        :param x_true: 光谱数据：格式：(1，length)
 57 |        :param x_pred: 光谱数据：格式：(1，length)
 58 |        :return: 计算原始光谱数据与目标光谱数据的光谱角差异
 59 |        References
 60 |        :param x_true: 光谱数据：格式：(1，length)
 61 |        :param x_pred: 光谱数据：格式：(1，length)
 62 |        :return: 计算原始光谱数据与目标光谱数据的光谱角差异
 63 |        References
 64 |        ----------
 65 |        ----------
 66 |        """
 67 |     p = np.zeros_like(x,dtype=np.float)
 68 |     q = np.zeros_like(y,dtype=np.float)
 69 |     Sid = 0
 70 |     for i in range(len(x)):
 71 |         p[i] = np.around((x[i]/np.sum(x)), 8)
 72 |         q[i] = np.around((y[i]/np.sum(y)), 8)
 73 |     for j in range(len(x)):
 74 |         Sid += p[j]*np.log10(p[j]/q[j])+q[j]*np.log10(q[j]/p[j])
 75 |     return Sid
 76 | 
 77 | def mpsnr(x_true, x_pred):
 78 |     """
 79 |         :param x_true: 高光谱图像：格式：(H, W, C)
 80 |         :param x_pred: 高光谱图像：格式：(H, W, C)
 81 |         :return: 计算原始高光谱数据与重构高光谱数据的均方误差
 82 |     """
 83 |     n_bands = x_true.shape[2]
 84 |     p = [compare_psnr(x_true[:, :, k], x_pred[:, :, k], data_range=(0, 10000)) for k in range(n_bands)]
 85 |     return np.mean(p)
 86 | 
 87 | 
 88 | def HsiSam(x_true, x_pred):
 89 |     """
 90 |         :param x_true: 高光谱图像：格式：(H, W, C)
 91 |         :param x_pred: 高光谱图像：格式：(H, W, C)
 92 |         :return: 计算原始高光谱数据与重构高光谱数据的光谱角相似度
 93 |     """
 94 |     assert x_true.ndim ==3 and x_true.shape == x_pred.shape
 95 |     sam_rad = np.zeros(x_pred.shape[0, 1])
 96 |     for x in range(x_true.shape[0]):
 97 |         for y in range(x_true.shape[1]):
 98 |             tmp_pred = x_pred[x, y].ravel()
 99 |             tmp_true = x_true[x, y].ravel()
100 |             sam_rad[x, y] = np.arccos(tmp_pred / (norm(tmp_pred) * tmp_true / norm(tmp_true)))
101 |     sam_deg = sam_rad.mean() * 180 / np.pi
102 |     return sam_deg
103 | 
104 | 
105 | def mssim(x_true,x_pred):
106 |     """
107 |         :param x_true: 高光谱图像：格式：(H, W, C)
108 |         :param x_pred: 高光谱图像：格式：(H, W, C)
109 |         :return: 计算原始高光谱数据与重构高光谱数据的结构相似度
110 |     """
111 |     SSIM = compare_ssim(im1=x_true, im2=x_pred, multichannel=True)
112 |     return SSIM


--------------------------------------------------------------------------------
/OpenSA/Simcalculation/__pycache__/SimCa.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/Simcalculation/__pycache__/SimCa.cpython-38.pyc


--------------------------------------------------------------------------------
/OpenSA/Simcalculation/__pycache__/SimCa.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/Simcalculation/__pycache__/SimCa.cpython-39.pyc


--------------------------------------------------------------------------------
/OpenSA/WaveSelect/Cars.py:
--------------------------------------------------------------------------------
  1 | """
  2 |     -*- coding: utf-8 -*-
  3 |     @Time   :2022/04/12 17:10
  4 |     @Author : Pengyou FU
  5 |     @blogs  : https://blog.csdn.net/Echo_Code?spm=1000.2115.3001.5343
  6 |     @github : https://github.com/FuSiry/OpenSA
  7 |     @WeChat : Fu_siry
  8 |     @License：Apache-2.0 license
  9 | 
 10 | """
 11 | 
 12 | import numpy as np
 13 | import matplotlib.pyplot as plt
 14 | from matplotlib import font_manager as fm, rcParams
 15 | from sklearn.cross_decomposition import PLSRegression
 16 | from sklearn.model_selection import KFold
 17 | from sklearn.metrics import mean_squared_error
 18 | import copy
 19 | 
 20 | # ref: https://blog.csdn.net/qq2512446791
 21 | 
 22 | def PC_Cross_Validation(X, y, pc, cv):
 23 |     '''
 24 |         x :光谱矩阵 nxm
 25 |         y :浓度阵 （化学值）
 26 |         pc:最大主成分数
 27 |         cv:交叉验证数量
 28 |     return :
 29 |         RMSECV:各主成分数对应的RMSECV
 30 |         PRESS :各主成分数对应的PRESS
 31 |         rindex:最佳主成分数
 32 |     '''
 33 |     kf = KFold(n_splits=cv)
 34 |     RMSECV = []
 35 |     for i in range(pc):
 36 |         RMSE = []
 37 |         for train_index, test_index in kf.split(X):
 38 |             x_train, x_test = X[train_index], X[test_index]
 39 |             y_train, y_test = y[train_index], y[test_index]
 40 |             pls = PLSRegression(n_components=i + 1)
 41 |             pls.fit(x_train, y_train)
 42 |             y_predict = pls.predict(x_test)
 43 |             RMSE.append(np.sqrt(mean_squared_error(y_test, y_predict)))
 44 |         RMSE_mean = np.mean(RMSE)
 45 |         RMSECV.append(RMSE_mean)
 46 |     rindex = np.argmin(RMSECV)
 47 |     return RMSECV, rindex
 48 | 
 49 | def Cross_Validation(X, y, pc, cv):
 50 |     '''
 51 |      x :光谱矩阵 nxm
 52 |      y :浓度阵 （化学值）
 53 |      pc:最大主成分数
 54 |      cv:交叉验证数量
 55 |      return :
 56 |             RMSECV:各主成分数对应的RMSECV
 57 |     '''
 58 |     kf = KFold(n_splits=cv)
 59 |     RMSE = []
 60 |     for train_index, test_index in kf.split(X):
 61 |         x_train, x_test = X[train_index], X[test_index]
 62 |         y_train, y_test = y[train_index], y[test_index]
 63 |         pls = PLSRegression(n_components=pc)
 64 |         pls.fit(x_train, y_train)
 65 |         y_predict = pls.predict(x_test)
 66 |         RMSE.append(np.sqrt(mean_squared_error(y_test, y_predict)))
 67 |     RMSE_mean = np.mean(RMSE)
 68 |     return RMSE_mean
 69 | 
 70 | def CARS_Cloud(X, y, N=50, f=20, cv=10):
 71 |     p = 0.8
 72 |     m, n = X.shape
 73 |     u = np.power((n/2), (1/(N-1)))
 74 |     k = (1/(N-1)) * np.log(n/2)
 75 |     cal_num = np.round(m * p)
 76 |     # val_num = m - cal_num
 77 |     b2 = np.arange(n)
 78 |     x = copy.deepcopy(X)
 79 |     D = np.vstack((np.array(b2).reshape(1, -1), X))
 80 |     WaveData = []
 81 |     # Coeff = []
 82 |     WaveNum =[]
 83 |     RMSECV = []
 84 |     r = []
 85 |     for i in range(1, N+1):
 86 |         r.append(u*np.exp(-1*k*i))
 87 |         wave_num = int(np.round(r[i-1]*n))
 88 |         WaveNum = np.hstack((WaveNum, wave_num))
 89 |         cal_index = np.random.choice    \
 90 |             (np.arange(m), size=int(cal_num), replace=False)
 91 |         wave_index = b2[:wave_num].reshape(1, -1)[0]
 92 |         xcal = x[np.ix_(list(cal_index), list(wave_index))]
 93 |         #xcal = xcal[:,wave_index].reshape(-1,wave_num)
 94 |         ycal = y[cal_index]
 95 |         x = x[:, wave_index]
 96 |         D = D[:, wave_index]
 97 |         d = D[0, :].reshape(1,-1)
 98 |         wnum = n - wave_num
 99 |         if wnum > 0:
100 |             d = np.hstack((d, np.full((1, wnum), -1)))
101 |         if len(WaveData) == 0:
102 |             WaveData = d
103 |         else:
104 |             WaveData  = np.vstack((WaveData, d.reshape(1, -1)))
105 | 
106 |         if wave_num < f:
107 |             f = wave_num
108 | 
109 |         pls = PLSRegression(n_components=f)
110 |         pls.fit(xcal, ycal)
111 |         beta = pls.coef_
112 |         b = np.abs(beta)
113 |         b2 = np.argsort(-b, axis=0)
114 |         coef = copy.deepcopy(beta)
115 |         coeff = coef[b2, :].reshape(len(b2), -1)
116 |         rmsecv, rindex = PC_Cross_Validation(xcal, ycal, f, cv)
117 |         RMSECV.append(Cross_Validation(xcal, ycal, rindex+1, cv))
118 | 
119 |     WAVE = []
120 | 
121 |     for i in range(WaveData.shape[0]):
122 |         wd = WaveData[i, :]
123 |         # cd = CoeffData[i, :]
124 |         WD = np.ones((len(wd)))
125 |         # CO = np.ones((len(wd)))
126 |         for j in range(len(wd)):
127 |             ind = np.where(wd == j)
128 |             if len(ind[0]) == 0:
129 |                 WD[j] = 0
130 |                 # CO[j] = 0
131 |             else:
132 |                 WD[j] = wd[ind[0]]
133 |                 # CO[j] = cd[ind[0]]
134 |         if len(WAVE) == 0:
135 |             WAVE = copy.deepcopy(WD)
136 |         else:
137 |             WAVE = np.vstack((WAVE, WD.reshape(1, -1)))
138 | 
139 | 
140 |     MinIndex = np.argmin(RMSECV)
141 |     Optimal = WAVE[MinIndex, :]
142 |     boindex = np.where(Optimal != 0)
143 |     OptWave = boindex[0]
144 | 
145 | 
146 |     return OptWave


--------------------------------------------------------------------------------
/OpenSA/WaveSelect/GA.py:
--------------------------------------------------------------------------------
  1 | from deap import base
  2 | from deap import creator
  3 | from deap import tools
  4 | import pandas as pd
  5 | import numpy as np
  6 | import random
  7 | from sklearn import model_selection
  8 | from sklearn.cross_decomposition import PLSRegression
  9 | 
 10 | creator.create('FitnessMax', base.Fitness, weights=(1.0,))  # for minimization, set weights as (-1.0,)
 11 | creator.create('Individual', list, fitness=creator.FitnessMax)
 12 | 
 13 | 
 14 | 
 15 | def GA(X, y, number_of_generation=10):
 16 | 
 17 |     scaled_x_train = (X - X.mean(axis=0)) / X.std(axis=0, ddof=1)
 18 |     scaled_y_train = (y - y.mean()) / y.std(ddof=1)
 19 | 
 20 | 
 21 |     toolbox = base.Toolbox()
 22 |     min_boundary = np.zeros(X.shape[1])
 23 |     max_boundary = np.ones(X.shape[1]) * 1.0
 24 | 
 25 |     # 基础参数
 26 | 
 27 |     probability_of_crossover = 0.5
 28 |     probability_of_mutation = 0.2
 29 |     threshold_of_variable_selection = 0.5
 30 | 
 31 |     def create_ind_uniform(min_boundary, max_boundary):
 32 |         index = []
 33 |         for min, max in zip(min_boundary, max_boundary):
 34 |             index.append(random.uniform(min, max))
 35 |         return index
 36 | 
 37 |     # individual 个体
 38 |     # population 种群
 39 |     toolbox.register('create_ind', create_ind_uniform, min_boundary, max_boundary)
 40 |     toolbox.register('individual', tools.initIterate, creator.Individual, toolbox.create_ind)
 41 |     toolbox.register('population', tools.initRepeat, list, toolbox.individual)
 42 | 
 43 | 
 44 | 
 45 |     def evalOneMax(individual):
 46 |         individual_array = np.array(individual)
 47 |         selected_x_variable_numbers = np.where(individual_array > threshold_of_variable_selection)[0]
 48 |         selected_scaled_x_train = scaled_x_train[:, selected_x_variable_numbers]
 49 |         max_number_of_components = 10
 50 |         if len(selected_x_variable_numbers):
 51 |             # cross-validation
 52 |             pls_components = np.arange(1, min(np.linalg.matrix_rank(selected_scaled_x_train) + 1,
 53 |                                               max_number_of_components + 1), 1)
 54 |             r2_cv_all = []
 55 |             for pls_component in pls_components:
 56 |                 model_in_cv = PLSRegression(n_components=pls_component)
 57 |                 estimated_y_train_in_cv = np.ndarray.flatten(
 58 |                     model_selection.cross_val_predict(model_in_cv, selected_scaled_x_train, scaled_y_train,
 59 |                                                       cv=5))
 60 |                 estimated_y_train_in_cv = estimated_y_train_in_cv * y.std(ddof=1) + y.mean()
 61 |                 r2_cv_all.append(
 62 |                     1 - sum((y - estimated_y_train_in_cv) ** 2) / sum((y - y.mean()) ** 2))
 63 |             value = [np.max(r2_cv_all)]
 64 |         return value
 65 | 
 66 |     toolbox.register('evaluate', evalOneMax)
 67 |     # 加入交叉变换
 68 |     toolbox.register('mate', tools.cxTwoPoint)
 69 |     # 设置突变几率
 70 |     toolbox.register('mutate', tools.mutFlipBit, indpb=0.05)
 71 |     # 挑选个体
 72 |     toolbox.register('select', tools.selTournament, tournsize=3)
 73 |     # 种群
 74 |     random.seed()
 75 |     pop = toolbox.population(n=len(y))
 76 | 
 77 | 
 78 |     for generation in range(number_of_generation):
 79 |         print('-- Generation {0} --'.format(generation + 1))
 80 | 
 81 |         offspring = toolbox.select(pop, len(pop))
 82 |         offspring = list(map(toolbox.clone, offspring))
 83 | 
 84 |         for child1, child2 in zip(offspring[::2], offspring[1::2]):
 85 |             if random.random() < probability_of_crossover:
 86 |                 toolbox.mate(child1, child2)
 87 |                 del child1.fitness.values
 88 |                 del child2.fitness.values
 89 | 
 90 |         for mutant in offspring:
 91 |             if random.random() < probability_of_mutation:
 92 |                 toolbox.mutate(mutant)
 93 |                 del mutant.fitness.values
 94 | 
 95 |         invalid_ind = [ind for ind in offspring if not ind.fitness.valid]
 96 |         fitnesses = map(toolbox.evaluate, invalid_ind)
 97 |         for ind, fit in zip(invalid_ind, fitnesses):
 98 |             ind.fitness.values = fit
 99 |         # 选出来的个体(描述符)
100 |         print('  Evaluated %i individuals' % len(invalid_ind))
101 | 
102 |         pop[:] = offspring
103 |         fits = [ind.fitness.values[0] for ind in pop]
104 | 
105 |         length = len(pop)
106 |         mean = sum(fits) / length
107 |         sum2 = sum(x * x for x in fits)
108 |         std = abs(sum2 / length - mean ** 2) ** 0.5
109 | 
110 |         print('  Min %s' % min(fits))
111 |         print('  Max %s' % max(fits))
112 | 
113 |     best_individual = tools.selBest(pop, 1)[0]
114 |     best_individual_array = np.array(best_individual)
115 |     selected_x_variable_numbers = np.where(best_individual_array > threshold_of_variable_selection)[0]
116 | 
117 |     return selected_x_variable_numbers
118 | 
119 | if __name__ == '__main__':
120 |     pass


--------------------------------------------------------------------------------
/OpenSA/WaveSelect/Lar.py:
--------------------------------------------------------------------------------
 1 | """
 2 |     -*- coding: utf-8 -*-
 3 |     @Time   :2022/04/12 17:10
 4 |     @Author : Pengyou FU
 5 |     @blogs  : https://blog.csdn.net/Echo_Code?spm=1000.2115.3001.5343
 6 |     @github : https://github.com/FuSiry/OpenSA
 7 |     @WeChat : Fu_siry
 8 |     @License：Apache-2.0 license
 9 | 
10 | """
11 | 
12 | 
13 | from sklearn import linear_model
14 | import numpy as np
15 | 
16 | def Lar(X, y, nums=40):
17 |     '''
18 |            X : 预测变量矩阵
19 |            y ：标签
20 |            nums : 选择的特征点的数目，默认为40
21 |            return ：选择变量集的索引
22 |     '''
23 |     Lars = linear_model.Lars()
24 |     Lars.fit(X, y)
25 |     corflist = np.abs(Lars.coef_)
26 | 
27 |     corf = np.asarray(corflist)
28 |     SpectrumList = corf.argsort()[-1:-(nums+1):-1]
29 |     SpectrumList = np.sort(SpectrumList)
30 | 
31 |     return SpectrumList


--------------------------------------------------------------------------------
/OpenSA/WaveSelect/Pca.py:
--------------------------------------------------------------------------------
 1 | """
 2 |     -*- coding: utf-8 -*-
 3 |     @Time   :2022/04/12 17:10
 4 |     @Author : Pengyou FU
 5 |     @blogs  : https://blog.csdn.net/Echo_Code?spm=1000.2115.3001.5343
 6 |     @github : https://github.com/FuSiry/OpenSA
 7 |     @WeChat : Fu_siry
 8 |     @License：Apache-2.0 license
 9 | 
10 | """
11 | 
12 | from sklearn.decomposition import PCA
13 | 
14 | def Pca(X, nums=20):
15 |     """
16 |        :param X: raw spectrum data, shape (n_samples, n_features)
17 |        :param nums: Number of principal components retained
18 |        :return: X_reduction：Spectral data after dimensionality reduction
19 |     """
20 |     pca = PCA(n_components=nums)  # 保留的特征数码
21 |     pca.fit(X)
22 |     X_reduction = pca.transform(X)
23 | 
24 |     return X_reduction
25 | 


--------------------------------------------------------------------------------
/OpenSA/WaveSelect/Spa.py:
--------------------------------------------------------------------------------
  1 | """
  2 |     -*- coding: utf-8 -*-
  3 |     @Time   :2022/04/12 17:10
  4 |     @Author : Pengyou FU
  5 |     @blogs  : https://blog.csdn.net/Echo_Code?spm=1000.2115.3001.5343
  6 |     @github : https://github.com/FuSiry/OpenSA
  7 |     @WeChat : Fu_siry
  8 |     @License：Apache-2.0 license
  9 | 
 10 | """
 11 | 
 12 | 
 13 | import pandas as pd
 14 | import numpy as np
 15 | from scipy.linalg import qr, inv, pinv
 16 | import scipy.stats
 17 | import scipy.io as scio
 18 | # from progress.bar import Bar
 19 | from matplotlib import pyplot as plt
 20 | 
 21 | # ref: https://blog.csdn.net/qq2512446791
 22 | 
 23 | class SPA:
 24 | 
 25 |     def _projections_qr(self, X, k, M):
 26 |         '''
 27 |         X : 预测变量矩阵
 28 |         K ：投影操作的初始列的索引
 29 |         M : 结果包含的变量个数
 30 |         return ：由投影操作生成的变量集的索引
 31 |         '''
 32 | 
 33 |         X_projected = X.copy()
 34 | 
 35 |         # 计算列向量的平方和
 36 |         norms = np.sum((X ** 2), axis=0)
 37 |         # 找到norms中数值最大列的平方和
 38 |         norm_max = np.amax(norms)
 39 | 
 40 |         # 缩放第K列 使其成为“最大的”列
 41 |         X_projected[:, k] = X_projected[:, k] * 2 * norm_max / norms[k]
 42 | 
 43 |         # 矩阵分割 ，order 为列交换索引
 44 |         _, __, order = qr(X_projected, 0, pivoting=True)
 45 | 
 46 |         return order[:M].T
 47 | 
 48 |     def _validation(self, Xcal, ycal, var_sel, Xval=None, yval=None):
 49 |         '''
 50 |         [yhat,e] = validation(Xcal,var_sel,ycal,Xval,yval) -->  使用单独的验证集进行验证
 51 |         [yhat,e] = validation(Xcal,ycalvar_sel) --> 交叉验证
 52 |         '''
 53 |         N = Xcal.shape[0]  # N 测试集的个数
 54 |         if Xval is None:  # 判断是否使用验证集
 55 |             NV = 0
 56 |         else:
 57 |             NV = Xval.shape[0]  # NV 验证集的个数
 58 | 
 59 |         yhat = e = None
 60 | 
 61 |         # 使用单独的验证集进行验证
 62 |         if NV > 0:
 63 |             Xcal_ones = np.hstack(
 64 |                 [np.ones((N, 1)), Xcal[:, var_sel].reshape(N, -1)])
 65 | 
 66 |             # 对偏移量进行多元线性回归
 67 |             b = np.linalg.lstsq(Xcal_ones, ycal, rcond=None)[0]
 68 |             # 对验证集进行预测
 69 |             np_ones = np.ones((NV, 1))
 70 |             Xval_ = Xval[:, var_sel]
 71 |             X = np.hstack([np.ones((NV, 1)), Xval[:, var_sel]])
 72 |             yhat = X.dot(b)
 73 |             # 计算误差
 74 |             e = yval - yhat
 75 |         else:
 76 |             # 为yhat 设置适当大小
 77 |             yhat = np.zeros((N, 1))
 78 |             for i in range(N):
 79 |                 # 从测试集中 去除掉第 i 项
 80 |                 cal = np.hstack([np.arange(i), np.arange(i + 1, N)])
 81 |                 X = Xcal[cal, :][:, var_sel.astype(np.int)]
 82 |                 y = ycal[cal]
 83 |                 xtest = Xcal[i, var_sel]
 84 |                 # ytest = ycal[i]
 85 |                 X_ones = np.hstack([np.ones((N - 1, 1)), X.reshape(N - 1, -1)])
 86 |                 # 对偏移量进行多元线性回归
 87 |                 b = np.linalg.lstsq(X_ones, y, rcond=None)[0]
 88 |                 # 对验证集进行预测
 89 |                 yhat[i] = np.hstack([np.ones(1), xtest]).dot(b)
 90 |             # 计算误差
 91 |             e = ycal - yhat
 92 | 
 93 |         return yhat, e
 94 | 
 95 |     def spa(self, Xcal, ycal, m_min=1, m_max=None, Xval=None, yval=None, autoscaling=1):
 96 |         '''
 97 |          [var_sel,var_sel_phase2] = spa(Xcal,ycal,m_min,m_max,Xval,yval,autoscaling) --> 使用单独的验证集进行验证
 98 |          [var_sel,var_sel_phase2] = spa(Xcal,ycal,m_min,m_max,autoscaling) --> 交叉验证
 99 | 
100 |          如果 m_min 为空时， 默认 m_min = 1
101 |          如果 m_max 为空时：
102 |              1. 当使用单独的验证集进行验证时， m_max = min(N-1, K)
103 |              2. 当使用交叉验证时，m_max = min(N-2, K)
104 | 
105 |          autoscaling : 是否使用自动刻度 yes = 1，no = 0, 默认为 1
106 | 
107 |          '''
108 | 
109 |         assert (autoscaling == 0 or autoscaling == 1), "请选择是否使用自动计算"
110 | 
111 |         N, K = Xcal.shape
112 | 
113 |         if m_max is None:
114 |             if Xval is None:
115 |                 m_max = min(N - 1, K)
116 |             else:
117 |                 m_max = min(N - 2, K)
118 | 
119 |             assert (m_max < min(N - 1, K)), "m_max 参数异常"
120 | 
121 |         # 第一步： 对测试集进行投影操作
122 | 
123 |         #
124 | 
125 |         normalization_factor = None
126 |         if autoscaling == 1:
127 |             normalization_factor = np.std(
128 |                 Xcal, ddof=1, axis=0).reshape(1, -1)[0]
129 |         else:
130 |             normalization_factor = np.ones((1, K))[0]
131 | 
132 |         Xcaln = np.empty((N, K))
133 |         for k in range(K):
134 |             x = Xcal[:, k]
135 |             Xcaln[:, k] = (x - np.mean(x)) / normalization_factor[k]
136 | 
137 |         SEL = np.zeros((m_max, K))
138 | 
139 |         # 进度条
140 |         # with Bar('Projections :', max=K) as bar:
141 |         for k in range(K):
142 |             SEL[:, k] = self._projections_qr(Xcaln, k, m_max)
143 |         #        bar.next()
144 | 
145 |         # 第二步： 进行评估
146 | 
147 |         PRESS = float('inf') * np.ones((m_max + 1, K))
148 | 
149 |         # with Bar('Evaluation of variable subsets :', max=(K) * (m_max - m_min + 1)) as bar:
150 |         for k in range(K):
151 |             for m in range(m_min, m_max + 1):
152 |                 var_sel = SEL[:m, k].astype(np.int)
153 |                 _, e = self._validation(Xcal, ycal, var_sel, Xval, yval)
154 |                 PRESS[m, k] = np.conj(e).T.dot(e)
155 | 
156 |         #            bar.next()
157 | 
158 |         PRESSmin = np.min(PRESS, axis=0)
159 |         m_sel = np.argmin(PRESS, axis=0)
160 |         k_sel = np.argmin(PRESSmin)
161 | 
162 |         # 第 k_sel波段为初始波段时最佳，波段数目为 m_sel（k_sel）
163 |         var_sel_phase2 = SEL[:m_sel[k_sel], k_sel].astype(np.int)
164 | 
165 |         # 最后消去变量
166 | 
167 |         # 第 3.1 步 计算相关指数
168 |         Xcal2 = np.hstack([np.ones((N, 1)), Xcal[:, var_sel_phase2]])
169 |         b = np.linalg.lstsq(Xcal2, ycal, rcond=None)[0]
170 |         std_deviation = np.std(Xcal2, ddof=1, axis=0)
171 | 
172 |         relev = np.abs(b * std_deviation.T)
173 |         relev = relev[1:]
174 | 
175 |         index_increasing_relev = np.argsort(relev, axis=0)
176 |         index_decreasing_relev = index_increasing_relev[::-1].reshape(1, -1)[0]
177 | 
178 |         PRESS_scree = np.empty(len(var_sel_phase2))
179 |         yhat = e = None
180 |         for i in range(len(var_sel_phase2)):
181 |             var_sel = var_sel_phase2[index_decreasing_relev[:i + 1]]
182 |             _, e = self._validation(Xcal, ycal, var_sel, Xval, yval)
183 | 
184 |             PRESS_scree[i] = np.conj(e).T.dot(e)
185 | 
186 |         RMSEP_scree = np.sqrt(PRESS_scree / len(e))
187 | 
188 |         # 第 3.3： F-test 验证
189 |         PRESS_scree_min = np.min(PRESS_scree)
190 |         alpha = 0.25
191 |         dof = len(e)
192 |         fcrit = scipy.stats.f.ppf(1 - alpha, dof, dof)
193 |         PRESS_crit = PRESS_scree_min * fcrit
194 | 
195 |         # 找到不明显比 PRESS_scree_min 大的最小变量
196 | 
197 |         i_crit = np.min(np.nonzero(PRESS_scree < PRESS_crit))
198 |         i_crit = max(m_min, i_crit)
199 | 
200 |         var_sel = var_sel_phase2[index_decreasing_relev[:i_crit]]
201 | 
202 |         # print("var_sel")
203 |         # print(var_sel)
204 | 
205 |         return var_sel
206 | 
207 |     def __repr__(self):
208 |         return "SPA()"
209 | 


--------------------------------------------------------------------------------
/OpenSA/WaveSelect/Uve.py:
--------------------------------------------------------------------------------
 1 | """
 2 |     -*- coding: utf-8 -*-
 3 |     @Time   :2022/04/12 17:10
 4 |     @Author : Pengyou FU
 5 |     @blogs  : https://blog.csdn.net/Echo_Code?spm=1000.2115.3001.5343
 6 |     @github : https://github.com/FuSiry/OpenSA
 7 |     @WeChat : Fu_siry
 8 |     @License：Apache-2.0 license
 9 | 
10 | """
11 | 
12 | 
13 | 
14 | 
15 | from sklearn.cross_decomposition import PLSRegression
16 | from sklearn.linear_model import LinearRegression
17 | from sklearn.metrics import mean_squared_error
18 | from sklearn.model_selection import ShuffleSplit
19 | from sklearn.model_selection import cross_val_predict
20 | from sklearn.model_selection import cross_val_score
21 | from sklearn.utils import shuffle
22 | from numpy.linalg import matrix_rank as rank
23 | import numpy as np
24 | 
25 | class UVE:
26 |     def __init__(self, x, y, ncomp=1, nrep=500, testSize=0.2):
27 | 
28 |         '''
29 |         X : 预测变量矩阵
30 |         y ：标签
31 |         ncomp : 结果包含的变量个数
32 |         testSize: PLS中划分的数据集
33 |         return ：波长筛选后的光谱数据
34 |         '''
35 | 
36 |         self.x = x
37 |         self.y = y
38 |         # The number of latent components should not be larger than any dimension size of independent matrix
39 |         self.ncomp = min([ncomp, rank(x)])
40 |         self.nrep = nrep
41 |         self.testSize = testSize
42 |         self.criteria = None
43 | 
44 |         self.featureIndex = None
45 |         self.featureR2 = np.full(self.x.shape[1], np.nan)
46 |         self.selFeature = None
47 | 
48 |     def calcCriteria(self):
49 |         PLSCoef = np.zeros((self.nrep, self.x.shape[1]))
50 |         ss = ShuffleSplit(n_splits=self.nrep, test_size=self.testSize)
51 |         step = 0
52 |         for train, test in ss.split(self.x, self.y):
53 |             xtrain = self.x[train, :]
54 |             ytrain = self.y[train]
55 |             plsModel = PLSRegression(min([self.ncomp, rank(xtrain)]))
56 |             plsModel.fit(xtrain, ytrain)
57 |             PLSCoef[step, :] = plsModel.coef_.T
58 |             step += 1
59 |         meanCoef = np.mean(PLSCoef, axis=0)
60 |         stdCoef = np.std(PLSCoef, axis=0)
61 |         self.criteria = meanCoef / stdCoef
62 | 
63 |     def evalCriteria(self, cv=3):
64 |         self.featureIndex = np.argsort(-np.abs(self.criteria))
65 |         for i in range(self.x.shape[1]):
66 |             xi = self.x[:, self.featureIndex[:i + 1]]
67 |             if i<self.ncomp:
68 |                 regModel = LinearRegression()
69 |             else:
70 |                 regModel = PLSRegression(min([self.ncomp, rank(xi)]))
71 | 
72 |             cvScore = cross_val_score(regModel, xi, self.y, cv=cv)
73 |             self.featureR2[i] = np.mean(cvScore)
74 | 
75 |     def cutFeature(self, *args):
76 |         cuti = np.argmax(self.featureR2)
77 |         self.selFeature = self.featureIndex[:cuti+1]
78 |         if len(args) != 0:
79 |             returnx = list(args)
80 |             i = 0
81 |             for argi in args:
82 |                 if argi.shape[1] == self.x.shape[1]:
83 |                     returnx[i] = argi[:, self.selFeature]
84 |                 i += 1
85 |         return returnx


--------------------------------------------------------------------------------
/OpenSA/WaveSelect/WaveSelcet.py:
--------------------------------------------------------------------------------
 1 | """
 2 |     -*- coding: utf-8 -*-
 3 |     @Time   :2022/04/12 17:10
 4 |     @Author : Pengyou FU
 5 |     @blogs  : https://blog.csdn.net/Echo_Code?spm=1000.2115.3001.5343
 6 |     @github : https://github.com/FuSiry/OpenSA
 7 |     @WeChat : Fu_siry
 8 |     @License：Apache-2.0 license
 9 | 
10 | """
11 | 
12 | from WaveSelect.Lar import Lar
13 | from WaveSelect.Spa import SPA
14 | from WaveSelect.Uve import UVE
15 | from WaveSelect.Cars import CARS_Cloud
16 | from WaveSelect.Pca import Pca
17 | from WaveSelect.GA import GA
18 | from sklearn.model_selection import train_test_split
19 | 
20 | def SpctrumFeatureSelcet(method, X, y):
21 |     """
22 |        :param method: 波长筛选/降维的方法，包括：Cars, Lars, Uve, Spa, Pca
23 |        :param X: 光谱数据, shape (n_samples, n_features)
24 |        :param y: 光谱数据对应标签：格式：(n_samples，)
25 |        :return: X_Feature： 波长筛选/降维后的数据, shape (n_samples, n_features)
26 |                 y：光谱数据对应的标签, (n_samples，)
27 |     """
28 |     if method == "None":
29 |         X_Feature = X
30 |     elif method== "Cars":
31 |         Featuresecletidx = CARS_Cloud(X, y)
32 |         X_Feature = X[:, Featuresecletidx]
33 |     elif method == "Lars":
34 |         Featuresecletidx = Lar(X, y)
35 |         X_Feature = X[:, Featuresecletidx]
36 |     elif method == "Uve":
37 |         Uve = UVE(X, y, 7)
38 |         Uve.calcCriteria()
39 |         Uve.evalCriteria(cv=5)
40 |         Featuresecletidx = Uve.cutFeature(X)
41 |         X_Feature = Featuresecletidx[0]
42 |     elif method == "Spa":
43 |         Xcal, Xval, ycal, yval = train_test_split(X, y, test_size=0.2)
44 |         Featuresecletidx = SPA().spa(
45 |             Xcal= Xcal, ycal=ycal, m_min=8, m_max=50, Xval=Xval, yval=yval, autoscaling=1)
46 |         X_Feature = X[:, Featuresecletidx]
47 |     elif method == "GA":
48 |         Featuresecletidx = GA(X, y, 10)
49 |         X_Feature = X[:, Featuresecletidx]
50 |     elif method == "Pca":
51 |         X_Feature = Pca(X)
52 |     else:
53 |         print("no this method of SpctrumFeatureSelcet!")
54 | 
55 |     return X_Feature, y


--------------------------------------------------------------------------------
/OpenSA/WaveSelect/__pycache__/Cars.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/WaveSelect/__pycache__/Cars.cpython-38.pyc


--------------------------------------------------------------------------------
/OpenSA/WaveSelect/__pycache__/Cars.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/WaveSelect/__pycache__/Cars.cpython-39.pyc


--------------------------------------------------------------------------------
/OpenSA/WaveSelect/__pycache__/GA.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/WaveSelect/__pycache__/GA.cpython-38.pyc


--------------------------------------------------------------------------------
/OpenSA/WaveSelect/__pycache__/Lar.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/WaveSelect/__pycache__/Lar.cpython-38.pyc


--------------------------------------------------------------------------------
/OpenSA/WaveSelect/__pycache__/Lar.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/WaveSelect/__pycache__/Lar.cpython-39.pyc


--------------------------------------------------------------------------------
/OpenSA/WaveSelect/__pycache__/Pca.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/WaveSelect/__pycache__/Pca.cpython-38.pyc


--------------------------------------------------------------------------------
/OpenSA/WaveSelect/__pycache__/Pca.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/WaveSelect/__pycache__/Pca.cpython-39.pyc


--------------------------------------------------------------------------------
/OpenSA/WaveSelect/__pycache__/Spa.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/WaveSelect/__pycache__/Spa.cpython-38.pyc


--------------------------------------------------------------------------------
/OpenSA/WaveSelect/__pycache__/Spa.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/WaveSelect/__pycache__/Spa.cpython-39.pyc


--------------------------------------------------------------------------------
/OpenSA/WaveSelect/__pycache__/Uve.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/WaveSelect/__pycache__/Uve.cpython-38.pyc


--------------------------------------------------------------------------------
/OpenSA/WaveSelect/__pycache__/Uve.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/WaveSelect/__pycache__/Uve.cpython-39.pyc


--------------------------------------------------------------------------------
/OpenSA/WaveSelect/__pycache__/WaveSelcet.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/WaveSelect/__pycache__/WaveSelcet.cpython-38.pyc


--------------------------------------------------------------------------------
/OpenSA/WaveSelect/__pycache__/WaveSelcet.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/FuSiry/OpenSA/05ba2cd523ad76b365af829e19f4f683621de149/OpenSA/WaveSelect/__pycache__/WaveSelcet.cpython-39.pyc


--------------------------------------------------------------------------------
/OpenSA/example.py:
--------------------------------------------------------------------------------
  1 | """
  2 |     -*- coding: utf-8 -*-
  3 |     @Time   :2022/04/12 17:10
  4 |     @Author : Pengyou FU
  5 |     @blogs  : https://blog.csdn.net/Echo_Code?spm=1000.2115.3001.5343
  6 |     @github : https://github.com/FuSiry/OpenSA
  7 |     @WeChat : Fu_siry
  8 |     @License：Apache-2.0 license
  9 | 
 10 | """
 11 | 
 12 | 
 13 | import numpy as np
 14 | from DataLoad.DataLoad import SetSplit, LoadNirtest
 15 | from Preprocessing.Preprocessing import Preprocessing
 16 | from WaveSelect.WaveSelcet import SpctrumFeatureSelcet
 17 | # from Plot.SpectrumPlot import plotspc
 18 | # from Plot.SpectrumPlot import ClusterPlot
 19 | from Simcalculation.SimCa import Simcalculation
 20 | from Clustering.Cluster import Cluster
 21 | from Regression.Rgs import QuantitativeAnalysis
 22 | from Classification.Cls import QualitativeAnalysis
 23 | 
 24 | #光谱聚类分析
 25 | def SpectralClusterAnalysis(data, label, ProcessMethods, FslecetedMethods, ClusterMethods):
 26 |     """
 27 |      :param data: shape (n_samples, n_features), 光谱数据
 28 |      :param label: shape (n_samples, ), 光谱数据对应的标签(理化性质)
 29 |      :param ProcessMethods: string, 预处理的方法, 具体可以看预处理模块
 30 |      :param FslecetedMethods: string, 光谱波长筛选的方法, 提供UVE、SPA、Lars、Cars、Pca
 31 |      :param ClusterMethods : string, 聚类的方法，提供Kmeans聚类、FCM聚类
 32 |      :return: Clusterlabels: 返回的隶属矩阵
 33 | 
 34 |      """
 35 |     ProcesedData = Preprocessing(ProcessMethods, data)
 36 |     FeatrueData, _ = SpctrumFeatureSelcet(FslecetedMethods, ProcesedData, label)
 37 |     Clusterlabels = Cluster(ClusterMethods, FeatrueData)
 38 |     #ClusterPlot(data, Clusterlabels)
 39 |     return Clusterlabels
 40 | 
 41 | # 光谱定量分析
 42 | def SpectralQuantitativeAnalysis(data, label, ProcessMethods, FslecetedMethods, SetSplitMethods, model):
 43 | 
 44 |     """
 45 |     :param data: shape (n_samples, n_features), 光谱数据
 46 |     :param label: shape (n_samples, ), 光谱数据对应的标签(理化性质)
 47 |     :param ProcessMethods: string, 预处理的方法, 具体可以看预处理模块
 48 |     :param FslecetedMethods: string, 光谱波长筛选的方法, 提供UVE、SPA、Lars、Cars、Pca
 49 |     :param SetSplitMethods : string, 划分数据集的方法, 提供随机划分、KS划分、SPXY划分
 50 |     :param model : string, 定量分析模型, 包括ANN、PLS、SVR、ELM、CNN、SAE等，后续会不断补充完整
 51 |     :return: Rmse: float, Rmse回归误差评估指标
 52 |              R2: float, 回归拟合,
 53 |              Mae: float, Mae回归误差评估指标
 54 |     """
 55 |     ProcesedData = Preprocessing(ProcessMethods, data)
 56 |     FeatrueData, labels = SpctrumFeatureSelcet(FslecetedMethods, ProcesedData, label)
 57 |     X_train, X_test, y_train, y_test = SetSplit(SetSplitMethods, FeatrueData, labels, test_size=0.2, randomseed=123)
 58 |     Rmse, R2, Mae = QuantitativeAnalysis(model, X_train, X_test, y_train, y_test )
 59 |     return Rmse, R2, Mae
 60 | 
 61 | # 光谱定性分析
 62 | def SpectralQualitativeAnalysis(data, label, ProcessMethods, FslecetedMethods, SetSplitMethods, model):
 63 | 
 64 |     """
 65 |     :param data: shape (n_samples, n_features), 光谱数据
 66 |     :param label: shape (n_samples, ), 光谱数据对应的标签(理化性质)
 67 |     :param ProcessMethods: string, 预处理的方法, 具体可以看预处理模块
 68 |     :param FslecetedMethods: string, 光谱波长筛选的方法, 提供UVE、SPA、Lars、Cars、Pca
 69 |     :param SetSplitMethods : string, 划分数据集的方法, 提供随机划分、KS划分、SPXY划分
 70 |     :param model : string, 定性分析模型, 包括ANN、PLS_DA、SVM、RF、CNN、SAE等，后续会不断补充完整
 71 |     :return: acc： float, 分类准确率
 72 |     """
 73 | 
 74 |     ProcesedData = Preprocessing(ProcessMethods, data)
 75 |     FeatrueData, labels = SpctrumFeatureSelcet(FslecetedMethods, ProcesedData, label)
 76 |     X_train, X_test, y_train, y_test = SetSplit(SetSplitMethods, FeatrueData, labels, test_size=0.2, randomseed=123)
 77 |     acc = QualitativeAnalysis(model, X_train, X_test, y_train, y_test )
 78 | 
 79 |     return acc
 80 | 
 81 | 
 82 | 
 83 | 
 84 | 
 85 | 
 86 | 
 87 | 
 88 | 
 89 | if __name__ == '__main__':
 90 | 
 91 |     # ## 载入原始数据并可视化
 92 |     # data1, label1 = LoadNirtest('Cls')
 93 |     # #plotspc(data1, "raw specturm")
 94 |     # # 光谱定性分析演示
 95 |     # # 示意1: 预处理算法:MSC , 波长筛选算法: 不使用, 全波长建模, 数据集划分:随机划分, 定性分析模型: RF
 96 |     # acc = SpectralQualitativeAnalysis(data1, label1, "MSC", "Lars", "random", "PLS_DA")
 97 |     # print("The acc:{} of result!".format(acc))
 98 | 
 99 | 
100 |     ## 载入原始数据并可视化
101 |     data2, label2 = LoadNirtest('Rgs')
102 |     #plotspc(data2, "raw specturm")
103 |     # 光谱定量分析演示
104 |     # 示意1: 预处理算法:MSC , 波长筛选算法: Uve, 数据集划分:KS, 定性分量模型: SVR
105 |     RMSE, R2, MAE = SpectralQuantitativeAnalysis(data2, label2, "None", "None", "random", "CNN")
106 |     print("The RMSE:{} R2:{}, MAE:{} of result!".format(RMSE, R2, MAE))
107 | 
108 | 
109 | 
110 |     # ## 光谱预处理并可视化
111 |     # method = "SNV"
112 |     # Preprocessingdata = Preprocessing(method, data)
113 |     # plotspc(Preprocessingdata, method)
114 |     # ## 波长特征筛选并可视化
115 |     # method = 'Uve'
116 |     # SpectruSelected, y = SpctrumFeatureSelcet(method, data, label)
117 |     # print("全光谱数据维度")
118 |     # print(len(data[0,:]))
119 |     # print("经过{}波长筛选后的数据维度".format(method))
120 |     # print(len(SpectruSelected[0, :]))
121 |     # # #划分数据集
122 |     # X_train, X_test, y_train, y_test = SetSplit('spxy', SpectruSelected, y, 0.2, 123)
123 | 
124 | 
125 | 
126 | 
127 | 


--------------------------------------------------------------------------------
/OpenSA/opt/readme.txt:
--------------------------------------------------------------------------------
1 | 暂不提供


--------------------------------------------------------------------------------
/OpenSAV2:
--------------------------------------------------------------------------------
1 | #完善日志 20220521
2 | 给penSA算法完善
3 | 1、在波长筛选算法，加入了遗传算发GA
4 | 2、在定量分析算法中，加入ELM，普通卷积神经网络
5 | 以及复现了一区文章的网络DeepSpectra，和二区文章的网络1-D ALENET
6 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # OpenSA
  2 | Aiming at the common training datsets split, spectrum preprocessing, wavelength select and calibration models algorithm involved in the spectral analysis process, a complete algorithm library is established, which is named opensa (openspectrum analysis).
  3 | # 系列文章目录
  4 | <font size =4 color=Red>“光晰本质，谱见不同”，光谱作为物质的指纹，被广泛应用于成分分析中。伴随微型光谱仪/光谱成像仪的发展与普及，基于光谱的分析技术将不只停留于工业和实验室，即将走入生活，实现万物感知，见微知著。本系列文章致力于光谱分析技术的科普和应用。
  5 | <hr style=" border:solid; width:100px; height:1px;" color=#000000 size=1">
  6 | 
  7 | 
  8 | 
  9 | 
 10 | @[TOC](文章目录)
 11 | 
 12 | </font>
 13 | 
 14 | <hr style=" border:solid; width:100px; height:1px;" color=#000000 size=1">
 15 | 
 16 | # 前言
 17 | 典型的光谱分析模型(以近红外光谱作为示意，可见光、中远红外、荧光、拉曼、高光谱等分析流程亦相似)建立流程如下所示，在建立过程中，需要使用算法对训练样本进行选择，然后使用预处理算法对光谱进行预处理，或对光谱的特征进行提取，再构建校正模型实现定量分析，最后针对不同测量仪器或环境，进行模型转移或传递。因此训练样本的选择、光谱的预处理、波长筛选、校正模型、模型传递以及上述算法的参数都影响着模型的应用效果。
 18 | 
 19 | ![图 1近红外光谱建模及应用流程](https://img-blog.csdnimg.cn/e4038170fff643468cacfed4fb34ab04.png?x-oss-process=image/watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBARWNob19Db2Rl,size_20,color_FFFFFF,t_70,g_se,x_16)
 20 | 针对光谱分析流程所涉及的常见的训练样本的划分、光谱的预处理、波长筛选、校正模型算法建立了完整的算法库，名为OpenSA(OpenSpectrumAnalysis)。整套算法库的架构如下所示。
 21 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/cf63e5d8980542bf824cb889d01f2e00.png?x-oss-process=image/watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBARWNob19Db2Rl,size_20,color_FFFFFF,t_70,g_se,x_16)
 22 | 样本划分模块提供随机划分、SPXY划分、KS划分三种数据集划分方法，光谱预处理模块提供常见光谱预处理，波长筛选模块提供Spa、Cars、Lars、Uve、Pca等特征降维方法，分析模块由光谱相似度计算、聚类、分类(定性分析)、回归(定量分析)构建，光谱相似度子模块计算提供SAM、SID、MSSIM、MPSNR等相似计算方法，聚类子模块提供KMeans、FCM等聚类方法，分类子模块提供ANN、SVM、PLS_DA、RF等经典化学计量学方法，亦提供CNN、AE、Transformer等前沿深度学习方法，回归子模块提供ANN、SVR、PLS等经典化学计量学定量分析方法，亦提供CNN、AE、Transformer等前沿深度学习定量分析方法。模型评估模块提供常见的评价指标，用于模型评估。自动参数优化模块用于自动进行最佳的模型设置参数寻找，提供网格搜索、遗传算法、贝叶斯概率三种最优参数寻找方法。可视化模块提供全程的分析可视化，可为科研绘图，模型选择提供视觉信息。可通过几行代码快速实现完整的光谱分析及应用（注: 自动参数优化模块和可视化模块暂不开源，等毕业后再说)
 23 | 
 24 | 
 25 | <hr style=" border:solid; width:100px; height:1px;" color=#000000 size=1">
 26 | 
 27 | 
 28 | <font  size=5 color=bule >本篇针对OpenSA的光谱预处理模块进行代码开源和使用示意
 29 | # 更新日志 20220521
 30 | 给OpenSA完善了一下
 31 | 1、在波长筛选算法中，加入了遗传算法GA
 32 | 2、在定量分析算法中，加入ELM，普通卷积神经网络
 33 | 以及复现了一区文章的网络DeepSpectra，和二区文章的网络1-D ALENET                                                                      
 34 | 
 35 | # 一、光谱数据读入
 36 | 提供两个开源数据作为实列，一个为公开定量分析数据集，一个为公开定性分析数据集，本章仅以公开定量分析数据集作为演示。
 37 | ##  1.1 光谱数据读入
 38 | 
 39 | ```python
 40 | # 分别使用一个回归、一个分类的公开数据集做为example
 41 | def LoadNirtest(type):
 42 | 
 43 |     if type == "Rgs":
 44 |         CDataPath1 = './/Data//Rgs//Cdata1.csv'
 45 |         VDataPath1 = './/Data//Rgs//Vdata1.csv'
 46 |         TDataPath1 = './/Data//Rgs//Tdata1.csv'
 47 | 
 48 |         Cdata1 = np.loadtxt(open(CDataPath1, 'rb'), dtype=np.float64, delimiter=',', skiprows=0)
 49 |         Vdata1 = np.loadtxt(open(VDataPath1, 'rb'), dtype=np.float64, delimiter=',', skiprows=0)
 50 |         Tdata1 = np.loadtxt(open(TDataPath1, 'rb'), dtype=np.float64, delimiter=',', skiprows=0)
 51 | 
 52 |         Nirdata1 = np.concatenate((Cdata1, Vdata1))
 53 |         Nirdata = np.concatenate((Nirdata1, Tdata1))
 54 |         data = Nirdata[:, :-4]
 55 |         label = Nirdata[:, -1]
 56 | 
 57 |     elif type == "Cls":
 58 |         path = './/Data//Cls//table.csv'
 59 |         Nirdata = np.loadtxt(open(path, 'rb'), dtype=np.float64, delimiter=',', skiprows=0)
 60 |         data = Nirdata[:, :-1]
 61 |         label = Nirdata[:, -1]
 62 | 
 63 |     return data, label
 64 | 
 65 | ```
 66 | ##  1.2 光谱可视化
 67 | ```python
 68 |     #载入原始数据并可视化
 69 |     data, label = LoadNirtest('Rgs')
 70 |     plotspc(data, "raw specturm")
 71 | ```
 72 | 采用的开源光谱如图所示:
 73 | ![原始光谱](https://img-blog.csdnimg.cn/04a9549619fd48198c9072c2d1acfd99.png?x-oss-process=image/watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBARWNob19Db2Rl,size_20,color_FFFFFF,t_70,g_se,x_16)
 74 | 
 75 | # 二、光谱预处理
 76 | ##  2.1 光谱预处理模块
 77 | 将常见的光谱进行了封装，使用者仅需要改变名字，即可选择对应的光谱分析，下面是光谱预处理模块的核心代码
 78 | ```python
 79 | """
 80 |     -*- coding: utf-8 -*-
 81 |     @Time   :2022/04/12 17:10
 82 |     @Author : Pengyou FU
 83 |     @blogs  : https://blog.csdn.net/Echo_Code?spm=1000.2115.3001.5343
 84 |     @github :
 85 |     @WeChat : Fu_siry
 86 |     @License：
 87 | 
 88 | """
 89 | import numpy as np
 90 | from scipy import signal
 91 | from sklearn.linear_model import LinearRegression
 92 | from sklearn.preprocessing import MinMaxScaler, StandardScaler
 93 | from copy import deepcopy
 94 | import pandas as pd
 95 | import pywt
 96 | 
 97 | 
 98 | # 最大最小值归一化
 99 | def MMS(data):
100 |     """
101 |        :param data: raw spectrum data, shape (n_samples, n_features)
102 |        :return: data after MinMaxScaler :(n_samples, n_features)
103 |        """
104 |     return MinMaxScaler().fit_transform(data)
105 | 
106 | 
107 | # 标准化
108 | def SS(data):
109 |     """
110 |         :param data: raw spectrum data, shape (n_samples, n_features)
111 |        :return: data after StandScaler :(n_samples, n_features)
112 |        """
113 |     return StandardScaler().fit_transform(data)
114 | 
115 | 
116 | # 均值中心化
117 | def CT(data):
118 |     """
119 |        :param data: raw spectrum data, shape (n_samples, n_features)
120 |        :return: data after MeanScaler :(n_samples, n_features)
121 |        """
122 |     for i in range(data.shape[0]):
123 |         MEAN = np.mean(data[i])
124 |         data[i] = data[i] - MEAN
125 |     return data
126 | 
127 | 
128 | # 标准正态变换
129 | def SNV(data):
130 |     """
131 |         :param data: raw spectrum data, shape (n_samples, n_features)
132 |        :return: data after SNV :(n_samples, n_features)
133 |     """
134 |     m = data.shape[0]
135 |     n = data.shape[1]
136 |     print(m, n)  #
137 |     # 求标准差
138 |     data_std = np.std(data, axis=1)  # 每条光谱的标准差
139 |     # 求平均值
140 |     data_average = np.mean(data, axis=1)  # 每条光谱的平均值
141 |     # SNV计算
142 |     data_snv = [[((data[i][j] - data_average[i]) / data_std[i]) for j in range(n)] for i in range(m)]
143 |     return  data_snv
144 | 
145 | 
146 | 
147 | # 移动平均平滑
148 | def MA(data, WSZ=11):
149 |     """
150 |        :param data: raw spectrum data, shape (n_samples, n_features)
151 |        :param WSZ: int
152 |        :return: data after MA :(n_samples, n_features)
153 |     """
154 | 
155 |     for i in range(data.shape[0]):
156 |         out0 = np.convolve(data[i], np.ones(WSZ, dtype=int), 'valid') / WSZ # WSZ是窗口宽度，是奇数
157 |         r = np.arange(1, WSZ - 1, 2)
158 |         start = np.cumsum(data[i, :WSZ - 1])[::2] / r
159 |         stop = (np.cumsum(data[i, :-WSZ:-1])[::2] / r)[::-1]
160 |         data[i] = np.concatenate((start, out0, stop))
161 |     return data
162 | 
163 | 
164 | # Savitzky-Golay平滑滤波
165 | def SG(data, w=11, p=2):
166 |     """
167 |        :param data: raw spectrum data, shape (n_samples, n_features)
168 |        :param w: int
169 |        :param p: int
170 |        :return: data after SG :(n_samples, n_features)
171 |     """
172 |     return signal.savgol_filter(data, w, p)
173 | 
174 | 
175 | # 一阶导数
176 | def D1(data):
177 |     """
178 |        :param data: raw spectrum data, shape (n_samples, n_features)
179 |        :return: data after First derivative :(n_samples, n_features)
180 |     """
181 |     n, p = data.shape
182 |     Di = np.ones((n, p - 1))
183 |     for i in range(n):
184 |         Di[i] = np.diff(data[i])
185 |     return Di
186 | 
187 | 
188 | # 二阶导数
189 | def D2(data):
190 |     """
191 |        :param data: raw spectrum data, shape (n_samples, n_features)
192 |        :return: data after second derivative :(n_samples, n_features)
193 |     """
194 |     data = deepcopy(data)
195 |     if isinstance(data, pd.DataFrame):
196 |         data = data.values
197 |     temp2 = (pd.DataFrame(data)).diff(axis=1)
198 |     temp3 = np.delete(temp2.values, 0, axis=1)
199 |     temp4 = (pd.DataFrame(temp3)).diff(axis=1)
200 |     spec_D2 = np.delete(temp4.values, 0, axis=1)
201 |     return spec_D2
202 | 
203 | 
204 | # 趋势校正(DT)
205 | def DT(data):
206 |     """
207 |        :param data: raw spectrum data, shape (n_samples, n_features)
208 |        :return: data after DT :(n_samples, n_features)
209 |     """
210 |     lenth = data.shape[1]
211 |     x = np.asarray(range(lenth), dtype=np.float32)
212 |     out = np.array(data)
213 |     l = LinearRegression()
214 |     for i in range(out.shape[0]):
215 |         l.fit(x.reshape(-1, 1), out[i].reshape(-1, 1))
216 |         k = l.coef_
217 |         b = l.intercept_
218 |         for j in range(out.shape[1]):
219 |             out[i][j] = out[i][j] - (j * k + b)
220 | 
221 |     return out
222 | 
223 | 
224 | # 多元散射校正
225 | def MSC(data):
226 |     """
227 |        :param data: raw spectrum data, shape (n_samples, n_features)
228 |        :return: data after MSC :(n_samples, n_features)
229 |     """
230 |     n, p = data.shape
231 |     msc = np.ones((n, p))
232 | 
233 |     for j in range(n):
234 |         mean = np.mean(data, axis=0)
235 | 
236 |     # 线性拟合
237 |     for i in range(n):
238 |         y = data[i, :]
239 |         l = LinearRegression()
240 |         l.fit(mean.reshape(-1, 1), y.reshape(-1, 1))
241 |         k = l.coef_
242 |         b = l.intercept_
243 |         msc[i, :] = (y - b) / k
244 |     return msc
245 | 
246 | # 小波变换
247 | def wave(data):
248 |     """
249 |        :param data: raw spectrum data, shape (n_samples, n_features)
250 |        :return: data after wave :(n_samples, n_features)
251 |     """
252 |     data = deepcopy(data)
253 |     if isinstance(data, pd.DataFrame):
254 |         data = data.values
255 |     def wave_(data):
256 |         w = pywt.Wavelet('db8')  # 选用Daubechies8小波
257 |         maxlev = pywt.dwt_max_level(len(data), w.dec_len)
258 |         coeffs = pywt.wavedec(data, 'db8', level=maxlev)
259 |         threshold = 0.04
260 |         for i in range(1, len(coeffs)):
261 |             coeffs[i] = pywt.threshold(coeffs[i], threshold * max(coeffs[i]))
262 |         datarec = pywt.waverec(coeffs, 'db8')
263 |         return datarec
264 | 
265 |     tmp = None
266 |     for i in range(data.shape[0]):
267 |         if (i == 0):
268 |             tmp = wave_(data[i])
269 |         else:
270 |             tmp = np.vstack((tmp, wave_(data[i])))
271 | 
272 |     return tmp
273 | 
274 | def Preprocessing(method, data):
275 | 
276 |     if method == "None":
277 |         data = data
278 |     elif method == 'MMS':
279 |         data = MMS(data)
280 |     elif method == 'SS':
281 |         data = SS(data)
282 |     elif method == 'CT':
283 |         data = CT(data)
284 |     elif method == 'SNV':
285 |         data = SNV(data)
286 |     elif method == 'MA':
287 |         data = MA(data)
288 |     elif method == 'SG':
289 |         data = SG(data)
290 |     elif method == 'MSC':
291 |         data = MSC(data)
292 |     elif method == 'D1':
293 |         data = D1(data)
294 |     elif method == 'D2':
295 |         data = D2(data)
296 |     elif method == 'DT':
297 |         data = DT(data)
298 |     elif method == 'WVAE':
299 |         data = wave(data)
300 |     else:
301 |         print("no this method of preprocessing!")
302 | 
303 |     return data
304 | 
305 | 
306 | ```
307 | ## 2 .2 光谱预处理的使用
308 | 在example.py文件中，提供了光谱预处理模块的使用方法，具体如下，仅需要两行代码即可实现所有常见的光谱预处理。
309 | 示意1：利用OpenSA实现MSC多元散射校正
310 | ```python
311 |  #载入原始数据并可视化
312 |     data, label = LoadNirtest('Rgs')
313 |     plotspc(data, "raw specturm")
314 |     #光谱预处理并可视化
315 |     method = "MSC"
316 |     Preprocessingdata = Preprocessing(method, data)
317 |     plotspc(Preprocessingdata, method)
318 | ```
319 | 预处理后的光谱数据如图所示:
320 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/3b38f01e6ebe4a22821274bca50aa5a2.png?x-oss-process=image/watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBARWNob19Db2Rl,size_20,color_FFFFFF,t_70,g_se,x_16)
321 | 
322 | 
323 | 示意2：利用OpenSA实现SNV预处理
324 | 
325 | ```python
326 |     #载入原始数据并可视化
327 |     data, label = LoadNirtest('Rgs')
328 |     plotspc(data, "raw specturm")
329 |     #光谱预处理并可视化
330 |     method = "SNV"
331 |     Preprocessingdata = Preprocessing(method, data)
332 |     plotspc(Preprocessingdata, method)
333 | ```
334 | 预处理后的光谱数据如图所示:
335 | ![SNV](https://img-blog.csdnimg.cn/558d1c710da04519b72cab08da67e9cc.png?x-oss-process=image/watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBARWNob19Db2Rl,size_20,color_FFFFFF,t_70,g_se,x_16)
336 | # 总结
337 | <font color=#999AAA >利用OpenSA可以非常简单的实现对光谱的预处理，完整代码可从获得[GitHub仓库](https://github.com/FuSiry/OpenSA) 如果对您有用，请点赞！
338 | 代码现仅供学术使用，若对您的学术研究有帮助，请引用本人的论文，同时，未经许可不得用于商业化应用，欢迎大家继续补充OpenSA中所涉及到的算法
339 | 


--------------------------------------------------------------------------------