├── Lab1_PythonLearning
    ├── PythonLearning.md
    ├── Python基础教程(crossin全60课).pdf
    └── README.md
├── Lab2_SplicingSequencesCoding
    ├── EI-true-false_IE-true-false_seq.zip
    ├── sequence_coding.md
    └── splice_signal1.jpg
├── Lab3_Classifiers_KNN-LR-DT
    ├── EISplicing_DecisionTreeGraph.pdf
    └── classifiers1.md
├── Lab4_Classifiers_Bayes-SVM
    └── classifiers2.md
├── Lab5_PeptideSequencesCoding
    ├── AA531properties.txt
    ├── ACEtriPeptidesSequencesActivities.txt
    ├── SchematicOfGeneralApproachInQSAR.jpg
    └── sequence_coding2.md
├── Lab6_Regression_MLR-PLSR-SVR
    └── regress1.md
├── Lab7_FeatureReduction
    └── dimReduction.md
├── Lab8_UnsupervisedLearning
    └── Clustering.md
└── README.md


/Lab1_PythonLearning/PythonLearning.md:
--------------------------------------------------------------------------------
  1 | # 实验一：Python快速入门
  2 | 参考：[Python基础教程(crossin全60课)](./Python基础教程(crossin全60课).pdf)
  3 | 
  4 | ## 实验目的
  5 | * 1）认识Python
  6 | * 2）掌握Python的基本数据结构及常用操作
  7 | * 3）理解Python的程序控制：条件语句、循环语句
  8 | * 4）了解Python的函数定义与调用
  9 | 
 10 | ## 准备工作目录
 11 | ```
 12 | $ mkdir your_PattRecogLab_directory
 13 | $ cd your_PattRecogLab_directory
 14 | $ mkdir lab_01
 15 | $ cd lab_01
 16 | ```
 17 | 
 18 | ## 1. 安装并运行Python
 19 | ### 1.1 安装
 20 | * Windows系统：下载[Python](https://www.python.org/downloads/)，按步骤安装，在"Advanced Options"时，勾选"Add Python to environment variables"。
 21 | * Linux, Mac系统自带Python
 22 | 
 23 | ### 1.2 运行
 24 | * Windows系统：Win+R打开'运行'，输入cmd
 25 | * Linux系统：远程用Xshell, Putty, MobaXterm都可以
 26 | * Mac系统：本地打开Terminal App
 27 | * Python IDE: PyCharm, PythonSpider, Jupyter Notebook(推荐)
 28 | 
 29 | ```
 30 | $ python3
 31 | # 第一声啼哭：在命令行输入
 32 | >>> print("Hello World!")
 33 | ```
 34 | 
 35 | ## 2. 数据结构
 36 | ### 2.1 基本数据类型
 37 | ```python
 38 | name = 'Tomas' # 字符串变量（单引号、双引号都可）
 39 | myInt = 666 # 整数型变量
 40 | myFloat = 1.618 # 浮点型变量
 41 | myBool = True # 逻辑型变量
 42 | ```
 43 | 
 44 | ### 2.2 变量命名规则
 45 | > 1) 第一个字符必须是字母或者下划线
 46 | > 2) 剩下的部分可以是字母、下划线或数字
 47 | > 3) 变量名称对大小写敏感，比如myname和myName不是同一个变量
 48 | 
 49 | **几个有效的变量名**
 50 | > a, _abc, abc_12, a1b2_c3
 51 | 
 52 | **几个无效的变量名**
 53 | > 2dog, my-name
 54 | 
 55 | ### 2.3 列表(list)
 56 | ```python
 57 | myList1 = [4, 2, 3, 2, 5, 1] # 用一对`中括号`创建一个列表型变量，每个元素都是整数
 58 | myList2 = ['meat', 'egg', 'fish', 'milk'] # 每个元素都是字符串
 59 | myList3 = [365, 'everyday', 0.618, True] # 每个元素也可以是不同的基本数据类型
 60 | 
 61 | myList3[1] # 会输出'everyday', python中的元素计数从0开始（不同于R、MATLAB，但和其他大多数语言相同）
 62 | myList3[4] # 会报错，4代表'myList3'中的第5个元素，不存在！
 63 | myList3[2] = 'happy' # 将其中的第3个元素0.618修改为字符串'happy'
 64 | myList3.append(666.666) # 在myList3的尾巴上增加1个元素，这里使用了list的append方法/函数
 65 | myList3[4] # 不会报错，会输出 666.666
 66 | del(myList3[1]) # 将第2个元素'everyday'删除
 67 | 
 68 | # 此时myList3的内容为[365, 'happy', True, 666.666]
 69 | myList3[-1] # 表示索引/选取最后1个元素（特别要注意和正向索引的区别，正向从0、负向从1计数）
 70 | myList3[1:3] # 输出['happy', True], 这里使用了切片索引（特别特别注意：冒号前的切片包括、冒号后的切片不包括，这是初学python最坑的地方）
 71 | myList3[:3] # 输出[365, 'happy', True]，等价于myList3[0:3]
 72 | myList3[1:] # 输出['happy', True, 666.666]，等价于myList3[1:4]（虽然索引4代表不存在的第5个元素，但冒号后切片不包括，所以能取到最后1个元素。很奇葩！）
 73 | myList3[1:-1] # 输出['happy', True]，等价于myList3[1:3]
 74 | ```
 75 | 
 76 | ### 2.4 字符串
 77 | 在字符串中表示单引号
 78 | > "What's your name?"
 79 | 
 80 | 在字符串中表示双引号
 81 | > 'You are a "BAD" man'
 82 | 
 83 | 用\\'表示单引号，用\\"表示双引号（ \ 被称为转义字符，\n表示换行，\ 还可用在代码中换行）
 84 | > 'I\\'m a \\"good\\" student'
 85 | 
 86 | 字符串拼接
 87 | ```python
 88 | str1 = 'good'
 89 | str2 = 'student'
 90 | str3 = str1 + ' ' + str2
 91 | str4 = 'I\'m a ' + str1 + ' ' + str2
 92 | 
 93 | str_and_num = str1 + 666 # 字符串和数字相加会报错
 94 | str_and_num = str1 + str(666) # 用str函数把数字转换为字符串，不会报错
 95 | str_and_num = 'good %d' % 666 # 用%对字符串进行格式化，另外有%f, %.2f, %s等
 96 | "%s's score is %d" % ('Mike', 90) # 同时用多个%对多个变量格式化
 97 | 
 98 | # 假如我们有一个字符串列表
 99 | str_list = ['apple', 'pear', 'orange']
100 | '-'.join(str_list) # 输出'apple-pear-orange'，以短横线将各字符串元素连接
101 | ''.join(str_list) # 输出'applepearorange', 连接符可以是空串
102 | ```
103 | 
104 | 字符串分割
105 | ```python
106 | sentence = 'I am a sentence'
107 | sentence.split() # split()函数会默认按空格分割字符串，每个子串组成一个list
108 | section = 'Come on. Let\'s go. Go go go.'
109 | section.split('.') # 指定'.'为分隔符
110 | ```
111 | 
112 | 字符串的索引和切片
113 | ```python
114 | word = 'helloworld'
115 | word[0] # 输出'h'
116 | word[-2] # 输出'l'
117 | word[0] = 'w' # 会报错，字符串不允许通过索引修改其中字符。如果想修改，只有先转换成列表（用list函数），修改后再用空串将列表中每个字符连接成字符串（用''.join(yourList)）
118 | word[:5] # 输出'hello', 切片规则和list相同
119 | '^_^'.join(word) # 输出''h^_^e^_^l^_^l^_^o^_^w^_^o^_^r^_^l^_^d''，（我只是耍个帅 ^_^）
120 | ```
121 | 
122 | ### 2.5 字典（dictionary, 类似Perl中的哈希hash）
123 | 字典是键/值对(key:value)的集合，每个键值对之间用逗号分隔，整个字典包括在`花括号`中
124 | > 1) 键必须是唯一的
125 | > 2) 键只能是基本数据类型，如整数、浮点数、字符串、逻辑值，list不能作为键
126 | > 3) 值没有要求
127 | > 4) 键值对没有顺序，因此无法用索引访问字典中的内容，需要用键来访问
128 | ```python
129 | score = {'萧峰':95, '段誉':97, '虚竹':90, True:'Good brother', 100:'Perfect score'} # 用大括号建立一个字典
130 | score['虚竹'] # 输出键对应的值：90
131 | score[True] # 输出键对应的值：'Good brother'
132 | score['虚竹'] = 99 # 修改“虚竹”的得分（后期有无崖子和天山童姥的百余年功力）
133 | score['慕容复'] = 88 # 增加“慕容复”的得分
134 | del(score[100]) # 删除键为100的键值对（注意这里100不是索引，而是score的一个键）
135 | ```
136 | 
137 | ### 2.6 元组(tuple)
138 | 和列表list类似，但是元组中的元素不可更改，元组用`小括号`创建。元组和list同样有索引、切片、遍历等操作
139 | ```python
140 | position = (147, 258) # 创建一个只包含数字的元组
141 | weather = ('sunny', 'cloudy', 'rainy') # 创建一个只包含字符串的元组
142 | weather_id = (1, 'sunny', 2, 'cloudy', 3, 'rainy') # 创建一个包含数字和字符串的元组
143 | weather_id[:2] # 输出(1, 'sunny')
144 | ```
145 | 
146 | ### 2.7 数据类型转换
147 | ```python
148 | int('123') # 输出：123, 字符串转整数
149 | float('6.6') # 输出：6.6, 字符串转小数
150 | str(168) # 输出：'168', 数字转字符串
151 | bool(0) # 输出：False, 数字转逻辑值
152 | int('abc') # 会报错，字符串abc不可能转成数字，不符合常识
153 | 
154 | bool(-3) # 输出：True
155 | bool(False) # 输出：False, 此时的False是python中的特殊关键字，代表0
156 | bool('False') # 输出：True, 此时'False'只是个不为空的字符串
157 | bool('') # 输出：False, 什么都没有
158 | bool(' ') # 输出：True, 看上去空，实际有一个空格
159 | ```
160 | 
161 | 
162 | ## 3. 程序控制
163 | ### 3.1 逻辑判断
164 | ```python
165 | x1 = 2
166 | x2 = 8
167 | x1 < 3 # True
168 | x1 == x2 # False
169 | x1 != x2 # True
170 | 
171 | # -------- and ---------
172 | x1 < 10 and x2 < 10 # True
173 | x1 < 10 and x2 > 10 # False
174 | x1 > 10 and x2 < 10 # False
175 | x1 > 10 and x2 > 10 # False
176 | 
177 | # -------- or ---------
178 | x1 < 10 or x2 < 10 # True
179 | x1 < 10 or x2 > 10 # True
180 | x1 > 10 or x2 < 10 # True
181 | x1 > 10 or x2 > 10 # False
182 | 
183 | # -------- not ---------
184 | not(x1<10) # False
185 | not(x1>10) # True
186 | ```
187 | 
188 | ### 3.2 判断语句
189 | **格式**
190 | > if 判断条件: <br>
191 | >> 执行的内容1 <br>
192 | >> 执行的内容2 <br>
193 | 
194 | **特别说明：判断条件后面的`冒号`不能少，if内部的语句需要有`统一的缩进`，一般用4个空格或按一次tab键，并且整个文件要统一，不能空格和tab混用**
195 | ```python
196 | if x1 < 10:
197 |   print('x1 is less than 10') # 命令行会自动输出3个点，需要按一次tab键，然后再输入print命令，按2次回车，输出结果
198 | 
199 | # and判断
200 | if x1 < 10 and x2 < 10:
201 |   print('x1 and x2 are both less than 10')
202 | 
203 | # if...else...语句
204 | if x1 < x2:
205 |   print('x1 is less than x2')
206 | else:
207 |   print('x1 is greater than x2')
208 | 
209 | # if...elif...else语句
210 | if x1 > x2: # 可以通过设置x1, x2的大小，来得到不同的输出
211 |   print('x1 is more than x2')
212 | elif x1 > 10:
213 |   print('x1 is less than x2, but x1 is greater than 10')
214 | else:
215 |   print('x1 is less than x2, and x1 is less than 10')
216 | 
217 | # if的嵌套
218 | if x1 < 10:
219 |   if x2 > 10:
220 |     print('x1 is less than 10, but x2 is greater than 10')
221 |   else:
222 |     print('x1 and x2 are both less than 10')
223 | ```
224 | 
225 | ### 3.3 循环语句
226 | **while循环语句格式(同样注意判断条件后面的冒号不能丢)**
227 | > while 判断条件: <br>
228 | >> 执行的内容1 <br>
229 | >> 执行的内容2 <br>
230 | ```python
231 | iter_m = 0
232 | x1 = 2
233 | while x1 < 20:
234 |   x1 = x1*2
235 |   iter_m = iter_m+1
236 |   print('Iteration %d: x1 is %d' % (iter_m, x1))
237 | ```
238 | 
239 | **for循环语句格式(同样注意循环范围后面的冒号不能丢)**
240 | > for ... in 循环范围: <br>
241 | >> 执行的内容1 <br>
242 | >> 执行的内容2 <br>
243 | ```python
244 | x1 = 2
245 | for i in range(5,10): # 这里使用了range函数产生5到10之间的整数，但不包括10
246 |   x1 = x1*i
247 |   print('i in range(5,10) is %d: x1 is %d' % (i, x1))
248 | ```
249 | 
250 | **循环的嵌套**
251 | ```python
252 | for i in range(0,5):
253 |   for j in range(0,i+1): # 好好体会这里j循环的范围，是随着外层i的取值变化的
254 |     print('*', end='')
255 |   print()
256 | ```
257 | 
258 | **break: 满足条件则结束`本层`循环**
259 | ```python
260 | x1 = 2
261 | iter_m = 0
262 | while 1: # 最粗暴的判断条件，1代表无限循环，即死循环
263 |   x1 = x1+1
264 |   iter_m = iter_m+1
265 |   print('Iteration %d: x1 is %d' % (iter_m, x1))
266 |   if x1 > 10:
267 |     print('x1 is greater than 10 and the while loop should be break!')
268 |     break # 如果没有break语句，循环将无限进行下去
269 | ```
270 | 
271 | **continue: 满足条件则结束`本次`循环**
272 | ```python
273 | x1 = 2
274 | iter_m = 0
275 | while x1 < 10:
276 |   x1 = x1+1
277 |   iter_m = iter_m+1
278 |   if x1 % 2 == 0: # 如果x1是偶数，则打印提示，并结束本次循环
279 |     print('x1 is an even number: %d' % x1)
280 |     continue
281 |   print('Iteration %d: x1 is an odd number %d' % (iter_m, x1)) # 如果x1是奇数，打印提示与数值
282 | ```
283 | 
284 | ## 4. 读写文件
285 | 比如有一个文件：data.txt，内容如下
286 | > Hi man! <br>
287 | > I am a file. <br>
288 | > Try read my mind and print it onto screen! <br>
289 | 
290 | 一次性读所有内容，并一次性将所有内容写到一个新文件中
291 | ```python
292 | f = open('data.txt') # 使用open函数打开文件，并返回文件句柄给变量f
293 | data = f.read() # 使用read函数一次性读取所有内容（当文件很大时，慎用！！）
294 | f_out = open('data_out.txt', 'w') # 'w'表示写文件，什么都不加表示读文件
295 | f_out.write(data) # 将所有内容写到新文件
296 | f_out.close() # 关闭'data_out.txt'（不论多复杂的程序，一旦打开过文件，记得最后一定要关闭文件）
297 | f.close() # 关闭'data.txt'
298 | ```
299 | 
300 | 读一行处理一行，并将处理结果逐行写到一个新文件中
301 | ```python
302 | f = open('data.txt')
303 | line = f.readline() # 使用readline函数读取一行
304 | f_out = open('data_out.txt', 'w')
305 | iter_m = 1
306 | while line != '': # 当读到文件末尾，什么都没有，则结束循环
307 |   line = str(iter_m) + ': ' + line # 在当前内容之前加上行号和一个冒号
308 |   f_out.write(line) # 将处理后的行写到文件
309 |   iter_m = iter_m+1 # 行号累加
310 |   line = f.readline() # 读取新行（一定要读取新行，否则一直停留在第一行，即死循环）
311 | f_out.close()
312 | f.close()
313 | ```
314 | 
315 | ## 5. 函数
316 | 最简单的函数
317 | ```python
318 | # 定义sayHello函数
319 | def sayHello(): # 注意括号后的冒号
320 |   print('Hello world!')
321 | # 调用sayHello函数
322 | sayHello() # 输出：Hello world!
323 | ```
324 | 
325 | 带参数的函数
326 | ```python
327 | # 定义包含1个参数的sayHello函数
328 | def sayHello(who): # 一个参数
329 |   print(who + ' says Hello!')
330 | # 调用sayHello函数
331 | sayHello('Jack') # 输出：Jack says Hello!
332 | 
333 | # 定义包含2个参数的sayHello函数
334 | def sayHello(who, friend): # 两个参数
335 |   print('%s says Hello to %s!' % (who, friend))
336 | # 调用sayHello函数
337 | sayHello('Jack', 'Rose') # 输出：Jack says Hello to Rose!
338 | ```
339 | 
340 | ## 6. 模块(module)
341 | Python的模块，类似于R、Perl中的扩展包package，MATLAB中的工具箱toolbox
342 | ```python
343 | import os # 引入操作系统os模块
344 | os.getcwd() # 获得当前工作路径(current working directory, cwd)，如果不导入os模块，无法使用该函数
345 | dir(os) # 获得os模块中的所有函数和变量
346 | 
347 | # 每次使用os.getcwd太麻烦，而且os模块中很多函数用不着，导入后浪费资源
348 | from os import getcwd # 只导入getcwd函数
349 | getcwd() # 获得当前工作路径，函数前不用输入模块名
350 | 
351 | # 获得工作路径竟然要输入6个字母，人家Linux都只用3个字母pwd
352 | from os import getcwd as pwd # 只导入getcwd函数，并且重新命名为pwd
353 | pwd() # 获得当前工作路径，更快捷！
354 | ```
355 | 
356 | ## 7. 正则表达式
357 | 参考[Python基础教程(crossin全60课)](./Python基础教程(crossin全60课).pdf)中p126-p135.
358 | 
359 | ## 作业
360 | 尽可能练习每一条命令。犯错越多，进步越快！
361 | 


--------------------------------------------------------------------------------
/Lab1_PythonLearning/Python基础教程(crossin全60课).pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ZhijunBioinf/Pattern-Recognition-and-Prediction/a2a6f7feffb2b919a1c813fe4fef0c4f726aef92/Lab1_PythonLearning/Python基础教程(crossin全60课).pdf


--------------------------------------------------------------------------------
/Lab1_PythonLearning/README.md:
--------------------------------------------------------------------------------
1 | # 实验一 Python快速入门
2 | 课件：[PythonLearning.md](https://github.com/dai0992/Pattern-Recognition-and-Prediction/blob/master/Lab1_PythonLearning/PythonLearning.md)
3 | 
4 | 参考[pdf课件](https://github.com/dai0992/Pattern-Recognition-and-Prediction/blob/master/Lab1_PythonLearning/Python基础教程(crossin全60课).pdf)
5 | 


--------------------------------------------------------------------------------
/Lab2_SplicingSequencesCoding/EI-true-false_IE-true-false_seq.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ZhijunBioinf/Pattern-Recognition-and-Prediction/a2a6f7feffb2b919a1c813fe4fef0c4f726aef92/Lab2_SplicingSequencesCoding/EI-true-false_IE-true-false_seq.zip


--------------------------------------------------------------------------------
/Lab2_SplicingSequencesCoding/sequence_coding.md:
--------------------------------------------------------------------------------
  1 | # 实验二：序列表征/数值化1(以剪接位点识别为例)
  2 | 
  3 | ## 实验目的
  4 | * 1）了解剪接位点识别的研究背景
  5 | * 2）编程实现DNA序列的k-spaced碱基对组分特征表征/数值化
  6 | 
  7 | ## 1. 真核基因中的RNA剪接
  8 | * RNA剪接是指将前体mRNA中的内含子剪除，将留下的外显子拼接起来形成成熟mRNA的过程，它对真核基因表达起着关键作用。
  9 | * 一旦剪接过程发生错误，会使成熟mRNA丢失一段外显子或保留一段内含子，从而影响基因的正常表达。
 10 | * 研究表明，人类的许多疾病就是由RNA剪接异常引起。例如，地中海贫血症患者的珠蛋白基因中，约有1/4的核苷酸突变发生在内含子的5’端或3’端边界保守序列上。
 11 | * RNA剪接发生的位置被称作剪接位点，其中，内含子的5’端(外显子-内含子的边界点)为`供体位点`(常为GT)，内含子的3’端(内含子-外显子的边界点)为`受体位点`(AG)。剪接位点是RNA剪接的识别信号，也是RNA正确剪接的关键因素。
 12 | 
 13 | ![Sequence of splice sites](./splice_signal1.jpg?raw=true)
 14 | 
 15 | ## 2. 剪接位点预测的研究现状
 16 | * DNA序列中有更多的GT、AG为非剪接位点。因此，我们面临着一个极度不平衡的分类任务，即从含有大量非剪接位点的GT、AG中识别出极少量的真实剪接位点。
 17 | * 实验阶段：通过生物实验和序列比对方法确定剪接位点。优点：可靠性高。缺点：无法获得剪接机制的一般性结论，并且成本代价高，不利于大规模使用。
 18 | * 生物信息学方法：权重矩阵模型(WMM)，其使用每个位置的核苷酸频率表征序列[1]；加权数组方法(weighted array method, WAM)则考虑了相邻碱基之间的依赖关系，被认为是WMM的扩展[2]; GeneSplicer[3], NNsplice[4], SpliceView[5], SpliceMachine[6], MaLDoss[7]等。
 19 | 
 20 | ## 3. 数据集
 21 | * HS<sup>3</sup>D数据集拥有从GenBank Rel.123中提取的所有人类基因外显子、内含子和剪接区域，为训练和评估基因预测模型提供了标准化的材料，是目前人类基因剪接位点预测研究通用的大型数据集[8]。该数据集包含2796/2880个真实供体/受体位点和271937/329374个虚假供体/受体位点；所有位点均遵循“GT-AG”规则，由长度为140bp的序列样本表示，其中保守的二核苷酸GT位于序列第71、72位，保守的二核苷酸AG位于序列第69、70 位；所有序列均已剔除非ACGT碱基，且已去除冗余序列。
 22 | * 从HS<sup>3</sup>D数据集的所有虚假剪接位点序列中，随机抽取2796/2880条虚假供体/受体位点序列，与所有真实剪接位点序列构建正负样本均衡的数据集。
 23 | * 供体(EI)、受体(IE)位点序列：[EI_true.seq, EI_false.seq; IE_true.seq, IE_false.seq](./EI-true-false_IE-true-false_seq.zip)
 24 | 
 25 | ## 4. 基于序列组分的特征：k-spaced氨基酸/碱基对组分[9]
 26 | * 被k个任意碱基隔开的碱基对(base pairs)在核酸序列中的出现频率。
 27 | * 例如：k = 2时，需计算被2个碱基隔开的所有16种碱基对在序列中的出现频率。
 28 | * 通常需要设定一个k的上限，比如KMAX = 4，分别计算k = 0, 1, ..., KMAX时的组分特征。对于任意一条核酸序列，其k-spaced碱基对组分特征维数为：16x(KMAX+1)
 29 | * k-spaced组分特征兼顾序列组分信息和碱基之间的不同尺度关联效应，并且特征维数与序列长度无关。
 30 | 
 31 | ## 5. 工作目录准备与Python包准备
 32 | ```sh
 33 | # 建立lab_02文件夹
 34 | $ mkdir lab_02
 35 | $ cd lab_02
 36 | 
 37 | # 集群上若python3不可用，需先激活base环境
 38 | $ source /opt/miniconda3/bin/activate
 39 | $ conda activate
 40 | 
 41 | # 首先要安装Python的包管理工具pip
 42 | $ curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py   # 下载安装脚本
 43 | # 如果curl报错，用wget下载
 44 | $ wget https://bootstrap.pypa.io/get-pip.py
 45 | 
 46 | $ python3 get-pip.py    # 运行安装脚本
 47 | 
 48 | # 安装3个常用包：矩阵运算包numpy、数值计算scipy包、矩阵作图包matplotlib
 49 | $ pip3 install --user numpy scipy matplotlib -i https://pypi.tuna.tsinghua.edu.cn/simple
 50 | ```
 51 | 
 52 | ## 6. 序列表征
 53 | * 参考程序：kSpaceCoding.py
 54 | * 将以下代码保存为一个.py文件(如kSpaceCoding.py). 程序功能：读取'EI_true1.seq', 计算kSpace特征，并将结果保存至输出文件(如'EI_true1_kSpace.txt')
 55 | ```python
 56 | import numpy as np # 导入numpy包，并重命名为np
 57 | 
 58 | def file2matrix(filename, KMAX, bpTable):
 59 |     fr = open(filename) # 打开文件
 60 |     arrayOLines = fr.readlines() # 读取所有内容
 61 |     fr.close() # 及时关闭文件
 62 | 
 63 |     numberOfLines = len(arrayOLines) # 得到文件行数
 64 |     returnMat = np.zeros((numberOfLines, 16*(KMAX+1))) # 为返回的结果矩阵开辟内存
 65 |     lineNum = 0
 66 | 
 67 |     for line in arrayOLines:
 68 |         line = line.strip() # 删除空白符，包括行尾回车符
 69 |         listFromLine = line.split(': ') # 以': '为分隔符进行切片
 70 |         nt_seq = list(listFromLine[1]) # 取出核酸序列并转换成list
 71 |         del(nt_seq[70:72]) # 删除位于第71，72位的供体位点
 72 |         
 73 |         kSpaceVec = []
 74 |         for k in range(KMAX+1): # 计算不同k条件下的kSpace特征
 75 |             bpFreq = bpTable.copy() # bpTable是一个字典型变量，一定要用字典的copy函数，Python函数参数使用的址传递
 76 | 
 77 |             for m in range(len(nt_seq)-k-1): # 扫描序列，并计算不同碱基对的频率
 78 |                 bpFreq[nt_seq[m]+nt_seq[m+1+k]] += 1 # 序列的子串会自动在字典中寻找对应的key，很神奇！否则要自己写if语句匹配
 79 |             bpFreqVal = list(bpFreq.values()) # 取出bpFreq中的值并转换成list
 80 |             kSpaceVec.extend(np.array(bpFreqVal)/(len(nt_seq)-k-1)) # 每个k下的特征，需除以查找的所有子串数
 81 | 
 82 |         returnMat[lineNum,:] = kSpaceVec
 83 |         lineNum += 1
 84 |     return returnMat, lineNum
 85 | 
 86 | if __name__ == '__main__':
 87 |     filename = 'EI_true1.seq'
 88 |     KMAX = 4
 89 |     bpTable = {}
 90 |     for m in ('A','T','C','G'):
 91 |         for n in ('A','T','C','G'):
 92 |             bpTable[m+n] = 0
 93 | 
 94 |     kSpaceMat, SeqNum = file2matrix(filename, KMAX, bpTable)
 95 |     outputFileName = 'EI_true1_kSpace.txt'
 96 |     np.savetxt(outputFileName, kSpaceMat, fmt='%g', delimiter=',')
 97 |     print('The number of sequences is %d. Matrix of features is saved in %s' % (SeqNum, outputFileName))
 98 | ```
 99 | ```sh
100 | # 删除EI_true.seq的头4行描述
101 | $ sed '1,4d' EI_true.seq > EI_true1.seq
102 | # 运行程序（可自行在主函数中更改KMAX的值，观察结果。每次在文件中更改参数很麻烦，可自己上网搜索如何通过命令行传递参数）
103 | $ python3 kSpaceCoding.py
104 | ```
105 | 
106 | ## 作业
107 | 自己独立编写序列表征程序。不怕报错，犯错越多，进步越快！
108 | 
109 | ## 参考文献
110 | [1] Staden R. Computer methods to locate signals in nucleic acid sequences [J]. Nucleic Acids Research. 1984, 12(2):505. <br>
111 | [2] Zhang M Q, Marr T G. A weight array method for splicing signal analysis [J]. Computer applications in the biosciences: CABIOS, 1993, 9(5):499-509. <br>
112 | [3] Pertea M, Lin X Y, Salzberg S L. GeneSplicer: a new computational method for splice site prediction [J]. Nucleic Acids Research. 2001, 29:1185-1190. <br>
113 | [4] Reese M G, Eeckman F H, Kulp D, et al. Improved splice site detection in Genie [J]. Journal of Computational Biology, 1997, 4(3):311-323. <br>
114 | [5] Rogozin I B, Milanesi L. Analysis of donor splice signals in different organisms [J]. Journal of Molecular Evolution, 1997, 45(1):50-59. <br>
115 | [6] Degroeve S, Saeys Y, Baets B D, et al. SpliceMachine: predicting splice sites from high-dimensional local context representations [J]. Bioinformatics. 2005,21:1332-1338. <br>
116 | [7] Meher P K, Sahu T K, Rao A R. Prediction of donor splice sites using random forest with a new sequence encoding approach [J]. Biodata Mining. 2016,9:4. <br>
117 | [8] Pollastro P, Rampone S. HS3D, a dataset of Homo sapiens splice regions and its extraction procedure from a major public database [J]. International Journal of Modern Physics C, 2002, 13(8):1105–1117. <br>
118 | [9] Chen Y Z, Tang Y R, Sheng Z Y, et al. Prediction of mucin-type O-glycosylation sites in mammalian proteins using the composition of k-spaced amino acid pairs [J]. BMC Bioinformatics, 2008, 9(1):101-112.
119 | 
120 | ## 致谢
121 | 剪接位点研究背景，部分摘自湖南农业大学博士学位论文《基于卡方决策表的分子序列信号位点预测》(2019)。<br>
122 | 感谢曾莹博士提供其学位论文！
123 | 


--------------------------------------------------------------------------------
/Lab2_SplicingSequencesCoding/splice_signal1.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ZhijunBioinf/Pattern-Recognition-and-Prediction/a2a6f7feffb2b919a1c813fe4fef0c4f726aef92/Lab2_SplicingSequencesCoding/splice_signal1.jpg


--------------------------------------------------------------------------------
/Lab3_Classifiers_KNN-LR-DT/EISplicing_DecisionTreeGraph.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ZhijunBioinf/Pattern-Recognition-and-Prediction/a2a6f7feffb2b919a1c813fe4fef0c4f726aef92/Lab3_Classifiers_KNN-LR-DT/EISplicing_DecisionTreeGraph.pdf


--------------------------------------------------------------------------------
/Lab3_Classifiers_KNN-LR-DT/classifiers1.md:
--------------------------------------------------------------------------------
  1 | # 实验三：分类器之KNN, Logistic Regression, Decision Tree
  2 | 
  3 | ## 实验目的
  4 | * 1）提取供体真实位点与虚假位点序列的k-space组分特征；构建训练集与测试集
  5 | * 2）使用K近邻（K-Nearest Neighbor, KNN）完成剪接位点识别
  6 | * 3）使用逻辑斯蒂回归（Logistic Regression, LR）完成剪接位点识别
  7 | * 4）使用决策树（Decision Tree, DT）完成剪接位点识别
  8 | 
  9 | ## 准备工作目录
 10 | ```
 11 | $ mkdir lab_03
 12 | $ cd lab_03
 13 | # 建立lab_02路径中供体true位点、false位点序列文件的软链接
 14 | $ ln -s ../lab_02/EI_true.seq ../lab_02/EI_false.seq ./
 15 | 
 16 | # 集群上若python3不可用，需先激活base环境
 17 | $ source /opt/miniconda3/bin/activate
 18 | $ conda activate
 19 | ```
 20 | 
 21 | ## 1. 训练集与测试集构建
 22 | * 1）编写更好用的k-spaced碱基对组分特征表征程序（用于HS3D数据的供体真实/虚假位点序列表征）<br>
 23 | 参考程序：kSpaceCoding_general.py, 该程序避免了每次在程序中修改文件名和其他参数的麻烦。
 24 | ```python
 25 | import numpy as np # 导入numpy包，并重命名为np
 26 | import sys # 导入sys包，用于从命令行传递参数给python程序
 27 | 
 28 | def file2matrix(filename, bpTable, KMAX=2): # 为KMAX提供默认参数(updated)
 29 |     fr = open(filename) # 打开文件
 30 |     arrayOLines = fr.readlines() # 读取所有内容
 31 |     del(arrayOLines[:4]) # 删除头4行（updated, 避免了运行程序之前，另外使用sed删除头4行）
 32 |     fr.close() # 及时关闭文件
 33 |     
 34 |     numberOfLines = len(arrayOLines) # 得到文件行数
 35 |     returnMat = np.zeros((numberOfLines, 16*(KMAX+1))) # 为返回的结果矩阵开辟内存
 36 |     
 37 |     lineNum = 0
 38 |     for line in arrayOLines:
 39 |         line = line.strip() # 删除空白符，包括行尾回车符
 40 |         listFromLine = line.split(': ') # 以': '为分隔符进行切片
 41 |         nt_seq = list(listFromLine[1]) # 取出核酸序列并转换成list
 42 |         del(nt_seq[70:72]) # 删除位于第71，72位的供体位点
 43 |         
 44 |         kSpaceVec = []
 45 |         for k in range(KMAX+1): # 计算不同k条件下的kSpace特征
 46 |             bpFreq = bpTable.copy() # bpTable是一个字典型变量，一定要用字典的copy函数，Python函数参数使用的址传递
 47 | 
 48 |             for m in range(len(nt_seq)-k-1): # 扫描序列，并计算不同碱基对的频率
 49 |                 sub_str = nt_seq[m]+nt_seq[m+1+k] # 提出子串(updated)
 50 |                 if sub_str in bpFreq.keys(): # 如果子串在bpFreq中有对应的key，才统计频次(updated, NOTE:在供体虚假位点序列中存在非正常碱基)
 51 |                     bpFreq[sub_str] += 1 # 序列的子串会自动在字典中寻找对应的key，很神奇！否则要自己写if语句匹配
 52 |             bpFreqVal = list(bpFreq.values()) # 取出bpFreq中的值并转换成list
 53 |             kSpaceVec.extend(np.array(bpFreqVal)/(len(nt_seq)-k-1)) # 每个k下的特征，需除以查找的所有子串数
 54 | 
 55 |         returnMat[lineNum,:] = kSpaceVec
 56 |         lineNum += 1
 57 |         if (lineNum % 1000) == 0:
 58 |             print('Extracting k-spaced features: %d sequences, done!' % lineNum)
 59 |     return returnMat, lineNum
 60 | 
 61 | if __name__ == '__main__':
 62 |     filename = sys.argv[1]
 63 |     outputFileName = sys.argv[2]
 64 |     KMAX = int(sys.argv[3])
 65 |     bpTable = {}
 66 |     for m in ('A','T','C','G'):
 67 |         for n in ('A','T','C','G'):
 68 |             bpTable[m+n] = 0
 69 |     
 70 |     kSpaceMat, SeqNum = file2matrix(filename, bpTable, KMAX)
 71 |     np.savetxt(outputFileName, kSpaceMat, fmt='%g', delimiter=',')
 72 |     print('The number of sequences is %d. Matrix of features is saved in %s' % (SeqNum, outputFileName))
 73 | ```
 74 | 
 75 | ```bash
 76 | # 获得供体真实位点序列表征结果：在命令行指定序列文件名为'EI_true.seq'，输出结果文件名为'EI_true_kSpace.txt'，KMAX值为4
 77 | $ python3 kSpaceCoding_general.py EI_true.seq EI_true_kSpace.txt 4
 78 | # 获得供体虚假位点序列表征结果：在命令行指定序列文件名为'EI_false.seq'，输出结果文件名为'EI_false_kSpace.txt'，KMAX值为4
 79 | $ python3 kSpaceCoding_general.py EI_false.seq EI_false_kSpace.txt 4
 80 | ```
 81 | 
 82 | * 2）以序列表征文件构建训练集、测试集 <br>
 83 | ```bash
 84 | # 首先安装机器学习包sklearn
 85 | $ pip3 install --user scikit-learn -i https://pypi.tuna.tsinghua.edu.cn/simple
 86 | ```
 87 | 
 88 | 参考程序：getTrainTest.py
 89 | ```python
 90 | import numpy as np
 91 | import sys
 92 | from random import sample # 导入sample函数，用于从虚假位点数据中随机抽取样本
 93 | from sklearn.model_selection import train_test_split # 用于产生训练集、测试集
 94 | 
 95 | trueSiteFileName = sys.argv[1]
 96 | falseSiteFileName = sys.argv[2]
 97 | trueSitesData = np.loadtxt(trueSiteFileName, delimiter = ',') # 载入true位点数据
 98 | numOfTrue = len(trueSitesData)
 99 | falseSitesData = np.loadtxt(falseSiteFileName, delimiter = ',') # 载入false位点数据
100 | numOfFalse = len(falseSitesData)
101 | randVec = sample(range(numOfFalse), len(trueSitesData)) # 随机产生true位点样本个数的随机向量
102 | falseSitesData = falseSitesData[randVec,] # 以随机向量从false位点数据中抽取样本
103 | 
104 | Data = np.vstack((trueSitesData, falseSitesData)) # 按行将true位点与false位点数据组合
105 | Y = np.vstack((np.ones((numOfTrue,1)),np.zeros((numOfTrue,1)))) # 产生Y列向量
106 | testSize = 0.3 # 测试集30%，训练集70%
107 | X_train, X_test, y_train, y_test = train_test_split(Data, Y, test_size = testSize, random_state = 0)
108 | 
109 | trainingSetFileName = sys.argv[3]
110 | testSetFileName = sys.argv[4]
111 | np.savetxt(trainingSetFileName, np.hstack((y_train, X_train)), fmt='%g', delimiter=',') # 将Y与X以列组合后，保存到文件
112 | np.savetxt(testSetFileName, np.hstack((y_test, X_test)), fmt='%g', delimiter=',')
113 | print('Generate training set(%d%%) and test set(%d%%): Done!' % ((1-testSize)*100, testSize*100))
114 | ```
115 | 
116 | ```bash
117 | # 构建训练集与测试集：在命令行指定true位点数据、false位点数据、train文件、test文件
118 | $ python3 getTrainTest.py EI_true_kSpace.txt EI_false_kSpace.txt EI_train.txt EI_test.txt
119 | ```
120 | 
121 | ## 2. 以KNN进行剪接位点识别
122 | 参考程序：myKNN.py
123 | ```python
124 | import numpy as np
125 | from sklearn import neighbors # 导入KNN包
126 | import sys
127 | 
128 | train = np.loadtxt(sys.argv[1], delimiter=',') # 载入训练集，在命令行指定文件名
129 | test = np.loadtxt(sys.argv[2], delimiter=',') # 载入测试集
130 | 
131 | n_neighbors = int(sys.argv[3]) # 在命令行指定邻居数
132 | weights = 'uniform' # 每个邻居的权重相等
133 | clf = neighbors.KNeighborsClassifier(n_neighbors, weights=weights) # 创建一个KNN的实例
134 | trX = train[:,1:]
135 | trY = train[:,0]
136 | clf.fit(trX, trY) # 训练模型
137 | 
138 | teX = test[:,1:]
139 | teY = test[:,0]
140 | predY = clf.predict(teX) # 预测测试集
141 | Acc = sum(predY==teY)/len(teY) # 计算预测正确的样本数
142 | print('Prediction Accuracy of KNN: %g%% (%d/%d)' % (Acc*100, sum(predY==teY), len(teY)))
143 | ```
144 | 
145 | ```bash
146 | # KNN分类器：在命令行指定训练集、测试集、近邻数K
147 | $ python3 myKNN.py EI_train.txt EI_test.txt 10
148 | ```
149 | 
150 | ## 3. 以Logistic回归进行剪接位点识别
151 | 参考程序：myLR.py
152 | ```python
153 | import numpy as np
154 | from sklearn import linear_model # 导入线性模型包
155 | import sys
156 | 
157 | train = np.loadtxt(sys.argv[1], delimiter=',') # 载入训练集
158 | test = np.loadtxt(sys.argv[2], delimiter=',') # 载入测试集
159 | 
160 | maxIterations = int(sys.argv[3]) # 在命令行指定最大迭代次数
161 | clf = linear_model.LogisticRegression(max_iter=maxIterations) # 创建一个LR的实例
162 | trX = train[:,1:]
163 | trY = train[:,0]
164 | clf.fit(trX, trY) # 训练模型
165 | 
166 | teX = test[:,1:]
167 | teY = test[:,0]
168 | predY = clf.predict(teX) # 预测测试集
169 | Acc = sum(predY==teY)/len(teY) # 计算预测正确的样本数
170 | print('Prediction Accuracy of LR: %g%% (%d/%d)' % (Acc*100, sum(predY==teY), len(teY)))
171 | ```
172 | 
173 | ```bash
174 | # LR分类器：在命令行指定训练集、测试集、迭代次数
175 | $ python3 myLR.py EI_train.txt EI_test.txt 1000
176 | ```
177 | 
178 | ## 4. 以Decision Tree进行剪接位点识别
179 | 参考程序：myDT.py
180 | ```python
181 | import numpy as np
182 | from sklearn import tree # 导入Decision Trees包
183 | import sys
184 | import graphviz # 导入Graphviz包
185 | 
186 | train = np.loadtxt(sys.argv[1], delimiter=',') # 载入训练集
187 | test = np.loadtxt(sys.argv[2], delimiter=',') # 载入测试集
188 | 
189 | clf = tree.DecisionTreeClassifier() # 创建一个DT的实例
190 | trX = train[:,1:]
191 | trY = train[:,0]
192 | clf.fit(trX, trY) # 训练模型
193 | 
194 | teX = test[:,1:]
195 | teY = test[:,0]
196 | predY = clf.predict(teX) # 预测测试集
197 | Acc = sum(predY==teY)/len(teY) # 计算预测正确的样本数
198 | print('Prediction Accuracy of DT: %g%% (%d/%d)' % (Acc*100, sum(predY==teY), len(teY)))
199 | 
200 | # Export the tree in Graphviz format
201 | graphFileName = sys.argv[3] # 从命令行指定图文件名称
202 | dotData = tree.export_graphviz(clf, out_file=None)
203 | graph = graphviz.Source(dotData)
204 | graph.render(graphFileName)
205 | print('The tree in Graphviz format is saved in "%s.pdf".' % graphFileName)
206 | ```
207 | 
208 | ```bash
209 | # 安装Graphviz绘图包
210 | $ pip3 install --user graphviz -i https://pypi.tuna.tsinghua.edu.cn/simple
211 | # DT分类器：在命令行指定训练集、测试集、DT图文件名
212 | $ python3 myDT.py EI_train.txt EI_test.txt EISplicing_DecisionTreeGraph
213 | ```
214 | [获得的DT树](https://github.com/ZhijunBioinf/Pattern-Recognition-and-Prediction/blob/master/Lab3_Classifiers_KNN-LR-DT/EISplicing_DecisionTreeGraph.pdf)
215 | 
216 | ## 作业
217 | 1. 尽量看懂`参考程序`的每一行代码。
218 | 2. 参考程序kSpaceCoding_general.py中，供体位点序列的第71、72位保守二核苷酸GT是在程序中指定的，试着改写程序，实现从命令行传递`位置信息`给程序。
219 | 3. 参考程序getTrainTest.py中，测试集的比例testSize是在程序中指定的，试着改写程序，实现从命令行传递`划分比例`给程序。
220 | 4. 熟练使用sklearn包中的不同分类器。 <br>
221 | 不怕报错，犯错越多，进步越快！
222 | 
223 | ## 参考
224 | * KNN手册：[sklearn.neighbors.KNeighborsClassifier](https://scikit-learn.org/stable/modules/neighbors.html#nearest-neighbors-classification)
225 | * LR手册：[sklearn.linear_model.LogisticRegression](https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression)
226 | * DT手册：[sklearn.tree.DecisionTreeClassifier](https://scikit-learn.org/stable/modules/tree.html#classification)
227 | 


--------------------------------------------------------------------------------
/Lab4_Classifiers_Bayes-SVM/classifiers2.md:
--------------------------------------------------------------------------------
  1 | # 实验四：分类器之NB, SVC
  2 | 
  3 | ## 实验目的
  4 | * 1）使用朴素贝叶斯(Naive Bayes, NB)完成剪接位点识别
  5 | * 2）使用支持向量分类(Support Vector Classification, SVC)完成剪接位点识别
  6 | * 3）理解交叉验证（cross validation）过程。
  7 | 
  8 | ## 准备工作目录
  9 | ```
 10 | $ mkdir lab_04
 11 | $ cd lab_04
 12 | # 建立lab_03路径中供体位点训练集、测试集文件的软链接
 13 | $ ln -s ../lab_03/EI_train.txt ../lab_03/EI_test.txt ./
 14 | 
 15 | # 集群上若python3不可用，需先激活base环境
 16 | $ source /opt/miniconda3/bin/activate
 17 | $ conda activate
 18 | ```
 19 | 
 20 | ## 1. 以NB进行剪接位点识别
 21 | 参考程序：myNB.py
 22 | ```python3
 23 | import numpy as np
 24 | from sklearn import naive_bayes # 导入NB包
 25 | import sys
 26 | 
 27 | train = np.loadtxt(sys.argv[1], delimiter=',') # 载入训练集，在命令行指定文件名
 28 | test = np.loadtxt(sys.argv[2], delimiter=',') # 载入测试集
 29 | 
 30 | clf = naive_bayes.GaussianNB() # 创建一个NB的实例
 31 | trX = train[:,1:]
 32 | trY = train[:,0]
 33 | clf.fit(trX, trY) # 训练模型
 34 | 
 35 | teX = test[:,1:]
 36 | teY = test[:,0]
 37 | predY = clf.predict(teX) # 预测测试集
 38 | Acc = sum(predY==teY)/len(teY) # 计算预测正确的样本数
 39 | print('Prediction Accuracy of NB: %g%%(%d/%d)' % (Acc*100, sum(predY==teY), len(teY)))
 40 | ```
 41 | 
 42 | ```bash
 43 | # NB分类器：在命令行指定训练集、测试集
 44 | $ python3 myNB.py EI_train.txt EI_test.txt
 45 | ```
 46 | 
 47 | ## 2. 以SVC进行剪接位点识别
 48 | [不同核函数的SVC](https://scikit-learn.org/stable/_images/sphx_glr_plot_iris_svc_0011.png) <br>
 49 | 参考程序：mySVC.py
 50 | ```python3
 51 | import numpy as np
 52 | from sklearn import svm # 导入svm包
 53 | import sys
 54 | from sklearn import preprocessing # 导入数据预处理包
 55 | from sklearn.model_selection import GridSearchCV # 导入参数寻优包
 56 | from random import sample
 57 | 
 58 | train = np.loadtxt(sys.argv[1], delimiter=',') # 载入训练集
 59 | test = np.loadtxt(sys.argv[2], delimiter=',') # 载入测试集
 60 | 
 61 | train = train[sample(range(len(train)), 200),] # 考虑到SVM运行时间较长，从train中随机抽200样本用于后续建模
 62 | trX = train[:,1:]
 63 | trY = train[:,0]
 64 | teX = test[:,1:]
 65 | teY = test[:,0]
 66 | 
 67 | isScale = int(sys.argv[3]) # 建模前，是否将每个特征归一化到[-1,1]
 68 | kernelFunction = sys.argv[4] # {‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’}, default=’rbf’
 69 | isChooseCG = int(sys.argv[5]) # 是否寻找最优参数：c, g
 70 | 
 71 | if isScale:
 72 |     min_max_scaler = preprocessing.MinMaxScaler(feature_range=(-1,1))
 73 |     trX = min_max_scaler.fit_transform(trX)
 74 |     teX = min_max_scaler.transform(teX)
 75 | 
 76 | if isChooseCG:
 77 |     numOfFolds = int(sys.argv[6]) # 命令行指定寻优过程的交叉验证次数
 78 |     C_range = np.power(2, np.arange(-5,15,2.0)) # 指定C的范围
 79 |     gamma_range = np.power(2, np.arange(3,-15,-2.0)) # 指定g的范围
 80 |     parameters = dict(gamma=gamma_range, C=C_range) # 将c, g组成字典，用于参数的grid遍历
 81 |     
 82 |     clf = svm.SVC(kernel=kernelFunction) # 创建一个SVC的实例
 83 |     grid = GridSearchCV(clf, param_grid=parameters, cv=numOfFolds) # 创建一个GridSearchCV实例
 84 |     grid.fit(trX, trY) # grid寻优c, g
 85 |     print("The best parameters are %s with a score of %g" % (grid.best_params_, grid.best_score_))
 86 |     clf = svm.SVC(kernel=kernelFunction, C=grid.best_params_['C'], gamma=grid.best_params_['gamma'])
 87 | else:
 88 |     clf = svm.SVC(kernel=kernelFunction)
 89 |     
 90 | clf.fit(trX, trY) # 训练模型
 91 | predY = clf.predict(teX) # 预测测试集
 92 | Acc = sum(predY==teY)/len(teY) # 计算预测正确的样本数
 93 | print('Prediction Accuracy of SVC: %g%%(%d/%d)' % (Acc*100, sum(predY==teY), len(teY)))
 94 | ```
 95 | 
 96 | * SVM运行时间较长，将命令写到脚本中再用qsub提交任务 <br>
 97 | work_mySVC.sh
 98 | ```bash
 99 | #!/bin/bash
100 | #$ -S /bin/bash
101 | #$ -N mySVC
102 | #$ -j y
103 | #$ -cwd
104 | 
105 | # 激活base环境，保证计算节点上正常运行python3
106 | source /opt/miniconda3/bin/activate
107 | conda activate
108 | 
109 | # SVC分类器：在命令行指定训练集、测试集，规格化，线性核，参数寻优，10次交叉
110 | echo '------ scale: 1; kernel: linear; chooseCG: 1; numOfCV: 10 --------'
111 | python3 mySVC.py EI_train.txt EI_test.txt 1 linear 1 10
112 | echo
113 | 
114 | # 规格化，线性核，不参数寻优
115 | echo '------ scale: 1; kernel: linear; chooseCG: 0 --------'
116 | python3 mySVC.py EI_train.txt EI_test.txt 1 linear 0
117 | echo
118 | 
119 | # 规格化，径向基核(rbf)，参数寻优，10次交叉
120 | echo '------ scale: 1; kernel: rbf; chooseCG: 1; numOfCV: 10 --------'
121 | python3 mySVC.py EI_train.txt EI_test.txt 1 rbf 1 10
122 | echo
123 | 
124 | # 规格化，径向基核(rbf)，不参数寻优
125 | echo '------ scale: 1; kernel: rbf; chooseCG: 0 --------'
126 | python3 mySVC.py EI_train.txt EI_test.txt 1 rbf 0
127 | echo
128 | ```
129 | ```
130 | # qsub提交任务
131 | $ qsub work_mySVC.sh
132 | ```
133 | 
134 | * 尝试更多的选项搭配，看精度变化，比如数据不规格化时，各种核函数、是否参数寻优、不同交叉验证次数等情形的预测精度。
135 | 
136 | ## 作业
137 | 1. 尽量看懂`参考程序`的每一行代码。  
138 | 2. 熟练使用sklearn包中的NB, SVC分类器。  
139 | 3. 完成受体剪接位点的NB、SVC分类预测。  
140 | 不怕报错，犯错越多，进步越快！  
141 | 
142 | ## 参考
143 | * NB手册：[sklearn.naive_bayes.GaussianNB](https://scikit-learn.org/stable/modules/naive_bayes.html#gaussian-naive-bayes)
144 | * SVM手册：[sklearn.svm.SVC](https://scikit-learn.org/stable/modules/svm.html#classification)
145 | 


--------------------------------------------------------------------------------
/Lab5_PeptideSequencesCoding/AA531properties.txt:
--------------------------------------------------------------------------------
 1 | AA	ANDN920101	ARGP820101	ARGP820102	ARGP820103	BEGF750101	BEGF750102	BEGF750103	BHAR880101	BIGC670101	BIOV880101	BIOV880102	BROC820101	BROC820102	BULH740101	BULH740102	BUNA790101	BUNA790102	BUNA790103	BURA740101	BURA740102	CHAM810101	CHAM820101	CHAM820102	CHAM830101	CHAM830102	CHAM830103	CHAM830104	CHAM830105	CHAM830106	CHAM830107	CHAM830108	CHOC750101	CHOC760101	CHOC760102	CHOC760103	CHOC760104	CHOP780101	CHOP780201	CHOP780202	CHOP780203	CHOP780204	CHOP780205	CHOP780206	CHOP780207	CHOP780208	CHOP780209	CHOP780210	CHOP780211	CHOP780212	CHOP780213	CHOP780214	CHOP780215	CHOP780216	CIDH920101	CIDH920102	CIDH920103	CIDH920104	CIDH920105	COHE430101	CRAJ730101	CRAJ730102	CRAJ730103	DAWD720101	DAYM780101	DAYM780201	DESM900101	DESM900102	EISD840101	EISD860101	EISD860102	EISD860103	FASG760101	FASG760102	FASG760103	FASG760104	FASG760105	FAUJ830101	FAUJ880101	FAUJ880102	FAUJ880103	FAUJ880104	FAUJ880105	FAUJ880106	FAUJ880107	FAUJ880108	FAUJ880109	FAUJ880110	FAUJ880111	FAUJ880112	FAUJ880113	FINA770101	FINA910101	FINA910102	FINA910103	FINA910104	GARJ730101	GEIM800101	GEIM800102	GEIM800103	GEIM800104	GEIM800105	GEIM800106	GEIM800107	GEIM800108	GEIM800109	GEIM800110	GEIM800111	GOLD730101	GOLD730102	GRAR740101	GRAR740102	GRAR740103	GUYH850101	HOPA770101	HOPT810101	HUTJ700101	HUTJ700102	HUTJ700103	ISOY800101	ISOY800102	ISOY800103	ISOY800104	ISOY800105	ISOY800106	ISOY800107	ISOY800108	JANJ780101	JANJ780102	JANJ780103	JANJ790101	JANJ790102	JOND750101	JOND750102	JOND920101	JOND920102	JUKT750101	JUNJ780101	KANM800101	KANM800102	KANM800103	KANM800104	KARP850101	KARP850102	KARP850103	KHAG800101	KLEP840101	KRIW710101	KRIW790101	KRIW790102	KRIW790103	KYTJ820101	LAWE840101	LEVM760101	LEVM760102	LEVM760103	LEVM760104	LEVM760105	LEVM760106	LEVM760107	LEVM780101	LEVM780102	LEVM780103	LEVM780104	LEVM780105	LEVM780106	LEWP710101	LIFS790101	LIFS790102	LIFS790103	MANP780101	MAXF760101	MAXF760102	MAXF760103	MAXF760104	MAXF760105	MAXF760106	MCMT640101	MEEJ800101	MEEJ800102	MEEJ810101	MEEJ810102	MEIH800101	MEIH800102	MEIH800103	MIYS850101	NAGK730101	NAGK730102	NAGK730103	NAKH900101	NAKH900102	NAKH900103	NAKH900104	NAKH900105	NAKH900106	NAKH900107	NAKH900108	NAKH900109	NAKH900110	NAKH900111	NAKH900112	NAKH900113	NAKH920101	NAKH920102	NAKH920103	NAKH920104	NAKH920105	NAKH920106	NAKH920107	NAKH920108	NISK800101	NISK860101	NOZY710101	OOBM770101	OOBM770102	OOBM770103	OOBM770104	OOBM770105	OOBM850101	OOBM850102	OOBM850103	OOBM850104	OOBM850105	PALJ810101	PALJ810102	PALJ810103	PALJ810104	PALJ810105	PALJ810106	PALJ810107	PALJ810108	PALJ810109	PALJ810110	PALJ810111	PALJ810112	PALJ810113	PALJ810114	PALJ810115	PALJ810116	PARJ860101	PLIV810101	PONP800101	PONP800102	PONP800103	PONP800104	PONP800105	PONP800106	PONP800107	PONP800108	PRAM820101	PRAM820102	PRAM820103	PRAM900101	PRAM900102	PRAM900103	PRAM900104	PTIO830101	PTIO830102	QIAN880101	QIAN880102	QIAN880103	QIAN880104	QIAN880105	QIAN880106	QIAN880107	QIAN880108	QIAN880109	QIAN880110	QIAN880111	QIAN880112	QIAN880113	QIAN880114	QIAN880115	QIAN880116	QIAN880117	QIAN880118	QIAN880119	QIAN880120	QIAN880121	QIAN880122	QIAN880123	QIAN880124	QIAN880125	QIAN880126	QIAN880127	QIAN880128	QIAN880129	QIAN880130	QIAN880131	QIAN880132	QIAN880133	QIAN880134	QIAN880135	QIAN880136	QIAN880137	QIAN880138	QIAN880139	RACS770101	RACS770102	RACS770103	RACS820101	RACS820102	RACS820103	RACS820104	RACS820105	RACS820106	RACS820107	RACS820108	RACS820109	RACS820110	RACS820111	RACS820112	RACS820113	RACS820114	RADA880101	RADA880102	RADA880103	RADA880104	RADA880105	RADA880106	RADA880107	RADA880108	RICJ880101	RICJ880102	RICJ880103	RICJ880104	RICJ880105	RICJ880106	RICJ880107	RICJ880108	RICJ880109	RICJ880110	RICJ880111	RICJ880112	RICJ880113	RICJ880114	RICJ880115	RICJ880116	RICJ880117	ROBB760101	ROBB760102	ROBB760103	ROBB760104	ROBB760105	ROBB760106	ROBB760107	ROBB760108	ROBB760109	ROBB760110	ROBB760111	ROBB760112	ROBB760113	ROBB790101	ROSG850101	ROSG850102	ROSM880101	ROSM880102	ROSM880103	SIMZ760101	SNEP660101	SNEP660102	SNEP660103	SNEP660104	SUEM840101	SUEM840102	SWER830101	TANS770101	TANS770102	TANS770103	TANS770104	TANS770105	TANS770106	TANS770107	TANS770108	TANS770109	TANS770110	VASM830101	VASM830102	VASM830103	VELV850101	VENT840101	VHEG790101	WARP780101	WEBA780101	WERD780101	WERD780102	WERD780103	WERD780104	WOEC730101	WOLR810101	WOLS870101	WOLS870102	WOLS870103	YUTK870101	YUTK870102	YUTK870103	YUTK870104	ZASB820101	ZIMJ680101	ZIMJ680102	ZIMJ680103	ZIMJ680104	ZIMJ680105	AURR980101	AURR980102	AURR980103	AURR980104	AURR980105	AURR980106	AURR980107	AURR980108	AURR980109	AURR980110	AURR980111	AURR980112	AURR980113	AURR980114	AURR980115	AURR980116	AURR980117	AURR980118	AURR980119	AURR980120	ONEK900101	ONEK900102	VINM940101	VINM940102	VINM940103	VINM940104	MUNV940101	MUNV940102	MUNV940103	MUNV940104	MUNV940105	WIMW960101	KIMC930101	MONM990101	BLAM930101	PARS000101	PARS000102	KUMS000101	KUMS000102	KUMS000103	KUMS000104	TAKK010101	FODM020101	NADH010101	NADH010102	NADH010103	NADH010104	NADH010105	NADH010106	NADH010107	MONM990201	KOEP990101	KOEP990102	CEDJ970101	CEDJ970102	CEDJ970103	CEDJ970104	CEDJ970105	FUKS010101	FUKS010102	FUKS010103	FUKS010104	FUKS010105	FUKS010106	FUKS010107	FUKS010108	FUKS010109	FUKS010110	FUKS010111	FUKS010112	MITS020101	TSAJ990101	TSAJ990102	COSI940101	PONP930101	WILM950101	WILM950102	WILM950103	WILM950104	KUHL950101	GUOD860101	JURD980101	BASU050101	BASU050102	BASU050103	SUYM030101	PUNT030101	PUNT030102	GEOR030101	GEOR030102	GEOR030103	GEOR030104	GEOR030105	GEOR030106	GEOR030107	GEOR030108	GEOR030109	ZHOH040101	ZHOH040102	ZHOH040103	BAEK050101	HARY940101	PONJ960101	DIGM050101	WOLR790101	OLSK800101	KIDA850101	GUYH850102	GUYH850104	GUYH850105	JACR890101	COWR900101	BLAS910101	CASG920101	CORJ870101	CORJ870102	CORJ870103	CORJ870104	CORJ870105	CORJ870106	CORJ870107	CORJ870108	MIYS990101	MIYS990102	MIYS990103	MIYS990104	MIYS990105	ENGD860101	FASG890101
 2 | A	4.35	0.61	1.18	1.56	1	0.77	0.37	0.357	52.6	16	44	7.3	3.9	-0.2	0.691	8.249	4.349	6.5	0.486	0.288	0.52	0.046	-0.368	0.71	-0.118	0	0	0	0	0	0	91.5	115	25	0.38	0.2	0.66	1.42	0.83	0.74	1.29	1.2	0.7	0.52	0.86	0.75	0.67	0.74	0.06	0.076	0.035	0.058	0.64	-0.45	-0.08	0.36	0.17	0.02	0.75	1.33	1	0.6	2.5	8.6	100	1.56	1.26	0.25	0.67	0	0	89.09	297	1.8	9.69	2.34	0.31	1.28	0.53	1	2.87	1.52	2.04	7.3	-0.01	0	0	0	0	4.76	1.08	1	1	1.2	1	0.28	1.29	1.13	1.55	1.19	0.84	0.86	0.91	0.91	0.8	1.1	0.93	0.75	88.3	0	8.1	31	0.1	1	-0.5	29.22	30.88	154.33	1.53	0.86	0.78	1.09	0.35	1.09	1.34	0.47	27.8	51	15	1.7	0.3	0.87	2.34	0.077	100	5.3	685	1.36	0.81	1.45	0.75	1.041	0.946	0.892	49.1	0	4.6	4.32	0.28	27.5	1.8	-0.48	-0.5	0.77	121.9	243.2	0.77	5.2	0.025	1.29	0.9	0.77	1.32	0.86	0.79	0.22	0.92	1	0.9	12.97	1.43	0.86	0.64	0.17	1.13	1	4.34	0.5	-0.1	1.1	1	0.93	0.94	87	2.36	1.29	0.96	0.72	7.99	3.73	5.74	-0.6	5.88	-0.57	5.39	-0.7	9.25	0.34	10.17	6.61	1.61	8.63	10.88	5.15	5.04	9.9	6.69	5.08	9.36	0.23	-0.22	0.5	-1.895	-1.404	-0.491	-9.475	-7.02	2.01	1.34	0.46	-2.49	4.55	1.3	1.32	0.81	0.9	0.84	0.65	1.08	1.34	1.15	0.89	0.82	0.98	0.69	0.87	0.91	0.92	2.1	-2.89	12.28	7.62	2.63	13.65	14.6	10.67	3.7	6.05	0.305	0.175	0.687	-6.7	1.29	0.9	0.78	1.1	1	0.12	0.26	0.64	0.29	0.68	0.34	0.57	0.33	0.13	0.31	0.21	0.18	-0.08	-0.18	-0.01	-0.19	-0.14	-0.31	-0.1	-0.25	-0.26	0.05	-0.44	-0.31	-0.02	-0.06	-0.05	-0.19	-0.43	-0.19	-0.25	-0.27	-0.42	-0.24	-0.14	0.01	-0.3	-0.23	0.08	0.934	0.941	1.16	0.85	1.58	0.82	0.78	0.88	0.3	0.4	1.48	0	1.02	0.93	0.99	17.05	14.53	1.81	0.52	0.13	1.29	1.42	93.7	-0.29	-0.06	0.7	0.7	0.5	1.2	1.6	1	1.1	1.4	1.8	1.8	1.3	0.7	1.4	1.1	0.8	1	0.7	6.5	2.3	6.7	2.3	-2.3	-2.7	0	-5	-3.3	-4.7	-3.7	-2.5	-5.1	-1	86.6	0.74	-0.67	-0.67	0.4	0.73	0.239	0.33	-0.11	-0.062	1.071	8	-0.4	1.42	0.946	0.79	1.194	0.497	0.937	0.289	0.328	0.945	0.842	0.135	0.507	0.159	0.03731	0	-12.04	10.04	0.89	0.52	0.16	0.15	-0.07	7	1.94	0.07	-1.73	0.09	8.5	6.8	18.08	18.56	-0.152	0.83	11.5	0	6	9.9	0.94	0.98	1.05	0.75	0.67	1.1	1.39	1.43	1.55	1.8	1.52	1.49	1.73	1.33	1.87	1.19	0.77	0.93	1.09	0.71	13.4	-0.77	0.984	1.315	0.994	0.783	0.423	0.619	1.08	0.978	1.4	4.08	-0.35	0.5	0.96	0.343	0.32	8.9	9.2	14.1	13.4	9.8	0.7	58	51	41	32	24	5	-2	0.4	-0.04	-0.12	8.6	7.6	8.1	7.9	8.3	4.47	6.77	7.43	5.22	9.88	10.98	9.95	8.26	7.39	9.07	8.82	6.65	0	89.3	90	0.0373	0.85	0.06	2.62	-1.64	-2.34	0.78	25	1.1	0.1366	0.0728	0.151	-0.058	-0.17	-0.15	0.964	0.974	0.938	1.042	1.065	0.99	0.892	1.092	0.843	2.18	1.79	13.4	0.0166	90.1	91.5	1.076	1.12	1.38	-0.27	0.05	-0.31	-0.27	0.18	0.42	0.616	0.2	50.76	-0.414	-0.96	-0.26	-0.73	-1.35	-0.56	1.37	-0.02	0	-0.03	-0.04	-0.02	-1.6	-0.21
 3 | R	4.38	0.6	0.2	0.45	0.52	0.72	0.84	0.529	109.1	-70	-68	-3.6	3.2	-0.12	0.728	8.274	4.396	6.9	0.262	0.362	0.68	0.291	-1.03	1.06	0.124	1	1	1	5	0	1	202	225	90	0.01	0	0.95	0.98	0.93	1.01	0.44	1.25	0.34	1.24	0.9	0.9	0.89	1.05	0.07	0.106	0.099	0.085	1.05	-0.24	-0.09	-0.52	-0.7	-0.42	0.7	0.79	0.74	0.79	7.5	4.9	65	0.59	0.38	-1.76	-2.1	10	-0.96	174.2	238	12.5	8.99	1.82	-1.01	2.34	0.69	6.13	7.82	1.52	6.24	11.1	0.04	4	3	1	0	4.3	1.05	0.7	0.7	1.7	1.7	0.1	1	1.09	0.2	1	1.04	1.15	0.99	1	0.96	0.93	1.01	0.75	181.2	0.65	10.5	124	1.91	2.3	3	26.37	68.43	341.01	1.17	0.98	1.06	0.97	0.75	1.07	2.78	0.52	94.7	5	67	0.1	-1.4	0.85	1.18	0.051	83	2.6	382	1	0.85	1.15	0.79	1.038	1.028	0.901	133	1	6.5	6.55	0.34	105	-4.5	-0.06	3	3.72	121.4	206.6	2.38	6	0.2	0.96	0.99	0.88	0.98	0.97	0.9	0.28	0.93	0.68	1.02	11.72	1.18	0.94	0.62	0.76	0.48	1.18	26.66	0.8	-4.5	-0.4	-2	0.98	1.09	81	1.92	0.83	0.67	1.33	5.86	3.34	1.92	-1.18	1.54	-1.29	2.81	-0.91	3.96	-0.57	1.21	0.41	0.4	6.75	6.01	4.38	3.73	0.09	6.65	4.75	0.27	-0.26	-0.93	0	-1.475	-0.921	-0.554	-16.225	-10.131	0.84	0.95	-1.54	2.55	5.97	0.93	1.04	1.03	0.75	0.91	0.93	0.93	0.91	1.06	1.06	0.99	1.03	0	1.3	0.77	0.9	4.2	-3.3	11.49	6.81	2.45	11.28	13.24	11.05	2.53	5.7	0.227	0.083	0.59	51.5	0.96	0.99	0.88	0.95	0.7	0.04	-0.14	-0.1	-0.03	-0.22	0.22	0.23	0.1	0.08	0.18	0.07	0.21	0.05	-0.13	0.02	0.03	0.14	0.25	0.19	-0.02	-0.09	-0.11	-0.13	-0.1	0.04	0.02	0.06	0.17	0.06	-0.07	0.12	-0.4	-0.23	-0.04	0.21	-0.13	-0.09	-0.2	-0.01	0.962	1.112	1.72	2.02	1.14	2.6	1.75	0.99	0.9	1.2	1.02	0	1	1.52	1.19	21.25	17.82	-14.92	-1.32	-5	-13.6	-18.6	250.4	-2.71	-0.84	0.4	0.4	0.4	0.7	0.9	0.4	1.5	1.2	1.3	1	0.8	0.8	2.1	1	0.9	1.4	1.1	-0.9	-5.2	0.3	1.4	0.4	0.4	1.1	2.1	0	2	1	-1.2	2.6	0.3	162.2	0.64	12.1	3.89	0.3	0.73	0.211	-0.176	0.079	-0.167	1.033	0.1	-0.59	1.06	1.128	1.087	0.795	0.677	1.725	1.38	2.088	0.364	0.936	0.296	0.459	0.194	0.09593	0	39.23	6.18	0.88	0.49	-0.2	-0.37	-0.4	9.1	-19.92	2.88	2.52	-3.44	0	0	0	0	-0.089	0.83	14.28	52	10.76	4.6	1.15	1.14	0.81	0.9	0.76	1.05	0.95	1.33	1.39	1.73	1.49	1.41	1.24	1.39	1.66	1.45	1.11	0.96	1.29	1.09	13.3	-0.68	1.008	1.31	1.026	0.807	0.503	0.753	0.976	0.784	1.23	3.91	-0.44	1.7	0.77	0.353	0.327	4.6	3.6	5.5	3.9	7.3	0.95	-184	-144	-109	-95	-79	-57	-41	1.5	-0.3	0.34	4.2	5	4.6	4.9	8.7	8.48	6.87	4.51	7.3	3.71	3.26	3.05	2.8	5.91	4.9	3.71	5.17	2.45	190.3	194	0.0959	0.2	-0.85	1.26	-3.28	1.6	1.58	-7	-5.1	0.0363	0.0394	-0.0103	0	0.37	0.32	1.143	1.129	1.137	1.069	1.131	1.132	1.154	1.239	1.038	2.71	3.2	8.5	-0.0762	192.8	196.1	1.361	-2.55	0	1.87	0.12	1.3	2	-5.4	-1.56	0	-0.7	48.66	-0.584	0.75	0.08	-1.03	-3.89	-0.26	1.33	0.44	0.07	0.09	0.07	0.08	12.3	2.11
 4 | N	4.75	0.06	0.23	0.27	0.35	0.55	0.97	0.463	75.7	-74	-72	-5.7	-2.8	0.08	0.596	8.747	4.755	7.5	0.193	0.229	0.76	0.134	0	1.37	0.289	1	1	0	2	1	1	135.2	160	63	0.12	0.03	1.56	0.67	0.89	1.46	0.81	0.59	1.42	1.64	0.66	1.21	1.86	1.13	0.161	0.083	0.191	0.091	1.56	-0.2	-0.7	-0.9	-0.9	-0.77	0.61	0.72	0.75	1.42	5	4.3	134	0.51	0.59	-0.64	-0.6	1.3	-0.86	132.12	236	-5.6	8.8	2.02	-0.6	1.6	0.58	2.95	4.58	1.52	4.37	8	0.06	2	3	0	0	3.64	0.85	1.7	1	1.2	1	0.25	0.81	1.06	1.2	0.94	0.66	0.6	0.72	1.64	1.1	1.57	1.36	0.69	125.1	1.33	11.6	56	0.48	2.2	0.2	38.3	41.7	207.9	0.6	0.74	1.56	1.14	2.12	0.88	0.92	2.16	60.1	22	49	0.4	-0.5	0.09	2.02	0.043	104	3	397	0.89	0.62	0.64	0.33	1.117	1.006	0.93	-3.6	0	5.9	6.24	0.31	58.7	-3.5	-0.87	0.2	1.98	117.5	207.1	1.45	5	0.1	0.9	0.76	1.28	0.95	0.73	1.25	0.42	0.6	0.54	0.62	11.42	0.64	0.74	3.14	2.62	1.11	0.87	13.28	0.8	-1.6	-4.2	-3	0.98	1.04	70	1.7	0.77	0.72	1.38	4.33	2.33	5.25	0.39	4.38	0.02	7.31	1.28	3.71	-0.27	1.36	1.84	0.73	4.18	5.75	4.81	5.94	0.94	4.49	5.75	2.31	-0.94	-2.65	0	-1.56	-1.178	-0.382	-12.48	-9.424	0.03	2.49	1.31	2.27	5.56	0.9	0.74	0.81	0.82	1.48	1.45	1.05	0.83	0.87	0.67	1.27	0.66	1.52	1.36	1.32	1.57	7	-3.41	11	6.17	2.27	12.24	11.79	10.85	2.12	5.04	0.322	0.09	0.489	20.1	0.9	0.76	1.28	0.8	0.6	-0.1	-0.03	0.09	-0.04	-0.09	-0.33	-0.36	-0.19	-0.07	-0.1	-0.04	-0.03	-0.08	0.28	0.41	0.02	-0.27	-0.53	-0.89	-0.77	-0.34	-0.4	0.05	0.06	0.03	0.1	0	-0.38	0	0.17	0.61	0.71	0.81	0.45	0.35	-0.11	-0.12	0.06	-0.06	0.986	1.038	1.97	0.88	0.77	2.07	1.32	1.02	2.73	1.24	0.99	4.14	1.31	0.92	1.15	34.81	13.59	-6.64	-0.01	-3.04	-6.63	-9.67	146.3	-1.18	-0.48	1.2	1.2	3.5	0.7	0.7	0.7	0	1.2	0.9	0.6	0.6	0.8	0.9	1.2	1.6	0.9	1.5	-5.1	0.3	-6.1	-3.3	-4.1	-4.2	-2	4.2	5.4	3.9	-0.6	4.6	4.7	-0.7	103.3	0.63	7.23	2.27	0.9	-0.01	0.249	-0.233	-0.136	0.166	0.784	0.1	-0.92	0.71	0.432	0.832	0.659	2.072	1.08	3.169	1.498	1.202	1.352	0.196	0.287	0.385	0.00359	0	4.25	5.63	0.89	0.42	1.03	0.69	-0.57	10	-9.68	3.22	1.45	0.84	8.2	6.2	17.47	18.24	-0.203	0.09	12.82	3.38	5.41	5.4	0.79	1.05	0.91	1.24	1.28	0.72	0.67	0.55	0.6	0.73	0.58	0.67	0.7	0.64	0.7	1.33	1.39	0.82	1.03	0.95	12	-0.07	1.048	1.38	1.022	0.799	0.906	1.089	1.197	0.915	1.61	3.83	-0.38	1.7	0.39	0.409	0.384	4.4	5.1	3.2	3.7	3.6	1.47	-93	-84	-74	-73	-76	-77	-97	1.6	0.25	1.05	4.6	4.4	3.7	4	3.7	3.89	5.5	9.12	6.06	2.35	2.85	4.84	2.54	3.06	4.05	6.77	4.4	0	122.4	124.7	0.0036	-0.48	0.25	-1.27	0.83	2.81	1.2	-7	-3.5	-0.0345	-0.039	0.0381	0.027	0.18	0.22	0.944	0.988	0.902	0.828	0.762	0.873	1.144	0.927	0.956	1.85	2.83	7.6	-0.0786	127.5	138.3	1.056	-0.83	0.37	0.81	0.29	0.49	0.61	-1.3	-1.03	0.236	-0.5	45.8	-0.916	-1.94	-0.46	-5.29	-10.96	-2.87	6.29	0.63	0.1	0.13	0.13	0.1	4.8	0.96
 5 | D	4.76	0.46	0.05	0.14	0.44	0.65	0.97	0.511	68.4	-78	-91	-2.9	-2.8	-0.2	0.558	8.41	4.765	7	0.288	0.271	0.76	0.105	2.06	1.21	0.048	1	1	0	2	1	0	124.5	150	50	0.15	0.04	1.46	1.01	0.54	1.52	2.02	0.61	0.98	1.06	0.38	0.85	1.39	1.32	0.147	0.11	0.179	0.081	1.61	-1.52	-0.71	-1.09	-1.05	-1.04	0.6	0.97	0.89	1.24	2.5	5.5	106	0.23	0.27	-0.72	-1.2	1.9	-0.98	133.1	270	5.05	9.6	1.88	-0.77	1.6	0.59	2.78	4.74	1.52	3.78	9.2	0.15	1	4	0	1	5.69	0.85	3.2	1.7	0.7	0.7	0.21	1.1	0.94	1.55	1.07	0.59	0.66	0.74	1.4	1.6	1.41	1.22	0	110.8	1.38	13	54	0.78	6.5	3	37.09	40.66	194.91	1	0.69	1.5	0.77	2.16	1.24	1.77	1.15	60.6	19	50	0.4	-0.6	0.66	2.01	0.052	86	3.6	400	1.04	0.71	0.91	0.31	1.033	1.089	0.932	0	-1	5.7	6.04	0.33	40	-3.5	-0.75	2.5	1.99	121.2	215	1.43	5	0.1	1.04	0.72	1.41	1.03	0.69	1.47	0.73	0.48	0.5	0.47	10.85	0.92	0.72	1.92	1.08	1.18	1.39	12	-8.2	-2.8	-1.6	-0.5	1.01	1.08	71	1.67	1	0.9	1.04	5.14	2.23	2.11	-1.36	1.7	-1.54	3.07	-0.93	3.89	-0.56	1.18	0.59	0.75	6.24	6.13	5.75	5.26	0.35	4.97	5.96	0.94	-1.13	-4.12	0	-1.518	-1.162	-0.356	-12.144	-9.296	-2.05	3.32	-0.33	8.86	2.85	1.02	0.97	0.71	0.75	1.28	1.47	0.86	1.06	1	0.71	0.98	0.74	2.42	1.24	0.9	1.22	10	-3.38	10.97	6.18	2.29	10.98	13.78	10.21	2.6	4.95	0.335	0.14	0.632	38.5	1.04	0.72	1.41	0.65	0.5	0.01	0.15	0.33	0.11	-0.02	0.06	-0.46	-0.44	-0.71	-0.81	-0.58	-0.32	-0.24	0.05	-0.09	-0.06	-0.1	-0.54	-0.89	-1.01	-0.55	-0.11	-0.2	0.13	0.11	0.24	0.15	0.09	-0.31	-0.27	0.6	0.54	0.95	0.65	0.66	0.78	0.44	0.34	0.04	0.994	1.071	2.66	1.5	0.98	2.64	1.25	1.16	1.26	1.59	1.19	2.15	1.76	0.6	1.18	19.27	19.78	-8.72	0	-2.23	0	0	142.6	-1.02	-0.8	1.4	1.4	2.1	0.8	2.6	2.2	0.3	0.6	1	0.7	0.5	0.6	0.7	0.4	0.7	1.4	1.4	0.5	7.4	-3.1	-4.4	-4.4	-4.4	-2.6	3.1	3.9	1.9	-0.6	0	3.1	-1.2	97.8	0.62	8.72	1.57	0.8	0.54	0.171	-0.371	-0.285	-0.079	0.68	70	-1.31	1.01	1.311	0.53	1.056	1.498	1.64	0.917	3.379	1.315	1.366	0.289	0.223	0.283	0.1263	0	23.22	5.76	0.87	0.37	-0.24	-0.22	-0.8	13	-10.95	3.64	1.13	2.36	8.5	7	17.36	17.94	-0.355	0.64	11.68	49.7	2.77	2.8	1.19	1.05	1.39	1.72	1.58	1.14	1.64	0.9	0.61	0.9	1.04	0.94	0.68	0.6	0.91	0.72	0.79	1.15	1.17	1.43	11.7	-0.15	1.068	1.372	1.022	0.822	0.87	0.932	1.266	1.038	1.89	3.02	-0.41	1.6	0.42	0.429	0.424	6.3	6	5.7	4.6	4.9	0.87	-97	-78	-47	-29	0	45	248	15	0.27	1.12	4.9	5.2	3.8	5.5	4.7	7.05	8.57	8.71	7.91	3.5	3.37	4.46	2.8	5.14	5.73	6.38	5.5	0	114.4	117.3	0.1263	-1.1	-0.2	-2.84	0.7	-0.48	1.35	2	-3.6	-0.1233	-0.0552	0.0047	0.016	0.37	0.41	0.916	0.892	0.857	0.97	0.836	0.915	0.925	0.919	0.906	1.75	2.33	8.2	-0.1278	117.1	135.2	1.29	-0.83	0.52	0.81	0.41	0.58	0.5	-2.36	-0.51	0.028	-1.4	43.17	-1.31	-5.68	-1.3	-6.13	-11.88	-4.31	8.93	0.72	0.12	0.17	0.19	0.19	9.2	1.36
 6 | C	4.65	1.07	1.89	1.23	0.06	0.65	0.84	0.346	68.3	168	90	-9.2	-14.3	-0.45	0.624	8.312	4.686	7.7	0.2	0.533	0.62	0.128	4.53	1.19	0.083	1	0	0	1	0	1	117.7	135	19	0.45	0.22	1.19	0.7	1.19	0.96	0.66	1.11	0.65	0.94	0.87	1.11	1.34	0.53	0.149	0.053	0.117	0.128	0.92	0.79	0.76	0.7	1.24	0.77	0.61	0.93	0.99	1.29	3	2.9	20	1.8	1.6	0.04	0.38	0.17	0.76	121.15	178	-16.5	8.35	1.92	1.54	1.77	0.66	2.43	4.47	1.52	3.41	14.4	0.12	0	0	0	0	3.67	0.95	1	1	1	1	0.28	0.79	1.32	1.44	0.95	1.27	0.91	1.12	0.93	0	1.05	0.92	1	112.4	2.75	5.5	55	-1.42	0.1	-1	50.7	53.83	219.79	0.89	1.39	0.6	0.5	0.5	1.04	1.44	0.41	15.5	74	5	4.6	0.9	1.52	1.65	0.02	44	1.3	241	0.82	1.17	0.7	1.46	0.96	0.878	0.925	0	0	-1	1.73	0.11	44.6	2.5	-0.32	-1	1.38	113.7	209.4	1.22	6.1	0.1	1.11	0.74	0.81	0.92	1.04	0.79	0.2	1.16	0.91	1.24	14.63	0.94	1.17	0.32	0.95	0.38	1.09	35.77	-6.8	-2.2	7.1	4.6	0.88	0.84	104	3.36	0.94	1.13	1.01	1.81	2.3	1.03	-0.34	1.11	-0.3	0.86	-0.41	1.07	-0.32	1.48	0.83	0.37	1.03	0.69	3.24	2.2	2.55	1.7	2.95	2.56	1.78	4.66	0	-2.035	-1.365	-0.67	-12.21	-8.19	1.98	1.07	0.2	-3.13	-0.78	0.92	0.7	1.12	1.12	0.69	1.43	1.22	1.27	1.03	1.04	0.71	1.01	0	0.83	0.5	0.62	1.4	-2.49	14.93	10.93	3.36	14.49	15.9	14.15	3.03	7.86	0.339	0.074	0.263	-8.4	1.11	0.74	0.8	0.95	1.9	-0.25	-0.15	0.03	-0.05	-0.15	-0.18	-0.15	-0.03	-0.09	-0.26	-0.12	-0.29	-0.25	-0.26	-0.27	-0.29	-0.64	-0.06	0.13	0.13	0.47	0.36	0.13	-0.11	-0.02	-0.19	0.3	0.41	0.19	0.42	0.18	0	-0.18	-0.38	-0.09	-0.31	0.03	0.19	0.37	0.9	0.866	0.5	0.9	1.04	0	3.14	1.14	0.72	2.98	0.86	0	1.05	1.08	2.32	28.84	30.57	1.28	0	-2.52	0	0	135.2	0	1.36	0.6	0.6	0.6	0.8	1.2	0.6	1.1	1.6	0.7	0	0.7	0.2	1.2	1.6	0.4	0.8	0.4	-1.3	0.8	-4.9	6.1	4.4	3.7	5.4	4.4	-0.3	6.2	4	-4.7	3.8	2.1	132.3	0.91	-0.34	-2	0.5	0.7	0.22	0.074	-0.184	0.38	0.922	26	0.17	0.73	0.481	1.268	0.678	1.348	1.004	1.767	0	0.932	1.032	0.159	0.592	0.187	0.08292	0	3.95	8.89	0.85	0.83	-0.12	-0.19	0.17	5.5	-1.24	0.71	-0.97	4.13	11	8.3	18.17	17.84	0	1.48	13.46	1.48	5.05	2.8	0.6	0.41	0.6	0.66	0.37	0.26	0.52	0.52	0.59	0.55	0.26	0.37	0.63	0.44	0.33	0.44	0.44	0.67	0.26	0.65	11.6	-0.23	0.906	1.196	0.939	0.785	0.877	1.107	0.733	0.573	1.14	4.49	-0.47	0.6	0.42	0.319	0.198	0.6	1	0.1	0.8	3	1.17	116	137	169	182	194	224	329	0.7	0.57	-0.63	2.9	2.2	2	1.9	1.6	0.29	0.31	0.42	1.01	1.12	1.47	1.3	2.67	0.74	0.95	0.9	1.79	0	102.5	103.3	0.0829	2.1	0.49	0.73	9.3	5.03	0.55	32	2.5	0.2745	0.3557	0.3222	0.447	-0.06	-0.15	0.778	0.972	0.6856	0.5	1.015	0.644	1.035	0.662	0.896	3.89	2.22	22.6	0.5724	113.2	114.4	0.753	0.59	1.43	-1.05	-0.84	-0.87	-0.23	0.27	0.84	0.68	1.9	58.74	0.162	4.54	0.83	0.64	4.37	1.78	-4.47	-0.96	-0.16	-0.36	-0.38	-0.32	-2	-6.04
 7 | Q	4.37	0	0.72	0.51	0.44	0.72	0.64	0.493	89.7	-73	-117	-0.3	1.8	0.16	0.649	8.411	4.373	6	0.418	0.327	0.68	0.18	0.731	0.87	-0.105	1	1	1	3	0	1	161.1	180	71	0.07	0.01	0.98	1.11	1.1	0.96	1.22	1.22	0.75	0.7	1.65	0.65	1.09	0.77	0.074	0.098	0.037	0.098	0.84	-0.99	-0.4	-1.05	-1.2	-1.1	0.67	1.42	0.87	0.92	6	3.9	93	0.39	0.39	-0.69	-0.22	1.9	-1	146.15	185	6.3	9.13	2.17	-0.22	1.56	0.71	3.95	6.11	1.52	3.53	10.6	0.05	2	3	0	0	4.54	0.95	1	1	1	1	0.35	1.07	0.93	1.13	1.32	1.02	1.11	0.9	0.94	1.6	0.81	0.83	0.59	148.7	0.89	10.5	85	0.95	2.1	0.2	44.02	46.62	235.51	1.27	0.89	0.78	0.83	0.73	1.09	0.79	0.95	68.7	16	56	0.3	-0.7	0	2.17	0.041	84	2.4	313	1.14	0.98	1.14	0.75	1.165	1.025	0.885	20	0	6.1	6.13	0.39	80.7	-3.5	-0.32	0.2	2.58	118	205.4	1.75	6	0.1	1.27	0.8	0.98	1.1	1	0.92	0.26	0.95	0.28	1.18	11.76	1.22	0.89	0.8	0.91	0.41	1.13	17.56	-4.8	-2.5	-2.9	-2	1.02	1.11	66	1.75	1.1	1.18	0.81	3.98	2.36	2.3	-0.71	2.3	-0.71	2.31	-0.71	3.17	-0.34	1.57	1.2	0.61	4.76	4.68	4.45	4.5	0.87	5.39	4.24	1.14	-0.57	-2.76	0	-1.521	-1.116	-0.405	-13.689	-10.044	1.02	1.49	-1.12	1.79	4.15	1.04	1.25	1.03	0.95	1	0.94	0.95	1.13	1.43	1.06	1.01	0.63	1.44	1.06	1.06	0.66	6	-3.15	11.28	6.67	2.45	11.3	12.02	11.71	2.7	5.45	0.306	0.093	0.527	17.2	1.27	0.8	0.97	1	1	-0.03	-0.13	-0.23	0.26	-0.15	0.01	0.15	0.19	0.12	0.41	0.13	-0.27	-0.28	0.21	0.01	0.02	-0.11	0.07	-0.04	-0.12	-0.33	-0.67	-0.58	-0.47	-0.17	-0.04	-0.08	0.04	0.14	-0.29	0.09	-0.08	-0.01	0.01	0.11	-0.13	0.24	0.47	0.48	1.047	1.15	3.87	1.71	1.24	0	0.93	0.93	0.97	0.5	1.42	0	1.05	0.94	1.52	15.42	22.18	-5.54	-0.07	-3.84	-5.47	-9.31	177.7	-1.53	-0.73	1	1	0.4	0.7	0.8	1.5	1.3	1.4	1.3	1	0.2	1.3	1.6	2.1	0.9	1.4	1.1	1	-0.7	0.6	2.7	1.2	0.8	2.4	0.4	-0.4	-2	3.4	-0.5	0.2	-0.1	119.2	0.62	6.39	2.12	0.7	-0.1	0.26	-0.254	-0.067	-0.025	0.977	33	-0.91	1.02	1.615	1.038	1.29	0.711	1.078	2.372	0	0.704	0.998	0.236	0.383	0.236	0.07606	0	2.16	5.41	0.82	0.35	-0.55	-0.06	-0.26	8.6	-9.38	2.18	0.53	-1.14	6.3	8.5	17.93	18.51	-0.181	0	14.45	3.53	5.65	9	0.94	0.9	0.87	1.08	1.05	1.31	1.6	1.43	1.43	0.97	1.41	1.52	0.88	1.37	1.24	1.43	0.95	1.02	1.08	0.87	12.8	-0.33	1.037	1.342	1.041	0.817	0.594	0.77	1.05	0.863	1.33	3.67	-0.4	1.6	0.8	0.395	0.436	2.8	2.9	3.7	4.8	2.4	0.73	-139	-128	-104	-95	-87	-67	-37	1.4	-0.02	1.67	4	4.1	3.1	4.4	4.7	2.87	5.24	5.42	6	1.66	2.3	2.64	2.86	2.22	3.63	3.89	4.52	1.25	146.9	149.4	0.0761	-0.42	0.31	-1.69	-0.04	0.16	1.19	0	-3.68	0.0325	0.0126	0.0246	-0.073	0.26	0.03	1.047	1.092	0.916	1.111	0.861	0.999	1.2	1.124	0.968	2.16	2.37	8.5	-0.1051	149.4	156.4	0.729	-0.78	0.22	1.1	0.46	0.7	1	-1.22	-0.96	0.251	-1.1	46.09	-0.905	-5.3	-0.83	-0.96	-1.34	-2.31	3.88	0.56	0.09	0.13	0.14	0.15	4.1	1.52
 8 | E	4.29	0.47	0.11	0.23	0.73	0.55	0.53	0.497	84.7	-106	-139	-7.1	-7.5	-0.3	0.632	8.368	4.295	7	0.538	0.262	0.68	0.151	1.77	0.84	-0.245	1	1	1	3	1	0	155.1	190	49	0.18	0.03	0.74	1.51	0.37	0.95	2.44	1.24	1.04	0.59	0.35	0.55	0.92	0.85	0.056	0.06	0.077	0.064	0.8	-0.8	-1.31	-0.83	-1.19	-1.14	0.66	1.66	0.37	0.64	5	6	102	0.19	0.23	-0.62	-0.76	3	-0.89	147.13	249	12	9.67	2.1	-0.64	1.56	0.72	3.78	5.97	1.52	3.31	11.4	0.07	1	4	0	1	5.48	1.15	1.7	1.7	0.7	0.7	0.33	1.49	1.2	1.67	1.64	0.57	0.37	0.41	0.97	0.4	1.4	1.05	0	140.5	0.92	12.3	83	0.83	6.2	3	41.84	44.98	223.16	1.63	0.66	0.97	0.92	0.65	1.14	2.54	0.64	68.2	16	55	0.3	-0.7	0.67	2.19	0.062	77	3.3	427	1.48	0.53	1.29	0.46	1.094	1.036	0.933	0	-1	5.6	6.17	0.37	62	-3.5	-0.71	2.5	2.63	118.2	213.6	1.77	6	0.1	1.44	0.75	0.99	1.44	0.66	1.02	0.08	0.61	0.59	0.62	11.89	1.67	0.62	1.01	0.28	1.02	1.04	17.26	-16.9	-7.5	0.7	1.1	1.02	1.12	72	1.74	1.54	0.33	0.75	6.1	3	2.63	-1.16	2.6	-1.17	2.7	-1.13	4.8	-0.43	1.15	1.63	1.5	7.82	9.34	7.05	6.07	0.08	7.76	6.04	0.94	-0.75	-3.64	0	-1.535	-1.163	-0.371	-13.815	-10.467	0.93	2.2	0.48	4.04	5.16	1.43	1.48	0.59	0.44	0.78	0.75	1.09	1.69	1.37	0.72	0.54	0.59	0.63	0.91	0.53	0.92	7.8	-2.94	11.19	6.38	2.31	12.55	13.59	11.71	3.3	5.1	0.282	0.135	0.669	34.3	1.44	0.75	1	1	0.7	-0.02	0.21	0.51	0.28	0.44	0.2	0.26	0.21	0.13	-0.06	-0.23	-0.25	-0.19	-0.06	0.09	-0.1	-0.39	-0.52	-0.34	-0.62	-0.75	-0.35	-0.28	-0.05	0.1	-0.04	-0.02	-0.2	-0.41	-0.22	-0.12	-0.12	-0.09	0.07	0.06	0.09	0.18	0.28	0.36	0.986	1.1	2.4	1.79	1.49	2.62	0.94	1.01	1.33	1.26	1.43	0	0.83	0.73	1.36	20.12	18.19	-6.81	-0.79	-3.43	-6.02	-9.45	182.9	-0.9	-0.77	1	1	0.4	2.2	2	3.3	0.5	0.9	0.8	1.1	0.7	1.6	1.7	0.8	0.3	0.8	0.7	7.8	10.3	2.2	2.5	-5	-8.1	3.1	-4.7	-1.8	-4.2	-4.3	-4.4	-5.2	-0.7	113.9	0.62	7.35	1.78	1.3	0.55	0.187	-0.409	-0.246	-0.184	0.97	6	-1.22	1.63	0.698	0.643	0.928	0.651	0.679	0.285	0	1.014	0.758	0.184	0.445	0.206	0.0058	0	16.81	5.37	0.84	0.38	-0.45	0.14	-0.63	12.5	-10.2	3.08	0.39	-0.07	8.8	4.9	18.16	17.97	-0.411	0.65	13.57	49.9	3.22	3.2	1.41	1.04	1.11	1.1	0.94	2.3	2.07	1.7	1.34	1.73	1.76	1.55	1.16	1.43	1.88	1.27	0.92	1.07	1.31	1.19	12.2	-0.27	1.094	1.376	1.052	0.826	0.167	0.675	1.085	0.962	1.42	2.23	-0.41	1.6	0.53	0.405	0.514	6.9	6	8.8	7.8	4.4	0.96	-131	-115	-90	-74	-57	-8	117	1.3	-0.33	0.91	5.1	6.2	4.6	7.1	6.5	16.56	12.93	5.86	10.66	4.02	3.51	2.58	2.67	9.8	7.77	4.05	6.89	1.27	138.8	142.2	0.0058	-0.79	-0.1	-0.45	1.18	1.3	1.45	14	-3.2	-0.0484	-0.0295	-0.0639	-0.128	0.15	0.3	1.051	1.054	1.139	0.992	0.736	1.053	1.115	1.199	0.9	1.89	2.52	7.3	-0.1794	140.8	154.6	1.118	-0.92	0.71	1.17	0.38	0.68	0.33	-2.1	-0.37	0.043	-1.3	43.48	-1.218	-3.86	-0.73	-2.9	-4.56	-2.35	4.04	0.74	0.12	0.23	0.23	0.21	8.2	2.3
 9 | G	3.97	0.07	0.49	0.62	0.35	0.65	0.97	0.544	36.3	-13	-8	-1.2	-2.3	0	0.592	8.391	3.972	5.6	0.12	0.312	0	0	-0.525	1.52	0.104	0	0	0	0	1	0	66.4	75	23	0.36	0.18	1.56	0.57	0.75	1.56	0.76	0.42	1.41	1.64	0.63	0.74	1.46	1.68	0.102	0.085	0.19	0.152	1.63	-1	-0.84	-0.82	-0.57	-0.8	0.64	0.58	0.56	1.38	0.5	8.4	49	1.03	1.08	0.16	0	0	0	75.07	290	0	9.78	2.35	0	0	0	0	2.06	1	1	0	0	0	0	0	0	3.77	0.55	1	1.3	0.8	1.5	0.17	0.63	0.83	0.59	0.6	0.94	0.86	0.91	1.51	2	1.3	1.45	0	60	0.74	9	3	0.33	1.1	0	23.71	24.74	127.9	0.44	0.7	1.73	1.25	2.4	0.27	0.95	3.03	24.5	52	10	1.8	0.3	0.1	2.34	0.074	50	4.8	707	0.63	0.88	0.53	0.83	1.142	1.042	0.923	64.6	0	7.6	6.09	0.28	0	-0.4	0	0	0	0	300	0.58	4.2	0.025	0.56	0.92	1.64	0.61	0.89	1.67	0.58	0.61	0.79	0.56	12.43	0.46	0.97	0.63	5.02	3.84	0.46	0	0	-0.5	-0.2	0.2	1.01	1.01	90	2.06	0.72	0.9	1.35	6.91	3.36	5.66	-0.37	5.29	-0.48	6.52	-0.12	8.51	0.48	8.87	4.88	3.12	6.8	7.72	6.38	7.09	8.14	6.32	8.2	6.17	-0.07	-1.62	0	-1.898	-1.364	-0.534	-7.592	-5.456	0.12	2.07	0.64	-0.56	9.14	0.63	0.59	0.94	0.83	1.76	1.53	0.85	0.47	0.64	0.87	0.94	0.9	2.64	1.69	1.61	1.61	5.7	-3.25	12.01	7.31	2.55	15.36	14.18	10.95	3.13	6.16	0.352	0.201	0.67	-4.2	0.56	0.92	1.64	0.6	0.3	-0.02	-0.37	-0.09	-0.67	-0.73	-0.88	-0.71	-0.46	-0.39	-0.42	-0.15	-0.4	-0.1	0.23	0.13	0.19	0.46	0.37	-0.45	-0.72	-0.56	0.14	0.08	0.45	0.38	0.17	-0.14	0.28	-0.21	0.17	0.09	1.14	1.24	0.85	0.36	0.14	-0.12	0.14	-0.02	1.015	1.055	1.63	1.54	0.66	1.63	1.13	0.7	3.09	1.89	0.46	6.49	2.39	0.78	1.4	38.14	37.16	0.94	0	1.45	0.94	2.39	52.6	-0.34	-0.41	1.6	1.6	1.8	0.3	0.9	0.6	0.4	0.6	0.5	0.5	0.5	0.1	0.2	0.2	3.9	1.2	0.6	-8.6	-5.2	-6.8	-8.3	-4.2	-3.9	-3.4	5.7	-1.2	5.7	5.9	4.9	5.6	0.3	62.9	0.72	0	0	0	0	0.16	0.37	-0.073	-0.017	0.591	0.1	-0.67	0.5	0.36	0.725	1.015	1.848	0.901	4.259	0.5	2.355	1.349	0.051	0.39	0.049	0.00499	0	-7.85	7.99	0.92	0.41	-0.16	0.36	0.27	7.9	2.39	2.23	-5.36	0.3	7.1	6.4	18.24	18.57	-0.19	0.1	3.4	0	5.97	5.6	1.18	1.25	1.26	1.14	0.98	0.55	0.65	0.56	0.37	0.32	0.3	0.29	0.32	0.2	0.33	0.74	2.74	1.08	0.97	1.07	11.3	0	1.031	1.382	1.018	0.784	1.162	1.361	1.104	1.405	2.06	4.24	0	1.3	0	0.389	0.374	9.4	9.4	4.1	4.6	0	0.64	-11	-13	-18	-22	-28	-47	-66	1.1	1.24	0.76	7.8	6.9	7	7.1	6.3	8.29	7.95	9.4	5.81	6.88	7.48	8.87	5.62	7.53	7.69	9.11	5.72	0	63.8	64.9	0.005	0	0.21	-1.15	-1.85	-1.06	0.68	-2	-0.64	-0.0464	-0.0589	0.0248	0.331	0.01	0.08	0.835	0.845	0.892	0.743	1.022	0.785	0.917	0.698	0.978	1.17	0.7	7	-0.0442	63.8	67.5	1.346	1.2	1.34	-0.16	0.31	-0.33	-0.22	0.09	0	0.501	-0.1	50.27	-0.684	-1.28	-0.4	-2.67	-5.82	-1.35	3.39	0.38	0.06	0.09	0.09	-0.02	-1	0
10 | H	4.63	0.61	0.31	0.29	0.6	0.83	0.75	0.323	91.9	50	47	-2.1	2	-0.12	0.646	8.415	4.63	8	0.4	0.2	0.7	0.23	0	1.07	0.138	1	1	1	3	0	1	167.3	195	43	0.17	0.02	0.95	1	0.87	0.95	0.73	1.77	1.22	1.86	0.54	0.9	0.78	0.96	0.14	0.047	0.093	0.054	0.77	1.07	0.43	0.16	-0.25	0.26	0.67	1.49	0.36	0.95	6	2	66	1	1	-0.4	0.64	0.99	-0.75	155.16	277	-38.5	9.17	1.82	0.13	2.99	0.64	4.66	5.23	1.52	5.66	10.2	0.08	1	1	1	0	2.84	1	1	1	1.2	1	0.21	1.33	1.09	1.21	1.03	0.81	1.07	1.01	0.9	0.96	0.85	0.96	0	152.6	0.58	10.4	96	-0.5	2.8	-0.5	59.64	65.99	242.54	1.03	1.06	0.83	0.67	1.19	1.07	0	0.89	50.7	34	34	0.8	-0.1	0.87	1.82	0.023	91	1.4	155	1.11	0.92	1.13	0.83	0.982	0.952	0.894	75.7	0	4.5	5.66	0.23	79	-3.2	-0.51	-0.5	2.76	118.2	219.9	1.78	6	0.1	1.22	1.08	0.68	1.31	0.85	0.81	0.14	0.93	0.38	1.12	12.16	0.98	1.06	2.05	0.57	0.3	0.71	21.81	-3.5	0.8	-0.7	-2.2	0.89	0.92	90	2.41	1.29	0.87	0.76	2.17	1.55	2.3	0.08	2.33	0.1	2.23	0.04	1.88	-0.19	1.07	1.14	0.46	2.7	2.15	2.69	2.99	0.2	2.11	2.1	0.47	0.11	1.28	0.5	-1.755	-1.215	-0.54	-17.55	-12.15	-0.14	1.27	-1.31	4.22	4.48	1.33	1.06	0.85	0.86	0.53	0.96	1.02	1.11	0.95	1.04	1.26	1.17	0.22	0.91	1.08	0.39	2.1	-2.84	12.84	7.85	2.57	11.59	15.35	12.07	3.57	5.8	0.215	0.125	0.594	12.6	1.22	1.08	0.69	0.85	0.8	-0.06	0.1	-0.23	-0.26	-0.14	-0.09	-0.05	0.27	0.32	0.51	0.37	0.28	0.29	0.24	0.22	-0.16	-0.04	-0.32	-0.34	-0.16	-0.04	0.02	0.09	-0.06	-0.09	0.19	-0.07	-0.19	0.21	0.17	0.42	0.18	0.05	-0.21	-0.31	-0.56	-0.2	-0.22	-0.45	0.882	0.911	0.86	1.59	0.99	0	1.03	1.87	1.33	2.71	1.27	0	0.4	1.08	1.06	23.07	22.63	-4.66	0.95	-5.61	-5.61	-11.22	188.1	-0.94	0.49	1.2	1.2	1.1	0.7	0.7	0.7	1.5	0.9	1	2.4	1.9	1.1	1.8	3.4	1.3	1.2	1	1.2	-2.8	-1	5.9	-2.5	-3	0.8	-0.3	3	-2.6	-0.8	1.6	-0.9	1.1	155.8	0.78	3.82	1.09	1	1.1	0.205	-0.078	0.32	0.056	0.85	0.1	-0.64	1.2	2.168	0.864	0.611	1.474	1.085	1.061	1.204	0.525	1.079	0.223	0.31	0.233	0.02415	0	6.28	7.49	0.83	0.7	-0.18	-0.25	-0.49	8.4	-10.27	2.41	1.74	1.11	10.1	9.2	18.49	18.64	0	1.1	13.69	51.6	7.59	8.2	1.15	1.01	1.43	0.96	0.83	0.83	1.36	0.66	0.89	0.46	0.83	0.96	0.76	1.02	0.89	1.55	1.65	1.4	0.88	1.13	11.6	-0.06	0.95	1.279	0.967	0.777	0.802	1.034	0.906	0.724	1.25	4.08	-0.46	1.6	0.57	0.307	0.299	2.2	2.1	2	3.3	11.9	1.39	-73	-55	-35	-25	-31	-50	-70	1.4	-0.11	1.34	2.1	2.1	2	2.1	2.1	1.74	2.8	1.49	2.27	1.88	2.2	1.99	1.98	1.82	2.47	1.77	2.13	1.45	157.5	160	0.0242	0.22	-2.24	-0.74	7.17	-3	0.99	-26	-3.2	0.0549	0.0874	0.1335	0.195	-0.02	0.06	1.014	0.949	1.109	1.034	0.973	1.054	0.992	1.012	1.05	2.51	3.06	11.3	0.1643	159.3	163.2	0.985	-0.93	0.66	0.28	-0.41	0.13	0.37	-1.48	-2.28	0.165	0.4	49.33	-0.63	-0.62	-0.18	3.03	6.54	0.81	-1.65	0	0	-0.04	-0.04	-0.02	3	-1.23
11 | I	3.95	2.22	1.45	1.67	0.73	0.98	0.37	0.462	102	151	100	6.6	11	-2.26	0.809	8.195	4.224	7	0.37	0.411	1.02	0.186	0.791	0.66	0.23	2	1	0	2	0	0	168.8	175	18	0.6	0.19	0.47	1.08	1.6	0.47	0.67	0.98	0.78	0.87	1.94	1.35	0.59	0.53	0.043	0.034	0.013	0.056	0.29	0.76	1.39	2.17	2.06	1.81	0.9	0.99	1.75	0.67	5.5	4.5	96	1.27	1.44	0.73	1.9	1.2	0.99	131.17	284	12.4	9.68	2.36	1.8	4.19	0.96	4	4.92	1.9	3.49	16.1	-0.01	0	0	0	0	4.81	1.05	0.6	1	0.8	1	0.82	1.05	1.05	1.27	1.12	1.29	1.17	1.29	0.65	0.85	0.67	0.58	2.95	168.5	0	5.2	111	-1.13	0.8	-1.8	45	49.71	233.21	1.07	1.31	0.4	0.66	0.12	0.97	0.52	0.62	22.8	66	13	3.1	0.7	3.15	2.36	0.053	103	3.1	394	1.08	1.48	1.23	1.87	1.002	0.892	0.872	18.9	0	2.6	2.31	0.12	93.5	4.5	0.81	-1.8	1.83	118.9	217.9	1.56	7	0.19	0.97	1.45	0.51	0.93	1.47	0.5	0.22	1.81	2.6	1.54	15.67	1.04	1.24	0.92	0.26	0.4	0.68	19.06	13.9	11.8	8.5	7	0.79	0.76	105	4.17	0.94	1.54	0.8	5.48	2.52	9.12	1.44	8.78	1.31	9.94	1.77	6.47	0.39	10.91	12.91	1.61	3.48	1.8	4.4	4.32	15.25	4.51	4.95	13.73	1.19	5.58	1.8	-1.951	-1.189	-0.762	-15.608	-9.512	3.7	0.66	3.28	-10.87	2.1	0.87	1.01	1.47	1.59	0.55	0.57	0.98	0.84	0.99	1.14	1.67	1.38	0.43	0.27	0.36	0.79	-8	-1.72	14.77	9.99	3.08	14.63	14.1	12.95	7.69	7.51	0.278	0.1	0.564	-13	0.97	1.45	0.51	1.1	4	-0.07	-0.03	-0.22	0	-0.08	-0.03	0	-0.33	0	-0.15	0.31	-0.03	-0.01	-0.42	-0.27	-0.08	0.16	0.57	0.95	1.1	0.94	0.47	-0.04	-0.25	-0.48	-0.2	0.26	-0.06	0.29	-0.34	-0.54	-0.74	-1.17	-0.65	-0.51	-0.09	-0.07	0.42	0.09	0.766	0.742	0.57	0.67	1.09	2.32	1.26	1.61	0.45	1.31	1.12	0	0.83	1.74	0.81	16.66	20.28	4.92	2.04	-2.77	2.88	0.11	182.2	0.24	1.31	0.9	0.9	0.2	0.9	0.7	0.4	1.1	0.9	1.2	1.3	1.6	1.4	0.4	0.7	0.7	1.1	0.7	0.6	-4	3.2	-0.5	6.7	7.7	-0.1	-4.6	-0.5	-7	-0.5	-3.3	-4.5	4	158	0.88	-3.02	-3.02	0.4	2.97	0.273	0.149	0.001	-0.309	1.14	55	1.25	1.12	1.283	1.361	0.603	0.471	0.178	0.262	2.078	0.673	0.459	0.173	0.111	0.581	0	1	-18.32	8.72	0.76	0.79	-0.19	0.02	0.06	4.9	2.15	-4.44	-1.68	-1.03	16.8	10	18.62	19.21	-0.086	3.07	21.4	0.13	6.02	17.1	1.07	0.88	0.95	0.8	0.78	1.06	0.64	1.18	1.47	1.09	1.25	1.04	1.15	1.58	0.9	0.61	0.64	1.14	0.97	1.05	12	-0.23	0.927	1.241	0.977	0.776	0.566	0.876	0.583	0.502	1.02	4.52	-0.56	0.6	0.84	0.296	0.306	7	6	7.1	6.5	17.2	1.29	107	106	104	106	102	83	28	0.5	-0.26	-0.77	4.6	5.1	6.7	5.2	3.7	3.3	2.72	1.76	2.36	10.08	9.74	7.73	8.95	6.96	6.56	5.05	5.47	0	163	163.9	0	3.14	3.48	4.38	3.02	7.26	0.47	91	4.5	0.4172	0.3805	0.4238	0.06	-0.28	-0.29	0.922	0.928	0.986	0.852	1.189	0.95	0.817	0.912	0.946	4.5	4.59	20.3	0.2758	164.9	162.6	0.926	1.16	2.32	-0.77	-0.69	-0.66	-0.8	0.37	1.81	0.943	1.4	57.3	1.237	5.54	1.1	5.04	10.93	3.83	-7.92	-1.89	-0.31	-0.33	-0.34	-0.28	-3.1	-4.81
12 | L	4.17	1.53	3.23	2.93	1	0.83	0.53	0.365	102	145	108	20	15	-2.46	0.842	8.423	4.385	6.5	0.42	0.4	0.98	0.186	1.07	0.69	-0.052	1	2	0	2	0	0	167.9	170	23	0.45	0.16	0.59	1.21	1.3	0.5	0.58	1.13	0.85	0.84	1.3	1.27	0.46	0.59	0.061	0.025	0.036	0.07	0.36	1.29	1.24	1.18	0.96	1.14	0.9	1.29	1.53	0.7	5.5	7.4	40	1.38	1.36	0.53	1.9	1	0.89	131.17	337	-11	9.6	2.36	1.7	2.59	0.92	4	4.92	1.52	4.45	10.1	-0.01	0	0	0	0	4.79	1.25	1	1	1	1	1	1.31	1.13	1.25	1.18	1.1	1.28	1.23	0.59	0.8	0.52	0.59	2.4	168.5	0	4.9	111	-1.18	0.8	-1.8	48.03	50.62	232.3	1.32	1.01	0.57	0.44	0.58	1.3	1.05	0.53	27.6	60	16	2.4	0.5	2.17	2.36	0.091	54	4.7	581	1.21	1.24	1.56	1.56	0.967	0.961	0.921	15.6	0	3.25	3.93	0.16	93.5	3.8	1.02	-1.8	2.08	118.1	205.6	1.54	7	0.19	1.3	1.02	0.58	1.31	1.04	0.57	0.19	1.3	1.42	1.26	14.9	1.36	0.98	0.37	0.21	0.65	1.01	18.78	8.8	10	11	9.6	0.85	0.82	104	3.93	1.23	1.26	0.63	9.16	3.4	15.36	1.82	16.52	2.16	12.64	1.02	10.94	0.52	16.22	21.66	1.37	8.44	8.03	8.11	9.88	22.28	8.23	8.03	16.64	1.03	5.01	1.8	-1.966	-1.315	-0.65	-15.728	-10.52	2.73	0.54	0.43	-7.16	3.24	1.3	1.22	1.03	1.24	0.49	0.56	1.04	1.39	1.22	1.02	0.94	1.05	0	0.67	0.77	0.5	-9.2	-1.61	14.1	9.37	2.98	14.01	16.49	13.07	5.88	7.37	0.262	0.104	0.541	-11.7	1.3	1.02	0.59	1.25	2	0.05	-0.02	0.41	0.47	0.61	0.2	0.48	0.57	0.5	0.56	0.7	0.62	0.28	-0.23	-0.25	-0.42	-0.57	0.09	0.32	0.23	0.25	0.32	-0.12	-0.44	-0.26	-0.46	0.04	0.34	-0.1	-0.22	-0.55	-0.54	-0.69	-0.8	-0.8	-0.81	-0.18	-0.36	0.24	0.825	0.798	0.51	1.03	1.21	0	0.91	1.09	0.96	0.57	1.33	0	1.06	1.03	1.26	10.89	14.3	4.92	1.76	-2.64	3.16	0.52	173.7	-0.12	1.21	0.9	0.9	0.2	0.9	0.3	0.6	2.6	1.1	1.2	1.2	1.4	1.9	0.8	0.7	0.7	0.9	0.5	3.2	-2.1	5.5	0.1	2.3	3.7	-3.7	-5.6	-2.3	-6.2	-2.8	-2	-5.4	2	164.1	0.85	-3.02	-3.02	0.6	2.49	0.281	0.129	-0.008	-0.264	1.14	33	1.22	1.29	1.192	1.111	0.595	0.656	0.808	0	0.414	0.758	0.665	0.215	0.619	0.083	0	1	-17.79	8.79	0.73	0.77	-0.44	0.06	-0.17	4.9	2.28	-4.19	-1.03	-0.98	15	12.2	18.6	19.01	-0.102	2.52	21.4	0.13	5.98	17.6	0.95	0.8	0.96	1.01	0.79	0.84	0.91	1.52	1.36	1.47	1.26	1.4	1.8	1.63	1.65	1.36	0.66	1.16	0.87	0.84	13	-0.62	0.935	1.234	0.982	0.783	0.494	0.74	0.789	0.766	1.33	4.81	-0.48	0.4	0.92	0.287	0.34	7.4	7.7	9.1	10.6	17	1.44	95	103	103	104	103	82	36	0.3	-0.38	0.15	8.8	9.4	11	8.6	7.4	5.06	4.43	2.74	4.52	13.21	12.79	9.66	16.46	9.45	9	6.54	10.15	0	163.1	164	0	1.99	3.5	6.57	0.83	1.09	0.56	100	3.8	0.4251	0.3819	0.3926	0.138	-0.28	-0.36	1.085	1.11	1	1.193	1.192	1.106	0.994	1.276	0.885	4.71	4.72	20.8	0.2523	164.6	163.4	1.054	1.18	1.47	-1.1	-0.62	-0.53	-0.44	0.41	1.8	0.943	0.5	53.89	1.215	6.81	1.52	4.91	9.88	4.09	-8.68	-2.29	-0.37	-0.38	-0.37	-0.32	-2.8	-4.68
13 | K	4.36	1.15	0.06	0.15	0.6	0.55	0.75	0.466	105.1	-141	-188	-3.7	-2.5	-0.35	0.767	8.408	4.358	6.5	0.402	0.265	0.68	0.219	0	0.99	0.032	1	1	1	4	0	1	171.3	200	97	0.03	0	1.01	1.16	0.74	1.19	0.66	1.83	1.01	1.49	1	0.74	1.09	0.82	0.055	0.115	0.072	0.095	1.13	-0.36	-0.09	-0.56	-0.62	-0.41	0.82	1.03	1.18	1.1	7	6.6	56	0.15	0.33	-1.1	-0.57	5.7	-0.99	146.19	224	14.6	9.18	2.16	-0.99	1.89	0.78	4.77	6.89	1.52	4.87	10.9	0	2	1	1	0	4.27	1.15	0.7	0.7	1.7	1.7	0.09	1.33	1.08	1.2	1.27	0.86	1.01	0.86	0.82	0.94	0.94	0.91	1.5	175.6	0.33	11.3	119	1.4	5.3	3	57.1	63.21	300.46	1.26	0.77	1.01	1.25	0.83	1.2	0.79	0.98	103	3	85	0.05	-1.8	1.64	2.18	0.059	72	4.1	575	1.22	0.77	1.27	0.66	1.093	1.082	1.057	0	1	7.9	7.92	0.59	100	-3.9	-0.09	3	2.94	122	210.9	2.08	6	0.2	1.23	0.77	0.96	1.25	0.77	0.99	0.27	0.7	0.59	0.74	11.36	1.27	0.79	0.89	1.17	1.13	1.05	21.29	0.1	-3.2	-1.9	-3	1.05	1.23	65	1.23	1.23	0.81	0.84	6.01	3.36	3.2	-0.84	2.58	-1.02	4.67	-0.4	3.5	-0.75	1.04	1.15	0.62	6.25	6.11	5.25	6.31	0.16	8.36	4.93	0.58	-1.05	-4.18	0	-1.374	-1.074	-0.3	-12.366	-9.666	2.55	0.61	-1.71	-9.97	10.68	1.23	1.13	0.77	0.75	0.95	0.95	1.01	1.08	1.2	1	0.73	0.83	1.18	0.66	1.27	0.86	5.7	-3.31	10.8	5.72	2.12	11.96	13.28	9.93	1.79	4.88	0.391	0.058	0.407	36.8	1.23	0.77	0.96	1	0.7	0.26	0.12	-0.17	-0.19	0.03	-0.11	0.16	0.23	0.37	0.47	0.28	0.41	0.45	0.03	0.08	-0.09	0.04	-0.29	-0.46	-0.59	-0.55	-0.51	-0.33	-0.44	-0.39	-0.43	-0.42	-0.2	0.33	0	0.14	0.45	0.09	0.17	-0.14	-0.43	0.06	-0.15	-0.27	1.04	1.232	3.9	0.88	1.27	2.86	0.85	0.83	0.71	0.87	1.36	0	0.94	1	0.91	16.46	14.07	-5.55	0.08	-3.97	-5.63	-9.6	215.2	-2.05	-1.18	1	1	0.7	0.6	1	0.8	0.8	1.9	1.1	1.4	1	2.2	1.9	2	1.3	1.2	1.3	2.3	-4.1	0.5	7.3	-3.3	-2.9	-3.1	1	-1.2	2.8	1.3	-0.8	1	-0.9	115.5	0.52	6.13	2.46	0.4	1.5	0.228	-0.075	0.049	-0.371	0.939	1	-0.67	1.24	1.203	0.735	1.06	0.932	1.254	1.288	0.835	0.947	1.045	0.17	0.559	0.159	0.0371	0	9.71	4.4	0.97	0.31	-0.12	-0.16	-0.45	10.1	-9.52	2.84	1.41	-3.14	7.9	7.5	17.96	18.36	-0.062	1.6	15.71	49.5	9.74	3.5	1.03	1.06	0.97	0.66	0.84	1.08	0.8	0.82	1.27	1.24	1.1	1.17	1.22	1.71	1.63	1.45	1.19	1.27	1.13	1.1	13	-0.65	1.102	1.367	1.029	0.834	0.615	0.784	1.026	0.841	1.34	3.77	-0.41	1.6	0.73	0.429	0.446	6.1	6.5	7.7	7.5	10.5	0.91	-24	-205	-148	-124	-9	-38	115	1.4	-0.18	0.29	6.3	5.8	4.4	6.7	7.9	12.98	10.2	9.67	12.68	3.39	2.54	2	1.89	7.81	6.01	5.45	7.59	3.67	165.1	167.3	0.0371	-1.19	-1.62	-2.78	-2.36	1.56	1.1	-26	-4.11	-0.0101	-0.0053	-0.0158	-0.112	0.32	0.24	0.944	0.946	0.952	0.979	0.478	1.003	0.944	1.008	0.893	2.12	2.5	6.1	-0.2134	170	162.5	1.105	-0.8	0.15	1.7	0.57	1.79	1.17	-2.53	-2.03	0.283	-1.6	42.92	-0.67	-5.62	-1.01	-5.99	-11.92	-4.08	7.7	1.01	0.17	0.32	0.33	0.3	8.8	3.88
14 | M	4.52	1.18	2.67	2.96	1	0.98	0.64	0.295	97.7	124	121	5.6	4.1	-1.47	0.709	8.418	4.513	0	0.417	0.375	0.78	0.221	0.656	0.59	-0.258	1	1	1	3	0	1	170.8	185	31	0.4	0.11	0.6	1.45	1.05	0.6	0.71	1.57	0.83	0.52	1.43	0.95	0.52	0.85	0.068	0.082	0.014	0.055	0.51	1.37	1.27	1.21	0.6	1	0.75	1.4	1.4	0.67	6	1.7	94	1.93	1.52	0.26	2.4	1.9	0.94	149.21	283	-10	9.21	2.28	1.23	2.35	0.77	4.43	6.36	1.52	4.8	10.4	0.04	0	0	0	0	4.25	1.15	1	1	1	1	0.74	1.54	1.23	1.37	1.49	0.88	1.15	0.96	0.58	0.39	0.69	0.6	1.3	162.2	0	5.7	105	-1.59	0.7	-1.3	69.32	55.32	202.65	1.66	1.06	0.3	0.45	0.22	0.55	0	0.68	33.5	52	20	1.9	0.4	1.67	2.28	0.024	93	1.1	132	1.45	1.05	1.83	0.86	0.947	0.862	0.804	6.8	0	1.4	2.44	0.08	94.1	1.9	0.81	-1.3	2.34	113.1	204	1.8	6.8	0.19	1.47	0.97	0.41	1.39	0.93	0.51	0.38	1.19	1.49	1.09	14.39	1.53	1.08	1.07	0	0	0.36	21.64	4.8	7.1	5.4	4	0.84	0.83	100	4.22	1.23	1.29	0.62	2.5	1.37	5.3	2.04	6	2.55	3.68	0.86	3.14	0.47	4.12	7.17	1.59	2.14	3.79	1.6	1.85	1.85	2.46	2.61	3.93	0.66	3.51	1.3	-1.963	-1.303	-0.659	-15.704	-10.424	1.75	0.7	0.15	-4.96	2.18	1.32	1.47	0.96	0.94	0.52	0.71	1.11	0.9	1.45	1.41	1.3	0.82	0.88	0	0.76	0.5	-4.2	-1.84	14.33	9.83	3.18	13.4	16.23	15	5.21	6.39	0.28	0.054	0.328	-14.2	1.47	0.97	0.39	1.15	1.9	0	0	0.13	0.27	0.39	0.43	0.41	0.79	0.63	0.58	0.61	0.21	0.11	-0.42	-0.57	-0.38	0.24	0.29	0.43	0.32	-0.05	-0.1	-0.21	-0.28	-0.14	-0.52	0.25	0.45	-0.01	-0.53	-0.47	-0.76	-0.86	-0.71	-0.56	-0.49	-0.44	-0.19	0.16	0.804	0.781	0.4	1.17	1.41	0	0.41	1.71	1.89	0	1.41	0	1.33	1.31	1	20.61	20.61	2.35	1.32	-3.83	1.03	-2.8	197.6	-0.24	1.27	0.3	0.3	0.8	0.3	1	1	1.7	1.7	1.5	2.7	2.8	1	1.3	1	0.8	0.8	0	5.3	-3.5	7.2	3.5	2.3	3.7	-2.1	-4.8	-4.3	-4.8	-1.6	-4.1	-5.3	1.8	172.9	0.85	-1.3	-1.67	0.3	1.3	0.253	-0.092	-0.041	0.077	1.2	54	1.02	1.21	0	1.092	0.831	0.425	0.886	0	0.982	1.028	0.668	0.239	0.431	0.198	0.08226	0	-8.86	9.15	0.74	0.76	-0.79	0.11	0.03	5.3	-1.48	-2.49	-0.27	-0.41	13.3	8.4	18.11	18.49	-0.107	1.4	16.25	1.43	5.74	14.9	0.88	1.12	0.99	1.02	0.98	0.9	1.1	1.68	2.13	1.64	1.14	1.84	2.21	1.76	1.35	1.35	0.74	1.11	0.96	0.8	12.8	-0.5	0.952	1.269	0.963	0.806	0.444	0.736	0.812	0.729	1.12	4.48	-0.46	0.5	0.86	0.293	0.313	2.3	2.4	3.3	3	11.9	0.91	78	73	77	82	90	83	62	0.5	-0.09	-0.71	2.5	2.1	2.8	2.4	2.3	1.71	1.87	0.6	1.85	2.44	3.1	2.45	2.67	2.1	2.54	1.62	2.24	0	165.8	167	0.0823	1.42	0.21	-3.12	4.26	0.62	0.66	68	1.9	0.1747	0.1613	0.216	0.275	-0.26	-0.19	1.032	0.923	1.077	0.998	1.369	1.093	0.782	1.171	0.878	3.63	3.91	15.7	0.0197	167.7	165.9	0.974	0.55	1.78	-0.73	-0.38	-0.38	-0.31	0.44	1.18	0.738	0.5	52.75	1.02	4.76	1.09	3.34	7.47	3.11	-7.13	-1.36	-0.22	-0.3	-0.3	-0.25	-3.4	-3.66
15 | F	4.66	2.02	1.96	2.03	0.6	0.98	0.53	0.314	113.9	189	148	19.2	14.7	-2.33	0.756	8.228	4.663	9.4	0.318	0.318	0.7	0.29	1.06	0.71	0.015	1	1	1	4	0	1	203.4	210	24	0.5	0.14	0.6	1.13	1.38	0.66	0.61	1.1	0.93	1.04	1.5	1.5	0.3	0.44	0.059	0.041	0.065	0.065	0.62	1.48	1.53	1.01	1.29	1.35	0.77	1.15	1.26	1.05	6.5	3.6	41	1.42	1.46	0.61	2.3	1.1	0.92	165.19	284	-34.5	9.18	2.16	1.79	2.94	0.71	5.89	4.62	1.52	6.02	13.9	0.03	0	0	0	0	4.31	1.1	1	1	1	1	2.18	1.13	1.01	0.4	1.02	1.15	1.34	1.26	0.72	1.2	0.6	0.71	2.65	189	0	5.2	132	-2.12	1.4	-2.5	48.52	51.06	204.74	1.22	1.16	0.67	0.5	0.89	0.8	0.43	0.61	25.5	58	10	2.2	0.5	2.87	1.83	0.04	51	2.3	303	1.05	1.2	1.2	1.37	0.93	0.912	0.914	54.7	0	3.2	2.59	0.1	115.5	2.8	1.03	-2.5	2.97	118.2	203.7	1.9	7.1	0.39	1.07	1.32	0.59	1.02	1.21	0.77	0.08	1.25	1.3	1.23	14	1.19	1.16	0.86	0.28	0.45	0.65	29.4	13.2	13.9	13.4	12.6	0.78	0.73	108	4.37	1.23	1.37	0.58	3.83	1.94	6.51	1.38	6.58	1.42	6.34	1.29	6.36	1.3	9.6	7.76	1.24	2.73	2.93	3.52	3.72	6.47	3.59	4.36	10.99	0.48	5.27	2.5	-1.864	-1.135	-0.729	-20.504	-12.485	2.68	0.8	0.52	-6.64	4.37	1.09	1.1	1.13	1.41	0.88	0.72	0.96	1.02	0.92	1.32	1.56	1.23	2.2	0.47	0.37	0.96	-9.2	-1.63	13.43	8.99	3.02	14.08	14.18	13.27	6.6	6.62	0.195	0.104	0.577	-15.5	1.07	1.32	0.58	1.1	3.1	0.05	0.12	-0.03	0.24	0.06	0.15	0.03	0.48	0.15	0.1	-0.06	0.05	0	-0.18	-0.12	-0.32	0.08	0.24	0.36	0.48	0.2	0.2	-0.13	-0.04	-0.03	-0.33	0.09	0.07	0.25	-0.31	-0.29	-0.47	-0.39	-0.61	-0.25	-0.2	0.11	-0.02	0.34	0.773	0.723	0.43	0.85	1	0	1.07	1.52	1.2	1.27	1.3	2.11	0.41	1.51	1.25	16.26	19.61	2.98	2.09	-3.74	0.89	-2.85	228.6	0	1.27	1.2	1.2	0.2	0.5	0.9	0.6	1.9	1	1.3	1.9	2.9	1.8	0.3	0.7	0.5	0.1	1.2	1.6	-1.1	2.8	1.6	2.6	3	0.7	-1.8	0.8	-3.7	1.6	-4.1	-2.4	2.8	194.1	0.88	-3.24	-3.24	0.7	2.65	0.234	-0.011	0.438	0.074	1.086	18	1.92	1.16	0.963	1.052	0.377	1.348	0.803	0.393	1.336	0.622	0.881	0.087	0.077	0.682	0.0946	1	-21.98	7.98	0.52	0.87	-0.25	1.18	0.4	5	-0.76	-4.92	1.3	0.45	11.2	8.3	17.3	17.95	0.001	2.75	19.8	0.35	5.48	18.8	1.06	1.12	0.95	0.88	0.96	0.9	1	1.1	1.39	0.96	1.14	0.86	1.35	1.22	0.67	1.2	1.04	1.05	0.84	0.95	12.1	-0.41	0.915	1.247	0.934	0.774	0.706	0.968	0.685	0.585	1.07	5.38	-0.55	0.4	0.59	0.292	0.314	3.3	3.4	5	4.5	23	1.34	92	108	128	132	131	117	120	0.3	-0.01	-0.67	3.7	4	5.6	3.9	2.7	2.32	1.92	1.18	1.68	5.27	4.97	5.41	7.32	3.91	3.59	3.51	4.34	0	190.8	191.9	0.0946	1.69	4.8	9.14	-1.36	2.57	0.47	100	2.8	0.4076	0.4201	0.3455	0.24	-0.41	-0.22	1.119	1.122	1.11	0.981	1.368	1.121	1.058	1.09	1.151	5.88	4.84	23.9	0.3561	193.5	198.8	0.869	0.67	1.72	-1.43	-0.45	-0.45	-0.55	0.5	1.74	1	1	53.45	1.938	5.06	1.09	5.2	11.35	3.67	-7.96	-2.22	-0.36	-0.34	-0.38	-0.33	-3.7	-4.65
16 | P	4.44	1.95	0.76	0.76	0.06	0.55	0.97	0.509	73.6	-20	-36	5.1	5.6	-0.98	0.73	0	4.471	0	0.208	0.34	0.36	0.131	-2.24	1.61	0	0	0	0	0	0	0	129.3	145	50	0.18	0.04	1.52	0.57	0.55	1.56	2.01	0	1.1	1.58	0.66	0.4	1.58	1.69	0.102	0.301	0.034	0.068	2.04	-0.12	-0.01	-0.06	-0.21	-0.09	0.76	0.49	0.36	1.47	5.5	5.2	56	0.27	0.54	-0.07	1.2	0.18	0.22	115.13	222	-86.2	10.64	1.95	0.72	2.67	0	2.72	4.11	1.52	4.31	17.8	0	0	0	0	0	0	0.71	1	13	1	0.1	0.39	0.63	0.82	0.21	0.68	0.8	0.61	0.65	1.66	2.1	1.77	1.67	2.6	122.2	0.39	8	32.5	0.73	0.9	0	36.13	39.21	179.93	0.25	1.16	1.55	2.96	0.43	1.78	0.37	0.63	51.5	25	45	0.6	-0.3	2.77	1.99	0.051	58	2.5	366	0.52	0.61	0.21	0.52	1.055	1.085	0.932	43.8	0	7	7.19	0.46	41.9	-1.6	2.03	-1.4	1.42	81.9	237.4	1.25	6.2	0.17	0.52	0.64	1.91	0.58	0.68	1.78	0.46	0.4	0.35	0.42	11.37	0.49	1.22	0.5	0.12	0	1.95	10.93	6.1	8	4.4	3.1	1	1.04	78	1.89	0.7	0.75	1.43	4.95	3.18	4.79	-0.05	5.29	0.11	3.62	-0.42	4.36	-0.19	2.24	3.51	0.67	6.28	7.21	5.65	6.22	2.38	5.2	4.84	1.96	-0.76	-3.03	0	-1.699	-1.236	-0.463	-11.893	-8.652	0.41	2.12	-0.58	5.19	5.14	0.63	0.57	0.75	0.46	1.47	1.51	0.91	0.48	0.72	0.69	0.69	0.73	1.34	1.54	1.62	1.3	2.1	-2.5	11.19	6.64	2.46	11.51	14.1	10.62	2.12	5.65	0.346	0.136	0.6	0.8	0.52	0.64	1.91	0.1	0.2	-0.19	-0.08	-0.43	-0.34	-0.76	-0.81	-1.12	-1.86	-1.4	-1.33	-1.03	-0.84	-0.42	-0.13	0.26	0.05	0.02	-0.31	-0.91	-1.24	-1.28	-0.79	-0.48	-0.29	-0.04	0.37	0.31	0.04	0.28	0.14	0.89	1.4	1.77	2.27	1.59	1.14	0.77	0.78	0.16	1.047	1.093	2.04	1.47	1.46	0	1.73	0.87	0.83	0.38	0.25	1.99	2.73	1.37	0	23.94	52.63	0	0	0	0	0	0	0	0	0.7	0.7	0.8	2.6	0.5	0.4	0.1	0.3	0.3	0.3	0	0	0.2	0	0.7	1.9	1.5	-7.7	8.1	-22.8	-24.4	-1.8	-6.6	7.4	2.6	6.5	3.6	-6	5.8	3.5	0.4	92.9	0.64	-1.75	-1.75	0.9	2.6	0.165	0.37	-0.016	-0.036	0.659	42	-0.49	0.65	2.093	1.249	3.159	0.179	0.748	0	0.415	0.579	1.385	0.151	0.739	0.366	0.01979	0	5.82	7.79	0.82	0.35	-0.59	0.11	-0.47	6.6	-3.68	-1.22	0.88	2.23	8.2	6.9	18.16	18.77	-0.181	2.7	17.43	1.58	6.3	14.8	1.18	1.31	1.05	1.33	1.12	1.67	0.94	0.15	0.03	0.15	0.44	0.2	0.07	0.07	0.03	0.1	0.66	1.01	2.01	1.7	6.5	3	1.049	1.342	1.05	0.809	1.945	1.78	1.412	2.613	3.9	3.8	-0.23	1.7	-2.5	0.432	0.354	4.2	4.2	0.7	1.3	15	0.12	-79	-79	-81	-82	-85	-103	-132	1.6	0	0	4.9	5.4	4.7	5.3	6.9	5.41	4.79	5.6	5.7	3.8	3.42	3.2	3.3	4.54	4.04	4.28	4.56	0	121.6	122.9	0.0198	-1.14	0.71	-0.12	3.12	-0.15	0.69	25	-1.9	0.0019	-0.0492	0.0844	-0.478	0.13	0.15	1.299	1.362	1.266	1.332	1.241	1.314	1.309	0.8	1.816	2.09	2.45	9.9	-0.4188	123.1	123.4	0.82	0.54	0.85	-0.75	0.46	0.34	0.36	-0.2	0.86	0.711	-1	45.39	-0.503	-4.47	-0.62	-4.32	-10.86	-3.22	6.25	0.47	0.08	0.2	0.19	0.11	0.2	0.75
17 | S	4.5	0.05	0.97	0.81	0.35	0.55	0.84	0.507	54.9	-70	-60	-4.1	-3.5	-0.39	0.594	8.38	4.498	6.5	0.2	0.354	0.53	0.062	-0.524	1.34	0.225	1	0	0	1	0	0	99.1	115	44	0.22	0.08	1.43	0.77	0.75	1.43	0.74	0.96	1.55	0.93	0.63	0.79	1.41	1.49	0.12	0.139	0.125	0.106	1.52	-0.98	-0.93	-0.6	-0.83	-0.97	0.68	0.83	0.65	1.26	3	7	120	0.96	0.98	-0.26	0.01	0.73	-0.67	105.09	228	-7.5	9.21	2.19	-0.04	1.31	0.55	1.6	3.97	1.52	2.7	13.1	0.11	1	2	0	0	3.83	0.75	1.7	1	1.5	1	0.12	0.78	1.01	1.01	0.81	1.05	0.91	0.93	1.23	1.3	1.13	1.25	0	88.7	1.42	9.2	32	0.52	1.7	0.3	32.4	35.65	174.06	0.65	1.09	1.19	1.21	1.24	1.2	0.87	1.03	42	35	32	0.8	-0.1	0.07	2.21	0.069	117	4.5	593	0.74	0.92	0.48	0.82	1.169	1.048	0.923	44.4	0	5.25	5.37	0.27	29.3	-0.8	0.05	0.3	1.28	117.9	232	1.08	4.9	0.025	0.82	0.95	1.32	0.76	1.02	1.3	0.55	0.82	0.7	0.87	11.23	0.7	1.04	1.01	0.57	0.81	1.56	6.35	1.2	-3.7	-3.2	-2.9	1.02	1.04	83	1.81	0.78	0.77	1.34	6.84	2.83	7.55	0.25	7.68	0.3	7.24	0.14	6.26	-0.2	5.38	6.84	0.68	8.53	7.25	8.04	8.05	4.17	7.4	6.41	5.58	-0.67	-2.84	0	-1.753	-1.297	-0.455	-10.518	-7.782	1.47	0.94	-0.83	-1.6	6.78	0.78	0.77	1.02	0.7	1.29	1.46	0.95	1.05	0.84	0.86	0.65	0.98	1.43	1.08	1.34	1.4	6.5	-3.3	11.26	6.93	2.6	11.26	13.36	11.18	2.43	5.53	0.326	0.155	0.692	-2.5	0.82	0.95	1.33	0.75	0.9	-0.19	0.01	-0.1	-0.17	-0.26	-0.35	-0.47	-0.23	-0.28	-0.49	-0.28	-0.05	0.07	0.41	0.44	0.25	-0.12	0.11	-0.12	-0.31	-0.28	0.03	0.27	0.34	0.41	0.43	-0.11	-0.23	-0.23	0.22	0.24	0.4	0.63	0.33	0.32	0.13	-0.09	-0.29	-0.35	1.056	1.082	1.61	1.5	1.05	1.23	1.31	1.14	1.16	0.92	0.89	0	1.18	0.97	1.5	19.95	18.56	-3.4	0.04	-1.66	-3.44	-5.1	109.5	-0.75	-0.5	1.6	1.6	2.3	0.7	0.8	0.4	0.4	1.1	0.6	0.5	0.5	0.6	1.6	1.7	0.8	0.7	0.9	-3.9	-3.5	-3	-1.9	-1.7	-2.4	1.3	2.6	1.8	2.1	1.5	2.5	3.2	-1.2	85.6	0.66	4.35	0.1	0.4	0.04	0.236	0.022	-0.153	0.47	0.76	0.1	-0.55	0.71	0.523	1.093	1.444	1.151	1.145	0.16	1.089	1.14	1.257	0.01	0.689	0.15	0.08292	0	-1.54	7.08	0.96	0.49	-0.01	0.13	-0.11	7.5	-5.06	1.96	-1.63	0.57	7.4	8	17.57	18.06	-0.203	0.14	9.47	1.67	5.68	6.9	0.69	1.02	0.96	1.2	1.25	0.81	0.69	0.61	0.44	0.67	0.66	0.68	0.65	0.42	0.71	1.02	0.64	0.71	0.76	0.65	12.2	-0.35	1.046	1.381	1.025	0.811	0.928	0.969	0.987	0.784	1.2	4.12	-0.39	0.7	0.53	0.416	0.376	4	5.5	3.9	3.8	2.6	0.84	-34	-26	-31	-34	-36	-41	-52	0.9	0.15	1.45	7.3	7.2	7.3	6.6	8.8	4.27	5.41	9.6	6.99	4.1	4.93	6.03	6	4.18	5.15	7.64	6.52	0	94.2	95.4	0.0829	-0.52	-0.62	-1.39	1.59	1.93	1	-2	-0.5	-0.0433	-0.0282	0.004	-0.177	0.05	0.16	0.947	0.932	0.956	0.984	1.097	0.911	0.986	0.886	1.003	1.66	1.82	8.2	-0.1629	94.2	102	1.342	-0.05	0.86	0.42	0.12	0.1	0.17	-0.4	-0.64	0.359	-0.7	47.24	-0.563	-1.92	-0.55	-3	-6.21	-1.85	4.08	0.55	0.09	0.1	0.12	0.11	-0.6	1.74
18 | T	4.35	0.05	0.84	0.91	0.44	0.83	0.75	0.444	71.2	-38	-54	0.8	1.1	-0.52	0.655	8.236	4.346	6.9	0.272	0.388	0.5	0.108	0	1.08	0.166	2	0	0	1	0	0	122.1	140	47	0.23	0.08	0.96	0.83	1.19	0.98	1.08	0.75	1.09	0.86	1.17	0.75	1.09	1.16	0.086	0.108	0.065	0.079	0.98	-0.7	-0.59	-1.2	-0.62	-0.77	0.7	0.94	1.15	1.05	5	6.1	97	1.11	1.01	-0.18	0.52	1.5	0.09	119.12	253	-28	9.1	2.09	0.26	3.03	0.63	2.6	4.11	1.73	3.17	16.7	0.04	1	2	0	0	3.87	0.75	1.7	1	1	1	0.21	0.77	1.17	0.55	0.85	1.2	1.14	1.05	1.04	0.6	0.88	1.08	0.45	118.2	0.71	8.6	61	0.07	1.5	-0.4	35.2	36.5	205.8	0.86	1.24	1.09	1.33	0.85	0.99	1.14	0.39	45	30	32	0.7	-0.2	0.07	2.1	0.059	107	3.7	490	0.81	1.18	0.77	1.36	1.073	1.051	0.934	31	0	4.8	5.16	0.26	51.3	-0.7	-0.35	-0.4	1.43	117.1	226.7	1.24	5	0.1	0.82	1.21	1.04	0.79	1.27	0.97	0.49	1.12	0.59	1.3	11.69	0.78	1.18	0.92	0.23	0.71	1.23	11.01	2.7	1.5	-1.7	-0.6	0.99	1.02	83	2.04	0.87	1.23	1.03	5.77	2.63	7.51	0.66	8.38	0.99	5.44	-0.13	5.66	-0.04	5.61	8.89	0.92	4.43	3.51	7.41	5.2	4.33	5.18	5.87	4.68	-0.36	-1.2	0.4	-1.767	-1.252	-0.515	-12.369	-8.764	2.39	1.09	-1.52	-4.75	8.6	0.8	0.86	1.19	1.2	1.05	0.96	1.15	0.74	0.97	1.15	0.98	1.2	0.28	1.12	0.87	1.11	5.2	-2.91	11.65	7.08	2.55	13	14.5	10.53	2.6	5.81	0.251	0.152	0.713	-5	0.82	1.21	1.03	0.75	1.7	-0.04	-0.34	-0.07	-0.2	-0.1	-0.37	-0.54	-0.33	-0.21	-0.44	-0.25	-0.16	-0.33	0.33	0.35	0.22	0	0.03	0.49	0.17	0.08	-0.15	0.47	0.27	0.36	0.5	-0.06	-0.02	-0.26	0.1	0.16	-0.1	0.29	0.13	0.21	-0.02	-0.27	-0.3	-0.04	1.008	1.043	1.48	1.96	0.87	2.48	1.57	0.96	0.97	1.38	0.81	1.24	0.77	1.38	1.18	18.92	21.09	-2.57	0.27	-2.31	-2.84	-5.15	142.1	-0.71	-0.27	0.3	0.3	1.6	0.8	0.7	1	0.5	0.6	1	0.5	0.6	0.7	0.9	1	0.3	0.8	2.1	-2.6	2.3	-4	-3.7	1.3	1.7	0	0.3	-0.7	0.6	1.2	1.7	0	-0.5	106.5	0.7	3.86	-0.42	0.4	0.44	0.213	0.136	-0.208	0.348	0.817	0.1	-0.28	0.78	1.961	1.214	1.172	0.749	1.487	0.218	1.732	0.863	1.055	0.1	0.785	0.074	0.09408	0	-4.15	7	0.92	0.38	0.05	0.28	0.09	6.6	-4.88	0.92	-2.09	-1.4	8.8	7	17.54	17.71	-0.17	0.54	15.77	1.66	5.66	9.5	0.87	0.8	1.03	1.13	1.41	0.77	0.92	0.75	0.65	0.7	0.73	0.79	0.46	0.57	0.5	0.82	0.82	0.84	0.79	0.086	11.7	-0.11	0.997	1.324	0.998	0.795	0.884	1.053	0.784	0.569	0.99	4.11	-0.48	0.4	0.54	0.362	0.339	5.7	5.7	4.4	4.6	6.9	0.74	-7	-3	10	20	34	79	174	0.7	0.39	-0.7	6	6.1	5.6	5.3	5.1	3.83	5.36	8.95	5.16	4.98	5.55	5.62	5	4.45	5.46	7.12	5.08	0	119.6	121.5	0.0941	-0.08	0.65	1.81	2.31	0.19	1.05	7	-0.7	0.0589	0.0239	0.1462	-0.163	0.02	-0.08	1.017	1.023	1.018	0.992	0.822	0.988	1.11	0.832	1.189	2.18	2.45	10.3	-0.0701	120	126	0.871	-0.02	0.89	0.63	0.38	0.21	0.18	-0.34	-0.26	0.45	-0.4	49.26	-0.289	-3.99	-0.71	-1.91	-4.83	-1.97	4.02	0.25	0.04	0.01	0.03	0.05	-1.2	0.78
19 | W	4.7	2.65	0.77	1.08	0.73	0.77	0.97	0.305	135.4	145	163	16.3	17.8	-2.01	0.743	8.094	4.702	0	0.462	0.231	0.7	0.409	1.6	0.76	0.158	1	1	1.5	5	0	1	237.6	255	32	0.27	0.04	0.96	1.08	1.37	0.6	1.47	0.4	0.62	0.16	1.49	1.19	0.48	1.59	0.077	0.013	0.064	0.167	0.48	1.38	2.25	1.31	1.51	1.71	0.74	1.33	0.84	1.23	7	1.3	18	0.91	1.06	0.37	2.6	1.6	0.67	204.24	282	-33.7	9.44	2.43	2.25	3.21	0.84	8.08	7.68	1.52	5.9	13.2	0	1	0	0	0	4.75	1.1	1	1	1	1	5.7	1.18	1.32	1.86	1.18	1.15	1.13	1.15	0.67	0	0.62	0.68	3	227	0.13	5.4	170	-0.51	1.9	-3.4	56.92	60	237.01	1.05	1.17	0.74	0.62	0.62	1.03	1.79	0.63	34.7	49	17	1.6	0.3	3.77	2.38	0.014	25	0.8	99	0.97	1.18	1.17	0.79	0.925	0.917	0.803	70.5	0	4	2.78	0.15	145.5	-0.9	0.66	-3.4	3.58	118.4	203.7	2.21	7.6	0.56	0.99	1.14	0.76	0.97	1.26	0.79	0.43	1.54	0.89	1.75	13.93	1.01	1.07	1	0	0.93	1.1	42.53	14.9	18.1	17.1	15.1	0.83	0.87	94	3.82	1.06	1.13	0.87	1.34	1.15	2.51	1.02	2.89	1.35	1.64	0.26	2.22	0.77	2.67	2.11	1.63	0.8	0.47	1.68	2.1	2.21	1.06	2.31	2.2	0.9	5.2	3.4	-1.869	-1.03	-0.839	-26.166	-14.42	2.49	-4.65	1.25	-17.84	1.97	1.03	1.02	1.24	1.28	0.88	0.9	1.17	0.64	1.11	1.06	1.25	1.26	0	1.24	1.1	0.57	-10	-1.75	12.95	8.41	2.85	12.06	13.9	11.41	6.25	6.98	0.291	0.092	0.632	-7.9	0.99	1.14	0.75	1.1	2.2	-0.06	-0.01	-0.02	0.25	0.2	0.07	-0.1	0.15	0.02	0.14	0.21	0.32	0.36	-0.1	-0.15	-0.19	-0.1	0.15	0.34	0.45	0.22	0.09	-0.22	-0.08	-0.01	-0.32	0.19	0.16	0.15	-0.15	-0.44	-0.46	-0.37	-0.44	-0.17	-0.2	-0.09	-0.18	-0.06	0.848	0.867	0.75	0.83	1.23	0	0.98	1.96	1.58	1.53	1.27	0	1.22	1.12	1.33	23.36	19.78	2.33	2.51	-8.21	-0.18	-8.39	271.6	-0.59	0.88	1.1	1.1	0.3	2.1	1.7	1.4	3.1	1.4	1.5	1.1	2.1	0.4	0.4	0	0	0.4	2.7	1.2	-0.9	4	-0.9	-1	0.3	-3.4	3.4	-0.8	3.3	6.5	1.2	2.9	3	224.6	0.85	-2.86	-2.86	0.6	3	0.183	-0.011	0.493	0.05	1.107	77	0.5	1.05	1.925	1.114	0.452	1.283	0.803	0	1.781	0.777	0.881	0.166	0.16	0.463	0.05481	1	-16.19	8.07	0.2	0.86	-0.33	-0.12	-0.61	5.3	-5.88	-4.75	3.65	0.85	9.9	5.7	17.19	16.87	0.275	0.31	21.67	2.1	5.89	17.1	0.91	0.9	1.06	0.68	0.94	1.26	1.1	1.68	1.1	0.68	0.68	1.52	1.57	1	1	0.58	0.58	1.06	0.91	1.25	12.4	-0.45	0.904	1.186	0.938	0.796	0.69	0.91	0.755	0.671	1.1	6.1	-0.48	0.7	0.58	0.268	0.291	1.3	1.2	1.2	1	24.2	1.8	59	69	102	118	116	130	179	0.9	0.21	-0.14	1.4	1.4	1.8	1.2	0.7	0.67	0.54	1.18	0.56	1.11	1.28	2.6	2.01	0.9	0.95	1.96	1.24	6.93	226.4	228.2	0.0548	1.76	2.29	5.91	2.61	3.59	0.7	109	-0.46	0.2362	0.4114	0.2657	0.564	-0.15	-0.28	0.895	0.879	0.971	0.96	1.017	0.939	0.841	0.981	0.852	6.46	5.64	24.5	0.3836	197.1	209.8	0.666	-0.19	0.82	-1.57	-0.98	-0.27	0.05	-0.01	1.46	0.878	1.6	53.59	0.514	0.21	-0.13	0.51	1.8	-0.11	0.79	-1.28	-0.21	-0.24	-0.33	-0.27	-1.9	-3.32
20 | Y	4.6	1.88	0.39	0.68	0.44	0.83	0.84	0.42	116.2	53	22	5.9	3.8	-2.24	0.743	8.183	4.604	6.8	0.161	0.429	0.7	0.298	4.91	1.07	0.094	1	1	1	5	0	1	203.6	230	60	0.15	0.03	1.14	0.69	1.47	1.14	0.68	0.73	0.99	0.96	1.07	1.96	1.23	1.01	0.082	0.065	0.114	0.125	1.08	1.49	1.53	1.05	0.66	1.11	0.71	0.49	1.41	1.35	7	3.4	41	1.1	0.89	0.02	1.6	1.8	-0.93	181.19	344	-10	9.11	2.2	0.96	2.94	0.71	6.47	4.73	1.52	6.72	13.9	0.03	1	2	0	0	4.3	1.1	1	1	1	1	1.26	0.71	0.88	1.08	0.77	1.39	1.37	1.21	0.92	1.8	0.41	0.98	2.85	193	0.2	6.2	136	-0.21	2.1	-2.3	51.73	51.15	229.15	0.7	1.28	1.14	0.94	1.44	0.69	0.73	0.83	55.2	24	41	0.5	-0.4	2.67	2.2	0.032	50	2.3	292	0.79	1.23	0.74	1.08	0.961	0.93	0.837	0	0	4.35	3.58	0.25	117.3	-1.3	1.24	-2.3	3.36	110	195.6	2.13	7.1	0.39	0.72	1.25	1.05	0.73	1.31	0.93	0.46	1.53	1.08	1.68	13.42	0.69	1.25	1.31	0.97	0.38	0.87	31.53	6.1	8.2	7.4	6.7	0.93	1.03	83	2.91	0.63	1.07	1.35	3.15	1.76	4.08	0.53	3.51	0.2	5.42	1.29	3.28	0.07	2.68	2.57	0.67	2.54	1.01	3.42	3.32	3.42	2.75	4.55	3.13	0.59	2.15	2.3	-1.686	-1.03	-0.656	-20.232	-12.36	2.23	-0.17	-2.21	9.25	2.4	0.71	0.72	1.35	1.45	1.28	1.12	0.8	0.73	0.72	1.35	1.26	1.23	1.53	0.54	1.24	1.78	-1.9	-2.42	13.29	8.53	2.79	12.64	14.76	11.52	3.03	6.73	0.293	0.081	0.495	2.9	0.72	1.25	1.05	1.1	2.8	-0.14	-0.29	-0.38	-0.3	-0.04	-0.31	-0.35	-0.19	-0.1	-0.08	0.16	0.11	0	-0.1	0.15	0.05	0.18	0.29	0.42	0.77	0.53	0.34	-0.11	0.06	-0.08	0.35	0.33	0.22	0.09	-0.02	-0.19	-0.05	-0.41	-0.49	-0.35	0.1	-0.25	0.07	-0.2	0.931	1.05	1.72	1.34	0.68	1.9	1.31	1.68	0.86	1.79	0.91	1.9	1.09	1.65	1.09	26.49	26.36	-0.14	1.63	-5.97	-1.77	-7.74	239.9	-1.02	0.33	1.9	1.9	0.8	1.8	0.4	1.2	0.6	0.2	0.8	1.3	0.8	1.1	0.3	1.2	0.8	0.9	0.5	-4.5	-3.7	-4.6	-0.6	4	3.3	4.8	2.9	3.1	3.8	1.3	-0.6	3.2	2.1	177.7	0.76	0.98	0.98	1.2	2.97	0.193	-0.138	0.381	0.22	1.02	66	1.67	0.67	0.802	1.34	0.816	1.283	1.227	0.654	0	0.907	1.101	0.066	0.06	0.737	0.05159	1	-1.51	6.9	0.49	0.64	-0.42	0.19	-0.61	5.7	-6.11	-1.39	2.32	0.01	8.8	6.8	17.99	18.23	0	2.97	18.03	1.61	5.66	15	1.04	1.12	0.94	0.8	0.82	0.99	0.73	0.65	0.93	0.91	1.04	1.06	1.1	1.02	0.73	1.06	0.93	1.15	0.64	0.85	12.1	-0.17	0.929	1.199	0.981	0.788	0.778	1.009	0.665	0.56	0.98	5.19	-0.5	0.6	0.72	0.22	0.287	4.5	3.7	4.5	3.3	17.2	1.68	-11	11	36	44	43	27	-7	0.9	0.05	-0.49	3.6	3.2	3.3	3.1	2.4	2.75	2.26	3.26	2.16	4.07	3.55	6.15	3.96	3.46	2.96	4.85	3.01	5.06	194.6	197	0.0516	1.37	1.89	1.39	2.37	-2.58	1	56	-1.3	0.3167	0.3113	0.2998	0.322	-0.09	-0.03	1	0.902	1.157	1.12	0.836	1.09	0.866	1.075	0.945	5.01	4.46	19.5	0.25	231.7	237.2	0.531	-0.23	0.47	-0.56	-0.25	0.4	0.48	-0.08	0.51	0.88	0.5	51.79	1.699	3.34	0.69	2.87	7.61	2.17	-4.73	-0.88	-0.14	-0.23	-0.29	-0.23	0.7	-1.01
21 | V	3.95	1.32	1.08	1.14	0.82	0.98	0.37	0.386	85.1	123	117	3.5	2.1	-1.56	0.777	8.436	4.184	7	0.379	0.495	0.76	0.14	0.401	0.63	0.513	2	0	0	1	0	0	141.7	155	18	0.54	0.18	0.5	1.06	1.7	0.59	0.61	1.25	0.75	0.32	1.69	1.79	0.42	0.59	0.062	0.048	0.028	0.053	0.43	1.26	1.09	1.21	1.21	1.13	0.86	0.96	1.61	0.48	5	6.6	74	1.58	1.33	0.54	1.5	0.48	0.84	117.15	293	5.63	9.62	2.32	1.22	3.67	0.89	3	4.11	1.9	3.17	17.2	0.01	0	0	0	0	4.86	0.95	0.6	1	0.8	1	0.6	0.81	1.13	0.64	0.74	1.56	1.31	1.58	0.6	0.8	0.58	0.62	1.7	141.4	0	5.9	84	-1.27	0.9	-1.5	40.35	42.75	207.6	0.93	1.4	0.44	0.56	0.43	0.77	0	0.76	23.7	64	14	2.9	0.6	1.87	2.32	0.066	98	4.2	553	0.94	1.66	1.1	2	0.982	0.927	0.913	29.5	0	3.4	3.31	0.22	71.5	4.2	0.56	-1.5	1.49	121.7	220.3	1.29	6.4	0.15	0.91	1.49	0.47	0.93	1.43	0.46	0.08	1.81	2.63	1.53	15.71	0.98	1.33	0.87	0.24	0.48	0.58	13.92	2.7	3.3	5.9	4.6	0.81	0.81	94	3.49	0.97	1.41	0.83	6.65	2.53	5.12	-0.6	4.66	-0.79	6.18	-0.19	7.55	0.36	11.44	6.3	1.3	5.44	4.57	7	6.19	14.34	5.27	6.07	12.43	1.24	4.45	1.5	-1.981	-1.254	-0.728	-13.867	-8.778	3.5	1.32	0.54	-3.97	3.81	0.95	1.05	1.44	1.73	0.51	0.55	1.03	1.18	0.82	1.66	1.22	1.62	0.14	0.69	0.52	0.5	-3.7	-2.08	15.07	10.38	3.21	12.88	16.3	13.86	7.14	7.62	0.291	0.096	0.529	-10.9	0.91	1.49	0.47	0.95	4	-0.03	0.02	-0.01	-0.01	0.12	0.13	0.31	0.24	0.17	-0.01	0	0.06	-0.13	-0.07	-0.09	-0.15	0.29	0.48	0.76	0.69	0.67	0.58	0.06	0.11	-0.18	0	0.04	0.05	-0.1	-0.33	-0.45	-0.86	-1.32	-0.99	-0.7	-0.11	-0.06	0.29	0.18	0.825	0.817	0.59	0.89	0.88	1.62	1.11	1.56	0.64	0.95	0.93	0	0.88	1.7	1.01	17.06	21.87	4.04	1.18	-2.05	2.86	0.81	157.2	0.09	1.09	0.7	0.7	0.1	1.1	0.6	1.1	1.5	0.8	1.2	0.4	1.4	1.3	0.7	0.7	0.2	0.6	1	1.4	-4.4	2.5	2.3	6.8	7.1	2.7	-6	-3.5	-6.2	-4.6	-3.5	-6.3	1.4	141	0.86	-2.18	-2.18	0.4	1.69	0.255	0.245	-0.155	-0.212	0.95	0.1	0.91	0.99	0.409	1.428	0.64	0.654	0.625	0.167	0.946	0.561	0.643	0.285	0.356	0.301	0.00569	1	-16.22	8.88	0.85	0.72	-0.46	-0.08	-0.11	5.6	1.99	-2.69	-2.53	-1.29	12	9.4	18.3	18.98	-0.125	1.79	21.57	0.13	5.96	14.3	0.9	0.87	0.62	0.58	0.67	0.76	0.7	1.14	1.18	0.81	1.03	0.94	0.94	1.08	0.51	0.46	0.53	0.74	0.77	1.12	11.9	-0.14	0.931	1.235	0.968	0.781	0.706	0.939	0.546	0.444	0.87	4.18	-0.53	0.5	0.63	0.307	0.294	8.2	8.2	5.9	7.1	15.3	1.2	100	108	116	113	111	117	114	0.4	-0.06	-0.7	6.7	6.7	7.7	6.8	5.3	4.05	3.57	3.1	4.1	12.53	10.69	9.46	10.24	8.62	7.47	6.6	7	0	138.2	139	0.0057	2.53	1.59	2.3	0.52	2.06	0.51	62	4.2	0.4084	0.2947	0.3997	-0.052	-0.17	-0.24	0.955	0.923	0.959	1.001	1.14	0.957	0.9	0.908	0.999	3.77	3.67	19.5	0.1782	139.1	138.4	1.131	1.13	1.99	-0.4	-0.46	-0.62	-0.65	0.32	1.34	0.825	0.7	56.12	0.899	5.39	1.15	3.98	8.2	3.31	-6.94	-1.34	-0.22	-0.29	-0.29	-0.23	-2.6	-3.5
22 | 


--------------------------------------------------------------------------------
/Lab5_PeptideSequencesCoding/ACEtriPeptidesSequencesActivities.txt:
--------------------------------------------------------------------------------
  1 | AAP	4.52
  2 | ADA	3.83
  3 | AEL	4.24
  4 | AFL	4.2
  5 | AGP	3.25
  6 | ALP	3.62
  7 | AQL	4.24
  8 | AVP	3.47
  9 | DLP	5.32
 10 | FAL	4.58
 11 | FCF	4.96
 12 | FDK	3.41
 13 | FEP	4.92
 14 | FFF	4.8
 15 | FFG	3.29
 16 | FFL	4.43
 17 | FFP	4.92
 18 | FGF	4.71
 19 | FGG	3.21
 20 | FGK	3.8
 21 | FIV	3.96
 22 | FNF	5.16
 23 | FPF	4.68
 24 | FPK	3.55
 25 | FPP	4.5
 26 | FQP	4.92
 27 | FWN	4.74
 28 | FYN	4.74
 29 | GEG	3.72
 30 | GFF	4.98
 31 | GFG	3.47
 32 | GGF	4.89
 33 | GGG	3.39
 34 | GGP	4.72
 35 | GKV	5.41
 36 | GLG	3.55
 37 | GLY	5.05
 38 | GPL	5.59
 39 | GPM	4.77
 40 | GPP	4.92
 41 | GQP	5.49
 42 | GRP	4.7
 43 | GSH	4.49
 44 | GVV	4.18
 45 | GYG	3.67
 46 | GYY	4.93
 47 | HIR	3.02
 48 | HLL	4.24
 49 | HQG	3.13
 50 | IAE	4.46
 51 | IAP	5.57
 52 | IAQ	4.46
 53 | IFL	4.35
 54 | IKP	5.68
 55 | IKY	6.68
 56 | ILP	4.49
 57 | IMY	5.74
 58 | IPA	3.85
 59 | IPP	5.3
 60 | IRA	5.01
 61 | IRP	5.74
 62 | ITF	4.31
 63 | IVQ	4.02
 64 | IVY	5.84
 65 | IWH	5.46
 66 | IYP	4.21
 67 | KPF	4.49
 68 | LAA	4.89
 69 | LAP	5.73
 70 | LAY	5.41
 71 | LDP	4.37
 72 | LEE	4
 73 | LEL	4.81
 74 | LEP	5.24
 75 | LGI	4.54
 76 | LGL	4.48
 77 | LIY	6.09
 78 | LKA	5.07
 79 | LKP	6.02
 80 | LKY	6.11
 81 | LLF	4.1
 82 | LLL	4.65
 83 | LLP	4.8
 84 | LPF	4.4
 85 | LPP	5.02
 86 | LRP	6.21
 87 | LQP	5.83
 88 | LQW	5.42
 89 | LSA	5.11
 90 | LSP	5.77
 91 | LTF	5.56
 92 | LVL	5.19
 93 | LVQ	4.85
 94 | LVR	4.85
 95 | LVY	5.74
 96 | LWA	4.9
 97 | LWY	5.3
 98 | LYP	5.18
 99 | MNP	4.18
100 | PFP	4.26
101 | PGI	3.77
102 | PGG	2.86
103 | PGL	4.86
104 | PGP	4.18
105 | PGR	3.33
106 | PIP	4.31
107 | PLW	4.44
108 | PPG	2.82
109 | PPP	4.14
110 | PSY	4.8
111 | PWP	3.66
112 | PYP	3.66
113 | RFH	3.48
114 | RGP	4.27
115 | RPG	2.91
116 | RPP	4.22
117 | RRR	4.23
118 | SVY	5.09
119 | TNP	3.68
120 | VAA	4.89
121 | VAF	4.45
122 | VGP	4.58
123 | VIY	5.12
124 | VLP	4.09
125 | VLY	4.51
126 | VPP	5.05
127 | VQV	5.06
128 | VRP	5.66
129 | VSP	5
130 | VSW	4.63
131 | VTR	3.87
132 | VVF	4.45
133 | VVV	4.37
134 | VWY	5.03
135 | VYP	3.83
136 | YPF	4.4
137 | YPR	4.78
138 | YYY	4.46
139 | AMY	5.26
140 | FAP	5.42
141 | GGY	5.89
142 | HHL	5.27
143 | IKW	6.68
144 | LRY	6.82
145 | MKY	5.14
146 | PRY	5.6
147 | RIY	4.55
148 | TVY	4.82
149 | VAP	5.7
150 | YEY	5.4
151 | 


--------------------------------------------------------------------------------
/Lab5_PeptideSequencesCoding/SchematicOfGeneralApproachInQSAR.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ZhijunBioinf/Pattern-Recognition-and-Prediction/a2a6f7feffb2b919a1c813fe4fef0c4f726aef92/Lab5_PeptideSequencesCoding/SchematicOfGeneralApproachInQSAR.jpg


--------------------------------------------------------------------------------
/Lab5_PeptideSequencesCoding/sequence_coding2.md:
--------------------------------------------------------------------------------
  1 | # 实验五：序列表征/数值化2(以定量构效关系建模为例)
  2 | 
  3 | ## 实验目的
  4 | * 1）了解定量构效关系建模的研究背景
  5 | * 2）编程实现肽序列的AA531(531 properties of Amino Acids)特征表征/数值化
  6 | 
  7 | ## 1. 定量构效关系
  8 | * 分子是物质的基本组成单位。分子结构属性决定其生理活性。
  9 | * 通过统计学、信息学方法从分子结构中提取、总结分子结构的信息与规律，有助于从理论上指导实验过程。
 10 | * 定量构效关系(Quantitative Structure-Acitivity Relationship, QSAR)是以分子的基本理化性质与相应的生理活性为基础，通过数学或统计学手段定量研究有机小分子（如抑制剂等）与生物大分子（如受体、酶等）间的相互作用。
 11 | * QSAR可用于高效生物活性分子化合物筛选、药物的环境毒性评价、新药设计与合成等[1-2]。
 12 | 
 13 | ![](./SchematicOfGeneralApproachInQSAR.jpg)
 14 | 
 15 | ## 2. ACE抑制剂研究现状
 16 | * 血管紧张素转化酶(Angiotensin-Converting Enzyme，ACE)抑制肽是一类从食源性蛋白质中分离得到的具有降高血压活性的多肽。由于其降血压效果好，而且没有降压药物的毒副作用从而引起了广泛关注。
 17 | * 近年来，ACE抑制肽的构效关系研究成为研究重点。结构生物信息学研究表明，ACE抑制肽的ACE抑制能力不仅与其分子质量有关，而且与其氨基酸序列以及其立体空间构象之间存在高度相关性。
 18 | * ACE抑制肽的抑制类型与ACE抑制活性、构效关系也存在一定相关性。对ACE抑制肽构效关系进行深入研究将有助于指导开发高活性的功能性食品及降血压药物[3]。
 19 | 
 20 | ## 3. 数据集
 21 | * ACE_tri-peptides_150数据集的肽序列及其活性(lg(IC<sub>50</sub>))来源于文献[4].
 22 | * [ACEtriPeptidesSequencesActivities.txt](./ACEtriPeptidesSequencesActivities.txt)
 23 | 
 24 | ## 4. 基于[AA531](http://www.genome.jp/aaindex)的序列表征
 25 | * An amino acid index is a set of 20 numerical values representing any of the different physicochemical and biological properties of amino acids. The AAindex1 section of the Amino Acid Index Database is a collection of published indices together with the result of cluster analysis using the correlation coefficient as the distance between two indices. This section currently contains 544 indices.
 26 | * 某些氨基酸缺失了部分理化属性，预处理后，符合条件的生理生化属性共531个（最新版本AAindex1中的理化性质经预处理后会多于531个）。
 27 | * 对于每条ACE三肽序列，以每个位置氨基酸对应的531个理化属性依次替换序列，可获得531x3 = 1593个特征。
 28 | * [AA531properties.txt](./AA531properties.txt)
 29 | 
 30 | ## 5. 工作目录准备
 31 | ```sh
 32 | # 建立lab_05文件夹
 33 | $ mkdir lab_05
 34 | $ cd lab_05
 35 | 
 36 | # 集群上若python3不可用，需先激活base环境
 37 | $ source /opt/miniconda3/bin/activate
 38 | $ conda activate
 39 | 
 40 | # 进入Python
 41 | $ python3
 42 | ```
 43 | 
 44 | ## 6. 序列表征
 45 | * 参考程序：AA531Coding.py
 46 | ```python3
 47 | import numpy as np
 48 | import sys
 49 | 
 50 | # 1. 将AA531properties.txt做成字典
 51 | # def makeAA531Dict(filename):
 52 | AA531FileName = 'AA531properties.txt'
 53 |     fr = open(AA531FileName) # 打开文件
 54 |     arrayOLines = fr.readlines() # 读取所有内容
 55 |     del(arrayOLines[0]) # 删除head行
 56 |     fr.close() # 及时关闭文件
 57 | 
 58 |     AA531Dict = {}
 59 |     for line in arrayOLines:
 60 |         line = line.strip()
 61 |         listFromLine = line.split('\t')
 62 |         AA = listFromLine[0]
 63 |         properties = [float(i) for i in listFromLine[1:]] # 从文件读取的数值默认是字符串类型，需要转换为浮点型
 64 |         AA531Dict[AA] = properties
 65 |     #return AA531Dict
 66 | 
 67 | # 2. 肽序列表征
 68 | # def file2matrix(filename, seqLength, AA531Dict):
 69 | AASeqFileName = 'ACEtriPeptidesSequencesActivities.txt'
 70 | seqLength = 3
 71 |     fr = open(AASeqFileName) # 打开文件
 72 |     arrayOLines = fr.readlines() # 读取所有内容
 73 |     fr.close() # 及时关闭文件
 74 | 
 75 |     numberOfLines = len(arrayOLines) # 得到文件行数
 76 |     returnMat = np.zeros((numberOfLines, 531*seqLength)) # 为返回的结果矩阵开辟内存
 77 |     Y = np.zeros((numberOfLines, 1))
 78 |     lineNum = 0
 79 | 
 80 |     for line in arrayOLines:
 81 |         line = line.strip() # 删除空白符，包括行尾回车符
 82 |         listFromLine = line.split('\t') # 以'\t'为分隔符进行切片
 83 |         AASeq = listFromLine[0] # 取出氨基酸序列
 84 |         Y[lineNum] = float(listFromLine[1]) # 取出活性值Y
 85 |         
 86 |         feaVec = []
 87 |         for AA in AASeq: # 扫描序列，将每个氨基酸替换为相应的531个理化属性
 88 |             if AA in AA531Dict.keys(): # 如果序列中的氨基酸在AA531Dict中有对应的key，才进行替换
 89 |                 feaVec.extend(AA531Dict[AA])
 90 |             else: # 否则以0替换
 91 |                 print('Warning: nonregular amino acid found! Coding "%s" in "%s"(seqId: %d) with 531 zeros.' % (AA, AASeq, lineNum))
 92 |                 feaVec.extend([0.0]*531)
 93 |                 Y[lineNum] = -1
 94 | 
 95 |         returnMat[lineNum,:] = np.array(feaVec)
 96 |         lineNum += 1
 97 |     #return Y, returnMat, lineNum
 98 | 
 99 | # 3. 将结果写入文件
100 | #if __name__ == '__main__':
101 | #    AASeqFileName = sys.argv[1]
102 | #    AA531FileName = sys.argv[2]
103 | #    seqLength = int(sys.argv[3])
104 | #    outputFileName = sys.argv[4]
105 | #    AA531Dict = makeAA531Dict(AA531FileName)
106 | #    Y, AA531Mat, SeqNum = file2matrix(AASeqFileName, seqLength, AA531Dict)
107 | 
108 | outputFileName = 'result.txt'
109 | np.savetxt(outputFileName, np.hstack((Y, returnMat)), fmt='%g', delimiter='\t')
110 | print('The number of sequences is %d. Matrix of features is saved in %s' % (lineNum, outputFileName))
111 | 
112 | ```
113 | ```sh
114 | # 运行程序AA531Coding.py实现以AA531表征序列（不需要运行）。在命令行指定ACE抑制剂三肽序列文件名、AA531属性文件名、序列长度、输出文件名。
115 | $ python3 AA531Coding.py ACEtriPeptidesSequencesActivities.txt AA531properties.txt 3 result.txt （不需要运行）
116 | ```
117 | 
118 | ## 作业
119 | > 1. 自己独立编写序列表征程序。  
120 | > 2. 将[AAindex](https://www.genome.jp/ftp/db/community/aaindex/aaindex1)文件中的氨基酸理化性质整理成'AA531properties.txt'的文件格式（用Python或R实现都可，建议Python）。  
121 | 不怕报错，犯错越多，进步越快！  
122 | 
123 | ## 参考文献
124 | [1] 代志军. 特征选择与样本选择用于癌分类与药物构效关系研究(湖南农业大学博士学位论文). 2014. <br>
125 | [2] Nongonierma AB, FitzGerald RJ. Learnings from quantitative structure–activity relationship (QSAR) studies with respect to food protein-derived bioactive peptides: a review. RSC advances. 2016, 6(79): 75400-75413. <br>
126 | [3] 王晓丹, 薛璐, 胡志和, 等. ACE抑制肽构效关系研究进展[J]. 食品科学, 2017, 38(5): 305-310. <br>
127 | [4] 刘静, 彭剑秋, 管骁. 基于多元线性回归的血管紧张素转化酶抑制肽定量构效关系建模研究[J]. 分析科学学报, 2012, 28(1): 16-22. <br>
128 | 
129 | 


--------------------------------------------------------------------------------
/Lab6_Regression_MLR-PLSR-SVR/regress1.md:
--------------------------------------------------------------------------------
  1 | # 实验六：回归模型之MLR, PLSR, SVR
  2 | 
  3 | ## 实验目的
  4 | * 1）根据[实验五](https://github.com/ZhijunBioinf/Pattern-Recognition-and-Prediction/blob/master/Lab5_PeptideSequencesCoding/sequence_coding2.md)对ACE抑制剂多肽序列表征的AA531特征，构建训练集与测试集
  5 | * 2）使用多元线性回归(Multiple Linear Regression, MLR)完成ACE抑制剂活性预测
  6 | * 3）使用偏最小二乘回归(Partial Least Squares Regression, PLSR)完成ACE抑制剂活性预测
  7 | * 4）使用支持向量回归(Support Vector Regression, SVR)完成ACE抑制剂活性预测
  8 | 
  9 | ## 准备工作目录
 10 | ```
 11 | $ mkdir lab_06
 12 | $ cd lab_06
 13 | # 对lab_05路径中的ACE抑制剂多肽活性及序列的AA531特征(result.txt)建立软链接，并重命名为'ACEtriPeptides_YandAA531.txt'
 14 | $ ln -s ../lab_05/result.txt ACEtriPeptides_YandAA531.txt
 15 | 
 16 | # 集群上若python3不可用，需先激活base环境
 17 | $ source /opt/miniconda3/bin/activate
 18 | $ conda activate
 19 | ```
 20 | 
 21 | ## 1. 训练集与测试集构建
 22 | * 参考程序：getTrainTest_regression.py
 23 | ```python3
 24 | import numpy as np
 25 | import sys
 26 | from sklearn.model_selection import train_test_split # 用于产生训练集、测试集
 27 | 
 28 | DataFileName = sys.argv[1]
 29 | Data = np.loadtxt(DataFileName, delimiter = '\t') # 载入数据
 30 | X = Data[:,1:]
 31 | Y = Data[:,0]
 32 | 
 33 | testSize = float(sys.argv[2]) # 命令行指定test set比例
 34 | X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = testSize)
 35 | 
 36 | trainingSetFileName = sys.argv[3]
 37 | testSetFileName = sys.argv[4]
 38 | np.savetxt(trainingSetFileName, np.hstack((y_train.reshape(-1,1), X_train)), fmt='%g', delimiter='\t') # 将Y与X以列组合后，保存到文件
 39 | np.savetxt(testSetFileName, np.hstack((y_test.reshape(-1,1), X_test)), fmt='%g', delimiter='\t')
 40 | print('Generate training set(%d%%) and test set(%d%%): Done!' % ((1-testSize)*100, testSize*100))
 41 | ```
 42 | 
 43 | ```bash
 44 | # 构建训练集与测试集：在命令行指定序列表征数据、测试集比例、train文件名、test文件名
 45 | $ python3 getTrainTest_regression.py ACEtriPeptides_YandAA531.txt 0.2 ACE_train.txt ACE_test.txt
 46 | ```
 47 | 
 48 | ## 2. 以MLR完成ACE抑制剂活性预测
 49 | * 参考程序：myMLR.py
 50 | ```python3
 51 | import numpy as np
 52 | from sklearn import linear_model # 导入MLR包
 53 | import sys
 54 | import matplotlib.pyplot as plt
 55 | from sklearn import preprocessing
 56 | 
 57 | train = np.loadtxt(sys.argv[1], delimiter='\t') # 载入训练集
 58 | test = np.loadtxt(sys.argv[2], delimiter='\t') # 载入测试集
 59 | isNormalizeX = bool(int(sys.argv[3])) # 是否标准化每个x
 60 | modelName = 'MLR'
 61 | 
 62 | trX = train[:,1:]
 63 | trY = train[:,0]
 64 | teX = test[:,1:]
 65 | teY = test[:,0]
 66 | 
 67 | if isNormalizeX:
 68 |     scaler = preprocessing.StandardScaler()
 69 |     trX = scaler.fit_transform(trX)
 70 |     teX = scaler.transform(teX)
 71 | 
 72 | reg = linear_model.LinearRegression() # 创建一个MLR的实例
 73 | reg.fit(trX, trY) # 训练模型
 74 | predY = reg.predict(teX) # 预测测试集
 75 | 
 76 | R2 = 1- sum((teY - predY) ** 2) / sum((teY - teY.mean()) ** 2)
 77 | RMSE = np.sqrt(sum((teY - predY) ** 2)/len(teY))
 78 | print('Predicted R2(coefficient of determination) of %s: %g' % (modelName, R2))
 79 | print('Predicted RMSE(root mean squared error) of %s: %g' % (modelName, RMSE))
 80 | 
 81 | # Plot outputs
 82 | plotFileName = sys.argv[4]
 83 | plt.scatter(teY, predY,  color='black') # 做测试集的真实Y值vs预测Y值的散点图
 84 | parameter = np.polyfit(teY, predY, 1) # 插入拟合直线
 85 | f = np.poly1d(parameter)
 86 | plt.plot(teY, f(teY), color='blue', linewidth=3)
 87 | plt.xlabel('Observed Y')
 88 | plt.ylabel('Predicted Y')
 89 | plt.title('Prediction performance using %s' % modelName)
 90 | r2text = 'Predicted R2: %g' % R2
 91 | textPosX = min(teY) + 0.2*(max(teY)-min(teY))
 92 | textPosY = max(predY) - 0.2*(max(predY)-min(predY))
 93 | plt.text(textPosX, textPosY, r2text, bbox=dict(edgecolor='red', fill=False, alpha=0.5))
 94 | plt.savefig(plotFileName)
 95 | ```
 96 | 
 97 | ```bash
 98 | # MLR模型：在命令行指定训练集、测试集、是否对特征标准化、图名
 99 | $ python3 myMLR.py ACE_train.txt ACE_test.txt 0 ObsdYvsPredY_MLR.pdf
100 | # 试试对数据标准化
101 | $ python3 myMLR.py ACE_train.txt ACE_test.txt 1 ObsdYvsPredY_MLR1.pdf
102 | ```
103 | 
104 | ## 3. 以PLSR完成ACE抑制剂活性预测
105 | * 参考程序：myPLSR.py
106 | ```python3
107 | import numpy as np
108 | from sklearn.cross_decomposition import PLSRegression # 导入PLSR包
109 | import sys
110 | import matplotlib.pyplot as plt
111 | from sklearn.model_selection import cross_val_predict # 导入交叉验证包
112 | 
113 | def optimise_pls_cv(X, y, nCompMax): # 以交叉验证技术获得不同潜变量个数情形下的MSE
114 |     MSEVec = []
115 |     nCompVec = np.arange(1, nCompMax)
116 |     for n_comp in nCompVec:
117 |         pls = PLSRegression(n_components=n_comp) # 创建一个PLSR的实例
118 |         y_cv = cross_val_predict(pls, X, y, cv=10) # 获得10次交叉的预测Y
119 |         mse = sum((y - y_cv.ravel()) ** 2)/len(y) # 计算MSE, NOTE: y_cv维度为2，需转成向量
120 |         MSEVec.append(mse)
121 |     bestNComp = np.argmin(MSEVec) # 获得最小MSE对应的下标，即最优潜变量个数
122 |     
123 |     with plt.style.context('ggplot'): # 以潜变量个数为x轴，对应的MSE为y轴，作图
124 |         plt.plot(nCompVec, np.array(MSEVec), '-v', color='blue', mfc='blue') # 带标记点的折线图
125 |         plt.plot(nCompVec[bestNComp], np.array(MSEVec)[bestNComp], 'P', ms=10, mfc='red') # 在图上标记出最小MSE对应的点
126 |         plt.xlabel('Number of PLS components')
127 |         plt.xticks = nCompVec
128 |         plt.ylabel('MSE')
129 |         plt.title('Optimise the number of PLS components')
130 |         plt.savefig('optimizePLSComponents.pdf')
131 |         
132 |     return bestNComp
133 | 
134 | if __name__ == '__main__':
135 |     train = np.loadtxt(sys.argv[1], delimiter='\t') # 载入训练集
136 |     test = np.loadtxt(sys.argv[2], delimiter='\t') # 载入测试集
137 |     nCompMax = int(sys.argv[3]) # 潜变量个数上限
138 |     modelName = 'PLSR'
139 | 
140 |     trX = train[:,1:]
141 |     trY = train[:,0]
142 |     bestNComp = optimise_pls_cv(trX, trY, nCompMax)+1 # 得到最优潜变量个数（特征降维思想）
143 |     print('The best number of PLS components: %d' % bestNComp)
144 |     reg = PLSRegression(n_components = bestNComp) # 创建一个PLSR的实例
145 |     reg.fit(trX, trY) # 训练模型
146 | 
147 |     teX = test[:,1:]
148 |     teY = test[:,0]
149 |     predY = reg.predict(teX) # 预测测试集
150 |     predY = predY.ravel() # NOTE: predY维度为2，需转成向量
151 | 
152 |     R2 = 1- sum((teY - predY) ** 2) / sum((teY - teY.mean()) ** 2)
153 |     RMSE = np.sqrt(sum((teY - predY) ** 2)/len(teY))
154 |     print('Predicted R2(coefficient of determination) of %s: %g' % (modelName, R2))
155 |     print('Predicted RMSE(root mean squared error) of %s: %g' % (modelName, RMSE))
156 | 
157 |     # Plot outputs
158 |     plotFileName = sys.argv[4]
159 |     plt.figure()
160 |     plt.scatter(teY, predY,  color='black') # 做测试集的真实Y值vs预测Y值的散点图
161 |     parameter = np.polyfit(teY, predY, 1) # 插入拟合直线
162 |     f = np.poly1d(parameter)
163 |     plt.plot(teY, f(teY), color='blue', linewidth=3)
164 |     plt.xlabel('Observed Y')
165 |     plt.ylabel('Predicted Y')
166 |     plt.title('Prediction performance using %s' % modelName)
167 |     r2text = 'Predicted R2: %g' % R2
168 |     textPosX = min(teY) + 0.2*(max(teY)-min(teY))
169 |     textPosY = max(predY) - 0.2*(max(predY)-min(predY))
170 |     plt.text(textPosX, textPosY, r2text, bbox=dict(edgecolor='red', fill=False, alpha=0.5))
171 |     plt.savefig(plotFileName)
172 | ```
173 | 
174 | ```bash
175 | # PLSR模型：在命令行指定训练集、测试集、最大潜变量个数(e.g.: 20)、图名
176 | $ python3 myPLSR.py ACE_train.txt ACE_test.txt 20 ObsdYvsPredY_PLSR.pdf
177 | ```
178 | 
179 | ## 4. 以SVR完成ACE抑制剂活性预测
180 | ```sh
181 | # 安装tictoc程序计时包
182 | $ pip3 install --user pytictoc -i https://pypi.tuna.tsinghua.edu.cn/simple
183 | ```
184 | * 参考程序：mySVR.py
185 | ```python3
186 | import numpy as np
187 | from sklearn import svm # 导入svm包
188 | import sys
189 | from sklearn import preprocessing # 导入数据预处理包
190 | from sklearn.model_selection import GridSearchCV # 导入参数寻优包
191 | import matplotlib.pyplot as plt
192 | from random import sample
193 | from pytictoc import TicToc
194 | 
195 | def optimise_svm_cv(X, y, kernelFunction, numOfFolds):
196 |     C_range = np.power(2, np.arange(-1, 6, 1.0)) # 指定C的范围
197 |     gamma_range = np.power(2, np.arange(0, -8, -1.0)) # 指定g的范围
198 |     epsilon_range = np.power(2, np.arange(-8, -1, 1.0)) # 指定p的范围
199 |     parameters = dict(gamma=gamma_range, C=C_range, epsilon=epsilon_range) # 将c, g, p组成字典，用于参数的grid遍历
200 |     
201 |     reg = svm.SVR(kernel=kernelFunction) # 创建一个SVR的实例
202 |     grid = GridSearchCV(reg, param_grid=parameters, cv=numOfFolds) # 创建一个GridSearchCV实例
203 |     grid.fit(X, y) # grid寻优c, g, p
204 |     print("The best parameters are %s with a score of %g" % (grid.best_params_, grid.best_score_))
205 |     return grid
206 | 
207 | def do_plot(teY, predY, modelName, plotFileName):
208 |     R2 = 1- sum((teY - predY) ** 2) / sum((teY - teY.mean()) ** 2)
209 |     plt.figure()
210 |     plt.scatter(teY, predY,  color='black') # 做测试集的真实Y值vs预测Y值的散点图
211 |     parameter = np.polyfit(teY, predY, 1) # 插入拟合直线
212 |     f = np.poly1d(parameter)
213 |     plt.plot(teY, f(teY), color='blue', linewidth=3)
214 |     plt.xlabel('Observed Y')
215 |     plt.ylabel('Predicted Y')
216 |     plt.title('Prediction performance using %s' % modelName)
217 |     r2text = 'Predicted R2: %g' % R2
218 |     textPosX = min(teY) + 0.2*(max(teY)-min(teY))
219 |     textPosY = max(predY) - 0.2*(max(predY)-min(predY))
220 |     plt.text(textPosX, textPosY, r2text, bbox=dict(edgecolor='red', fill=False, alpha=0.5))
221 |     plt.savefig(plotFileName)
222 |     print('Plot of prediction performance is save into %s' % plotFileName)
223 |     
224 | if __name__ == '__main__':
225 |     train = np.loadtxt(sys.argv[1], delimiter='\t') # 载入训练集
226 |     test = np.loadtxt(sys.argv[2], delimiter='\t') # 载入测试集
227 |     modelName = 'SVR'
228 | 
229 |     trY = train[:,0]
230 |     trX = train[:,1:]
231 |     teX = test[:,1:]
232 |     teY = test[:,0]
233 |     numX = trX.shape[1]
234 |     if numX > 200:
235 |         randVec = np.array(sample(range(numX), 100)) + 1 # 考虑到特征数较多，SVM运行时间较长，随机抽100个特征用于后续建模
236 |         print('Note: 100 features are randomly selected to speed up modeling')
237 |         trX = train[:,randVec]
238 |         teX = test[:,randVec]
239 |     
240 |     isScale = int(sys.argv[3]) # 建模前，是否将每个特征归一化到[-1,1]
241 |     kernelFunction = sys.argv[4] # {‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’}, default=’rbf’
242 |     numOfFolds = int(sys.argv[5]) # 是否寻找最优参数：c, g, p
243 | 
244 |     if isScale:
245 |         min_max_scaler = preprocessing.MinMaxScaler(feature_range=(-1,1))
246 |         trX = min_max_scaler.fit_transform(trX)
247 |         teX = min_max_scaler.transform(teX)
248 |     
249 |     t = TicToc() # 创建一个TicToc实例
250 |     t.tic()
251 |     if numOfFolds > 2: # 如果k-fold > 2, 则进行参数寻优
252 |         grid = optimise_svm_cv(trX, trY, kernelFunction, numOfFolds)
253 |         print('Time cost in optimising c-g-p: %gs' % t.tocvalue(restart=True))
254 |         bestC = grid.best_params_['C']
255 |         bestGamma = grid.best_params_['gamma']
256 |         bestEpsilon = grid.best_params_['epsilon']
257 |         reg = svm.SVR(kernel=kernelFunction, C=bestC, gamma=bestGamma, epsilon=bestEpsilon)
258 |     else: # 否则不寻优，使用svm默认参数
259 |         reg = svm.SVR(kernel=kernelFunction)
260 |         
261 |     reg.fit(trX, trY) # 训练模型
262 |     print('Time cost in building model: %gs' % t.tocvalue(restart=True))
263 |     predY = reg.predict(teX) # 预测测试集
264 |     print('Time cost in predicting Y of test set: %gs\n' % t.tocvalue(restart=True))
265 | 
266 |     R2 = 1- sum((teY - predY) ** 2) / sum((teY - teY.mean()) ** 2)
267 |     RMSE = np.sqrt(sum((teY - predY) ** 2)/len(teY))
268 |     print('Predicted R2(coefficient of determination) of %s: %g' % (modelName, R2))
269 |     print('Predicted RMSE(root mean squared error) of %s: %g' % (modelName, RMSE))
270 | 
271 |     # Plot outputs
272 |     if len(sys.argv) > 6:
273 |         plotFileName = sys.argv[6]
274 |         do_plot(teY, predY, modelName, plotFileName)
275 |     
276 | ```
277 | 
278 | * SVM运行时间较长，将命令写到脚本中再用qsub提交任务
279 | * work_mySVR.sh
280 | ```bash
281 | #!/bin/bash
282 | #$ -S /bin/bash
283 | #$ -N mySVR
284 | #$ -j y
285 | #$ -cwd
286 | 
287 | # 激活base环境，保证计算节点上正常使用python3
288 | source /opt/miniconda3/bin/activate
289 | conda activate
290 | 
291 | # SVR：在命令行指定训练集、测试集，规格化，线性核，10次交叉寻优，图文件名
292 | echo '------ scale: 1; kernel: linear; numOfCV: 10 --------'
293 | python3 mySVR.py ACE_train.txt ACE_test.txt 1 linear 10 ObsdYvsPredY_SVR_1-linear-cv10.pdf
294 | echo
295 | 
296 | # 规格化，线性核，不参数寻优
297 | echo '------ scale: 1; kernel: linear; numOfCV: 0 --------'
298 | python3 mySVR.py ACE_train.txt ACE_test.txt 1 linear 0 ObsdYvsPredY_SVR_1-linear-noCV.pdf
299 | echo
300 | 
301 | # 规格化，径向基核(rbf)，10次交叉寻优
302 | echo '------ scale: 1; kernel: rbf; numOfCV: 10 --------'
303 | python3 mySVR.py ACE_train.txt ACE_test.txt 1 rbf 10 ObsdYvsPredY_SVR_1-rbf-cv10.pdf
304 | echo
305 | 
306 | # 规格化，径向基核(rbf)，不参数寻优
307 | echo '------ scale: 1; kernel: rbf; numOfCV: 0 --------'
308 | python3 mySVR.py ACE_train.txt ACE_test.txt 1 rbf 0 ObsdYvsPredY_SVR_1-rbf-noCV.pdf
309 | echo
310 | ```
311 | ```
312 | # qsub提交任务
313 | $ qsub work_mySVR.sh
314 | ```
315 | 
316 | * 尝试更多的选项搭配，看精度变化，比如数据不规格化时，各种核函数、是否寻优、不同交叉验证次数等情形的预测精度。
317 | 
318 | ## 作业
319 | 1. 尽量看懂`参考程序`的每一行代码。 <br>
320 | 2. 熟练使用本次实验涉及的sklearn包中3种回归模型。 <br>
321 | 3. 尝试改变测试集比例（如设置为0.4或0.1），看各模型预测精度变化，并将不同比例下的各模型结果整理成表格。  
322 | * （MLR应包含是否对数据标准化、PLSR可设置潜变量上限为10或20、SVR可包含数据不规格化时的结果）。  
323 | 不怕报错，犯错越多，进步越快！
324 | 
325 | ## 参考
326 | * MLR手册：[sklearn.linear_model.LinearRegression](https://scikit-learn.org/stable/modules/linear_model.html#ordinary-least-squares)
327 | * PLSR手册：[sklearn.cross_decomposition.PLSRegression](https://scikit-learn.org/stable/modules/cross_decomposition.html)
328 | * SVR手册：[sklearn.svm.SVR](https://scikit-learn.org/stable/modules/svm.html#regression)
329 | 


--------------------------------------------------------------------------------
/Lab7_FeatureReduction/dimReduction.md:
--------------------------------------------------------------------------------
  1 | # 实验七：特征降维/选择(PCA, MI-filter, SVM-RFE, RF)
  2 | 
  3 | ## 实验目的
  4 | * 1）数据：[实验六](https://github.com/ZhijunBioinf/Pattern-Recognition-and-Prediction/blob/master/Lab6_Regression_MLR-PLSR-SVR/regress1.md)ACE抑制剂的训练集与测试集；基本模型：SVR
  5 | * 2）使用主成分分析(Principal Component Analysis, PCA)进行特征压缩降维，再以SVR建模预测，对比`实验六`的预测结果。
  6 | * 3）Filter: 使用基于互信息的单变量过滤法(Mutual Information-based Filter)进行特征选择，后续同(2)。
  7 | * 4）Wrapper: 使用基于SVM的迭代特征剔除(SVM Recursive Feature Elimination, SVM-RFE)进行特征选择，后续同(2)。
  8 | * 5）Embed: 使用随机森林(Random Forest, RF)进行特征选择（用于剪接位点识别），再以SVC建模预测。
  9 | 
 10 | ## 准备工作目录与数据
 11 | ```bash
 12 | $ mkdir lab_07
 13 | $ cd lab_07
 14 | # 对lab_06路径中的ACE抑制剂训练集与测试集建立软链接
 15 | $ ln -s ../lab_06/ACE_train.txt
 16 | $ ln -s ../lab_06/ACE_test.txt
 17 | 
 18 | # 若python3不可用，需先激活base环境
 19 | $ source /opt/miniconda3/bin/activate
 20 | $ conda activate
 21 | ```
 22 | 
 23 | ## 1. 使用PCA进行特征压缩降维，再以保留主成分建立SVR模型
 24 | * 参考程序：myPCA.py
 25 | ```python3
 26 | import sys
 27 | import numpy as np
 28 | from sklearn.decomposition import PCA # 导入PCA包
 29 | 
 30 | trainFile = sys.argv[1]
 31 | testFile = sys.argv[2]
 32 | train = np.loadtxt(trainFile, delimiter='\t') # 载入训练集
 33 | test = np.loadtxt(testFile, delimiter='\t') # 载入测试集
 34 | trX = train[:,1:]
 35 | trY = train[:,0]
 36 | teX = test[:,1:]
 37 | teY = test[:,0]
 38 | 
 39 | percentVar = float(sys.argv[3]) # 主成分累计解释的百分比
 40 | pca = PCA(n_components = percentVar, svd_solver = 'full') # 创建一个PCA实例
 41 | trX = pca.fit_transform(trX)
 42 | teX = pca.transform(teX)
 43 | 
 44 | print('Number of principal components: %d' % trX.shape[1])
 45 | newTrainFile = sys.argv[4]
 46 | newTestFile = sys.argv[5]
 47 | np.savetxt(newTrainFile, np.hstack((trY.reshape(-1,1), trX)), fmt='%g', delimiter='\t') # 将Y与X以列组合后，保存到文件
 48 | np.savetxt(newTestFile, np.hstack((teY.reshape(-1,1), teX)), fmt='%g', delimiter='\t')
 49 | print('New training set is saved into: %s\nNew test set is saved into: %s' % (newTrainFile, newTestFile))
 50 | ```
 51 | 
 52 | ```bash
 53 | # 设置主成分累计解释的百分比为95%
 54 | $ python3 myPCA.py ACE_train.txt ACE_test.txt 0.95 train_pca.txt test_pca.txt
 55 | # 以SVR建模预测：规格化、rbf核、10次交叉寻优
 56 | $ python3 ../lab_06/mySVR.py train_pca.txt test_pca.txt 1 rbf 10 ObsdYvsPredY_PCA_SVR.pdf
 57 | ```
 58 | 
 59 | ## 2. 使用MI-filter进行特征选择，再以保留变量建立SVR模型
 60 | * 参考程序：myMIFilter.py
 61 | ```python3
 62 | import sys
 63 | import numpy as np
 64 | from sklearn.feature_selection import SelectPercentile, mutual_info_regression # 导入MI for regression包
 65 | 
 66 | trainFile = sys.argv[1]
 67 | testFile = sys.argv[2]
 68 | train = np.loadtxt(trainFile, delimiter='\t') # 载入训练集
 69 | test = np.loadtxt(testFile, delimiter='\t') # 载入测试集
 70 | trX = train[:,1:]
 71 | trY = train[:,0]
 72 | teX = test[:,1:]
 73 | teY = test[:,0]
 74 | 
 75 | percentile = int(sys.argv[3]) # Percent of features to keep
 76 | selector = SelectPercentile(mutual_info_regression, percentile=percentile) # 创建一个基于MI的SelectPercentile实例
 77 | trX = selector.fit_transform(trX, trY)
 78 | teX = selector.transform(teX)
 79 | 
 80 | newTrainFile = sys.argv[4]
 81 | newTestFile = sys.argv[5]
 82 | np.savetxt(newTrainFile, np.hstack((trY.reshape(-1,1), trX)), fmt='%g', delimiter='\t') # 将Y与X以列组合后，保存到文件
 83 | np.savetxt(newTestFile, np.hstack((teY.reshape(-1,1), teX)), fmt='%g', delimiter='\t')
 84 | print('%d features are selected.' % trX.shape[1])
 85 | print('New training set is saved into: %s\nNew test set is saved into: %s' % (newTrainFile, newTestFile))
 86 | ```
 87 | 
 88 | ```bash
 89 | # 保留5%的特征
 90 | $ python3 myMIFilter.py ACE_train.txt ACE_test.txt 5 train_mi.txt test_mi.txt
 91 | # 以SVR建模预测：规格化、rbf核、10次交叉寻优
 92 | $ python3 ../lab_06/mySVR.py train_mi.txt test_mi.txt 1 rbf 10 ObsdYvsPredY_MI_SVR.pdf
 93 | ```
 94 | 
 95 | ## 3. 使用SVM-RFE进行特征选择，再以保留变量建立SVR模型
 96 | * 参考程序：mySVMRFE.py
 97 | ```python3
 98 | import sys
 99 | import numpy as np
100 | from sklearn.feature_selection import RFE # 导入RFE包
101 | from sklearn.svm import SVR # 导入SVR包
102 | from pytictoc import TicToc
103 | from random import sample
104 | 
105 | trainFile = sys.argv[1]
106 | testFile = sys.argv[2]
107 | train = np.loadtxt(trainFile, delimiter='\t') # 载入训练集
108 | test = np.loadtxt(testFile, delimiter='\t') # 载入测试集
109 | trX = train[:,1:]
110 | trY = train[:,0]
111 | teX = test[:,1:]
112 | teY = test[:,0]
113 | 
114 | numX = trX.shape[1]
115 | if numX > 200:
116 |     randVec = np.array(sample(range(numX), 100)) + 1 # 考虑到特征数较多，SVM运行时间较长，随机抽100个特征用于后续建模
117 |     print('Note: 100 features are randomly selected to speed up modeling')
118 |     trX = train[:,randVec]
119 |     teX = test[:,randVec]
120 |     
121 | n_features = int(sys.argv[3])
122 | estimator = SVR(kernel="linear")
123 | t = TicToc()
124 | t.tic()
125 | selector = RFE(estimator, n_features_to_select=n_features, step=1) # 创建一个RFECV实例，每次删1个特征
126 | trX = selector.fit_transform(trX, trY)
127 | teX = selector.transform(teX)
128 | print('Time cost in selecting fetures with SVM-RFE: %gs' % t.tocvalue())
129 | 
130 | newTrainFile = sys.argv[4]
131 | newTestFile = sys.argv[5]
132 | np.savetxt(newTrainFile, np.hstack((trY.reshape(-1,1), trX)), fmt='%g', delimiter='\t') # 将Y与X以列组合后，保存到文件
133 | np.savetxt(newTestFile, np.hstack((teY.reshape(-1,1), teX)), fmt='%g', delimiter='\t')
134 | print('%d features are selected.' % trX.shape[1])
135 | print('New training set is saved into: %s\nNew test set is saved into: %s' % (newTrainFile, newTestFile))
136 | ```
137 | 
138 | * SVM-RFE运行时间较长，将命令写到脚本中再用qsub提交任务。Note: 命令脚本中也需要先激活base环境，才可用python3
139 | * work_mySVMRFE.sh
140 | ```bash
141 | #!/bin/bash
142 | #$ -S /bin/bash
143 | #$ -N mySVMRFE
144 | #$ -j y
145 | #$ -cwd
146 | 
147 | # 激活base环境，保证计算节点上正常使用python3
148 | source /opt/miniconda3/bin/activate
149 | conda activate
150 | 
151 | # 设置保留10个特征
152 | python3 mySVMRFE.py ACE_train.txt ACE_test.txt 10 train_svmrfe.txt test_svmrfe.txt
153 | ```
154 | ```bash
155 | # 提交任务
156 | $ qsub work_mySVMRFE.sh
157 | ```
158 | ```bash
159 | # 以SVR建模预测：规格化、rbf核、10次交叉寻优
160 | $ python3 ../lab_06/mySVR.py train_svmrfe.txt test_svmrfe.txt 1 rbf 10 ObsdYvsPredY_RFE_SVR.pdf
161 | ```
162 | 
163 | ## 4. 使用Random Forest进行特征选择，再以保留变量建立SVC模型
164 | ```bash
165 | # RF只能用于分类问题，对lab_03路径中的剪接位点训练集与测试集建立软链接
166 | $ ln -s ../lab_03/EI_train.txt
167 | $ ln -s ../lab_03/EI_test.txt
168 | ```
169 | 
170 | * 参考程序：myRandomForest.py
171 | ```python3
172 | import sys
173 | import numpy as np
174 | from sklearn.feature_selection import SelectFromModel # 导入SelectFromModel包
175 | from sklearn.ensemble import RandomForestClassifier # 导入RF包
176 | from pytictoc import TicToc
177 | 
178 | trainFile = sys.argv[1]
179 | testFile = sys.argv[2]
180 | train = np.loadtxt(trainFile, delimiter=',') # 载入训练集
181 | test = np.loadtxt(testFile, delimiter=',') # 载入测试集
182 | trX = train[:,1:]
183 | trY = train[:,0]
184 | teX = test[:,1:]
185 | teY = test[:,0]
186 | 
187 | n_features = sys.argv[3]
188 | t = TicToc()
189 | t.tic()
190 | clf = RandomForestClassifier(max_depth=2, random_state=0, max_features=n_features) # 创建一个RF实例
191 | clf = clf.fit(trX, trY)
192 | selector = SelectFromModel(clf, prefit=True) # 创建一个SelectFromModel实例
193 | trX = selector.transform(trX)
194 | teX = selector.transform(teX)
195 | print('Time cost in selecting fetures with Random-Forest: %gs' % t.tocvalue())
196 | 
197 | newTrainFile = sys.argv[4]
198 | newTestFile = sys.argv[5]
199 | np.savetxt(newTrainFile, np.hstack((trY.reshape(-1,1), trX)), fmt='%g', delimiter=',') # 将Y与X以列组合后，保存到文件
200 | np.savetxt(newTestFile, np.hstack((teY.reshape(-1,1), teX)), fmt='%g', delimiter=',')
201 | print('%d features are selected.' % trX.shape[1])
202 | print('New training set is saved into: %s\nNew test set is saved into: %s' % (newTrainFile, newTestFile))
203 | ```
204 | 
205 | ```bash
206 | # max_features (looking for the best split): set as 'auto'
207 | $ python3 myRandomForest.py EI_train.txt EI_test.txt auto EI_train_rf.txt EI_test_rf.txt
208 | # 以SVR建模预测：规格化、rbf核、10次交叉寻优
209 | $ python3 ../lab_04/mySVC.py EI_train_rf.txt EI_test_rf.txt 1 rbf 1 10
210 | ```
211 | 
212 | 
213 | ## 作业
214 | 1. 尽量看懂`参考程序`的每一行代码。 <br>
215 | 2. 熟练使用sklearn包中的不同特征降维/选择方法。 <br>
216 | 3. 使用不同特征选择方法结合不同回归模型（或分类器）对ACE抑制肽（实验五）、剪接位点（实验三）数据进行建模预测。  
217 | 不怕报错，犯错越多，进步越快！
218 | 
219 | ## 参考
220 | * PCA手册：[sklearn.decomposition.PCA](https://scikit-learn.org/stable/modules/decomposition.html#principal-component-analysis-pca)
221 | * MI手册：[sklearn.feature_selection.mutual_info_regression](https://scikit-learn.org/stable/modules/feature_selection.html#univariate-feature-selection)
222 | * RFE手册：[sklearn.feature_selection.RFE](https://scikit-learn.org/stable/modules/feature_selection.html#recursive-feature-elimination)
223 | * RandomForest: [sklearn.ensemble.RandomForestClassifier](https://scikit-learn.org/stable/modules/feature_selection.html#tree-based-feature-selection)
224 | 


--------------------------------------------------------------------------------
/Lab8_UnsupervisedLearning/Clustering.md:
--------------------------------------------------------------------------------
  1 | # 实验八：无监督学习之聚类分析(K-Means, Hierarchical clustering)
  2 | 
  3 | ## 实验目的
  4 | * 1）使用K-means完成聚类分析。
  5 | * 2）使用Hierarchical clustering完成聚类分析。
  6 | * 3）理解每种聚类方法在不同参数下的聚类表现。
  7 | 
  8 | ## 准备工作目录
  9 | ```bash
 10 | $ mkdir lab_08
 11 | $ cd lab_08
 12 | 
 13 | # 若python3不可用，需先激活base环境
 14 | $ source /opt/miniconda3/bin/activate
 15 | $ conda activate
 16 | ```
 17 | 
 18 | ## 背景
 19 | * [无监督学习](https://en.wikipedia.org/wiki/Unsupervised_learning): is a type of machine learning that looks for previously undetected patterns in a data set with no pre-existing labels and with a minimum of human supervision. In contrast to supervised learning that usually makes use of human-labeled data, unsupervised learning, also known as self-organization allows for modeling of probability densities over inputs. It forms one of the three main categories of machine learning, along with supervised and reinforcement learning. Semi-supervised learning, a related variant, makes use of supervised and unsupervised techniques.
 20 | * [聚类分析](https://en.wikipedia.org/wiki/Cluster_analysis): is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including pattern recognition, image analysis, information retrieval, bioinformatics, data compression, computer graphics and machine learning.
 21 | * 经典聚类模型包括但不限于：
 22 | > * Centroid models: [K-means](https://en.wikipedia.org/wiki/K-means_clustering), k-means++, [Mean Shift](https://scikit-learn.org/stable/modules/clustering.html#mean-shift), etc.
 23 | > * Connectivity models: [hierarchical clustering](https://en.wikipedia.org/wiki/Hierarchical_clustering), [BIRCH(Balanced Iterative Reducing and Clustering using Hierarchies)](https://en.wikipedia.org/wiki/BIRCH), etc.
 24 | > * Density models: [DBSCAN(Density-Based Spatial Clustering of Applications with Noise)](https://en.wikipedia.org/wiki/DBSCAN), [OPTICS(Ordering Points To Identify the Clustering Structure)](https://en.wikipedia.org/wiki/OPTICS), etc.
 25 | 
 26 | ## 1. 使用K-means完成`手写数字`聚类分析
 27 | * 参考程序：myKMeansDigits.py
 28 | ```python3
 29 | import sys
 30 | from time import time # 函数的计时包
 31 | import numpy as np
 32 | import matplotlib.pyplot as plt
 33 | from sklearn import metrics # 聚类效果指标包
 34 | from sklearn.cluster import KMeans # KMeans包
 35 | from sklearn.datasets import load_digits # size:1797x64, 10类，每个样本为1个手写数字的8x8维度的图
 36 | from sklearn.decomposition import PCA
 37 | from sklearn.preprocessing import scale
 38 | 
 39 | """
 40 | Cluster quality metrics evaluated (see :ref:`clustering_evaluation` for
 41 | definitions and discussions of the metrics):
 42 | Shorthand    full name
 43 | =========== ========================================================
 44 | homo         homogeneity score
 45 | compl        completeness score
 46 | ARI          adjusted Rand index
 47 | =========== ========================================================
 48 | """
 49 | 
 50 | def bench_k_means(estimator, name, data, labels):
 51 |     t0 = time()
 52 |     estimator.fit(data)
 53 |     t1 = time() - t0
 54 |     inertia_ =  estimator.inertia_ # 每个样本到最近聚类中心（质心）的距离的平方和
 55 |     homo = metrics.homogeneity_score(labels, estimator.labels_) # 由真实labels和估计的labels计算出homo, 下同
 56 |     compl = metrics.completeness_score(labels, estimator.labels_)
 57 |     ARI = metrics.adjusted_rand_score(labels, estimator.labels_)
 58 |     print('%-9s\t%.2fs\t%i\t%.3f\t%.3f\t%.3f' % (name, t1, inertia_, homo, compl, ARI))
 59 |     
 60 | def do_plot(data, n_digits, plotFileName): # 基于PCA的结果作图
 61 |     reduced_data = PCA(n_components=2).fit_transform(data) # 保留2个主成分
 62 |     kmeans = KMeans(init='k-means++', n_clusters=n_digits, n_init=10) # 创建一个KMeans实例
 63 |     kmeans.fit(reduced_data)
 64 | 
 65 |     x_min, x_max = reduced_data[:, 0].min() - 1, reduced_data[:, 0].max() + 1
 66 |     y_min, y_max = reduced_data[:, 1].min() - 1, reduced_data[:, 1].max() + 1
 67 |     xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.02), np.arange(y_min, y_max, 0.02)) # 网格图
 68 |     Z = kmeans.predict(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape) # 获得网格中每个样本点的label.
 69 | 
 70 |     plt.figure(1)
 71 |     plt.clf()
 72 |     plt.imshow(Z, interpolation='nearest', extent=(xx.min(), xx.max(), yy.min(), yy.max()),
 73 |                cmap=plt.cm.Paired, aspect='auto', origin='lower')
 74 |     plt.plot(reduced_data[:, 0], reduced_data[:, 1], 'k.', markersize=2)
 75 |     centroids = kmeans.cluster_centers_     # 得到质心，作图，白色叉标记
 76 |     plt.scatter(centroids[:, 0], centroids[:, 1], marker='x', s=169, linewidths=3, color='w', zorder=10)
 77 |     plt.title('K-means clustering on the digits dataset (PCA-reduced data)\n'
 78 |               'Centroids are marked with white cross')
 79 |     plt.xlim(x_min, x_max)
 80 |     plt.ylim(y_min, y_max)
 81 |     plt.xticks(())
 82 |     plt.yticks(())
 83 |     plt.savefig(plotFileName)
 84 |     print('Plot of K-means clustering performance is save into %s' % plotFileName)
 85 | 
 86 | if __name__ == '__main__':
 87 |     np.random.seed(42)
 88 |     X_digits, y_digits = load_digits(return_X_y=True)
 89 |     data = scale(X_digits) # 数据标准化处理
 90 |     n_samples, n_features = data.shape
 91 |     n_digits = len(np.unique(y_digits))
 92 |     labels = y_digits
 93 |     
 94 |     print("n_digits: %d, \t n_samples: %d, \t n_features: %d" % (n_digits, n_samples, n_features))
 95 |     print(82 * '_') # 打印横线
 96 |     print('init\t\ttime\tinertia\thomo\tcompl\tARI')
 97 |     estimator = KMeans(init='k-means++', n_clusters=n_digits, n_init=10) # 创建一个k-means++的实例
 98 |     bench_k_means(estimator, name="k-means++", data=data, labels=labels) # 聚类拟合数据，打印信息
 99 |     
100 |     estimator = KMeans(init='random', n_clusters=n_digits, n_init=10) # 创建一个经典k-means的实例
101 |     bench_k_means(estimator, name="random", data=data, labels=labels) # 聚类拟合数据，打印信息
102 |     
103 |     pca = PCA(n_components=n_digits).fit(data) # 主成分分析
104 |     estimator = KMeans(init=pca.components_, n_clusters=n_digits, n_init=1) # 传递主成分，质心确定，运行次数n_init设为1
105 |     bench_k_means(estimator, name="PCA-based", data=data, labels=labels) # 聚类拟合数据，打印信息
106 |     print(82 * '_') # 打印横线
107 |     
108 |     do_plot(data, n_digits, 'K-means_clustering.pdf')
109 |     
110 | ```
111 | 
112 | ```bash
113 | $ python3 myKMeansDigits.py
114 | ```
115 | 
116 | ## 2. 使用Hierarchical clustering完成`手写数字`聚类分析
117 | * 参考程序：myHClusteringDigits.py
118 | ```python3
119 | from time import time
120 | import numpy as np
121 | from scipy import ndimage # 导入图形处理包
122 | from matplotlib import pyplot as plt
123 | from sklearn import manifold, datasets # 导入数据降维包manifold，数据集包datasets
124 | from sklearn.cluster import AgglomerativeClustering # 导入HClustering包
125 | from sklearn import metrics
126 | 
127 | def plot_clustering(X_red, y, labels, title, plotFileName): # 聚类结果可视化
128 |     x_min, x_max = np.min(X_red, axis=0), np.max(X_red, axis=0)
129 |     X_red = (X_red - x_min) / (x_max - x_min)
130 | 
131 |     plt.figure(figsize=(6, 4))
132 |     for i in range(X_red.shape[0]):
133 |         plt.text(X_red[i, 0], X_red[i, 1], str(y[i]),
134 |                  color=plt.cm.nipy_spectral(labels[i] / 10.),
135 |                  fontdict={'weight': 'bold', 'size': 9})
136 | 
137 |     plt.xticks([])
138 |     plt.yticks([])
139 |     plt.title(title, size=17)
140 |     plt.axis('off')
141 |     plt.tight_layout(rect=[0, 0.03, 1, 0.95])
142 |     plt.savefig(plotFileName)
143 |     print("Plot of hierarchical clustering performance using '%s' is save into '%s'\n" % (title, plotFileName))
144 | 
145 | if __name__ == '__main__':
146 |     X, y = datasets.load_digits(return_X_y=True) # 导入手写数字数据集
147 |     n_samples, n_features = X.shape
148 |     np.random.seed(0)
149 | 
150 |     shift = lambda x: ndimage.shift(x.reshape((8, 8)), 0.3*np.random.normal(size=2)).ravel() # Shift an array
151 |     X = np.concatenate([X, np.apply_along_axis(shift, 1, X)]) # 2倍扩充X，便于更好显示聚类结果
152 |     y = np.concatenate([y, y], axis=0) # 2倍扩充y
153 | 
154 |     print("Spectral embedding for non-linear dimensionality reduction")
155 |     X_red = manifold.SpectralEmbedding(n_components=2).fit_transform(X) # 非线性降维，使用2个主成分
156 |     print("Done!\n")
157 | 
158 |     for linkage in ('ward', 'average', 'complete', 'single'): # 使用HC聚类算法的不同连接指标
159 |         estimator = AgglomerativeClustering(linkage=linkage, n_clusters=10) # 创建一个HClustering的实例
160 |         t0 = time()
161 |         estimator.fit(X_red)
162 |         homo = metrics.homogeneity_score(y, estimator.labels_) # 由真实labels和估计的labels计算出homo, 下同
163 |         compl = metrics.completeness_score(y, estimator.labels_)
164 |         ARI = metrics.adjusted_rand_score(y, estimator.labels_)
165 |         print("## Use '%s' linkage criterion, time = %.2fs. homoScore: %g, complScore: %g, ARI: %g" % 
166 |             (linkage, time()-t0, homo, compl, ARI))
167 | 
168 |         title = "%s linkage" % linkage
169 |         plotFileName = "hierarchicalClustering_%sLinkage.pdf" % linkage
170 |         plot_clustering(X_red, y, estimator.labels_, title, plotFileName)
171 |     
172 | ```
173 | 
174 | ```bash
175 | $ python3 myHClusteringDigits.py
176 | ```
177 | 
178 | 
179 | ## 作业
180 | 1. 尽量看懂`参考程序`的每一行代码。 <br>
181 | 2. 熟练使用K-means, Hierarchical clustering完成聚类分析。 <br>
182 | 不怕报错，犯错越多，进步越快！
183 | 
184 | ## 参考
185 | * K-means手册：[sklearn.cluster.KMeans](https://scikit-learn.org/stable/modules/clustering.html#k-means)
186 | * Hierarchical clustering手册：[sklearn.cluster.AgglomerativeClustering](https://scikit-learn.org/stable/modules/clustering.html#hierarchical-clustering)
187 | 
188 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # 机器学习与数据挖掘-实验（基于Python）
 2 | 
 3 | ## 实验一：[Python快速入门](./Lab1_PythonLearning/PythonLearning.md)
 4 | 
 5 | ## 实验二：[序列表征/数值化1(以剪接位点识别为例)](./Lab2_SplicingSequencesCoding/sequence_coding.md)
 6 | 
 7 | ## 实验三：[分类器之k近邻、逻辑斯蒂回归、决策树](./Lab3_Classifiers_KNN-LR-DT/classifiers1.md)
 8 | 
 9 | ## 实验四：[分类器之朴素贝叶斯、支持向量分类](./Lab4_Classifiers_Bayes-SVM/classifiers2.md)
10 | 
11 | ## 实验五：[序列表征/数值化2(以定量构效关系建模为例)](./Lab5_PeptideSequencesCoding/sequence_coding2.md)
12 | 
13 | ## 实验六：[回归模型之多元线性回归、偏最小二乘回归、支持向量回归](./Lab6_Regression_MLR-PLSR-SVR/regress1.md)
14 | 
15 | ## 实验七：[特征降维/选择](./Lab7_FeatureReduction/dimReduction.md)
16 | 
17 | ## 实验八：[无监督学习之聚类分析](./Lab8_UnsupervisedLearning/Clustering.md)
18 | 
19 | 


--------------------------------------------------------------------------------