├── readme.md ├── 第七章 时间序列分析 ├── 7-1.xlsx ├── 7-3.xlsx ├── 7-4 .xlsx ├── 7-5.xlsx ├── 7-7.xlsx └── 第七章 时间序列分析.ipynb ├── 第三章 概率计算与随机抽样 └── 第三章 概率计算与随机抽样.ipynb ├── 第二章 描述性统计分析 ├── 2-1.xlsx ├── 2-10.xlsx ├── 2-2.xlsx ├── 2-7.xlsx ├── 2-8.xlsx └── 第二章 描述性统计分析.ipynb ├── 第五章 方差分析 ├── 5-1.xlsx ├── 5-2.xlsx ├── 5-3.xlsx └── 第五章 方差分析.ipynb ├── 第六章 相关与回归分析 ├── 6-1.xlsx ├── 6-5.xlsx ├── 6-6.xlsx └── 第六章 相关与回归分析.ipynb └── 第四章 参数估计与假设检验 ├── 4-10.xlsx ├── 4-13.xlsx ├── 4-14.xlsx ├── 4-15.xlsx ├── 4-16.xlsx ├── 4-17.xlsx ├── 4-18.xlsx ├── 4-19.xlsx ├── 4-2.xlsx ├── 4-6.xlsx ├── 4-7.xlsx ├── 4-9.xlsx └── 第四章 参数估计与假设检验.ipynb /readme.md: -------------------------------------------------------------------------------- 1 | # 《统计学原理实验教程(Python)》书中代码实现 2 | 3 | ## 简介 4 | 5 | 《统计学原理实验教程(Python)》是厦门大学出版社在2019年出版的图书,主要通过python来实现统计学中最基本的检验方法。个人觉得本书内容较为基础,适合初学者进行学习。结构体系化,由浅入深。是一本很好的入门教科书。 6 | 7 | 个人认为本书适合,具有基本的统计学知识且以python作为数据分析工具的人进行入门,了解如何通过python实现统计学中最基本的一些方法。 8 | 9 | 这本书比较薄,其中,回归分析和时间序列分析的篇幅较小,介绍较浅,这部分可以自行深入了解。 10 | 11 | 感谢本书的编著者! 12 | 13 | ## 注意 14 | 15 | 有些数据内容我可能会有所改动,但不影响什么。 16 | 17 | 尽可能加上注释,力求清晰易读,增强代码可复用性。 18 | 19 | 个人建议在阅读之前可以复习下相关基础概念,便于更好地理解。 20 | 21 | ## 目录 22 | ### 第二章 描述性统计分析 23 | 24 | 第一节 分布数列 25 | 26 | 实验2-1 连续变量分布数列的编制 27 | 28 | 实验2-2 离散变量分布数列的编制 29 | 30 | 第二节 统计图 31 | 32 | 第三节 描述统计量 33 | 34 | 实验2-7 计算描述统计量 35 | 36 | 实验2-8 使用分类汇总计算描述统计量 37 | 38 | 实验2-9 使用数据透视表方法计算描述统计量 39 | 40 | 实验2-10 计算分组资料的描述统计量 41 | 42 | ### 第三章 概率计算与随机抽样 43 | 44 | 第一节 概率计算 45 | 46 | 实验3-1 二项分布概率的计算 47 | 48 | 实验3-2 泊松分布概率计算 49 | 50 | 实验3-3 超几何分布概率计算 51 | 52 | 实验3-4 正态分布概率计算 53 | 54 | 实验3-5 卡方分布概率计算 55 | 56 | 实验3-6 t分布概率计算 57 | 58 | 实验3-7 F分布概率计算 59 | 60 | 实验3-8 排列组合与阶乘函数计算概率 61 | 62 | 实验3-9 概率密度函数图的绘制 63 | 64 | 第二节 随机抽样 65 | 66 | 实验3-10 使用随机数发生器创建随机数 67 | 68 | 实验3-11 使用随机数函数创建随机数 69 | 70 | 实验3-12 正态分布的模拟 71 | 72 | 实验3-13 随机抽样 73 | 74 | ### 第四章 参数估计与假设检验 75 | 76 | 第一节 参数估计 77 | 78 | 实验4-1 总体均值的区间估计:大样本 79 | 80 | 实验4-2 总体均值的区间估计:小样本 81 | 82 | 实验4-3 总体成数的估计 83 | 84 | 实验4-4 总体方差的估计 85 | 86 | 第二节 参数检验 87 | 88 | 实验4-5 单一总体均值检验:大样本 89 | 90 | 实验4-6 单一总体均值检验:正态总体,方差已知 91 | 92 | 实验4-7 单一总体均值检验:正态总体,方差未知 93 | 94 | 实验4-8 两个总体的均值检验:总体方差未知,大样本 95 | 96 | 实验4-8 两个总体的均值检验:总体方差未知,大样本 97 | 98 | 实验4-10 配对样本t检验 99 | 100 | 实验4-11 单一总体成数的假设检验 101 | 102 | 实验4-12 两个总体的成数检验 103 | 104 | 实验4-13 单一总体方差的假设检验 105 | 106 | 实验4-14 两个总体的方差检验 107 | 108 | 第三节 非参数检验 109 | 110 | 实验4-15 卡方检验 111 | 112 | 实验4-16 单样本符号检验 113 | 114 | 实验4-17 配对样本的符号检验 115 | 116 | 实验4-18 秩和检验 117 | 118 | 实验4-19 游程检验 119 | 120 | ### 第五章 方差分析 121 | 122 | 第一节 单因素方差分析 123 | 124 | 实验5-1 单因素方差分析 125 | 126 | 第二节 双因素方差分析 127 | 128 | 实验5-2 无交互作用的双因素方差分析 129 | 130 | 实验5-3 有交互作用的双因素方差分析 131 | 132 | ### 第六章 相关与回归分析 133 | 134 | 第一节 相关分析 135 | 136 | 实验6-1 计算协方差 137 | 138 | 实验6-2 计算相关系数 139 | 140 | 实验6-3 绘制相关图 141 | 142 | 第二节 回归分析 143 | 144 | 实验6-4 一元线性回归分析与预测 145 | 146 | 实验6-5 多元线性回归分析和预测 147 | 148 | 实验6-6 非线性回归分析 149 | 150 | ### 第七章 时间序列分析 151 | 152 | 第一节 平均发展速度 153 | 154 | 实验7-1 基于几何法的平均发展速度 155 | 156 | 实验7-2 基于方程式法的平均发展速度 157 | 158 | 第二节 长期趋势的测定 159 | 160 | 实验7-3 移动平均测定长期趋势 161 | 162 | 实验7-4 趋势模型法测定长期趋势 163 | 164 | 第三节 季节变动与循环变动的测定 165 | 166 | 实验7-5 季节变动的测定 167 | 168 | 实验7-6 循环变动的测定 169 | 170 | 第四节 时间序列预测 171 | 172 | 实验7-7 时间序列模型的预测 173 | -------------------------------------------------------------------------------- /第七章 时间序列分析/7-1.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AirFin/Statistics_Python_Codes/684c651552e0012d085e72301c4fd210748e4a93/第七章 时间序列分析/7-1.xlsx -------------------------------------------------------------------------------- /第七章 时间序列分析/7-3.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AirFin/Statistics_Python_Codes/684c651552e0012d085e72301c4fd210748e4a93/第七章 时间序列分析/7-3.xlsx -------------------------------------------------------------------------------- /第七章 时间序列分析/7-4 .xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AirFin/Statistics_Python_Codes/684c651552e0012d085e72301c4fd210748e4a93/第七章 时间序列分析/7-4 .xlsx -------------------------------------------------------------------------------- /第七章 时间序列分析/7-5.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AirFin/Statistics_Python_Codes/684c651552e0012d085e72301c4fd210748e4a93/第七章 时间序列分析/7-5.xlsx -------------------------------------------------------------------------------- /第七章 时间序列分析/7-7.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AirFin/Statistics_Python_Codes/684c651552e0012d085e72301c4fd210748e4a93/第七章 时间序列分析/7-7.xlsx -------------------------------------------------------------------------------- /第三章 概率计算与随机抽样/第三章 概率计算与随机抽样.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 第一节 概率计算" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": null, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "这一节主要介绍各种概率分布:二项分布、泊松分布、正态分布等等。具体概念这里不过多赘述,直接拿案例进行实验。" 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "metadata": {}, 27 | "source": [ 28 | "## 实验3-1 二项分布概率的计算" 29 | ] 30 | }, 31 | { 32 | "cell_type": "raw", 33 | "metadata": {}, 34 | "source": [ 35 | "利用scipy统计模块计算二项分布概率\n", 36 | "\n", 37 | "假设有6个顾客进入某服装店,任一顾客购买的概率是0.30。计算下列事件的概率\n", 38 | "1.恰有4个顾客购买\n", 39 | "2.购买的顾客不超过一半\n", 40 | "3.至少有1个顾客购买" 41 | ] 42 | }, 43 | { 44 | "cell_type": "code", 45 | "execution_count": 3, 46 | "metadata": {}, 47 | "outputs": [], 48 | "source": [ 49 | "from scipy import stats" 50 | ] 51 | }, 52 | { 53 | "cell_type": "code", 54 | "execution_count": 4, 55 | "metadata": {}, 56 | "outputs": [], 57 | "source": [ 58 | "n = 6\n", 59 | "p = 0.3" 60 | ] 61 | }, 62 | { 63 | "cell_type": "code", 64 | "execution_count": 5, 65 | "metadata": {}, 66 | "outputs": [ 67 | { 68 | "data": { 69 | "text/plain": [ 70 | "0.05953499999999999" 71 | ] 72 | }, 73 | "execution_count": 5, 74 | "metadata": {}, 75 | "output_type": "execute_result" 76 | } 77 | ], 78 | "source": [ 79 | "# 1.恰有4个顾客购买\n", 80 | "k = 4\n", 81 | "prob = stats.binom.pmf(k,n,p)\n", 82 | "prob" 83 | ] 84 | }, 85 | { 86 | "cell_type": "code", 87 | "execution_count": 6, 88 | "metadata": {}, 89 | "outputs": [ 90 | { 91 | "data": { 92 | "text/plain": [ 93 | "0.92953" 94 | ] 95 | }, 96 | "execution_count": 6, 97 | "metadata": {}, 98 | "output_type": "execute_result" 99 | } 100 | ], 101 | "source": [ 102 | "# 2.购买的顾客不超过一半\n", 103 | "k = 3\n", 104 | "prob = stats.binom.cdf(k,n,p)\n", 105 | "prob" 106 | ] 107 | }, 108 | { 109 | "cell_type": "code", 110 | "execution_count": 7, 111 | "metadata": {}, 112 | "outputs": [ 113 | { 114 | "data": { 115 | "text/plain": [ 116 | "0.882351" 117 | ] 118 | }, 119 | "execution_count": 7, 120 | "metadata": {}, 121 | "output_type": "execute_result" 122 | } 123 | ], 124 | "source": [ 125 | "# 3.至少有1个顾客购买,相当于1-没有买的的概率\n", 126 | "k = 0\n", 127 | "prob = 1 - stats.binom.cdf(k,n,p)\n", 128 | "prob" 129 | ] 130 | }, 131 | { 132 | "cell_type": "markdown", 133 | "metadata": {}, 134 | "source": [ 135 | "## 实验3-2 泊松分布概率计算" 136 | ] 137 | }, 138 | { 139 | "cell_type": "markdown", 140 | "metadata": {}, 141 | "source": [ 142 | "利用scipy统计模块计算泊松分布概率\n", 143 | "\n", 144 | "某航空公司的订票处每60分钟有48次电话。求5分钟内接到3次电话的概率。" 145 | ] 146 | }, 147 | { 148 | "cell_type": "code", 149 | "execution_count": 8, 150 | "metadata": {}, 151 | "outputs": [], 152 | "source": [ 153 | "from scipy import stats" 154 | ] 155 | }, 156 | { 157 | "cell_type": "code", 158 | "execution_count": 10, 159 | "metadata": {}, 160 | "outputs": [ 161 | { 162 | "data": { 163 | "text/plain": [ 164 | "4.0" 165 | ] 166 | }, 167 | "execution_count": 10, 168 | "metadata": {}, 169 | "output_type": "execute_result" 170 | } 171 | ], 172 | "source": [ 173 | "x = 3\n", 174 | "mu =(48/60)*5\n", 175 | "mu\n", 176 | "# 注意mu的计算" 177 | ] 178 | }, 179 | { 180 | "cell_type": "code", 181 | "execution_count": 11, 182 | "metadata": {}, 183 | "outputs": [ 184 | { 185 | "data": { 186 | "text/plain": [ 187 | "0.19536681481316454" 188 | ] 189 | }, 190 | "execution_count": 11, 191 | "metadata": {}, 192 | "output_type": "execute_result" 193 | } 194 | ], 195 | "source": [ 196 | "prob = stats.poisson.pmf(x,mu)\n", 197 | "prob" 198 | ] 199 | }, 200 | { 201 | "cell_type": "markdown", 202 | "metadata": {}, 203 | "source": [ 204 | "## 超几何分布概率计算" 205 | ] 206 | }, 207 | { 208 | "cell_type": "raw", 209 | "metadata": {}, 210 | "source": [ 211 | "利用scipy统计模块计算超几何分布概率\n", 212 | "\n", 213 | "10人中,6人喜欢可口可乐,4人喜欢百事可乐,从这些人中选出一个3人的随机样本。求\n", 214 | "1.恰有2人喜欢可口可乐的概率是多少?\n", 215 | "2.2或3个人喜欢百事可乐的概率是多少?\n", 216 | "\n", 217 | "简单理解超几何分布:它描述了从有限M个物件(其中包含n个指定种类的物件)中抽出N个物件,成功抽出该指定种类的物件的次数(不放回)。" 218 | ] 219 | }, 220 | { 221 | "cell_type": "code", 222 | "execution_count": 12, 223 | "metadata": {}, 224 | "outputs": [], 225 | "source": [ 226 | "from scipy import stats" 227 | ] 228 | }, 229 | { 230 | "cell_type": "code", 231 | "execution_count": 13, 232 | "metadata": {}, 233 | "outputs": [], 234 | "source": [ 235 | "N = 3\n", 236 | "M = 10\n", 237 | "n = 6\n", 238 | "# n=6 这里是6人喜欢可口可乐的意思" 239 | ] 240 | }, 241 | { 242 | "cell_type": "code", 243 | "execution_count": 14, 244 | "metadata": {}, 245 | "outputs": [ 246 | { 247 | "data": { 248 | "text/plain": [ 249 | "0.4999999999999997" 250 | ] 251 | }, 252 | "execution_count": 14, 253 | "metadata": {}, 254 | "output_type": "execute_result" 255 | } 256 | ], 257 | "source": [ 258 | "# 1.恰有2人喜欢可口可乐的概率是多少?\n", 259 | "k1 = 2\n", 260 | "prob = stats.hypergeom.pmf(k1,M,n,N)\n", 261 | "prob" 262 | ] 263 | }, 264 | { 265 | "cell_type": "code", 266 | "execution_count": 16, 267 | "metadata": {}, 268 | "outputs": [ 269 | { 270 | "data": { 271 | "text/plain": [ 272 | "0.3333333333333335" 273 | ] 274 | }, 275 | "execution_count": 16, 276 | "metadata": {}, 277 | "output_type": "execute_result" 278 | } 279 | ], 280 | "source": [ 281 | "# 2或3个人喜欢百事可乐的概率是多少?\n", 282 | "k1 = 2\n", 283 | "k2 = 3\n", 284 | "n = 4 \n", 285 | "# n=6 这里是4人喜欢百事可乐的意思\n", 286 | "prob = stats.hypergeom.pmf(k1,M,n,N) + stats.hypergeom.pmf(k2,M,n,N)\n", 287 | "prob" 288 | ] 289 | }, 290 | { 291 | "cell_type": "markdown", 292 | "metadata": {}, 293 | "source": [ 294 | "## 实验3-4 正态分布概率计算" 295 | ] 296 | }, 297 | { 298 | "cell_type": "raw", 299 | "metadata": {}, 300 | "source": [ 301 | "利用scipy统计模块计算正态分布概率\n", 302 | "\n", 303 | "人们第一次结婚的平均年龄是26岁。假设第一次结婚的年龄为正态分布,标准差为4年。求\n", 304 | "1.一个人第一次结婚时的年龄小于23岁的概率多大?\n", 305 | "2.一个人人第一次结婚时的年龄在20-30岁之间的概率多大?\n", 306 | "3.95%的人在什么年龄前第一次结婚" 307 | ] 308 | }, 309 | { 310 | "cell_type": "code", 311 | "execution_count": 17, 312 | "metadata": {}, 313 | "outputs": [], 314 | "source": [ 315 | "from scipy import stats" 316 | ] 317 | }, 318 | { 319 | "cell_type": "code", 320 | "execution_count": 18, 321 | "metadata": {}, 322 | "outputs": [], 323 | "source": [ 324 | "mu = 26\n", 325 | "sigma = 4" 326 | ] 327 | }, 328 | { 329 | "cell_type": "code", 330 | "execution_count": 20, 331 | "metadata": {}, 332 | "outputs": [ 333 | { 334 | "data": { 335 | "text/plain": [ 336 | "0.2266273523768682" 337 | ] 338 | }, 339 | "execution_count": 20, 340 | "metadata": {}, 341 | "output_type": "execute_result" 342 | } 343 | ], 344 | "source": [ 345 | "# 1.一个人第一次结婚时的年龄小于23岁的概率多大?\n", 346 | "x1 = 23\n", 347 | "prob = stats.norm.cdf(x1,mu,sigma)\n", 348 | "prob" 349 | ] 350 | }, 351 | { 352 | "cell_type": "code", 353 | "execution_count": 21, 354 | "metadata": {}, 355 | "outputs": [ 356 | { 357 | "data": { 358 | "text/plain": [ 359 | "0.7745375447996848" 360 | ] 361 | }, 362 | "execution_count": 21, 363 | "metadata": {}, 364 | "output_type": "execute_result" 365 | } 366 | ], 367 | "source": [ 368 | "# 2.一个人人第一次结婚时的年龄在20-30岁之间的概率多大?\n", 369 | "x2 = 20\n", 370 | "x3 = 30\n", 371 | "prob = stats.norm.cdf(x3,mu,sigma) - stats.norm.cdf(x2,mu,sigma)\n", 372 | "prob" 373 | ] 374 | }, 375 | { 376 | "cell_type": "code", 377 | "execution_count": 24, 378 | "metadata": {}, 379 | "outputs": [ 380 | { 381 | "data": { 382 | "text/plain": [ 383 | "32.579414507805886" 384 | ] 385 | }, 386 | "execution_count": 24, 387 | "metadata": {}, 388 | "output_type": "execute_result" 389 | } 390 | ], 391 | "source": [ 392 | "# 3.95%的人在什么年龄前第一次结婚\n", 393 | "x4 = 0.95\n", 394 | "prob = stats.norm.ppf(x4,mu,sigma)\n", 395 | "prob" 396 | ] 397 | }, 398 | { 399 | "cell_type": "markdown", 400 | "metadata": {}, 401 | "source": [ 402 | "## 实验3-5 卡方分布概率计算" 403 | ] 404 | }, 405 | { 406 | "cell_type": "raw", 407 | "metadata": {}, 408 | "source": [ 409 | "利用scipy.stats.chi2进行计算\n", 410 | "\n", 411 | "from scipy import stats\n", 412 | "stats.chi2.cdf(x,n)\n", 413 | "stats.chi2.pdf(x,n)\n", 414 | "\n", 415 | "cdf返回随机变量X小于x的累积概率,即P(X0),n表示自由度" 437 | ] 438 | }, 439 | { 440 | "cell_type": "markdown", 441 | "metadata": {}, 442 | "source": [ 443 | "# 实验3-7 F分布概率计算" 444 | ] 445 | }, 446 | { 447 | "cell_type": "raw", 448 | "metadata": {}, 449 | "source": [ 450 | "利用scipy.stats.f进行计算\n", 451 | "\n", 452 | "from scipy import stats\n", 453 | "stats.f.cdf(x,m,n)\n", 454 | "stats.f.pdf(x,m,n)\n", 455 | "\n", 456 | "x:用来计算F分布的概率数值\n", 457 | "m:分子自由度\n", 458 | "n:分母自由度\n", 459 | "\n", 460 | "cdf返回随机变量X小于x的累积概率,即P(X" 591 | ] 592 | }, 593 | "metadata": { 594 | "needs_background": "light" 595 | }, 596 | "output_type": "display_data" 597 | } 598 | ], 599 | "source": [ 600 | "# 画图\n", 601 | "x = np.linspace(1,25,100)\n", 602 | "y1 = norm_fun(x,mu1,sigma1)\n", 603 | "y2 = norm_fun(x,mu2,sigma2)\n", 604 | "\n", 605 | "fig,ax = plt.subplots()\n", 606 | "ax.plot(x,y1,\"r-\",linewidth=2,label='f(x1)')\n", 607 | "ax.plot(x,y2,\"b-\",linewidth=2,label='f(x2)')\n", 608 | "ax.set(ylabel='f(x)',xlabel='x')\n", 609 | "ax.legend()\n", 610 | "ax.grid(True)" 611 | ] 612 | }, 613 | { 614 | "cell_type": "markdown", 615 | "metadata": {}, 616 | "source": [ 617 | "# 第二节 随机抽样" 618 | ] 619 | }, 620 | { 621 | "cell_type": "markdown", 622 | "metadata": {}, 623 | "source": [ 624 | "## 实验3-10 使用随机数发生器创建随机数" 625 | ] 626 | }, 627 | { 628 | "cell_type": "markdown", 629 | "metadata": {}, 630 | "source": [ 631 | "创建取值分别为1,2,,3,4,相应概率为0.3,0.2,0.1,0.4的概率分布下,15行8列的离散分布随机数表" 632 | ] 633 | }, 634 | { 635 | "cell_type": "code", 636 | "execution_count": null, 637 | "metadata": {}, 638 | "outputs": [], 639 | "source": [ 640 | "import numpy as np" 641 | ] 642 | }, 643 | { 644 | "cell_type": "code", 645 | "execution_count": 47, 646 | "metadata": {}, 647 | "outputs": [ 648 | { 649 | "data": { 650 | "text/plain": [ 651 | "array([[4, 3, 4, 4, 2, 1, 2, 1],\n", 652 | " [3, 2, 4, 4, 4, 2, 1, 4],\n", 653 | " [4, 2, 1, 1, 2, 3, 1, 1],\n", 654 | " [4, 4, 1, 2, 4, 2, 1, 4],\n", 655 | " [4, 2, 1, 1, 2, 1, 1, 1],\n", 656 | " [1, 4, 2, 4, 4, 4, 1, 1],\n", 657 | " [2, 2, 4, 1, 1, 4, 2, 4],\n", 658 | " [1, 1, 2, 1, 1, 1, 1, 4],\n", 659 | " [4, 1, 1, 4, 4, 4, 4, 1],\n", 660 | " [4, 1, 2, 4, 1, 1, 4, 1],\n", 661 | " [3, 1, 4, 4, 4, 2, 1, 4],\n", 662 | " [1, 2, 2, 4, 2, 1, 2, 4],\n", 663 | " [1, 1, 4, 3, 4, 1, 4, 3],\n", 664 | " [4, 2, 1, 2, 4, 4, 4, 3],\n", 665 | " [2, 4, 3, 4, 2, 2, 1, 1]])" 666 | ] 667 | }, 668 | "execution_count": 47, 669 | "metadata": {}, 670 | "output_type": "execute_result" 671 | } 672 | ], 673 | "source": [ 674 | "np.random.choice([1,2,3,4], size=(15,8), p=[0.3,0.2,0.1,0.4])" 675 | ] 676 | }, 677 | { 678 | "cell_type": "markdown", 679 | "metadata": {}, 680 | "source": [ 681 | "## 实验3-11 使用随机数函数创建随机数" 682 | ] 683 | }, 684 | { 685 | "cell_type": "code", 686 | "execution_count": 48, 687 | "metadata": {}, 688 | "outputs": [ 689 | { 690 | "data": { 691 | "text/plain": [ 692 | "array([[0.96217097, 0.98366955, 0.74277225, 0.60416228],\n", 693 | " [0.06235049, 0.74642095, 0.32462698, 0.69385062],\n", 694 | " [0.85452821, 0.12790053, 0.69142201, 0.24191518],\n", 695 | " [0.19062622, 0.65556837, 0.44151301, 0.7943343 ]])" 696 | ] 697 | }, 698 | "execution_count": 48, 699 | "metadata": {}, 700 | "output_type": "execute_result" 701 | } 702 | ], 703 | "source": [ 704 | "# 创建一组服从0-1均匀分布的随机数\n", 705 | "np.random.rand(4,4)" 706 | ] 707 | }, 708 | { 709 | "cell_type": "code", 710 | "execution_count": 49, 711 | "metadata": {}, 712 | "outputs": [ 713 | { 714 | "data": { 715 | "text/plain": [ 716 | "array([116.76319916, 135.19370917, 109.63489242, 144.82287433,\n", 717 | " 136.83623471, 105.72656478, 101.95467207, 130.47797445,\n", 718 | " 131.04471831, 124.04200185, 112.66708126, 134.27049649,\n", 719 | " 115.19391079, 143.9568004 , 115.83986507, 103.46374113,\n", 720 | " 131.50326788, 107.43072546, 132.84189222, 127.73643143,\n", 721 | " 112.44130568, 127.89727523, 110.67101175, 100.78853637,\n", 722 | " 118.5169953 , 118.54264792, 146.05817162, 115.50604347,\n", 723 | " 131.63951633, 113.06720324, 125.39585152, 114.22338753,\n", 724 | " 121.31933932, 147.92981049, 118.1308307 , 140.90040844,\n", 725 | " 146.10194269, 135.69761576, 128.46450351, 118.35484852,\n", 726 | " 101.41365076, 114.91153316, 110.17305378, 139.27232063,\n", 727 | " 116.49425751, 114.79169251, 131.29659205, 141.42651957])" 728 | ] 729 | }, 730 | "execution_count": 49, 731 | "metadata": {}, 732 | "output_type": "execute_result" 733 | } 734 | ], 735 | "source": [ 736 | "# 创建一个均匀分布的,48个在[100,150)中的随机数\n", 737 | "np.random.uniform(100,150,48)" 738 | ] 739 | }, 740 | { 741 | "cell_type": "markdown", 742 | "metadata": {}, 743 | "source": [ 744 | "## 实验3-12 正态分布的模拟" 745 | ] 746 | }, 747 | { 748 | "cell_type": "code", 749 | "execution_count": 50, 750 | "metadata": {}, 751 | "outputs": [], 752 | "source": [ 753 | "import numpy as np\n", 754 | "%matplotlib inline\n", 755 | "import matplotlib.pyplot as plt" 756 | ] 757 | }, 758 | { 759 | "cell_type": "code", 760 | "execution_count": 54, 761 | "metadata": {}, 762 | "outputs": [ 763 | { 764 | "data": { 765 | "text/plain": [ 766 | "array([-0.42170056, -0.44939553, 0.15702986, ..., 0.95759534,\n", 767 | " -0.01364779, 0.27112559])" 768 | ] 769 | }, 770 | "execution_count": 54, 771 | "metadata": {}, 772 | "output_type": "execute_result" 773 | } 774 | ], 775 | "source": [ 776 | "# loc参数代表均值,scale代表标准差,size代表数量\n", 777 | "sample = np.random.normal(loc=0,scale=1,size=2000)\n", 778 | "sample" 779 | ] 780 | }, 781 | { 782 | "cell_type": "code", 783 | "execution_count": 55, 784 | "metadata": {}, 785 | "outputs": [ 786 | { 787 | "data": { 788 | "text/plain": [ 789 | "(array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n", 790 | " 0., 0., 0., 0., 0., 1., 1., 0., 1., 1., 1., 3., 3.,\n", 791 | " 4., 7., 11., 13., 10., 21., 17., 16., 23., 26., 32., 43., 47.,\n", 792 | " 48., 65., 66., 51., 62., 70., 74., 62., 78., 81., 90., 75., 74.,\n", 793 | " 88., 74., 64., 75., 62., 68., 56., 45., 43., 43., 34., 21., 25.,\n", 794 | " 28., 14., 20., 13., 7., 8., 7., 9., 8., 3., 4., 1., 1.,\n", 795 | " 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n", 796 | " 0., 0., 0., 0., 0., 0., 0., 0.]),\n", 797 | " array([-5.00000000e+00, -4.90000000e+00, -4.80000000e+00, -4.70000000e+00,\n", 798 | " -4.60000000e+00, -4.50000000e+00, -4.40000000e+00, -4.30000000e+00,\n", 799 | " -4.20000000e+00, -4.10000000e+00, -4.00000000e+00, -3.90000000e+00,\n", 800 | " -3.80000000e+00, -3.70000000e+00, -3.60000000e+00, -3.50000000e+00,\n", 801 | " -3.40000000e+00, -3.30000000e+00, -3.20000000e+00, -3.10000000e+00,\n", 802 | " -3.00000000e+00, -2.90000000e+00, -2.80000000e+00, -2.70000000e+00,\n", 803 | " -2.60000000e+00, -2.50000000e+00, -2.40000000e+00, -2.30000000e+00,\n", 804 | " -2.20000000e+00, -2.10000000e+00, -2.00000000e+00, -1.90000000e+00,\n", 805 | " -1.80000000e+00, -1.70000000e+00, -1.60000000e+00, -1.50000000e+00,\n", 806 | " -1.40000000e+00, -1.30000000e+00, -1.20000000e+00, -1.10000000e+00,\n", 807 | " -1.00000000e+00, -9.00000000e-01, -8.00000000e-01, -7.00000000e-01,\n", 808 | " -6.00000000e-01, -5.00000000e-01, -4.00000000e-01, -3.00000000e-01,\n", 809 | " -2.00000000e-01, -1.00000000e-01, -1.77635684e-14, 1.00000000e-01,\n", 810 | " 2.00000000e-01, 3.00000000e-01, 4.00000000e-01, 5.00000000e-01,\n", 811 | " 6.00000000e-01, 7.00000000e-01, 8.00000000e-01, 9.00000000e-01,\n", 812 | " 1.00000000e+00, 1.10000000e+00, 1.20000000e+00, 1.30000000e+00,\n", 813 | " 1.40000000e+00, 1.50000000e+00, 1.60000000e+00, 1.70000000e+00,\n", 814 | " 1.80000000e+00, 1.90000000e+00, 2.00000000e+00, 2.10000000e+00,\n", 815 | " 2.20000000e+00, 2.30000000e+00, 2.40000000e+00, 2.50000000e+00,\n", 816 | " 2.60000000e+00, 2.70000000e+00, 2.80000000e+00, 2.90000000e+00,\n", 817 | " 3.00000000e+00, 3.10000000e+00, 3.20000000e+00, 3.30000000e+00,\n", 818 | " 3.40000000e+00, 3.50000000e+00, 3.60000000e+00, 3.70000000e+00,\n", 819 | " 3.80000000e+00, 3.90000000e+00, 4.00000000e+00, 4.10000000e+00,\n", 820 | " 4.20000000e+00, 4.30000000e+00, 4.40000000e+00, 4.50000000e+00,\n", 821 | " 4.60000000e+00, 4.70000000e+00, 4.80000000e+00, 4.90000000e+00]),\n", 822 | " )" 823 | ] 824 | }, 825 | "execution_count": 55, 826 | "metadata": {}, 827 | "output_type": "execute_result" 828 | }, 829 | { 830 | "data": { 831 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAW4AAAD2CAYAAAD24G0VAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAK2klEQVR4nO3dX4id+VkH8O9jbCA0WlN2iCy4jYHghUpqGXQjUdKlW9wLFVeUxVovKkRlwUvZYqqlFypLWcVClkZjb/TC9KJSsZWAsrhCFkkovVFLe5GlBIKp2Ua3148XOWsm6UzmZHL+zO+cz+fqnPPmnXneOXO+PHne9/dOdXcAGMf3LLsAAB6N4AYYjOAGGIzgBhiM4AYYzPfO+xs88cQTfezYsXl/G4CVcu3atW9198Z22+Ye3MeOHcvVq1fn/W0AVkpVvbnTNqMSgMEIboDBCG6AwQhugMEIboDBCG6AwQhugMEIboDBCG6Awcx95STsR5/85PaPYQQ6boDBCG6AwQhugMGYccMOzMHZr3TcAIMR3ACDEdwAgxHcAIMR3ACDEdwAgxHcAIMR3ACDEdwAgxHcAIMR3ACDEdwAgxHcAIMR3ACDEdwAg3E/btaGe2qzKh7acVfVkar6UlVdrarPTl67WFVXqurcYkoEYKvdRiUfTfI33b2Z5Puq6veSHOjuU0mOV9WJuVcIwH12C+7/TvJjVfUDSX4oyQ8nuTTZdjnJ6e12qqqzky796q1bt2ZWLAC7B/e/Jnlfkt9N8h9JDia5Mdl2O8nR7Xbq7gvdvdndmxsbG7OqFYDsHtx/mOS3u/tTSf4zya8lOTTZdniK/QGYsd2C90iSH6+qA0l+Ksmf5N545GSS6/MrDYDt7HY54B8n+VzujkuuJPnTJK9X1ZNJnkvy9HzLg8fjEkBW0UODu7v/LcmPbn2tqs4keTbJy919Z36lAbCdR16A091v5d6VJQAsmJOLAIMR3ACDEdwAgxHcAINxd0DW3tZLBvdy+eDj7g+PSscNMBjBDTAYoxJWwqzGFTvtaxzCfqLjBhiM4AYYjOAGGIzgBhiM4AYYjOAGGIzgBhiM4AYYjAU4DMtCGNaVjhtgMIIbYDCCG2AwZtwwJ25MxbzouAEGI7gBBiO4AQYjuAEG4+Qk+86DJ/Kc2IP76bgBBqPjZl/QVcP0dNwAgxHcAIMxKoFHZKzDsum4AQYjuAEGI7gBBmPGzVCmmS8vcwZt/s0i6LgBBiO4AQYzdXBX1fmq+vnJ44tVdaWqzs2vNAC2M1VwV9XPJPnB7v77qno+yYHuPpXkeFWdmGuFANxn1+Cuqncl+Ysk16vqF5OcSXJpsvlyktPb7HO2qq5W1dVbt27NsFwApum4fyPJvyd5OclPJnkxyY3JtttJjj64Q3df6O7N7t7c2NiYVa0AZLrLAX8iyYXuvllVf53kp5Mcmmw7HCc4ARZqmtD9RpLjk8ebSY7l3njkZJLrM68KgB1N03FfTPJXVfVCknfl7oz7i1X1ZJLnkjw9v/IAeNCuwd3d/5vkV7a+VlVnkjyb5OXuvjOf0gDYzp6WvHf3W7l3ZQkAC+TEIsBgBDfAYNwdEBZg610D3UGQx6XjBhiM4AYYjOAGGIzgBhiM4AYYjOAGGIzgBhiM4AYYjOAGGIzgBhiMJe/se5aIw/103ACDEdwAgxHcAIMR3ACDEdwAgxHcAINxOSBL4zI/2BsdN8BgBDfAYIxKWCjjEXh8Om6AwQhugMEIboDBmHHDgm2d85v5sxc6boDBCG6AwQhugMEIboDBODnJ3DkBt7OdfjZ+ZjyMjhtgMIIbYDCCG2AwghtgMFMFd1UdraqvTB5frKorVXVuvqUBsJ1pO+5PJzlUVc8nOdDdp5Icr6oT8ysNgO3sGtxV9UyS7yS5meRMkkuTTZeTnN5hn7NVdbWqrt66dWtGpQKQ7BLcVXUwySeSvDR56d1Jbkwe305ydLv9uvtCd2929+bGxsasagUgu3fcLyU5393fnjx/O8mhyePDU+wPwIzttnLyQ0meqaoXk7w/yVNJvpnkjSQnk3xtvuUB8KCHBnd3/+w7j6vqtSS/kOT1qnoyyXNJnp5rdQB8l6lHHd19prv/J3dPUL6R5IPdfWdehQGwvUe+yVR3v5V7V5YAsGBOLgIMRnADDEZwAwxGcAMMRnADDEZwAwxGcAMMxh8Lhn1o6x8L9oeDeZCOG2AwghtgMIIbYDBm3MyFuSzMj44bYDCCG2AwghtgMIIbYDCCG2AwghtgMIIbYDCCG2AwFuDAoNyIan3puAEGI7gBBiO4AQYjuAEG4+QkM+MEGSyGjhtgMDpuGIj/1ZDouAGGI7gBBiO4AQYjuAEG4+Qkj8XJMlg8HTfAYAQ3wGAEN8BgzLjZkfs97w9+9jxo1467qt5TVV+uqstV9YWqOlhVF6vqSlWdW0SRANwzzajkI0le6e4PJ7mZ5IUkB7r7VJLjVXVingUCcL9dRyXdfX7L040kv57kzybPLyc5neTrW/epqrNJzibJU089NZNCgekYca2+qU9OVtWpJEeSfDPJjcnLt5McffDfdveF7t7s7s2NjY2ZFArAXVMFd1W9N8lnknwsydtJDk02HZ72awAwG9OcnDyY5PNJPt7dbya5lrvjkSQ5meT63KoD4LtM0y3/ZpIPJPn9qnotSSX5aFW9kuRXk/zD/MoD4EHTnJx8NcmrW1+rqi8meTbJy919Z061AbCNPS3A6e63klyacS0ATMHKSabiEjPYP1wRAjAYHTePTMcNy6XjBhiM4AYYjOAGGIzgBhiMk5OwApwwXi86boDB6LixuGaFPfh+en9Xg44bYDCCG2AwghtgMGbca8qsE8al4wYYjOAGGIxRCfcxQoH9T8cNMBgdN3AfC7L2Px03wGB03LCmpumsdd/7k44bYDCCG2AwRiVrwn9zSaYbibD/6bgBBiO4AQYjuAEGI7gBBiO4AQYjuAEG43LAFeYSL+bFisrl0nEDDEbHvQJ0PCyC37P9Q8cNMBjBDTAYwQ0wGDPuJXIPZFbBo/6O+p1+fHvuuKvqYlVdqapzsywIgIfbU3BX1fNJDnT3qSTHq+rEbMsCYCfV3Y++U9WfJ/nH7v5SVb2Q5FB3f27L9rNJzk6e/kiSr82i2AV7Ism3ll3EEqzjca/jMSfredwjHfP7untjuw17nXG/O8mNyePbST6wdWN3X0hyYY9fe1+oqqvdvbnsOhZtHY97HY85Wc/jXpVj3uuM++0khyaPDz/G1wHgEe01cK8lOT15fDLJ9ZlUA8Cu9joq+bskr1fVk0meS/L07EraN4Ye9TyGdTzudTzmZD2PeyWOeU8nJ5Okqo4keTbJv3T3zZlWBcCO9hzcACyHk4oAgxHcD1FVR6vqK8uuY1Gq6j1V9eWqulxVX6iqg8uuad7WbQXwOr7HW63KZ1pwP9ync++yx3XwkSSvdPeHk9xM8nNLrmeu1nQF8Fq9x9tYic+0m0ztoKqeSfKd3P3lXgvdfX7L040k/7WsWhbkTJJLk8eXc/cS168vrZoFWMP3+P+t0mdacCepqs/m7tL8d/xzkg8m+aXcvfRxJW133N39qao6leRId7+xpNIW5aErgFfZGr3HSZLJSOgTWZHPtOBO0t2/tfV5Vf1BkvPd/e2qWlJV8/fgcSdJVb03yWeS/PLiK1q4tVwBvGbv8Tteygp9ptfiF3UPPpTkxap6Lcn7q+ovl1zPQky6ks8n+Xh3v7nsehZg7VYAr+F7/I6V+ky7jnsXVfVad59Zdh2LUFW/k+SPknx18tKr3f23Syxprqrq+5O8nuSfMlkB3N13llvVfK3be7ydVfhMC27WmhXAjEhwAwzGjBtgMIIbYDCCG2AwghtgMIIbYDD/B5lKIWigy8+GAAAAAElFTkSuQmCC\n", 832 | "text/plain": [ 833 | "
" 834 | ] 835 | }, 836 | "metadata": { 837 | "needs_background": "light" 838 | }, 839 | "output_type": "display_data" 840 | } 841 | ], 842 | "source": [ 843 | "bins = np.arange(-5,5,0.1)\n", 844 | "plt.hist(sample,bins,color='blue',alpha=0.5)" 845 | ] 846 | }, 847 | { 848 | "cell_type": "markdown", 849 | "metadata": {}, 850 | "source": [ 851 | "## 实验3-13 随机抽样" 852 | ] 853 | }, 854 | { 855 | "cell_type": "code", 856 | "execution_count": 59, 857 | "metadata": {}, 858 | "outputs": [ 859 | { 860 | "data": { 861 | "text/plain": [ 862 | "array([ 1, 3, 5, 7, 9, 11, 13, 15, 17, 19])" 863 | ] 864 | }, 865 | "execution_count": 59, 866 | "metadata": {}, 867 | "output_type": "execute_result" 868 | } 869 | ], 870 | "source": [ 871 | "# 这个我照书上的简化了一下\n", 872 | "import numpy as np\n", 873 | "all = np.arange(1,20,2)\n", 874 | "all" 875 | ] 876 | }, 877 | { 878 | "cell_type": "code", 879 | "execution_count": 60, 880 | "metadata": {}, 881 | "outputs": [ 882 | { 883 | "data": { 884 | "text/plain": [ 885 | "array([17, 11, 7])" 886 | ] 887 | }, 888 | "execution_count": 60, 889 | "metadata": {}, 890 | "output_type": "execute_result" 891 | } 892 | ], 893 | "source": [ 894 | "# 从all中随机抽3个数\n", 895 | "np.random.choice(all,size=3)" 896 | ] 897 | } 898 | ], 899 | "metadata": { 900 | "kernelspec": { 901 | "display_name": "Python [conda env:root] *", 902 | "language": "python", 903 | "name": "conda-root-py" 904 | }, 905 | "language_info": { 906 | "codemirror_mode": { 907 | "name": "ipython", 908 | "version": 3 909 | }, 910 | "file_extension": ".py", 911 | "mimetype": "text/x-python", 912 | "name": "python", 913 | "nbconvert_exporter": "python", 914 | "pygments_lexer": "ipython3", 915 | "version": "3.7.5" 916 | } 917 | }, 918 | "nbformat": 4, 919 | "nbformat_minor": 2 920 | } 921 | -------------------------------------------------------------------------------- /第二章 描述性统计分析/2-1.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AirFin/Statistics_Python_Codes/684c651552e0012d085e72301c4fd210748e4a93/第二章 描述性统计分析/2-1.xlsx -------------------------------------------------------------------------------- /第二章 描述性统计分析/2-10.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AirFin/Statistics_Python_Codes/684c651552e0012d085e72301c4fd210748e4a93/第二章 描述性统计分析/2-10.xlsx -------------------------------------------------------------------------------- /第二章 描述性统计分析/2-2.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AirFin/Statistics_Python_Codes/684c651552e0012d085e72301c4fd210748e4a93/第二章 描述性统计分析/2-2.xlsx -------------------------------------------------------------------------------- /第二章 描述性统计分析/2-7.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AirFin/Statistics_Python_Codes/684c651552e0012d085e72301c4fd210748e4a93/第二章 描述性统计分析/2-7.xlsx -------------------------------------------------------------------------------- /第二章 描述性统计分析/2-8.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AirFin/Statistics_Python_Codes/684c651552e0012d085e72301c4fd210748e4a93/第二章 描述性统计分析/2-8.xlsx -------------------------------------------------------------------------------- /第二章 描述性统计分析/第二章 描述性统计分析.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 第一节 分布数列" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "## 实验2-1 连续变量分布数列的编制" 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "50个电池寿命如下,请对其寿命编制分布数列,并进行向上和向下累计" 22 | ] 23 | }, 24 | { 25 | "cell_type": "code", 26 | "execution_count": 4, 27 | "metadata": {}, 28 | "outputs": [], 29 | "source": [ 30 | "import numpy as np\n", 31 | "import pandas as pd " 32 | ] 33 | }, 34 | { 35 | "cell_type": "code", 36 | "execution_count": 10, 37 | "metadata": {}, 38 | "outputs": [ 39 | { 40 | "data": { 41 | "text/html": [ 42 | "
\n", 43 | "\n", 56 | "\n", 57 | " \n", 58 | " \n", 59 | " \n", 60 | " \n", 61 | " \n", 62 | " \n", 63 | " \n", 64 | " \n", 65 | " \n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | "
Hours
0996
1942
2957
31400
41623
\n", 86 | "
" 87 | ], 88 | "text/plain": [ 89 | " Hours\n", 90 | "0 996\n", 91 | "1 942\n", 92 | "2 957\n", 93 | "3 1400\n", 94 | "4 1623" 95 | ] 96 | }, 97 | "execution_count": 10, 98 | "metadata": {}, 99 | "output_type": "execute_result" 100 | } 101 | ], 102 | "source": [ 103 | "life_df = pd.read_excel('2-1.xlsx')\n", 104 | "life_df.head()" 105 | ] 106 | }, 107 | { 108 | "cell_type": "code", 109 | "execution_count": 11, 110 | "metadata": {}, 111 | "outputs": [ 112 | { 113 | "data": { 114 | "text/html": [ 115 | "
\n", 116 | "\n", 129 | "\n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | "
Hours
count50.000000
mean1257.700000
std277.948553
min804.000000
25%1002.500000
50%1286.000000
75%1540.250000
max1689.000000
\n", 171 | "
" 172 | ], 173 | "text/plain": [ 174 | " Hours\n", 175 | "count 50.000000\n", 176 | "mean 1257.700000\n", 177 | "std 277.948553\n", 178 | "min 804.000000\n", 179 | "25% 1002.500000\n", 180 | "50% 1286.000000\n", 181 | "75% 1540.250000\n", 182 | "max 1689.000000" 183 | ] 184 | }, 185 | "execution_count": 11, 186 | "metadata": {}, 187 | "output_type": "execute_result" 188 | } 189 | ], 190 | "source": [ 191 | "life_df.describe()" 192 | ] 193 | }, 194 | { 195 | "cell_type": "code", 196 | "execution_count": 13, 197 | "metadata": {}, 198 | "outputs": [ 199 | { 200 | "data": { 201 | "text/html": [ 202 | "
\n", 203 | "\n", 216 | "\n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | "
value_counts
[800, 900)6
[900, 1000)7
[1000, 1100)5
[1100, 1200)4
[1200, 1300)4
[1300, 1400)5
[1400, 1500)5
[1500, 1600)8
[1600, 1700)6
\n", 262 | "
" 263 | ], 264 | "text/plain": [ 265 | " value_counts\n", 266 | "[800, 900) 6\n", 267 | "[900, 1000) 7\n", 268 | "[1000, 1100) 5\n", 269 | "[1100, 1200) 4\n", 270 | "[1200, 1300) 4\n", 271 | "[1300, 1400) 5\n", 272 | "[1400, 1500) 5\n", 273 | "[1500, 1600) 8\n", 274 | "[1600, 1700) 6" 275 | ] 276 | }, 277 | "execution_count": 13, 278 | "metadata": {}, 279 | "output_type": "execute_result" 280 | } 281 | ], 282 | "source": [ 283 | "# 利用pandas的cut方法进行分组\n", 284 | "\n", 285 | "# 这个bins区间,是根据这组数的范围选择的,看情况。\n", 286 | "# right=False 表明上限不在内\n", 287 | "bins = range(800,1701,100) \n", 288 | "life_bin_df = pd.cut(life_df['Hours'],bins,right=False)\n", 289 | "life_bin_df = life_bin_df.value_counts().to_frame(name='value_counts').sort_index()\n", 290 | "life_bin_df" 291 | ] 292 | }, 293 | { 294 | "cell_type": "code", 295 | "execution_count": 14, 296 | "metadata": {}, 297 | "outputs": [ 298 | { 299 | "data": { 300 | "text/html": [ 301 | "
\n", 302 | "\n", 315 | "\n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | " \n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | " \n", 360 | " \n", 361 | " \n", 362 | " \n", 363 | " \n", 364 | " \n", 365 | " \n", 366 | " \n", 367 | " \n", 368 | " \n", 369 | " \n", 370 | " \n", 371 | " \n", 372 | " \n", 373 | " \n", 374 | " \n", 375 | " \n", 376 | " \n", 377 | " \n", 378 | " \n", 379 | " \n", 380 | " \n", 381 | " \n", 382 | " \n", 383 | " \n", 384 | " \n", 385 | " \n", 386 | " \n", 387 | " \n", 388 | " \n", 389 | " \n", 390 | "
value_countspercentagecumsum_upcumsum_down
[800, 900)60.12650
[900, 1000)70.141344
[1000, 1100)50.101837
[1100, 1200)40.082232
[1200, 1300)40.082628
[1300, 1400)50.103124
[1400, 1500)50.103619
[1500, 1600)80.164414
[1600, 1700)60.12506
\n", 391 | "
" 392 | ], 393 | "text/plain": [ 394 | " value_counts percentage cumsum_up cumsum_down\n", 395 | "[800, 900) 6 0.12 6 50\n", 396 | "[900, 1000) 7 0.14 13 44\n", 397 | "[1000, 1100) 5 0.10 18 37\n", 398 | "[1100, 1200) 4 0.08 22 32\n", 399 | "[1200, 1300) 4 0.08 26 28\n", 400 | "[1300, 1400) 5 0.10 31 24\n", 401 | "[1400, 1500) 5 0.10 36 19\n", 402 | "[1500, 1600) 8 0.16 44 14\n", 403 | "[1600, 1700) 6 0.12 50 6" 404 | ] 405 | }, 406 | "execution_count": 14, 407 | "metadata": {}, 408 | "output_type": "execute_result" 409 | } 410 | ], 411 | "source": [ 412 | "# 计算频率、向上累计及向下累计,看结果就很容易理解了。\n", 413 | "\n", 414 | "life_bin_df.loc[:,'percentage'] = life_bin_df.loc[:,'value_counts']/life_bin_df.loc[:,'value_counts'].sum()\n", 415 | "life_bin_df.loc[:,'cumsum_up'] = life_bin_df.loc[::,'value_counts'].cumsum()\n", 416 | "life_bin_df.loc[:,'cumsum_down'] = life_bin_df.loc[::-1,'value_counts'].cumsum()\n", 417 | "life_bin_df" 418 | ] 419 | }, 420 | { 421 | "cell_type": "markdown", 422 | "metadata": {}, 423 | "source": [ 424 | "## 实验2-2 离散变量分布数列的编制" 425 | ] 426 | }, 427 | { 428 | "cell_type": "markdown", 429 | "metadata": {}, 430 | "source": [ 431 | "这个没啥,看着就能明白" 432 | ] 433 | }, 434 | { 435 | "cell_type": "code", 436 | "execution_count": 15, 437 | "metadata": {}, 438 | "outputs": [ 439 | { 440 | "data": { 441 | "text/html": [ 442 | "
\n", 443 | "\n", 456 | "\n", 457 | " \n", 458 | " \n", 459 | " \n", 460 | " \n", 461 | " \n", 462 | " \n", 463 | " \n", 464 | " \n", 465 | " \n", 466 | " \n", 467 | " \n", 468 | " \n", 469 | " \n", 470 | " \n", 471 | " \n", 472 | " \n", 473 | " \n", 474 | " \n", 475 | " \n", 476 | " \n", 477 | " \n", 478 | " \n", 479 | " \n", 480 | " \n", 481 | " \n", 482 | " \n", 483 | " \n", 484 | " \n", 485 | " \n", 486 | " \n", 487 | " \n", 488 | " \n", 489 | " \n", 490 | " \n", 491 | " \n", 492 | " \n", 493 | " \n", 494 | " \n", 495 | " \n", 496 | " \n", 497 | " \n", 498 | " \n", 499 | " \n", 500 | " \n", 501 | " \n", 502 | " \n", 503 | " \n", 504 | " \n", 505 | " \n", 506 | " \n", 507 | " \n", 508 | " \n", 509 | " \n", 510 | " \n", 511 | " \n", 512 | " \n", 513 | " \n", 514 | " \n", 515 | " \n", 516 | " \n", 517 | " \n", 518 | " \n", 519 | " \n", 520 | " \n", 521 | " \n", 522 | " \n", 523 | " \n", 524 | " \n", 525 | " \n", 526 | " \n", 527 | " \n", 528 | " \n", 529 | " \n", 530 | " \n", 531 | " \n", 532 | " \n", 533 | " \n", 534 | " \n", 535 | " \n", 536 | " \n", 537 | " \n", 538 | " \n", 539 | " \n", 540 | " \n", 541 | " \n", 542 | " \n", 543 | " \n", 544 | " \n", 545 | "
Eval
0O
1O
2O
3V
4V
5V
6O
7O
8O
9A
10A
11A
12A
13G
14G
15V
16V
17A
18A
19A
\n", 546 | "
" 547 | ], 548 | "text/plain": [ 549 | " Eval\n", 550 | "0 O\n", 551 | "1 O\n", 552 | "2 O\n", 553 | "3 V\n", 554 | "4 V\n", 555 | "5 V\n", 556 | "6 O\n", 557 | "7 O\n", 558 | "8 O\n", 559 | "9 A\n", 560 | "10 A\n", 561 | "11 A\n", 562 | "12 A\n", 563 | "13 G\n", 564 | "14 G\n", 565 | "15 V\n", 566 | "16 V\n", 567 | "17 A\n", 568 | "18 A\n", 569 | "19 A" 570 | ] 571 | }, 572 | "execution_count": 15, 573 | "metadata": {}, 574 | "output_type": "execute_result" 575 | } 576 | ], 577 | "source": [ 578 | "eval_df = pd.read_excel('2-2.xlsx')\n", 579 | "eval_df" 580 | ] 581 | }, 582 | { 583 | "cell_type": "code", 584 | "execution_count": 17, 585 | "metadata": {}, 586 | "outputs": [ 587 | { 588 | "data": { 589 | "text/plain": [ 590 | "A 7\n", 591 | "O 6\n", 592 | "V 5\n", 593 | "G 2\n", 594 | "Name: Eval, dtype: int64" 595 | ] 596 | }, 597 | "execution_count": 17, 598 | "metadata": {}, 599 | "output_type": "execute_result" 600 | } 601 | ], 602 | "source": [ 603 | "eval_df['Eval'].value_counts()" 604 | ] 605 | }, 606 | { 607 | "cell_type": "markdown", 608 | "metadata": {}, 609 | "source": [ 610 | "总结:其实就是 value_counts()这个函数的运用" 611 | ] 612 | }, 613 | { 614 | "cell_type": "markdown", 615 | "metadata": {}, 616 | "source": [ 617 | "# 第二节 统计图" 618 | ] 619 | }, 620 | { 621 | "cell_type": "markdown", 622 | "metadata": {}, 623 | "source": [ 624 | "这个教程很多,这里写的也很简单,就省略不写了。唯一要注意的是matplotlib中文显示问题。这里给出一种方法。" 625 | ] 626 | }, 627 | { 628 | "cell_type": "code", 629 | "execution_count": 19, 630 | "metadata": {}, 631 | "outputs": [], 632 | "source": [ 633 | "%matplotlib inline\n", 634 | "import matplotlib.pyplot as plt\n", 635 | "plt.rcParams['font.sans-serif'] = ['SimHei'] # 步骤一(替换sans-serif字体)\n", 636 | "plt.rcParams['axes.unicode_minus'] = False # 步骤二(解决坐标轴负数的负号显示问题)\n", 637 | "plt.rcParams['savefig.dpi'] = 100 # 图片质量" 638 | ] 639 | }, 640 | { 641 | "cell_type": "markdown", 642 | "metadata": {}, 643 | "source": [ 644 | "# 第三节 描述统计量" 645 | ] 646 | }, 647 | { 648 | "cell_type": "markdown", 649 | "metadata": {}, 650 | "source": [ 651 | "描述统计量是什么,就不过多赘述了,哪一个指标忘记了,上网搜索即可。这部分内容比较简单,通常一个函数即可。" 652 | ] 653 | }, 654 | { 655 | "cell_type": "markdown", 656 | "metadata": {}, 657 | "source": [ 658 | "## 实验2-7 计算描述统计量" 659 | ] 660 | }, 661 | { 662 | "cell_type": "code", 663 | "execution_count": 21, 664 | "metadata": {}, 665 | "outputs": [], 666 | "source": [ 667 | "import numpy as np\n", 668 | "import pandas as pd " 669 | ] 670 | }, 671 | { 672 | "cell_type": "code", 673 | "execution_count": 22, 674 | "metadata": {}, 675 | "outputs": [ 676 | { 677 | "data": { 678 | "text/html": [ 679 | "
\n", 680 | "\n", 693 | "\n", 694 | " \n", 695 | " \n", 696 | " \n", 697 | " \n", 698 | " \n", 699 | " \n", 700 | " \n", 701 | " \n", 702 | " \n", 703 | " \n", 704 | " \n", 705 | " \n", 706 | " \n", 707 | " \n", 708 | " \n", 709 | " \n", 710 | " \n", 711 | " \n", 712 | " \n", 713 | " \n", 714 | " \n", 715 | " \n", 716 | " \n", 717 | " \n", 718 | " \n", 719 | " \n", 720 | " \n", 721 | " \n", 722 | " \n", 723 | " \n", 724 | " \n", 725 | " \n", 726 | " \n", 727 | " \n", 728 | " \n", 729 | " \n", 730 | " \n", 731 | " \n", 732 | " \n", 733 | " \n", 734 | " \n", 735 | " \n", 736 | " \n", 737 | " \n", 738 | " \n", 739 | " \n", 740 | " \n", 741 | " \n", 742 | " \n", 743 | " \n", 744 | " \n", 745 | " \n", 746 | " \n", 747 | " \n", 748 | " \n", 749 | " \n", 750 | " \n", 751 | " \n", 752 | " \n", 753 | " \n", 754 | " \n", 755 | " \n", 756 | " \n", 757 | " \n", 758 | " \n", 759 | " \n", 760 | " \n", 761 | " \n", 762 | " \n", 763 | " \n", 764 | " \n", 765 | " \n", 766 | " \n", 767 | " \n", 768 | " \n", 769 | " \n", 770 | " \n", 771 | " \n", 772 | " \n", 773 | " \n", 774 | " \n", 775 | " \n", 776 | " \n", 777 | " \n", 778 | " \n", 779 | " \n", 780 | " \n", 781 | " \n", 782 | "
Score
095
163
278
394
460
596
683
768
888
990
1095
1193
1267
1383
1482
1572
1685
1781
1861
1987
\n", 783 | "
" 784 | ], 785 | "text/plain": [ 786 | " Score\n", 787 | "0 95\n", 788 | "1 63\n", 789 | "2 78\n", 790 | "3 94\n", 791 | "4 60\n", 792 | "5 96\n", 793 | "6 83\n", 794 | "7 68\n", 795 | "8 88\n", 796 | "9 90\n", 797 | "10 95\n", 798 | "11 93\n", 799 | "12 67\n", 800 | "13 83\n", 801 | "14 82\n", 802 | "15 72\n", 803 | "16 85\n", 804 | "17 81\n", 805 | "18 61\n", 806 | "19 87" 807 | ] 808 | }, 809 | "execution_count": 22, 810 | "metadata": {}, 811 | "output_type": "execute_result" 812 | } 813 | ], 814 | "source": [ 815 | "score_df = pd.read_excel('2-7.xlsx')\n", 816 | "score_df" 817 | ] 818 | }, 819 | { 820 | "cell_type": "code", 821 | "execution_count": 24, 822 | "metadata": {}, 823 | "outputs": [ 824 | { 825 | "data": { 826 | "text/plain": [ 827 | "81.05" 828 | ] 829 | }, 830 | "execution_count": 24, 831 | "metadata": {}, 832 | "output_type": "execute_result" 833 | } 834 | ], 835 | "source": [ 836 | "# 平均数\n", 837 | "score_df['Score'].mean()" 838 | ] 839 | }, 840 | { 841 | "cell_type": "code", 842 | "execution_count": 25, 843 | "metadata": {}, 844 | "outputs": [ 845 | { 846 | "data": { 847 | "text/plain": [ 848 | "83.0" 849 | ] 850 | }, 851 | "execution_count": 25, 852 | "metadata": {}, 853 | "output_type": "execute_result" 854 | } 855 | ], 856 | "source": [ 857 | "# 中位数\n", 858 | "score_df['Score'].median()" 859 | ] 860 | }, 861 | { 862 | "cell_type": "code", 863 | "execution_count": 26, 864 | "metadata": {}, 865 | "outputs": [ 866 | { 867 | "data": { 868 | "text/plain": [ 869 | "0 83\n", 870 | "1 95\n", 871 | "dtype: int64" 872 | ] 873 | }, 874 | "execution_count": 26, 875 | "metadata": {}, 876 | "output_type": "execute_result" 877 | } 878 | ], 879 | "source": [ 880 | "# 众数\n", 881 | "score_df['Score'].mode()" 882 | ] 883 | }, 884 | { 885 | "cell_type": "code", 886 | "execution_count": 38, 887 | "metadata": {}, 888 | "outputs": [ 889 | { 890 | "data": { 891 | "text/plain": [ 892 | "12.010850357730277" 893 | ] 894 | }, 895 | "execution_count": 38, 896 | "metadata": {}, 897 | "output_type": "execute_result" 898 | } 899 | ], 900 | "source": [ 901 | "# 标准差\n", 902 | "source_std = score_df['Score'].std()\n", 903 | "source_std " 904 | ] 905 | }, 906 | { 907 | "cell_type": "code", 908 | "execution_count": 28, 909 | "metadata": {}, 910 | "outputs": [ 911 | { 912 | "data": { 913 | "text/plain": [ 914 | "144.26052631578952" 915 | ] 916 | }, 917 | "execution_count": 28, 918 | "metadata": {}, 919 | "output_type": "execute_result" 920 | } 921 | ], 922 | "source": [ 923 | "# 方差\n", 924 | "score_df['Score'].var()" 925 | ] 926 | }, 927 | { 928 | "cell_type": "code", 929 | "execution_count": 29, 930 | "metadata": {}, 931 | "outputs": [ 932 | { 933 | "data": { 934 | "text/plain": [ 935 | "-1.0283438185619334" 936 | ] 937 | }, 938 | "execution_count": 29, 939 | "metadata": {}, 940 | "output_type": "execute_result" 941 | } 942 | ], 943 | "source": [ 944 | "# 峰度\n", 945 | "score_df['Score'].kurt()" 946 | ] 947 | }, 948 | { 949 | "cell_type": "code", 950 | "execution_count": 30, 951 | "metadata": {}, 952 | "outputs": [ 953 | { 954 | "data": { 955 | "text/plain": [ 956 | "-0.5077502334495104" 957 | ] 958 | }, 959 | "execution_count": 30, 960 | "metadata": {}, 961 | "output_type": "execute_result" 962 | } 963 | ], 964 | "source": [ 965 | "# 偏度\n", 966 | "score_df['Score'].skew()" 967 | ] 968 | }, 969 | { 970 | "cell_type": "code", 971 | "execution_count": 31, 972 | "metadata": {}, 973 | "outputs": [ 974 | { 975 | "data": { 976 | "text/plain": [ 977 | "96" 978 | ] 979 | }, 980 | "execution_count": 31, 981 | "metadata": {}, 982 | "output_type": "execute_result" 983 | } 984 | ], 985 | "source": [ 986 | "# 最大值\n", 987 | "score_df['Score'].max()" 988 | ] 989 | }, 990 | { 991 | "cell_type": "code", 992 | "execution_count": 32, 993 | "metadata": {}, 994 | "outputs": [ 995 | { 996 | "data": { 997 | "text/plain": [ 998 | "60" 999 | ] 1000 | }, 1001 | "execution_count": 32, 1002 | "metadata": {}, 1003 | "output_type": "execute_result" 1004 | } 1005 | ], 1006 | "source": [ 1007 | "# 最小值\n", 1008 | "score_df['Score'].min()" 1009 | ] 1010 | }, 1011 | { 1012 | "cell_type": "code", 1013 | "execution_count": 34, 1014 | "metadata": {}, 1015 | "outputs": [ 1016 | { 1017 | "data": { 1018 | "text/plain": [ 1019 | "36" 1020 | ] 1021 | }, 1022 | "execution_count": 34, 1023 | "metadata": {}, 1024 | "output_type": "execute_result" 1025 | } 1026 | ], 1027 | "source": [ 1028 | "# 区域\n", 1029 | "score_area = score_df['Score'].max() - score_df['Score'].min()\n", 1030 | "score_area" 1031 | ] 1032 | }, 1033 | { 1034 | "cell_type": "code", 1035 | "execution_count": 35, 1036 | "metadata": {}, 1037 | "outputs": [ 1038 | { 1039 | "data": { 1040 | "text/plain": [ 1041 | "1621" 1042 | ] 1043 | }, 1044 | "execution_count": 35, 1045 | "metadata": {}, 1046 | "output_type": "execute_result" 1047 | } 1048 | ], 1049 | "source": [ 1050 | "# 求和\n", 1051 | "score_df['Score'].sum()" 1052 | ] 1053 | }, 1054 | { 1055 | "cell_type": "code", 1056 | "execution_count": 37, 1057 | "metadata": {}, 1058 | "outputs": [ 1059 | { 1060 | "data": { 1061 | "text/plain": [ 1062 | "20" 1063 | ] 1064 | }, 1065 | "execution_count": 37, 1066 | "metadata": {}, 1067 | "output_type": "execute_result" 1068 | } 1069 | ], 1070 | "source": [ 1071 | "# 观测数\n", 1072 | "score_count = score_df['Score'].count()\n", 1073 | "score_count" 1074 | ] 1075 | }, 1076 | { 1077 | "cell_type": "code", 1078 | "execution_count": 41, 1079 | "metadata": {}, 1080 | "outputs": [ 1081 | { 1082 | "data": { 1083 | "text/plain": [ 1084 | "2.6857077867462564" 1085 | ] 1086 | }, 1087 | "execution_count": 41, 1088 | "metadata": {}, 1089 | "output_type": "execute_result" 1090 | } 1091 | ], 1092 | "source": [ 1093 | "# 标准误差\n", 1094 | "score_se = source_std / (np.sqrt(score_count))\n", 1095 | "score_se" 1096 | ] 1097 | }, 1098 | { 1099 | "cell_type": "code", 1100 | "execution_count": 42, 1101 | "metadata": {}, 1102 | "outputs": [ 1103 | { 1104 | "data": { 1105 | "text/plain": [ 1106 | "5.386847637006618" 1107 | ] 1108 | }, 1109 | "execution_count": 42, 1110 | "metadata": {}, 1111 | "output_type": "execute_result" 1112 | } 1113 | ], 1114 | "source": [ 1115 | "# 置信度(95%)\n", 1116 | "score_confidence = 2.005745995 * score_se\n", 1117 | "score_confidence" 1118 | ] 1119 | }, 1120 | { 1121 | "cell_type": "markdown", 1122 | "metadata": {}, 1123 | "source": [ 1124 | "## 实验2-8 使用分类汇总计算描述统计量" 1125 | ] 1126 | }, 1127 | { 1128 | "cell_type": "markdown", 1129 | "metadata": {}, 1130 | "source": [ 1131 | "请按子公司分类统计员工的平均销售量" 1132 | ] 1133 | }, 1134 | { 1135 | "cell_type": "code", 1136 | "execution_count": 44, 1137 | "metadata": {}, 1138 | "outputs": [ 1139 | { 1140 | "data": { 1141 | "text/html": [ 1142 | "
\n", 1143 | "\n", 1156 | "\n", 1157 | " \n", 1158 | " \n", 1159 | " \n", 1160 | " \n", 1161 | " \n", 1162 | " \n", 1163 | " \n", 1164 | " \n", 1165 | " \n", 1166 | " \n", 1167 | " \n", 1168 | " \n", 1169 | " \n", 1170 | " \n", 1171 | " \n", 1172 | " \n", 1173 | " \n", 1174 | " \n", 1175 | " \n", 1176 | " \n", 1177 | " \n", 1178 | " \n", 1179 | " \n", 1180 | " \n", 1181 | " \n", 1182 | " \n", 1183 | " \n", 1184 | " \n", 1185 | " \n", 1186 | " \n", 1187 | " \n", 1188 | " \n", 1189 | " \n", 1190 | " \n", 1191 | " \n", 1192 | " \n", 1193 | " \n", 1194 | " \n", 1195 | " \n", 1196 | " \n", 1197 | " \n", 1198 | " \n", 1199 | " \n", 1200 | " \n", 1201 | " \n", 1202 | " \n", 1203 | " \n", 1204 | " \n", 1205 | " \n", 1206 | " \n", 1207 | " \n", 1208 | " \n", 1209 | " \n", 1210 | " \n", 1211 | " \n", 1212 | " \n", 1213 | " \n", 1214 | " \n", 1215 | " \n", 1216 | " \n", 1217 | " \n", 1218 | " \n", 1219 | " \n", 1220 | " \n", 1221 | " \n", 1222 | " \n", 1223 | " \n", 1224 | " \n", 1225 | " \n", 1226 | " \n", 1227 | " \n", 1228 | " \n", 1229 | " \n", 1230 | " \n", 1231 | " \n", 1232 | " \n", 1233 | " \n", 1234 | " \n", 1235 | " \n", 1236 | " \n", 1237 | " \n", 1238 | "
子公司员工性别销售量
0子公司1A11009
1子公司1A22125
2子公司2A31157
3子公司2A42045
4子公司2A52964
5子公司3A62769
6子公司3A71665
7子公司4A82745
8子公司4A91415
9子公司4A101306
\n", 1239 | "
" 1240 | ], 1241 | "text/plain": [ 1242 | " 子公司 员工 性别 销售量\n", 1243 | "0 子公司1 A1 男 1009\n", 1244 | "1 子公司1 A2 男 2125\n", 1245 | "2 子公司2 A3 男 1157\n", 1246 | "3 子公司2 A4 女 2045\n", 1247 | "4 子公司2 A5 女 2964\n", 1248 | "5 子公司3 A6 男 2769\n", 1249 | "6 子公司3 A7 女 1665\n", 1250 | "7 子公司4 A8 女 2745\n", 1251 | "8 子公司4 A9 男 1415\n", 1252 | "9 子公司4 A10 男 1306" 1253 | ] 1254 | }, 1255 | "execution_count": 44, 1256 | "metadata": {}, 1257 | "output_type": "execute_result" 1258 | } 1259 | ], 1260 | "source": [ 1261 | "sale_df = pd.read_excel('2-8.xlsx')\n", 1262 | "sale_df" 1263 | ] 1264 | }, 1265 | { 1266 | "cell_type": "code", 1267 | "execution_count": 47, 1268 | "metadata": {}, 1269 | "outputs": [ 1270 | { 1271 | "data": { 1272 | "text/plain": [ 1273 | "子公司\n", 1274 | "子公司1 1567.000000\n", 1275 | "子公司2 2055.333333\n", 1276 | "子公司3 2217.000000\n", 1277 | "子公司4 1822.000000\n", 1278 | "Name: 销售量, dtype: float64" 1279 | ] 1280 | }, 1281 | "execution_count": 47, 1282 | "metadata": {}, 1283 | "output_type": "execute_result" 1284 | } 1285 | ], 1286 | "source": [ 1287 | "# 使用pandas的groupby进行分组聚类,并计算每组平均值\n", 1288 | "sale_grounped = sale_df['销售量'].groupby(sale_df['子公司'])\n", 1289 | "sale_grounped.mean()\n", 1290 | "# 还可以用sum、count等方法进行不同的分类汇总" 1291 | ] 1292 | }, 1293 | { 1294 | "cell_type": "markdown", 1295 | "metadata": {}, 1296 | "source": [ 1297 | "## 实验2-9 使用数据透视表方法计算描述统计量" 1298 | ] 1299 | }, 1300 | { 1301 | "cell_type": "markdown", 1302 | "metadata": {}, 1303 | "source": [ 1304 | "根据实验2-8的数据,请按所在子公司和性别,对员工的平均销售量进行统计" 1305 | ] 1306 | }, 1307 | { 1308 | "cell_type": "code", 1309 | "execution_count": 48, 1310 | "metadata": {}, 1311 | "outputs": [ 1312 | { 1313 | "data": { 1314 | "text/html": [ 1315 | "
\n", 1316 | "\n", 1329 | "\n", 1330 | " \n", 1331 | " \n", 1332 | " \n", 1333 | " \n", 1334 | " \n", 1335 | " \n", 1336 | " \n", 1337 | " \n", 1338 | " \n", 1339 | " \n", 1340 | " \n", 1341 | " \n", 1342 | " \n", 1343 | " \n", 1344 | " \n", 1345 | " \n", 1346 | " \n", 1347 | " \n", 1348 | " \n", 1349 | " \n", 1350 | " \n", 1351 | " \n", 1352 | " \n", 1353 | " \n", 1354 | " \n", 1355 | " \n", 1356 | " \n", 1357 | " \n", 1358 | " \n", 1359 | " \n", 1360 | " \n", 1361 | " \n", 1362 | " \n", 1363 | " \n", 1364 | " \n", 1365 | " \n", 1366 | " \n", 1367 | " \n", 1368 | " \n", 1369 | " \n", 1370 | " \n", 1371 | " \n", 1372 | " \n", 1373 | " \n", 1374 | " \n", 1375 | " \n", 1376 | "
销售量
子公司性别
子公司11567.0
子公司22504.5
1157.0
子公司31665.0
2769.0
子公司42745.0
1360.5
\n", 1377 | "
" 1378 | ], 1379 | "text/plain": [ 1380 | " 销售量\n", 1381 | "子公司 性别 \n", 1382 | "子公司1 男 1567.0\n", 1383 | "子公司2 女 2504.5\n", 1384 | " 男 1157.0\n", 1385 | "子公司3 女 1665.0\n", 1386 | " 男 2769.0\n", 1387 | "子公司4 女 2745.0\n", 1388 | " 男 1360.5" 1389 | ] 1390 | }, 1391 | "execution_count": 48, 1392 | "metadata": {}, 1393 | "output_type": "execute_result" 1394 | } 1395 | ], 1396 | "source": [ 1397 | "# 使用pandas中的pivot_table\n", 1398 | "pd.pivot_table(sale_df,index=['子公司','性别'],values=['销售量'],aggfunc=np.mean)" 1399 | ] 1400 | }, 1401 | { 1402 | "cell_type": "code", 1403 | "execution_count": 50, 1404 | "metadata": {}, 1405 | "outputs": [ 1406 | { 1407 | "data": { 1408 | "text/html": [ 1409 | "
\n", 1410 | "\n", 1427 | "\n", 1428 | " \n", 1429 | " \n", 1430 | " \n", 1431 | " \n", 1432 | " \n", 1433 | " \n", 1434 | " \n", 1435 | " \n", 1436 | " \n", 1437 | " \n", 1438 | " \n", 1439 | " \n", 1440 | " \n", 1441 | " \n", 1442 | " \n", 1443 | " \n", 1444 | " \n", 1445 | " \n", 1446 | " \n", 1447 | " \n", 1448 | " \n", 1449 | " \n", 1450 | " \n", 1451 | " \n", 1452 | " \n", 1453 | " \n", 1454 | " \n", 1455 | " \n", 1456 | " \n", 1457 | " \n", 1458 | " \n", 1459 | " \n", 1460 | " \n", 1461 | " \n", 1462 | " \n", 1463 | " \n", 1464 | " \n", 1465 | " \n", 1466 | " \n", 1467 | " \n", 1468 | " \n", 1469 | " \n", 1470 | " \n", 1471 | " \n", 1472 | " \n", 1473 | " \n", 1474 | " \n", 1475 | " \n", 1476 | " \n", 1477 | " \n", 1478 | " \n", 1479 | " \n", 1480 | " \n", 1481 | " \n", 1482 | " \n", 1483 | " \n", 1484 | " \n", 1485 | " \n", 1486 | " \n", 1487 | " \n", 1488 | " \n", 1489 | " \n", 1490 | " \n", 1491 | " \n", 1492 | " \n", 1493 | " \n", 1494 | " \n", 1495 | " \n", 1496 | " \n", 1497 | " \n", 1498 | " \n", 1499 | "
summeanlen
销售量销售量销售量
子公司性别
子公司131341567.02
子公司250092504.52
11571157.01
子公司316651665.01
27692769.01
子公司427452745.01
27211360.52
\n", 1500 | "
" 1501 | ], 1502 | "text/plain": [ 1503 | " sum mean len\n", 1504 | " 销售量 销售量 销售量\n", 1505 | "子公司 性别 \n", 1506 | "子公司1 男 3134 1567.0 2\n", 1507 | "子公司2 女 5009 2504.5 2\n", 1508 | " 男 1157 1157.0 1\n", 1509 | "子公司3 女 1665 1665.0 1\n", 1510 | " 男 2769 2769.0 1\n", 1511 | "子公司4 女 2745 2745.0 1\n", 1512 | " 男 2721 1360.5 2" 1513 | ] 1514 | }, 1515 | "execution_count": 50, 1516 | "metadata": {}, 1517 | "output_type": "execute_result" 1518 | } 1519 | ], 1520 | "source": [ 1521 | "# 使用pandas中的pivot_table,aggfunc参数可接受一个列表,进行各类统计汇总\n", 1522 | "pd.pivot_table(sale_df,index=['子公司','性别'],values=['销售量'],aggfunc=[np.sum,np.mean,len])" 1523 | ] 1524 | }, 1525 | { 1526 | "cell_type": "code", 1527 | "execution_count": 58, 1528 | "metadata": {}, 1529 | "outputs": [ 1530 | { 1531 | "data": { 1532 | "text/html": [ 1533 | "
\n", 1534 | "\n", 1551 | "\n", 1552 | " \n", 1553 | " \n", 1554 | " \n", 1555 | " \n", 1556 | " \n", 1557 | " \n", 1558 | " \n", 1559 | " \n", 1560 | " \n", 1561 | " \n", 1562 | " \n", 1563 | " \n", 1564 | " \n", 1565 | " \n", 1566 | " \n", 1567 | " \n", 1568 | " \n", 1569 | " \n", 1570 | " \n", 1571 | " \n", 1572 | " \n", 1573 | " \n", 1574 | " \n", 1575 | " \n", 1576 | " \n", 1577 | " \n", 1578 | " \n", 1579 | " \n", 1580 | " \n", 1581 | " \n", 1582 | " \n", 1583 | " \n", 1584 | " \n", 1585 | " \n", 1586 | " \n", 1587 | " \n", 1588 | " \n", 1589 | " \n", 1590 | " \n", 1591 | " \n", 1592 | " \n", 1593 | " \n", 1594 | " \n", 1595 | " \n", 1596 | " \n", 1597 | " \n", 1598 | " \n", 1599 | " \n", 1600 | " \n", 1601 | " \n", 1602 | "
销售量
性别All
子公司
子公司10.001567.0000001567.000000
子公司22504.501157.0000002055.333333
子公司31665.002769.0000002217.000000
子公司42745.001360.5000001822.000000
All2354.751630.1666671920.000000
\n", 1603 | "
" 1604 | ], 1605 | "text/plain": [ 1606 | " 销售量 \n", 1607 | "性别 女 男 All\n", 1608 | "子公司 \n", 1609 | "子公司1 0.00 1567.000000 1567.000000\n", 1610 | "子公司2 2504.50 1157.000000 2055.333333\n", 1611 | "子公司3 1665.00 2769.000000 2217.000000\n", 1612 | "子公司4 2745.00 1360.500000 1822.000000\n", 1613 | "All 2354.75 1630.166667 1920.000000" 1614 | ] 1615 | }, 1616 | "execution_count": 58, 1617 | "metadata": {}, 1618 | "output_type": "execute_result" 1619 | } 1620 | ], 1621 | "source": [ 1622 | "# 使用pandas中的pivot_table,fill_value参数对空值进行填充,这里=0就是填充为0,margins设置为True则增加汇总列,columns就是列\n", 1623 | "pd.pivot_table(sale_df,\n", 1624 | " index=['子公司'],\n", 1625 | " columns=['性别'],\n", 1626 | " values=['销售量'],\n", 1627 | " aggfunc=np.mean,\n", 1628 | " fill_value=0,\n", 1629 | " margins=True)" 1630 | ] 1631 | }, 1632 | { 1633 | "cell_type": "markdown", 1634 | "metadata": {}, 1635 | "source": [ 1636 | "## 实验2-10 计算分组资料的描述统计量" 1637 | ] 1638 | }, 1639 | { 1640 | "cell_type": "markdown", 1641 | "metadata": {}, 1642 | "source": [ 1643 | "个人认为,这一部分书上介绍的意义不大。\n", 1644 | "下表中,x是每组的中位数,f是每组的数量。\n", 1645 | "书上要求利用x与f来计算平均值,标准差,偏度以及峰度。\n", 1646 | "主要是考察对公式的掌握。" 1647 | ] 1648 | }, 1649 | { 1650 | "cell_type": "code", 1651 | "execution_count": 61, 1652 | "metadata": {}, 1653 | "outputs": [ 1654 | { 1655 | "data": { 1656 | "text/html": [ 1657 | "
\n", 1658 | "\n", 1671 | "\n", 1672 | " \n", 1673 | " \n", 1674 | " \n", 1675 | " \n", 1676 | " \n", 1677 | " \n", 1678 | " \n", 1679 | " \n", 1680 | " \n", 1681 | " \n", 1682 | " \n", 1683 | " \n", 1684 | " \n", 1685 | " \n", 1686 | " \n", 1687 | " \n", 1688 | " \n", 1689 | " \n", 1690 | " \n", 1691 | " \n", 1692 | " \n", 1693 | " \n", 1694 | " \n", 1695 | " \n", 1696 | " \n", 1697 | " \n", 1698 | " \n", 1699 | " \n", 1700 | " \n", 1701 | " \n", 1702 | " \n", 1703 | " \n", 1704 | " \n", 1705 | " \n", 1706 | " \n", 1707 | " \n", 1708 | " \n", 1709 | " \n", 1710 | " \n", 1711 | " \n", 1712 | "
binsxf
01000以下9002
11000-120011008
21200-1400130016
31400-1600150035
41600-1800170023
\n", 1713 | "
" 1714 | ], 1715 | "text/plain": [ 1716 | " bins x f\n", 1717 | "0 1000以下 900 2\n", 1718 | "1 1000-1200 1100 8\n", 1719 | "2 1200-1400 1300 16\n", 1720 | "3 1400-1600 1500 35\n", 1721 | "4 1600-1800 1700 23" 1722 | ] 1723 | }, 1724 | "execution_count": 61, 1725 | "metadata": {}, 1726 | "output_type": "execute_result" 1727 | } 1728 | ], 1729 | "source": [ 1730 | "lamp_life_df = pd.read_excel('2-10.xlsx')\n", 1731 | "lamp_life_df" 1732 | ] 1733 | }, 1734 | { 1735 | "cell_type": "code", 1736 | "execution_count": 63, 1737 | "metadata": {}, 1738 | "outputs": [ 1739 | { 1740 | "data": { 1741 | "text/plain": [ 1742 | "1464.2857142857142" 1743 | ] 1744 | }, 1745 | "execution_count": 63, 1746 | "metadata": {}, 1747 | "output_type": "execute_result" 1748 | } 1749 | ], 1750 | "source": [ 1751 | "# 平均数\n", 1752 | "lamp_life_mean = (lamp_life_df['x']*lamp_life_df['f']).sum()/lamp_life_df['f'].sum()\n", 1753 | "lamp_life_mean" 1754 | ] 1755 | }, 1756 | { 1757 | "cell_type": "code", 1758 | "execution_count": 67, 1759 | "metadata": {}, 1760 | "outputs": [ 1761 | { 1762 | "data": { 1763 | "text/plain": [ 1764 | "202.7447710222652" 1765 | ] 1766 | }, 1767 | "execution_count": 67, 1768 | "metadata": {}, 1769 | "output_type": "execute_result" 1770 | } 1771 | ], 1772 | "source": [ 1773 | "# 标准差\n", 1774 | "lamp_life_std = np.sqrt((((lamp_life_df['x']-lamp_life_mean)**2)*lamp_life_df['f']).sum()/lamp_life_df['f'].sum())\n", 1775 | "lamp_life_std" 1776 | ] 1777 | }, 1778 | { 1779 | "cell_type": "code", 1780 | "execution_count": 70, 1781 | "metadata": {}, 1782 | "outputs": [ 1783 | { 1784 | "data": { 1785 | "text/plain": [ 1786 | "-6121720.116618068" 1787 | ] 1788 | }, 1789 | "execution_count": 70, 1790 | "metadata": {}, 1791 | "output_type": "execute_result" 1792 | } 1793 | ], 1794 | "source": [ 1795 | "# 三阶动差,为计算偏度做准备\n", 1796 | "lamp_life_three = (((lamp_life_df['x'] - lamp_life_mean)**3)*lamp_life_df['f']).sum()/lamp_life_df['f'].sum()\n", 1797 | "lamp_life_three" 1798 | ] 1799 | }, 1800 | { 1801 | "cell_type": "code", 1802 | "execution_count": 71, 1803 | "metadata": {}, 1804 | "outputs": [ 1805 | { 1806 | "data": { 1807 | "text/plain": [ 1808 | "5075925829.515478" 1809 | ] 1810 | }, 1811 | "execution_count": 71, 1812 | "metadata": {}, 1813 | "output_type": "execute_result" 1814 | } 1815 | ], 1816 | "source": [ 1817 | "# 四阶动差,为计算峰度做准备\n", 1818 | "lamp_life_four = (((lamp_life_df['x'] - lamp_life_mean)**4)*lamp_life_df['f']).sum()/lamp_life_df['f'].sum()\n", 1819 | "lamp_life_four" 1820 | ] 1821 | }, 1822 | { 1823 | "cell_type": "code", 1824 | "execution_count": 72, 1825 | "metadata": {}, 1826 | "outputs": [ 1827 | { 1828 | "data": { 1829 | "text/plain": [ 1830 | "-0.734555277612486" 1831 | ] 1832 | }, 1833 | "execution_count": 72, 1834 | "metadata": {}, 1835 | "output_type": "execute_result" 1836 | } 1837 | ], 1838 | "source": [ 1839 | "# 偏度\n", 1840 | "lamp_life_skew = lamp_life_three/(lamp_life_std**3)\n", 1841 | "lamp_life_skew" 1842 | ] 1843 | }, 1844 | { 1845 | "cell_type": "code", 1846 | "execution_count": 75, 1847 | "metadata": {}, 1848 | "outputs": [ 1849 | { 1850 | "data": { 1851 | "text/plain": [ 1852 | "0.0041154496430850784" 1853 | ] 1854 | }, 1855 | "execution_count": 75, 1856 | "metadata": {}, 1857 | "output_type": "execute_result" 1858 | } 1859 | ], 1860 | "source": [ 1861 | "# 峰度\n", 1862 | "lamp_life_kurt = lamp_life_four/lamp_life_std**4 - 3\n", 1863 | "lamp_life_kurt" 1864 | ] 1865 | }, 1866 | { 1867 | "cell_type": "markdown", 1868 | "metadata": {}, 1869 | "source": [ 1870 | "## 总结" 1871 | ] 1872 | }, 1873 | { 1874 | "cell_type": "markdown", 1875 | "metadata": {}, 1876 | "source": [ 1877 | "第二章作为初始的章节,基本没有难度,需要掌握的理论也比较少。但是这些基本指标,为后面进一步计算打下基础,不能忽视。" 1878 | ] 1879 | } 1880 | ], 1881 | "metadata": { 1882 | "kernelspec": { 1883 | "display_name": "Python [conda env:root] *", 1884 | "language": "python", 1885 | "name": "conda-root-py" 1886 | }, 1887 | "language_info": { 1888 | "codemirror_mode": { 1889 | "name": "ipython", 1890 | "version": 3 1891 | }, 1892 | "file_extension": ".py", 1893 | "mimetype": "text/x-python", 1894 | "name": "python", 1895 | "nbconvert_exporter": "python", 1896 | "pygments_lexer": "ipython3", 1897 | "version": "3.7.5" 1898 | } 1899 | }, 1900 | "nbformat": 4, 1901 | "nbformat_minor": 2 1902 | } 1903 | -------------------------------------------------------------------------------- /第五章 方差分析/5-1.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AirFin/Statistics_Python_Codes/684c651552e0012d085e72301c4fd210748e4a93/第五章 方差分析/5-1.xlsx -------------------------------------------------------------------------------- /第五章 方差分析/5-2.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AirFin/Statistics_Python_Codes/684c651552e0012d085e72301c4fd210748e4a93/第五章 方差分析/5-2.xlsx -------------------------------------------------------------------------------- /第五章 方差分析/5-3.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AirFin/Statistics_Python_Codes/684c651552e0012d085e72301c4fd210748e4a93/第五章 方差分析/5-3.xlsx -------------------------------------------------------------------------------- /第五章 方差分析/第五章 方差分析.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 第一节 单因素方差分析" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "## 实验5-1 单因素方差分析" 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "检查三家工厂的机器完成生产所需平均时间是否相同。三家工厂机器完成生产所需时间如下\n", 22 | "\n", 23 | "α=0.05\n", 24 | "\n", 25 | "H0:三家工厂的机器完成生产所需平均时间相同\n", 26 | " \n", 27 | "H1:三家工厂的机器完成生产所需平均时间不同" 28 | ] 29 | }, 30 | { 31 | "cell_type": "code", 32 | "execution_count": 9, 33 | "metadata": {}, 34 | "outputs": [], 35 | "source": [ 36 | "import pandas as pd \n", 37 | "import statsmodels.api as sm \n", 38 | "from statsmodels.formula.api import ols" 39 | ] 40 | }, 41 | { 42 | "cell_type": "code", 43 | "execution_count": 10, 44 | "metadata": {}, 45 | "outputs": [ 46 | { 47 | "data": { 48 | "text/html": [ 49 | "
\n", 50 | "\n", 63 | "\n", 64 | " \n", 65 | " \n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | "
123
0202820
1262619
2243123
3222722
\n", 99 | "
" 100 | ], 101 | "text/plain": [ 102 | " 1 2 3\n", 103 | "0 20 28 20\n", 104 | "1 26 26 19\n", 105 | "2 24 31 23\n", 106 | "3 22 27 22" 107 | ] 108 | }, 109 | "execution_count": 10, 110 | "metadata": {}, 111 | "output_type": "execute_result" 112 | } 113 | ], 114 | "source": [ 115 | "productivity_df = pd.read_excel('5-1.xlsx')\n", 116 | "productivity_df" 117 | ] 118 | }, 119 | { 120 | "cell_type": "code", 121 | "execution_count": 11, 122 | "metadata": {}, 123 | "outputs": [ 124 | { 125 | "data": { 126 | "text/html": [ 127 | "
\n", 128 | "\n", 141 | "\n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | "
factorytime_spent
0120
1126
2124
3122
4228
5226
6231
7227
8320
9319
10323
11322
\n", 212 | "
" 213 | ], 214 | "text/plain": [ 215 | " factory time_spent\n", 216 | "0 1 20\n", 217 | "1 1 26\n", 218 | "2 1 24\n", 219 | "3 1 22\n", 220 | "4 2 28\n", 221 | "5 2 26\n", 222 | "6 2 31\n", 223 | "7 2 27\n", 224 | "8 3 20\n", 225 | "9 3 19\n", 226 | "10 3 23\n", 227 | "11 3 22" 228 | ] 229 | }, 230 | "execution_count": 11, 231 | "metadata": {}, 232 | "output_type": "execute_result" 233 | } 234 | ], 235 | "source": [ 236 | "# 将原始数据的宽表转化成便于数据分析的长表\n", 237 | "productivity_df_long = productivity_df.melt(var_name='factory' , value_name='time_spent')\n", 238 | "productivity_df_long" 239 | ] 240 | }, 241 | { 242 | "cell_type": "code", 243 | "execution_count": 12, 244 | "metadata": {}, 245 | "outputs": [ 246 | { 247 | "data": { 248 | "text/html": [ 249 | "
\n", 250 | "\n", 267 | "\n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | " \n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | "
time_spent
countmeanstdmin25%50%75%max
factory
14.023.02.58198920.021.5023.024.5026.0
24.028.02.16024726.026.7527.528.7531.0
34.021.01.82574219.019.7521.022.2523.0
\n", 332 | "
" 333 | ], 334 | "text/plain": [ 335 | " time_spent \n", 336 | " count mean std min 25% 50% 75% max\n", 337 | "factory \n", 338 | "1 4.0 23.0 2.581989 20.0 21.50 23.0 24.50 26.0\n", 339 | "2 4.0 28.0 2.160247 26.0 26.75 27.5 28.75 31.0\n", 340 | "3 4.0 21.0 1.825742 19.0 19.75 21.0 22.25 23.0" 341 | ] 342 | }, 343 | "execution_count": 12, 344 | "metadata": {}, 345 | "output_type": "execute_result" 346 | } 347 | ], 348 | "source": [ 349 | "# 用pandas中groupby按工厂进行分组,然后用describe方法对各组进行描述性统计\n", 350 | "productivity_df_long.groupby('factory').describe()" 351 | ] 352 | }, 353 | { 354 | "cell_type": "code", 355 | "execution_count": 18, 356 | "metadata": {}, 357 | "outputs": [ 358 | { 359 | "data": { 360 | "text/html": [ 361 | "
\n", 362 | "\n", 375 | "\n", 376 | " \n", 377 | " \n", 378 | " \n", 379 | " \n", 380 | " \n", 381 | " \n", 382 | " \n", 383 | " \n", 384 | " \n", 385 | " \n", 386 | " \n", 387 | " \n", 388 | " \n", 389 | " \n", 390 | " \n", 391 | " \n", 392 | " \n", 393 | " \n", 394 | " \n", 395 | " \n", 396 | " \n", 397 | " \n", 398 | " \n", 399 | " \n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | "
dfsum_sqmean_sqFPR(>F)
C(factory)2.0104.052.00000010.6363640.00426
Residual9.044.04.888889NaNNaN
\n", 405 | "
" 406 | ], 407 | "text/plain": [ 408 | " df sum_sq mean_sq F PR(>F)\n", 409 | "C(factory) 2.0 104.0 52.000000 10.636364 0.00426\n", 410 | "Residual 9.0 44.0 4.888889 NaN NaN" 411 | ] 412 | }, 413 | "execution_count": 18, 414 | "metadata": {}, 415 | "output_type": "execute_result" 416 | } 417 | ], 418 | "source": [ 419 | "productivity_lm = ols('time_spent~C(factory)' , data=productivity_df_long).fit()\n", 420 | "sm.stats.anova_lm(productivity_lm)" 421 | ] 422 | }, 423 | { 424 | "cell_type": "markdown", 425 | "metadata": {}, 426 | "source": [ 427 | "结论\n", 428 | "\n", 429 | "由于p值 0.00426 < 0.05 ,因此在5%显著性水平下,应拒绝原假设,即三个厂机器完成工作时间是有显著差异的" 430 | ] 431 | }, 432 | { 433 | "cell_type": "markdown", 434 | "metadata": {}, 435 | "source": [ 436 | "## 第二节 双因素方差分析" 437 | ] 438 | }, 439 | { 440 | "cell_type": "markdown", 441 | "metadata": {}, 442 | "source": [ 443 | "## 实验5-2 无交互作用的双因素方差分析" 444 | ] 445 | }, 446 | { 447 | "cell_type": "markdown", 448 | "metadata": {}, 449 | "source": [ 450 | "三个品牌的手机在四个地区销售,销售数据如下。显著性水平5%,分析手机销售量是否由于品牌的不同和地区的不同而存在差异。" 451 | ] 452 | }, 453 | { 454 | "cell_type": "code", 455 | "execution_count": 2, 456 | "metadata": {}, 457 | "outputs": [], 458 | "source": [ 459 | "import pandas as pd \n", 460 | "import statsmodels.api as sm \n", 461 | "from statsmodels.formula.api import ols" 462 | ] 463 | }, 464 | { 465 | "cell_type": "code", 466 | "execution_count": 3, 467 | "metadata": {}, 468 | "outputs": [ 469 | { 470 | "data": { 471 | "text/html": [ 472 | "
\n", 473 | "\n", 486 | "\n", 487 | " \n", 488 | " \n", 489 | " \n", 490 | " \n", 491 | " \n", 492 | " \n", 493 | " \n", 494 | " \n", 495 | " \n", 496 | " \n", 497 | " \n", 498 | " \n", 499 | " \n", 500 | " \n", 501 | " \n", 502 | " \n", 503 | " \n", 504 | " \n", 505 | " \n", 506 | " \n", 507 | " \n", 508 | " \n", 509 | " \n", 510 | " \n", 511 | " \n", 512 | " \n", 513 | " \n", 514 | " \n", 515 | " \n", 516 | " \n", 517 | " \n", 518 | " \n", 519 | "
地区1地区2地区3地区4
品牌14.56.47.26.7
品牌28.87.89.67.0
品牌35.96.85.75.2
\n", 520 | "
" 521 | ], 522 | "text/plain": [ 523 | " 地区1 地区2 地区3 地区4\n", 524 | "品牌1 4.5 6.4 7.2 6.7\n", 525 | "品牌2 8.8 7.8 9.6 7.0\n", 526 | "品牌3 5.9 6.8 5.7 5.2" 527 | ] 528 | }, 529 | "execution_count": 3, 530 | "metadata": {}, 531 | "output_type": "execute_result" 532 | } 533 | ], 534 | "source": [ 535 | "sell_df = pd.read_excel('5-2.xlsx',index_col=0) # 注意索引\n", 536 | "sell_df" 537 | ] 538 | }, 539 | { 540 | "cell_type": "code", 541 | "execution_count": 4, 542 | "metadata": {}, 543 | "outputs": [ 544 | { 545 | "data": { 546 | "text/html": [ 547 | "
\n", 548 | "\n", 561 | "\n", 562 | " \n", 563 | " \n", 564 | " \n", 565 | " \n", 566 | " \n", 567 | " \n", 568 | " \n", 569 | " \n", 570 | " \n", 571 | " \n", 572 | " \n", 573 | " \n", 574 | " \n", 575 | " \n", 576 | " \n", 577 | " \n", 578 | " \n", 579 | " \n", 580 | " \n", 581 | " \n", 582 | " \n", 583 | " \n", 584 | " \n", 585 | " \n", 586 | " \n", 587 | " \n", 588 | " \n", 589 | " \n", 590 | " \n", 591 | " \n", 592 | " \n", 593 | " \n", 594 | " \n", 595 | " \n", 596 | " \n", 597 | " \n", 598 | "
brand地区1地区2地区3地区4
0品牌14.56.47.26.7
1品牌28.87.89.67.0
2品牌35.96.85.75.2
\n", 599 | "
" 600 | ], 601 | "text/plain": [ 602 | " brand 地区1 地区2 地区3 地区4\n", 603 | "0 品牌1 4.5 6.4 7.2 6.7\n", 604 | "1 品牌2 8.8 7.8 9.6 7.0\n", 605 | "2 品牌3 5.9 6.8 5.7 5.2" 606 | ] 607 | }, 608 | "execution_count": 4, 609 | "metadata": {}, 610 | "output_type": "execute_result" 611 | } 612 | ], 613 | "source": [ 614 | "# 清洗数据,变长表\n", 615 | "sell_df_n = sell_df.reset_index()\n", 616 | "sell_df_n = sell_df_n.rename(index = str , columns = {'index':'brand'})\n", 617 | "sell_df_n" 618 | ] 619 | }, 620 | { 621 | "cell_type": "code", 622 | "execution_count": 5, 623 | "metadata": {}, 624 | "outputs": [ 625 | { 626 | "data": { 627 | "text/html": [ 628 | "
\n", 629 | "\n", 642 | "\n", 643 | " \n", 644 | " \n", 645 | " \n", 646 | " \n", 647 | " \n", 648 | " \n", 649 | " \n", 650 | " \n", 651 | " \n", 652 | " \n", 653 | " \n", 654 | " \n", 655 | " \n", 656 | " \n", 657 | " \n", 658 | " \n", 659 | " \n", 660 | " \n", 661 | " \n", 662 | " \n", 663 | " \n", 664 | " \n", 665 | " \n", 666 | " \n", 667 | " \n", 668 | " \n", 669 | " \n", 670 | " \n", 671 | " \n", 672 | " \n", 673 | " \n", 674 | " \n", 675 | " \n", 676 | " \n", 677 | " \n", 678 | " \n", 679 | " \n", 680 | " \n", 681 | " \n", 682 | " \n", 683 | " \n", 684 | " \n", 685 | " \n", 686 | " \n", 687 | " \n", 688 | " \n", 689 | " \n", 690 | " \n", 691 | " \n", 692 | " \n", 693 | " \n", 694 | " \n", 695 | " \n", 696 | " \n", 697 | " \n", 698 | " \n", 699 | " \n", 700 | " \n", 701 | " \n", 702 | " \n", 703 | " \n", 704 | " \n", 705 | " \n", 706 | " \n", 707 | " \n", 708 | " \n", 709 | " \n", 710 | " \n", 711 | " \n", 712 | " \n", 713 | " \n", 714 | " \n", 715 | " \n", 716 | " \n", 717 | " \n", 718 | " \n", 719 | " \n", 720 | " \n", 721 | " \n", 722 | " \n", 723 | " \n", 724 | " \n", 725 | "
brandareasell
0品牌1地区14.5
1品牌2地区18.8
2品牌3地区15.9
3品牌1地区26.4
4品牌2地区27.8
5品牌3地区26.8
6品牌1地区37.2
7品牌2地区39.6
8品牌3地区35.7
9品牌1地区46.7
10品牌2地区47.0
11品牌3地区45.2
\n", 726 | "
" 727 | ], 728 | "text/plain": [ 729 | " brand area sell\n", 730 | "0 品牌1 地区1 4.5\n", 731 | "1 品牌2 地区1 8.8\n", 732 | "2 品牌3 地区1 5.9\n", 733 | "3 品牌1 地区2 6.4\n", 734 | "4 品牌2 地区2 7.8\n", 735 | "5 品牌3 地区2 6.8\n", 736 | "6 品牌1 地区3 7.2\n", 737 | "7 品牌2 地区3 9.6\n", 738 | "8 品牌3 地区3 5.7\n", 739 | "9 品牌1 地区4 6.7\n", 740 | "10 品牌2 地区4 7.0\n", 741 | "11 品牌3 地区4 5.2" 742 | ] 743 | }, 744 | "execution_count": 5, 745 | "metadata": {}, 746 | "output_type": "execute_result" 747 | } 748 | ], 749 | "source": [ 750 | "sell_df_long = sell_df_n.melt(id_vars='brand' , var_name='area' , value_name='sell')\n", 751 | "sell_df_long" 752 | ] 753 | }, 754 | { 755 | "cell_type": "code", 756 | "execution_count": 6, 757 | "metadata": {}, 758 | "outputs": [ 759 | { 760 | "data": { 761 | "text/html": [ 762 | "
\n", 763 | "\n", 776 | "\n", 777 | " \n", 778 | " \n", 779 | " \n", 780 | " \n", 781 | " \n", 782 | " \n", 783 | " \n", 784 | " \n", 785 | " \n", 786 | " \n", 787 | " \n", 788 | " \n", 789 | " \n", 790 | " \n", 791 | " \n", 792 | " \n", 793 | " \n", 794 | " \n", 795 | " \n", 796 | " \n", 797 | " \n", 798 | " \n", 799 | " \n", 800 | " \n", 801 | " \n", 802 | " \n", 803 | " \n", 804 | " \n", 805 | " \n", 806 | " \n", 807 | " \n", 808 | " \n", 809 | " \n", 810 | " \n", 811 | " \n", 812 | " \n", 813 | "
dfsum_sqmean_sqFPR(>F)
C(brand)2.013.686.8400006.2370820.034258
C(area)3.02.820.9400000.8571430.512185
Residual6.06.581.096667NaNNaN
\n", 814 | "
" 815 | ], 816 | "text/plain": [ 817 | " df sum_sq mean_sq F PR(>F)\n", 818 | "C(brand) 2.0 13.68 6.840000 6.237082 0.034258\n", 819 | "C(area) 3.0 2.82 0.940000 0.857143 0.512185\n", 820 | "Residual 6.0 6.58 1.096667 NaN NaN" 821 | ] 822 | }, 823 | "execution_count": 6, 824 | "metadata": {}, 825 | "output_type": "execute_result" 826 | } 827 | ], 828 | "source": [ 829 | "sell_lm = ols('sell~C(brand)+C(area)' , data=sell_df_long).fit()\n", 830 | "sm.stats.anova_lm(sell_lm) " 831 | ] 832 | }, 833 | { 834 | "cell_type": "markdown", 835 | "metadata": {}, 836 | "source": [ 837 | "结论\n", 838 | "\n", 839 | "对于品牌因素 p值0.034258 < 0.05 , 拒绝原假设,即品牌是重要的影响因素。\n", 840 | "\n", 841 | "对于地区因素 p值0.512185 > 0.05 , 拒绝无法原假设,即品牌不是重要的影响因素。" 842 | ] 843 | }, 844 | { 845 | "cell_type": "markdown", 846 | "metadata": {}, 847 | "source": [ 848 | "## 实验5-3 有交互作用的双因素方差分析" 849 | ] 850 | }, 851 | { 852 | "cell_type": "markdown", 853 | "metadata": {}, 854 | "source": [ 855 | "五一与十一两个黄金周,四个旅游线路,旅游情况如下,显著性水平5%\n", 856 | "\n", 857 | "判断:\n", 858 | "\n", 859 | "旅游线路之间是否存在差异?\n", 860 | "\n", 861 | "两个黄金周之间是否存在差异?\n", 862 | "\n", 863 | "是否存在线路与黄金周交互作用的影响?" 864 | ] 865 | }, 866 | { 867 | "cell_type": "code", 868 | "execution_count": 7, 869 | "metadata": {}, 870 | "outputs": [], 871 | "source": [ 872 | "import pandas as pd \n", 873 | "import statsmodels.api as sm \n", 874 | "from statsmodels.formula.api import ols" 875 | ] 876 | }, 877 | { 878 | "cell_type": "code", 879 | "execution_count": 8, 880 | "metadata": {}, 881 | "outputs": [ 882 | { 883 | "data": { 884 | "text/html": [ 885 | "
\n", 886 | "\n", 899 | "\n", 900 | " \n", 901 | " \n", 902 | " \n", 903 | " \n", 904 | " \n", 905 | " \n", 906 | " \n", 907 | " \n", 908 | " \n", 909 | " \n", 910 | " \n", 911 | " \n", 912 | " \n", 913 | " \n", 914 | " \n", 915 | " \n", 916 | " \n", 917 | " \n", 918 | " \n", 919 | " \n", 920 | " \n", 921 | " \n", 922 | " \n", 923 | " \n", 924 | " \n", 925 | " \n", 926 | " \n", 927 | " \n", 928 | " \n", 929 | " \n", 930 | " \n", 931 | " \n", 932 | " \n", 933 | " \n", 934 | " \n", 935 | " \n", 936 | " \n", 937 | " \n", 938 | " \n", 939 | " \n", 940 | " \n", 941 | " \n", 942 | " \n", 943 | " \n", 944 | " \n", 945 | " \n", 946 | " \n", 947 | " \n", 948 | " \n", 949 | " \n", 950 | " \n", 951 | " \n", 952 | " \n", 953 | " \n", 954 | " \n", 955 | " \n", 956 | " \n", 957 | " \n", 958 | " \n", 959 | " \n", 960 | " \n", 961 | " \n", 962 | " \n", 963 | " \n", 964 | " \n", 965 | " \n", 966 | " \n", 967 | " \n", 968 | " \n", 969 | " \n", 970 | " \n", 971 | " \n", 972 | " \n", 973 | " \n", 974 | " \n", 975 | " \n", 976 | " \n", 977 | " \n", 978 | " \n", 979 | " \n", 980 | " \n", 981 | "
线路A线路B线路C线路D
五一3122148
NaN2923164
NaN3226206
NaN3025155
NaN3024185
十一2521165
NaN2220137
NaN2716158
NaN2619127
NaN2215108
\n", 982 | "
" 983 | ], 984 | "text/plain": [ 985 | " 线路A 线路B 线路C 线路D\n", 986 | "五一 31 22 14 8\n", 987 | "NaN 29 23 16 4\n", 988 | "NaN 32 26 20 6\n", 989 | "NaN 30 25 15 5\n", 990 | "NaN 30 24 18 5\n", 991 | "十一 25 21 16 5\n", 992 | "NaN 22 20 13 7\n", 993 | "NaN 27 16 15 8\n", 994 | "NaN 26 19 12 7\n", 995 | "NaN 22 15 10 8" 996 | ] 997 | }, 998 | "execution_count": 8, 999 | "metadata": {}, 1000 | "output_type": "execute_result" 1001 | } 1002 | ], 1003 | "source": [ 1004 | "tourist_df = pd.read_excel('5-3.xlsx' , index_col=0) # 注意索引\n", 1005 | "tourist_df" 1006 | ] 1007 | }, 1008 | { 1009 | "cell_type": "code", 1010 | "execution_count": 9, 1011 | "metadata": {}, 1012 | "outputs": [ 1013 | { 1014 | "data": { 1015 | "text/html": [ 1016 | "
\n", 1017 | "\n", 1030 | "\n", 1031 | " \n", 1032 | " \n", 1033 | " \n", 1034 | " \n", 1035 | " \n", 1036 | " \n", 1037 | " \n", 1038 | " \n", 1039 | " \n", 1040 | " \n", 1041 | " \n", 1042 | " \n", 1043 | " \n", 1044 | " \n", 1045 | " \n", 1046 | " \n", 1047 | " \n", 1048 | " \n", 1049 | " \n", 1050 | " \n", 1051 | " \n", 1052 | " \n", 1053 | " \n", 1054 | " \n", 1055 | " \n", 1056 | " \n", 1057 | " \n", 1058 | " \n", 1059 | " \n", 1060 | " \n", 1061 | " \n", 1062 | " \n", 1063 | " \n", 1064 | " \n", 1065 | " \n", 1066 | " \n", 1067 | " \n", 1068 | " \n", 1069 | " \n", 1070 | " \n", 1071 | " \n", 1072 | " \n", 1073 | " \n", 1074 | " \n", 1075 | " \n", 1076 | " \n", 1077 | " \n", 1078 | " \n", 1079 | " \n", 1080 | " \n", 1081 | " \n", 1082 | " \n", 1083 | " \n", 1084 | " \n", 1085 | " \n", 1086 | " \n", 1087 | " \n", 1088 | " \n", 1089 | " \n", 1090 | " \n", 1091 | " \n", 1092 | " \n", 1093 | " \n", 1094 | " \n", 1095 | " \n", 1096 | " \n", 1097 | " \n", 1098 | " \n", 1099 | " \n", 1100 | " \n", 1101 | " \n", 1102 | " \n", 1103 | " \n", 1104 | " \n", 1105 | " \n", 1106 | " \n", 1107 | " \n", 1108 | " \n", 1109 | " \n", 1110 | " \n", 1111 | " \n", 1112 | " \n", 1113 | " \n", 1114 | " \n", 1115 | " \n", 1116 | " \n", 1117 | " \n", 1118 | " \n", 1119 | " \n", 1120 | " \n", 1121 | " \n", 1122 | " \n", 1123 | "
period线路A线路B线路C线路D
0五一3122148
1五一2923164
2五一3226206
3五一3025155
4五一3024185
5十一2521165
6十一2220137
7十一2716158
8十一2619127
9十一2215108
\n", 1124 | "
" 1125 | ], 1126 | "text/plain": [ 1127 | " period 线路A 线路B 线路C 线路D\n", 1128 | "0 五一 31 22 14 8\n", 1129 | "1 五一 29 23 16 4\n", 1130 | "2 五一 32 26 20 6\n", 1131 | "3 五一 30 25 15 5\n", 1132 | "4 五一 30 24 18 5\n", 1133 | "5 十一 25 21 16 5\n", 1134 | "6 十一 22 20 13 7\n", 1135 | "7 十一 27 16 15 8\n", 1136 | "8 十一 26 19 12 7\n", 1137 | "9 十一 22 15 10 8" 1138 | ] 1139 | }, 1140 | "execution_count": 9, 1141 | "metadata": {}, 1142 | "output_type": "execute_result" 1143 | } 1144 | ], 1145 | "source": [ 1146 | "# 数据清洗,转换表\n", 1147 | "tourist_df_n = tourist_df.reset_index()\n", 1148 | "\n", 1149 | "# 重命名黄金周变量为period,原变量名为index\n", 1150 | "tourist_df_n = tourist_df_n.rename(index=str , columns = {'index':'period'})\n", 1151 | "\n", 1152 | "# 填充原始数据中黄金周变量的缺失值\n", 1153 | "tourist_df_n.loc[0:5 , 'period'] = '五一'\n", 1154 | "tourist_df_n.loc[6:10 , 'period'] = '十一'\n", 1155 | "\n", 1156 | "tourist_df_n" 1157 | ] 1158 | }, 1159 | { 1160 | "cell_type": "code", 1161 | "execution_count": 10, 1162 | "metadata": {}, 1163 | "outputs": [ 1164 | { 1165 | "data": { 1166 | "text/html": [ 1167 | "
\n", 1168 | "\n", 1181 | "\n", 1182 | " \n", 1183 | " \n", 1184 | " \n", 1185 | " \n", 1186 | " \n", 1187 | " \n", 1188 | " \n", 1189 | " \n", 1190 | " \n", 1191 | " \n", 1192 | " \n", 1193 | " \n", 1194 | " \n", 1195 | " \n", 1196 | " \n", 1197 | " \n", 1198 | " \n", 1199 | " \n", 1200 | " \n", 1201 | " \n", 1202 | " \n", 1203 | " \n", 1204 | " \n", 1205 | " \n", 1206 | " \n", 1207 | " \n", 1208 | " \n", 1209 | " \n", 1210 | " \n", 1211 | " \n", 1212 | " \n", 1213 | " \n", 1214 | " \n", 1215 | " \n", 1216 | " \n", 1217 | " \n", 1218 | " \n", 1219 | " \n", 1220 | " \n", 1221 | " \n", 1222 | " \n", 1223 | " \n", 1224 | " \n", 1225 | " \n", 1226 | " \n", 1227 | " \n", 1228 | " \n", 1229 | " \n", 1230 | " \n", 1231 | " \n", 1232 | " \n", 1233 | " \n", 1234 | " \n", 1235 | " \n", 1236 | " \n", 1237 | " \n", 1238 | " \n", 1239 | " \n", 1240 | " \n", 1241 | " \n", 1242 | " \n", 1243 | " \n", 1244 | " \n", 1245 | " \n", 1246 | " \n", 1247 | " \n", 1248 | " \n", 1249 | " \n", 1250 | " \n", 1251 | " \n", 1252 | " \n", 1253 | " \n", 1254 | " \n", 1255 | " \n", 1256 | " \n", 1257 | " \n", 1258 | " \n", 1259 | " \n", 1260 | " \n", 1261 | " \n", 1262 | " \n", 1263 | " \n", 1264 | " \n", 1265 | " \n", 1266 | " \n", 1267 | " \n", 1268 | " \n", 1269 | " \n", 1270 | " \n", 1271 | " \n", 1272 | " \n", 1273 | " \n", 1274 | " \n", 1275 | " \n", 1276 | " \n", 1277 | " \n", 1278 | " \n", 1279 | " \n", 1280 | " \n", 1281 | " \n", 1282 | " \n", 1283 | " \n", 1284 | " \n", 1285 | " \n", 1286 | " \n", 1287 | " \n", 1288 | " \n", 1289 | " \n", 1290 | " \n", 1291 | " \n", 1292 | " \n", 1293 | " \n", 1294 | " \n", 1295 | " \n", 1296 | " \n", 1297 | " \n", 1298 | " \n", 1299 | " \n", 1300 | " \n", 1301 | " \n", 1302 | " \n", 1303 | " \n", 1304 | " \n", 1305 | " \n", 1306 | " \n", 1307 | " \n", 1308 | " \n", 1309 | " \n", 1310 | " \n", 1311 | " \n", 1312 | " \n", 1313 | " \n", 1314 | " \n", 1315 | " \n", 1316 | " \n", 1317 | " \n", 1318 | " \n", 1319 | " \n", 1320 | " \n", 1321 | " \n", 1322 | " \n", 1323 | " \n", 1324 | " \n", 1325 | " \n", 1326 | " \n", 1327 | " \n", 1328 | " \n", 1329 | " \n", 1330 | " \n", 1331 | " \n", 1332 | " \n", 1333 | " \n", 1334 | " \n", 1335 | " \n", 1336 | " \n", 1337 | " \n", 1338 | " \n", 1339 | " \n", 1340 | " \n", 1341 | " \n", 1342 | " \n", 1343 | " \n", 1344 | " \n", 1345 | " \n", 1346 | " \n", 1347 | " \n", 1348 | " \n", 1349 | " \n", 1350 | " \n", 1351 | " \n", 1352 | " \n", 1353 | " \n", 1354 | " \n", 1355 | " \n", 1356 | " \n", 1357 | " \n", 1358 | " \n", 1359 | " \n", 1360 | " \n", 1361 | " \n", 1362 | " \n", 1363 | " \n", 1364 | " \n", 1365 | " \n", 1366 | " \n", 1367 | " \n", 1368 | " \n", 1369 | " \n", 1370 | " \n", 1371 | " \n", 1372 | " \n", 1373 | " \n", 1374 | " \n", 1375 | " \n", 1376 | " \n", 1377 | " \n", 1378 | " \n", 1379 | " \n", 1380 | " \n", 1381 | " \n", 1382 | " \n", 1383 | " \n", 1384 | " \n", 1385 | " \n", 1386 | " \n", 1387 | " \n", 1388 | " \n", 1389 | " \n", 1390 | " \n", 1391 | " \n", 1392 | " \n", 1393 | " \n", 1394 | " \n", 1395 | " \n", 1396 | " \n", 1397 | " \n", 1398 | " \n", 1399 | " \n", 1400 | " \n", 1401 | " \n", 1402 | " \n", 1403 | " \n", 1404 | " \n", 1405 | " \n", 1406 | " \n", 1407 | " \n", 1408 | " \n", 1409 | " \n", 1410 | " \n", 1411 | " \n", 1412 | " \n", 1413 | " \n", 1414 | " \n", 1415 | " \n", 1416 | " \n", 1417 | " \n", 1418 | " \n", 1419 | " \n", 1420 | " \n", 1421 | " \n", 1422 | " \n", 1423 | " \n", 1424 | " \n", 1425 | " \n", 1426 | " \n", 1427 | " \n", 1428 | " \n", 1429 | " \n", 1430 | " \n", 1431 | " \n", 1432 | "
periodplanamount
0五一线路A31
1五一线路A29
2五一线路A32
3五一线路A30
4五一线路A30
5十一线路A25
6十一线路A22
7十一线路A27
8十一线路A26
9十一线路A22
10五一线路B22
11五一线路B23
12五一线路B26
13五一线路B25
14五一线路B24
15十一线路B21
16十一线路B20
17十一线路B16
18十一线路B19
19十一线路B15
20五一线路C14
21五一线路C16
22五一线路C20
23五一线路C15
24五一线路C18
25十一线路C16
26十一线路C13
27十一线路C15
28十一线路C12
29十一线路C10
30五一线路D8
31五一线路D4
32五一线路D6
33五一线路D5
34五一线路D5
35十一线路D5
36十一线路D7
37十一线路D8
38十一线路D7
39十一线路D8
\n", 1433 | "
" 1434 | ], 1435 | "text/plain": [ 1436 | " period plan amount\n", 1437 | "0 五一 线路A 31\n", 1438 | "1 五一 线路A 29\n", 1439 | "2 五一 线路A 32\n", 1440 | "3 五一 线路A 30\n", 1441 | "4 五一 线路A 30\n", 1442 | "5 十一 线路A 25\n", 1443 | "6 十一 线路A 22\n", 1444 | "7 十一 线路A 27\n", 1445 | "8 十一 线路A 26\n", 1446 | "9 十一 线路A 22\n", 1447 | "10 五一 线路B 22\n", 1448 | "11 五一 线路B 23\n", 1449 | "12 五一 线路B 26\n", 1450 | "13 五一 线路B 25\n", 1451 | "14 五一 线路B 24\n", 1452 | "15 十一 线路B 21\n", 1453 | "16 十一 线路B 20\n", 1454 | "17 十一 线路B 16\n", 1455 | "18 十一 线路B 19\n", 1456 | "19 十一 线路B 15\n", 1457 | "20 五一 线路C 14\n", 1458 | "21 五一 线路C 16\n", 1459 | "22 五一 线路C 20\n", 1460 | "23 五一 线路C 15\n", 1461 | "24 五一 线路C 18\n", 1462 | "25 十一 线路C 16\n", 1463 | "26 十一 线路C 13\n", 1464 | "27 十一 线路C 15\n", 1465 | "28 十一 线路C 12\n", 1466 | "29 十一 线路C 10\n", 1467 | "30 五一 线路D 8\n", 1468 | "31 五一 线路D 4\n", 1469 | "32 五一 线路D 6\n", 1470 | "33 五一 线路D 5\n", 1471 | "34 五一 线路D 5\n", 1472 | "35 十一 线路D 5\n", 1473 | "36 十一 线路D 7\n", 1474 | "37 十一 线路D 8\n", 1475 | "38 十一 线路D 7\n", 1476 | "39 十一 线路D 8" 1477 | ] 1478 | }, 1479 | "execution_count": 10, 1480 | "metadata": {}, 1481 | "output_type": "execute_result" 1482 | } 1483 | ], 1484 | "source": [ 1485 | "# 转换长表\n", 1486 | "tourist_df_long = tourist_df_n.melt(id_vars='period' , var_name='plan' , value_name='amount')\n", 1487 | "tourist_df_long" 1488 | ] 1489 | }, 1490 | { 1491 | "cell_type": "code", 1492 | "execution_count": 11, 1493 | "metadata": {}, 1494 | "outputs": [ 1495 | { 1496 | "data": { 1497 | "text/html": [ 1498 | "
\n", 1499 | "\n", 1512 | "\n", 1513 | " \n", 1514 | " \n", 1515 | " \n", 1516 | " \n", 1517 | " \n", 1518 | " \n", 1519 | " \n", 1520 | " \n", 1521 | " \n", 1522 | " \n", 1523 | " \n", 1524 | " \n", 1525 | " \n", 1526 | " \n", 1527 | " \n", 1528 | " \n", 1529 | " \n", 1530 | " \n", 1531 | " \n", 1532 | " \n", 1533 | " \n", 1534 | " \n", 1535 | " \n", 1536 | " \n", 1537 | " \n", 1538 | " \n", 1539 | " \n", 1540 | " \n", 1541 | " \n", 1542 | " \n", 1543 | " \n", 1544 | " \n", 1545 | " \n", 1546 | " \n", 1547 | " \n", 1548 | " \n", 1549 | " \n", 1550 | " \n", 1551 | " \n", 1552 | " \n", 1553 | " \n", 1554 | " \n", 1555 | " \n", 1556 | " \n", 1557 | "
dfsum_sqmean_sqFPR(>F)
C(period)1.0119.025119.02500030.6173634.208408e-06
C(plan)3.02431.475810.491667208.4866024.475032e-21
C(period):C(plan)3.088.87529.6250007.6205795.564285e-04
Residual32.0124.4003.887500NaNNaN
\n", 1558 | "
" 1559 | ], 1560 | "text/plain": [ 1561 | " df sum_sq mean_sq F PR(>F)\n", 1562 | "C(period) 1.0 119.025 119.025000 30.617363 4.208408e-06\n", 1563 | "C(plan) 3.0 2431.475 810.491667 208.486602 4.475032e-21\n", 1564 | "C(period):C(plan) 3.0 88.875 29.625000 7.620579 5.564285e-04\n", 1565 | "Residual 32.0 124.400 3.887500 NaN NaN" 1566 | ] 1567 | }, 1568 | "execution_count": 11, 1569 | "metadata": {}, 1570 | "output_type": "execute_result" 1571 | } 1572 | ], 1573 | "source": [ 1574 | "tourist_lm = ols('amount~C(period)+C(plan)+C(period):C(plan)' , data=tourist_df_long).fit()\n", 1575 | "sm.stats.anova_lm(tourist_lm) " 1576 | ] 1577 | }, 1578 | { 1579 | "cell_type": "markdown", 1580 | "metadata": {}, 1581 | "source": [ 1582 | "结论\n", 1583 | "\n", 1584 | "对于period因素,4.208408e-06 < 0.05 , 拒绝原假设,即不同时期有显著影响\n", 1585 | "\n", 1586 | "对于plan因素,4.475032e-21 < 0.05 , 拒绝原假设,即不同旅游路线有显著影响\n", 1587 | "\n", 1588 | "对于交互因素,5.564285e-04 < 0.05 , 拒绝原假设,旅游线路与黄金周的交互作用对游客人数产生显著影响(旅游景点具有一定季节性特征,两者相互作用影响游客人数)" 1589 | ] 1590 | }, 1591 | { 1592 | "cell_type": "code", 1593 | "execution_count": null, 1594 | "metadata": {}, 1595 | "outputs": [], 1596 | "source": [] 1597 | } 1598 | ], 1599 | "metadata": { 1600 | "kernelspec": { 1601 | "display_name": "Python [conda env:root] *", 1602 | "language": "python", 1603 | "name": "conda-root-py" 1604 | }, 1605 | "language_info": { 1606 | "codemirror_mode": { 1607 | "name": "ipython", 1608 | "version": 3 1609 | }, 1610 | "file_extension": ".py", 1611 | "mimetype": "text/x-python", 1612 | "name": "python", 1613 | "nbconvert_exporter": "python", 1614 | "pygments_lexer": "ipython3", 1615 | "version": "3.7.5" 1616 | } 1617 | }, 1618 | "nbformat": 4, 1619 | "nbformat_minor": 2 1620 | } 1621 | -------------------------------------------------------------------------------- /第六章 相关与回归分析/6-1.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AirFin/Statistics_Python_Codes/684c651552e0012d085e72301c4fd210748e4a93/第六章 相关与回归分析/6-1.xlsx -------------------------------------------------------------------------------- /第六章 相关与回归分析/6-5.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AirFin/Statistics_Python_Codes/684c651552e0012d085e72301c4fd210748e4a93/第六章 相关与回归分析/6-5.xlsx -------------------------------------------------------------------------------- /第六章 相关与回归分析/6-6.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AirFin/Statistics_Python_Codes/684c651552e0012d085e72301c4fd210748e4a93/第六章 相关与回归分析/6-6.xlsx -------------------------------------------------------------------------------- /第六章 相关与回归分析/第六章 相关与回归分析.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 第一节 相关分析" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "## 实验6-1 计算协方差" 15 | ] 16 | }, 17 | { 18 | "cell_type": "code", 19 | "execution_count": 1, 20 | "metadata": {}, 21 | "outputs": [], 22 | "source": [ 23 | "import numpy as np\n", 24 | "import pandas as pd " 25 | ] 26 | }, 27 | { 28 | "cell_type": "code", 29 | "execution_count": 2, 30 | "metadata": {}, 31 | "outputs": [ 32 | { 33 | "data": { 34 | "text/html": [ 35 | "
\n", 36 | "\n", 49 | "\n", 50 | " \n", 51 | " \n", 52 | " \n", 53 | " \n", 54 | " \n", 55 | " \n", 56 | " \n", 57 | " \n", 58 | " \n", 59 | " \n", 60 | " \n", 61 | " \n", 62 | " \n", 63 | " \n", 64 | " \n", 65 | " \n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | "
DJIASPX
Date
17715942
27442915
37581928
47572928
57881963
67823955
78149984
87838953
97756947
107679936
\n", 115 | "
" 116 | ], 117 | "text/plain": [ 118 | " DJIA SPX\n", 119 | "Date \n", 120 | "1 7715 942\n", 121 | "2 7442 915\n", 122 | "3 7581 928\n", 123 | "4 7572 928\n", 124 | "5 7881 963\n", 125 | "6 7823 955\n", 126 | "7 8149 984\n", 127 | "8 7838 953\n", 128 | "9 7756 947\n", 129 | "10 7679 936" 130 | ] 131 | }, 132 | "execution_count": 2, 133 | "metadata": {}, 134 | "output_type": "execute_result" 135 | } 136 | ], 137 | "source": [ 138 | "index_df = pd.read_excel('6-1.xlsx', index_col=0) # 注意索引\n", 139 | "index_df" 140 | ] 141 | }, 142 | { 143 | "cell_type": "code", 144 | "execution_count": 3, 145 | "metadata": {}, 146 | "outputs": [ 147 | { 148 | "data": { 149 | "text/html": [ 150 | "
\n", 151 | "\n", 164 | "\n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | "
DJIASPX
DJIA38937.3777783917.155556
SPX3917.155556397.877778
\n", 185 | "
" 186 | ], 187 | "text/plain": [ 188 | " DJIA SPX\n", 189 | "DJIA 38937.377778 3917.155556\n", 190 | "SPX 3917.155556 397.877778" 191 | ] 192 | }, 193 | "execution_count": 3, 194 | "metadata": {}, 195 | "output_type": "execute_result" 196 | } 197 | ], 198 | "source": [ 199 | "# 无偏估计\n", 200 | "index_df.cov()" 201 | ] 202 | }, 203 | { 204 | "cell_type": "code", 205 | "execution_count": 4, 206 | "metadata": {}, 207 | "outputs": [ 208 | { 209 | "data": { 210 | "text/plain": [ 211 | "array([[35043.64, 3525.44],\n", 212 | " [ 3525.44, 358.09]])" 213 | ] 214 | }, 215 | "execution_count": 4, 216 | "metadata": {}, 217 | "output_type": "execute_result" 218 | } 219 | ], 220 | "source": [ 221 | "# 有偏估计\n", 222 | "np.cov(index_df.values , rowvar=False , ddof=0)" 223 | ] 224 | }, 225 | { 226 | "cell_type": "markdown", 227 | "metadata": {}, 228 | "source": [ 229 | "## 实验6-2 计算相关系数" 230 | ] 231 | }, 232 | { 233 | "cell_type": "markdown", 234 | "metadata": {}, 235 | "source": [ 236 | "利用实验6-1的数据" 237 | ] 238 | }, 239 | { 240 | "cell_type": "code", 241 | "execution_count": 5, 242 | "metadata": {}, 243 | "outputs": [ 244 | { 245 | "data": { 246 | "text/html": [ 247 | "
\n", 248 | "\n", 261 | "\n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | "
DJIASPX
DJIA1.0000000.995205
SPX0.9952051.000000
\n", 282 | "
" 283 | ], 284 | "text/plain": [ 285 | " DJIA SPX\n", 286 | "DJIA 1.000000 0.995205\n", 287 | "SPX 0.995205 1.000000" 288 | ] 289 | }, 290 | "execution_count": 5, 291 | "metadata": {}, 292 | "output_type": "execute_result" 293 | } 294 | ], 295 | "source": [ 296 | "# 方法1,利用pandas中的corr方法\n", 297 | "index_df.corr()" 298 | ] 299 | }, 300 | { 301 | "cell_type": "code", 302 | "execution_count": 6, 303 | "metadata": {}, 304 | "outputs": [ 305 | { 306 | "data": { 307 | "text/plain": [ 308 | "array([[1. , 0.99520518],\n", 309 | " [0.99520518, 1. ]])" 310 | ] 311 | }, 312 | "execution_count": 6, 313 | "metadata": {}, 314 | "output_type": "execute_result" 315 | } 316 | ], 317 | "source": [ 318 | "# 方法2,利numpy中的corrcoef方法\n", 319 | "np.corrcoef(index_df.values , rowvar=False)" 320 | ] 321 | }, 322 | { 323 | "cell_type": "markdown", 324 | "metadata": {}, 325 | "source": [ 326 | "## 实验6-3 绘制相关图" 327 | ] 328 | }, 329 | { 330 | "cell_type": "code", 331 | "execution_count": 7, 332 | "metadata": {}, 333 | "outputs": [], 334 | "source": [ 335 | "%matplotlib inline\n", 336 | "import matplotlib.pyplot as plt\n", 337 | "plt.rcParams['font.sans-serif'] = ['SimHei'] # 步骤一(替换sans-serif字体)\n", 338 | "plt.rcParams['axes.unicode_minus'] = False # 步骤二(解决坐标轴负数的负号显示问题)\n", 339 | "plt.rcParams['savefig.dpi'] = 300 # 图片质量" 340 | ] 341 | }, 342 | { 343 | "cell_type": "code", 344 | "execution_count": 9, 345 | "metadata": {}, 346 | "outputs": [ 347 | { 348 | "data": { 349 | "text/plain": [ 350 | "" 351 | ] 352 | }, 353 | "execution_count": 9, 354 | "metadata": {}, 355 | "output_type": "execute_result" 356 | }, 357 | { 358 | "data": { 359 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX8AAAECCAYAAAAW+Nd4AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAV/klEQVR4nO3dfYxd9X3n8ff32pPxyGNgYs86wgMBhJE2m9huOyVYtpMhgmLa4ka4VXbTTaRUXjYRG9TtLri7ZWVRs5syWixFpYlw6u7SNI3WwWpp6tLQJEV4syR0jIyTTfYhbU2xN8jTYSDMYo+G3O/+cY/XY3PngXDug33eL2mUc3/33Hs/M8Gfe+7vnHtOZCaSpGqpdTqAJKn9LH9JqiDLX5IqyPKXpAqy/CWpgpZ2OsBirFq1Kq+66qpOx5CkC8rhw4f/PjMHm913QZT/VVddxdjYWKdjSNIFJSKen+s+p30kqYIsf0mqIMtfkirI8pekCrL8JamCLH9J6lITU9M898LLTExNl/7cF8ShnpJUNY8dOcHOA0fpqdWYqdcZ3b6ObRvWlPb8bvlLUpeZmJpm54GjnJ6p8+r065yeqXPPgaOlfgKw/CWpyxyfPEVP7dx67qnVOD55qrTXsPwlqcsMDfQxU6+fMzZTrzM00Ffaa1j+ktRlVvb3Mrp9Hct6aqzoXcqynhqj29exsr+3tNdwh68kdaFtG9aw6dpVHJ88xdBAX6nFD5a/JHWtlf29pZf+GU77SFIFWf6SVEGWvyRVkOUvSRVk+UtSBZVa/hFxdUQcjIhDEfFgRAxExJ9FxFhEPDxrvX0R8XRE3Fvm60uSFqfsLf8HgN2ZuQUYAj4CfCEzh4EVETEcEbcDSzJzI3BNRKwtOYMkaQFll/91wLPF8kngFeDdEXEZcAXwAjAC7C/WeQLY3OyJIuKO4hPD2Pj4eMkxJanayi7/R4FdEXEbsBV4EngncBfwPeAlYDlwolj/JWB1syfKzL2ZOZyZw4ODgyXHlKRqK/Ubvpl5f0RsBu4GHgF2AR/PzB9GxK8BHwOmgDNnJ+rHnc6S1HatKN4jwJXAHmAAeE9ELAHeCyRwmLNTPeuBYy3IIEmaRyvO7XM3sCczX4uITwH/icbUz9PAF2m84RyKiMuBW4EbWpBBkjSP0ss/M3fNWn4G+EfnrxMRI8DNwGhmvlJ2BknS/DpyVs/MnOTsET+SpDZzZ6skVZDlL0kVZPlLUgVZ/pJUQZa/JFWQ5S9JFWT5S1IFWf6SVEGWvyRVkOUvSRVk+UtSBVn+klRBlr8kVZDlL0kVZPlLUgVZ/pJUQZa/JFWQ5S9JFWT5S1IFlXoN34i4GngIuAR4Bvg+8KHi7suAb2XmP4+IfcC7gIOZeX+ZGSRJCyt7y/8BYHdmbgGGgO9l5khmjgCHgM9FxO3AkszcCFwTEWtLziBJWkDZ5X8d8GyxfBK4FCAi1gCrM3MMGAH2F+s8AWxu9kQRcUdEjEXE2Pj4eMkxJanayi7/R4FdEXEbsBX4WjF+J/DZYnk5cKJYfglY3eyJMnNvZg5n5vDg4GDJMSWp2kot/2L+/nFgB/BIZk5FRA24EXiyWG0K6CuW+8vOIElaWCuK9whwJbCnuL2Fxo7eLG4f5uxUz3rgWAsySJLmUerRPoW7gT2Z+Vpx+xbgqVn3/zFwKCIuB24FbmhBBknSPEov/8zcdd7tf3ve7R9GxAhwMzCama+UnUGSNL9WbPkvKDMnOXvEjySpzdzZKkkVZPlLUgVZ/pJUQZa/JFWQ5S9JFWT5S1IFWf6SVEGWvyRVkOUvSRVk+UtSBVn+0kVmYmqa5154mYmp6U5HURfryLl9JLXGY0dOsPPAUXpqNWbqdUa3r2PbhjWdjqUu5Ja/dJGYmJpm54GjnJ6p8+r065yeqXPPgaN+AlBTlr90kTg+eYqe2rn/pHtqNY5PnupQInUzy1+6SAwN9DFTr58zNlOvMzTQN8cjVGWWv3SRWNnfy+j2dSzrqbGidynLemqMbl/Hyv7eTkdTF3KHr3QR2bZhDZuuXcXxyVMMDfRZ/JqT5S9dZFb291r6WpDTPpJUQZa/JFVQqeUfEVdHxMGIOBQRD84a/0xE3Dbr9r6IeDoi7i3z9SVJi1P2lv8DwO7M3AIMRcRIRGwB3pGZXwaIiNuBJZm5EbgmItaWnEGStICyy/864Nli+SRwKfA54FhE/EIxPgLsL5afADY3e6KIuCMixiJibHx8vOSYklRtZZf/o8CuYopnK3AV8F1gFLg+Ij4JLAdOFOu/BKxu9kSZuTczhzNzeHBwsOSYklRtpZZ/Zt4PPA7sAB4B1gJ7M/NF4A+AG4Ep4MxXDvvLziBJWlgrivcIcCWwB/g+cE0xPgw8Dxzm7FTPeuBYCzJIkubRii953Q3syczXImIf8HsR8Y+BHuAXgVeBQxFxOXArcEMLMkiS5lF6+WfmrlnLrwK/dP46ETEC3AyMZuYrZWeQJM2vI6d3yMxJzh7xI0lqM3e2SlIFWf6SVEGWvyRVkOUvSRVk+UttNDE1zXMvvOxF1dVxXsxFapPHjpxg54Gj9NRqzNTrjG5fx7YNazodSxXllr/UBhNT0+w8cJTTM3VenX6d0zN17jlw1E8A6hjLX2qD45On6Kmd+8+tp1bj+OSpDiVS1Vn+UhsMDfQxU6+fMzZTrzM00DfHI6TWsvylNljZ38vo9nUs66mxoncpy3pqjG5f54XW1THu8JXaZNuGNWy6dhXHJ08xNNB3TvFPTE03HZdaxfKX2mhlf+8byt2jgNQJTvtIHeRRQOoUy1/qII8CUqdY/lIHeRSQOsXylzrIo4DUKe7wlTpsvqOApFax/KUu0OwoIKmV5p32iYj3NRnrjYi7WhdJktRqC8353xkRX42I90fEsoj4l8C3gOVtyCZJapF5p30y80MR8Q+Bg8Ag8CfAezOz6UHIEXE18BBwCfAMsBP4m+IH4JOZ+e2IuA/4WeCZzLyzlN9EkrRoC0377AZ+H/gtYAPwKvBERHxwjoc8AOzOzC3AEHAX8MXMHCl+vh0RPwVsBq4HTkbETSX9LpKkRVpo2ud5YGNm7s3Mv87MjwMfAW6ZY/3rgGeL5ZPAj4Cfj4hnImJfRCwF3g8cyMwEvgJsafZEEXFHRIxFxNj4+Pib/LUkSfNZqPwPAvdFxL0RsQIgM/8uMz8xx/qPArsi4jZgK403gpsy83qgh8ZUz3LgRLH+S8DqZk9UvOEMZ+bw4ODgm/qlJEnzW6j8fx/4LvAy8JmFniwz7wceB3YAj9CY0/9BcfcYsBaYAs58fbF/ERkkSSVbqHjflplfyMyHgCsW+ZxHgCuBPcDnI2J9RCwBPgg8BxymMecPsB449qZTS5LekoW+5DUYEf+ExpvEP4iID5+5IzP/cI7H3A3syczXIuI3gT8EAviTzPxqRNSAT0XEp2lMDW19y7+FJOlNWaj89wMfpjFl8yhwP9ALfJ5Gqb9BZu6atfwdYN1599eLI3x+Dvh0Zv7tj51eegu8gIqqbKHyvwb4QWbeV2ypfxP4DvDTb+VFM/MUjTcTqSO8gIqqbqHyvyIzb4yIa4EbgfWZmRHx9TZkk1pi9gVUTtM4nfI9B46y6dpVfgJQZSy0w3cyIv41jSN3dgPLI+KjrY8ltY4XUJEWLv9/SuNbvZ/KzC8B7wDeBfxyq4NJreIFVKQFyj8zX8vMhzPzT4vb38/MX5917L50wfECKpLn81dFeQEVVZ3lr8ryAiqqMk+tIEkVZPlLUgVZ/pJUQZa/JFWQ5S9JFWT5S1IFWf6SVEGWvy44E1PTPPfCy0xMTXc6inTB8kteuqB4KmapHG7564Ix+1TMr06/zumZOvccOOonAOnHYPnrguGpmKXyWP66YHgqZqk8lr8uGJ6KWSpPqTt8I+Jq4CHgEuCZzPxXxfhq4M8z8yeK2/toXBTmYGbeX2YGXdw8FbNUjrK3/B8AdmfmFmAoIkaK8f8I9AFExO3AkszcCFwTEWtLzqCL3Mr+XtZfcZnFL70FZZf/dcCzxfJJ4NKI+ADwf4EXi/ERYH+x/ASwudkTRcQdETEWEWPj4+Mlx5Skaiu7/B8FdkXEbcBW4C+Bfwf8+qx1lgMniuWXgNXNnigz92bmcGYODw4OlhxTkqqt1PIv5u8fB3YAjwC/CnwmM1+etdoUxRQQ0F92BknSwlpRvEeAK4E9wE3AnRHxJLAhIn4XOMzZqZ71wLEWZJAkzaMVp3e4G9iTma8B7zszGBFPZuaOiLgEOBQRlwO3Aje0IIMkaR6ll39m7ppjfKT43x8WRwHdDIxm5itlZ5Akza8jJ3bLzEnOHvEjSWozd7ZKUgVZ/pJUQZa/OsILskid5cVc1HZekEXqPLf81VZekEXqDpa/2soLskjdwfJXW3lBFqk7WP5qKy/IInUHd/iq7bwgi9R5lr86YmV/r6UvdZDTPpJUQZa/JFWQ5S9JFWT5S1IFWf6SVEGWvyRVkOUvSRVk+UtSBVn+klRBLS//iHh7RNwcEata/VqSpMUptfwj4uqIOBgRhyLiwYgYAP4UuB74y4gYLNbbFxFPR8S9Zb6+JGlxyj63zwPA7sz8ZkT8F+Au4NeK2wPAT0bEcmBJZm6MiN+LiLWZ+b9LziFJmkfZ0z7XAc8WyyeBI0Xxv4/G1v/TwAiwv1jnCWBzsyeKiDsiYiwixsbHx0uOKUnVVnb5PwrsiojbgK3A1yIigA8Bk8AMsBw4Uaz/ErC62RNl5t7MHM7M4cHBwZJjSlK1lVr+mXk/8DiwA3gkM6ey4U7gKLANmALOXLapv+wMkqSFtaJ4jwBXAnsiYmdEfLQYvwx4GTjM2ame9cCxFmSojImpaZ574WUmpqbPWZak+bTiYi53A3sy87WI2Avsj4gdwHdozPGvAA5FxOXArcANLchQCY8dOcHOA0fpqdU4/fqPyEz6epYyU68zun0d2zas6XRESV0qMrP9L9o48udm4KnMfHGh9YeHh3NsbKz1wS4gE1PTbHrg65yeqTe9f1lPjW/s/IBXy5IqLCIOZ+Zws/s6Mt+emZOZuX8xxa/mjk+eoqc29/99PbUaxydPtTGRpAuJO1svUEMDfczUm2/1A8zU6wwN9M15v6Rqs/wvUCv7exndvo5lPTVW9C6lZ0mwtAYrepeyrKfG6PZ1TvlImlMrdviqTbZtWMOma1dxfPLU/9/KP7Ns8Uuaj+V/gVvZ33tO0Vv6khbDaR9JqiDLX5IqyPKXpAqy/CWpgix/Saogy1+SKsjyl6QKsvwlqYIsf0mqIMtfkirI8pekCrL8JamCLH9JqiDLX5IqyPKXpAoqtfwj4uqIOBgRhyLiwYi4NCIej4gnIuKPIuJtxXr7IuLpiLi3zNeXJC1O2Vv+DwC7M3MLMAT8MrAnM38GeBHYGhG3A0sycyNwTUSsLTmDJGkBZZf/dcCzxfJJ4ERm/kVxe7AYGwH2F2NPAJtLziBJWkDZ5f8osCsibgO2Al8DiIiNwEBmfhNYDpwo1n8JWN3siSLijogYi4ix8fHxkmNKUrWVWv6ZeT/wOLADeCQzpyLi7cBvA79SrDYF9BXL/XNlyMy9mTmcmcODg4NlxpSkymvF0T5HgCuBPcUO3i8B/yYzny/uP8zZqZ71wLEWZJAkzWNpC57zbho7eV+LiE8APwn8RkT8BvBZ4I+BQxFxOXArcEMLMkiS5hGZ2f4XjRgAbgaeyswXF1p/eHg4x8bGWh9Mki4iEXE4M4eb3deKLf8FZeYkZ4/4kSS1md/wlaQKsvwlqYIsf0mqIMtfkirI8pekCrroy39iaprnXniZianpTkeRpK7RkUM92+WxIyfYeeAoPbUaM/U6o9vXsW3Dmk7HkqSOu2i3/Cemptl54CinZ+q8Ov06p2fq3HPgqJ8AJImLuPyPT56ip3bur9dTq3F88lSHEklS97hoy39ooI+Zev2csZl6naGBvjkeIUnVcdGW/8r+Xka3r2NZT40VvUtZ1lNjdPs6Vvb3djqaJHXcRb3Dd9uGNWy6dhXHJ08xNNBn8UtS4aIuf2h8ArD0JelcF+20jyRpbpa/JFWQ5S9JFWT5S1IFWf6SVEEduYbvmxUR48Dznc6xgFXA33c6xCKZtTXM2hpm/fG9MzMHm91xQZT/hSAixua6UHK3MWtrmLU1zNoaTvtIUgVZ/pJUQZZ/efZ2OsCbYNbWMGtrmLUFnPOXpApyy1+SKsjyl6QKsvwXEBGfiIgni58jEbEvIv5u1th7ivXui4i/iojfmfXYN4y1OevDxfhnIuK2Wevti4inI+Le+cbanHV6juzdmPXhiPiziBg7k7OLs34lIg5GxKGIeLDLsg6c/3dcbK4uybo6Ig7NWqcnIr4cEd+IiF+Za6xbWP4LyMzPZuZIZo4Ah4CHgS+eGcvMb0fETwGbgeuBkxFxU7OxDmT9XERsAd6RmV8GiIjbgSWZuRG4JiLWNhvrQNZNTbJ3a9YTwBeK47lXRMRwF2ftAXZn5hZgKCJGuiUr8BHO/Tves5hcXZL1p4FHgOWz1vkkcDgzNwG/GBEr5hjrCpb/IkXEGmA1MAz8fEQ8U2x9LAXeDxzIxt7zrwBb5hhrd9bngM8BxyLiF4q7R4D9xfITNN6gmo21NWtmjjW53ZVZgb8G3h0RlwFXAC90cda3A88WwyeBS+fI1Wys1SY49+949SJzdUPWk8CHgB/OWmd2rqdodEWzsa5g+S/encBngb8CbsrM62lsVf0sjXf/E8V6L9H4R9dsrN1ZPwp8FxgFro+IT3Zx1ma3uzXrfwXeCdwFfK/I0a1ZHwV2FdN+W4GvdVHW8/+Ob1tkrm7I+n8y85Xz1umWrIti+S9CRNSAG4EngaOZ+YPirjFgLTAFnLkyfD+Nv2uzsXZn/Qlgb2a+CPxBMd6tWd9wu4uz7gI+npm/CfwP4GPdmjUz7wceB3YAj2TmVBdlPf/v+OFF5uqGrB9rsk63ZF2UrgnS5bYA3yqmcD4fEesjYgnwQRpTK4c5+9FzPXBsjrF2Z/0+cE0xPkzj5HjdmrXZ7W7NOgC8p/hv4L1AdnFWgCPAlcCe4na3ZD3/7/hbi8zVDVmbfUGqW7IuTmb6s8AP8B+A24vldwNHgW8D/74YqwHfAD4N/E8ac5dvGOtA1hXAl2jMNT4NrAEuofGGtYfGx9dLm421O+sct7syK42d+P+dxlbdX9DYouvKrMXt+4CPdNvftcnfcVG5uiRrfzH+5Kx13lms82ka08NLmo2147+Bxfz4Dd+SREQf8HPAs5n5N3ONdYOIGABuBp7KxpRQ07FuYNbW6Nasi83VDVmbiYjLaWzpfyWLfQLNxrqB5S9JFeScvyRVkOUvSRVk+UtSBVn+0hwi4j8X58cZi4h/1uS+zcXyVRHx1fPu3xARf9vOvNKbsbTTAaQu9y9oHE74XER8KzOPLvJxt9A4l851mfm/WhdP+vG45S8tIDMngIPA+97Ew24BfofGKRWkrmP5S4szAVy2mBUjop/GCdV+l8abgNR1LH9pcd4OvBoR75o19qM51v0AsAp4CNgYEb2tDie9WZa/tIDiNL63AptonCwP4Coap3Ju5hbgrmycU/8gbTydt7RYlr80v98G/hzYCfwqsCMivgH8t8w8PsdjbubsmUm/jvP+6kKe3kGSKsgtf0mqIMtfkirI8pekCrL8JamCLH9JqiDLX5Iq6P8BujgOFymysqYAAAAASUVORK5CYII=\n", 360 | "text/plain": [ 361 | "
" 362 | ] 363 | }, 364 | "metadata": { 365 | "needs_background": "light" 366 | }, 367 | "output_type": "display_data" 368 | } 369 | ], 370 | "source": [ 371 | "index_df.plot.scatter(x='DJIA',y='SPX')" 372 | ] 373 | }, 374 | { 375 | "cell_type": "markdown", 376 | "metadata": {}, 377 | "source": [ 378 | "# 第二节 回归分析" 379 | ] 380 | }, 381 | { 382 | "cell_type": "markdown", 383 | "metadata": {}, 384 | "source": [ 385 | "## 实验6-4 一元线性回归分析与预测" 386 | ] 387 | }, 388 | { 389 | "cell_type": "markdown", 390 | "metadata": {}, 391 | "source": [ 392 | "根据实验6-1的数据,建立用DJIA预测SPX的回归模型,并预测DJIA=8300时,SPX=?" 393 | ] 394 | }, 395 | { 396 | "cell_type": "code", 397 | "execution_count": 11, 398 | "metadata": {}, 399 | "outputs": [], 400 | "source": [ 401 | "from sklearn.linear_model import LinearRegression\n", 402 | "import statsmodels.api as sm\n", 403 | "import statsmodels.formula.api as smf" 404 | ] 405 | }, 406 | { 407 | "cell_type": "code", 408 | "execution_count": 16, 409 | "metadata": {}, 410 | "outputs": [ 411 | { 412 | "data": { 413 | "text/html": [ 414 | "
\n", 415 | "\n", 428 | "\n", 429 | " \n", 430 | " \n", 431 | " \n", 432 | " \n", 433 | " \n", 434 | " \n", 435 | " \n", 436 | " \n", 437 | " \n", 438 | " \n", 439 | " \n", 440 | " \n", 441 | " \n", 442 | " \n", 443 | " \n", 444 | " \n", 445 | " \n", 446 | " \n", 447 | " \n", 448 | " \n", 449 | " \n", 450 | " \n", 451 | " \n", 452 | " \n", 453 | " \n", 454 | " \n", 455 | " \n", 456 | " \n", 457 | " \n", 458 | " \n", 459 | " \n", 460 | " \n", 461 | " \n", 462 | " \n", 463 | " \n", 464 | " \n", 465 | " \n", 466 | " \n", 467 | " \n", 468 | " \n", 469 | " \n", 470 | " \n", 471 | " \n", 472 | " \n", 473 | " \n", 474 | " \n", 475 | " \n", 476 | " \n", 477 | " \n", 478 | " \n", 479 | " \n", 480 | " \n", 481 | " \n", 482 | " \n", 483 | " \n", 484 | " \n", 485 | " \n", 486 | " \n", 487 | " \n", 488 | " \n", 489 | " \n", 490 | " \n", 491 | " \n", 492 | " \n", 493 | "
constDJIA
Date
11.07715
21.07442
31.07581
41.07572
51.07881
61.07823
71.08149
81.07838
91.07756
101.07679
\n", 494 | "
" 495 | ], 496 | "text/plain": [ 497 | " const DJIA\n", 498 | "Date \n", 499 | "1 1.0 7715\n", 500 | "2 1.0 7442\n", 501 | "3 1.0 7581\n", 502 | "4 1.0 7572\n", 503 | "5 1.0 7881\n", 504 | "6 1.0 7823\n", 505 | "7 1.0 8149\n", 506 | "8 1.0 7838\n", 507 | "9 1.0 7756\n", 508 | "10 1.0 7679" 509 | ] 510 | }, 511 | "execution_count": 16, 512 | "metadata": {}, 513 | "output_type": "execute_result" 514 | } 515 | ], 516 | "source": [ 517 | "x = index_df[['DJIA']]\n", 518 | "y = index_df[['SPX']]\n", 519 | "X = sm.add_constant(x) # 模型包含截距项,因而需要因变量矩阵增加值为1的常数列\n", 520 | "X" 521 | ] 522 | }, 523 | { 524 | "cell_type": "code", 525 | "execution_count": 18, 526 | "metadata": {}, 527 | "outputs": [ 528 | { 529 | "data": { 530 | "text/plain": [ 531 | "const 166.082832\n", 532 | "DJIA 0.100601\n", 533 | "dtype: float64" 534 | ] 535 | }, 536 | "execution_count": 18, 537 | "metadata": {}, 538 | "output_type": "execute_result" 539 | } 540 | ], 541 | "source": [ 542 | "sm_model = sm.OLS(y, X)\n", 543 | "sm_result = sm_model.fit()\n", 544 | "sm_result.params" 545 | ] 546 | }, 547 | { 548 | "cell_type": "markdown", 549 | "metadata": {}, 550 | "source": [ 551 | "回归直线方程为 Y = 166.082832 + 0.100601X" 552 | ] 553 | }, 554 | { 555 | "cell_type": "code", 556 | "execution_count": 19, 557 | "metadata": {}, 558 | "outputs": [ 559 | { 560 | "data": { 561 | "text/plain": [ 562 | "0.9904333423452636" 563 | ] 564 | }, 565 | "execution_count": 19, 566 | "metadata": {}, 567 | "output_type": "execute_result" 568 | } 569 | ], 570 | "source": [ 571 | "# 计算判定系数\n", 572 | "sm_result.rsquared" 573 | ] 574 | }, 575 | { 576 | "cell_type": "markdown", 577 | "metadata": {}, 578 | "source": [ 579 | "判定系数很大,说明模型拟合效果较好" 580 | ] 581 | }, 582 | { 583 | "cell_type": "code", 584 | "execution_count": 20, 585 | "metadata": {}, 586 | "outputs": [ 587 | { 588 | "data": { 589 | "text/plain": [ 590 | "array([1001.07463095])" 591 | ] 592 | }, 593 | "execution_count": 20, 594 | "metadata": {}, 595 | "output_type": "execute_result" 596 | } 597 | ], 598 | "source": [ 599 | "# 进行一元线性预测\n", 600 | "sm_result.predict([1,8300])" 601 | ] 602 | }, 603 | { 604 | "cell_type": "markdown", 605 | "metadata": {}, 606 | "source": [ 607 | "y预测当DJIA=8300时,SPX=1001.07463095" 608 | ] 609 | }, 610 | { 611 | "cell_type": "code", 612 | "execution_count": 21, 613 | "metadata": {}, 614 | "outputs": [ 615 | { 616 | "data": { 617 | "text/plain": [ 618 | "Intercept 166.082832\n", 619 | "DJIA 0.100601\n", 620 | "dtype: float64" 621 | ] 622 | }, 623 | "execution_count": 21, 624 | "metadata": {}, 625 | "output_type": "execute_result" 626 | } 627 | ], 628 | "source": [ 629 | "# 第二种方法,用statsmodels\n", 630 | "sm_model = smf.ols(formula='SPX~DJIA' , data=index_df)\n", 631 | "sm_result = sm_model.fit()\n", 632 | "sm_result.params" 633 | ] 634 | }, 635 | { 636 | "cell_type": "code", 637 | "execution_count": 22, 638 | "metadata": {}, 639 | "outputs": [ 640 | { 641 | "data": { 642 | "text/plain": [ 643 | "0.9904333423452636" 644 | ] 645 | }, 646 | "execution_count": 22, 647 | "metadata": {}, 648 | "output_type": "execute_result" 649 | } 650 | ], 651 | "source": [ 652 | "sm_result.rsquared" 653 | ] 654 | }, 655 | { 656 | "cell_type": "code", 657 | "execution_count": 23, 658 | "metadata": {}, 659 | "outputs": [ 660 | { 661 | "data": { 662 | "text/plain": [ 663 | "0 1001.074631\n", 664 | "dtype: float64" 665 | ] 666 | }, 667 | "execution_count": 23, 668 | "metadata": {}, 669 | "output_type": "execute_result" 670 | } 671 | ], 672 | "source": [ 673 | "sm_result.predict(pd.DataFrame([{'DJIA':8300}]))" 674 | ] 675 | }, 676 | { 677 | "cell_type": "code", 678 | "execution_count": 24, 679 | "metadata": {}, 680 | "outputs": [ 681 | { 682 | "data": { 683 | "text/plain": [ 684 | "166.08283214871528" 685 | ] 686 | }, 687 | "execution_count": 24, 688 | "metadata": {}, 689 | "output_type": "execute_result" 690 | } 691 | ], 692 | "source": [ 693 | "# 第三种方法,用sklearn\n", 694 | "sk_model = LinearRegression()\n", 695 | "sk_model.fit(x,y)\n", 696 | "sk_model.intercept_[0]" 697 | ] 698 | }, 699 | { 700 | "cell_type": "code", 701 | "execution_count": 25, 702 | "metadata": {}, 703 | "outputs": [ 704 | { 705 | "data": { 706 | "text/plain": [ 707 | "0.10060142154182612" 708 | ] 709 | }, 710 | "execution_count": 25, 711 | "metadata": {}, 712 | "output_type": "execute_result" 713 | } 714 | ], 715 | "source": [ 716 | "sk_model.coef_[0][0]" 717 | ] 718 | }, 719 | { 720 | "cell_type": "code", 721 | "execution_count": 27, 722 | "metadata": {}, 723 | "outputs": [ 724 | { 725 | "data": { 726 | "text/plain": [ 727 | "0.9904333423452637" 728 | ] 729 | }, 730 | "execution_count": 27, 731 | "metadata": {}, 732 | "output_type": "execute_result" 733 | } 734 | ], 735 | "source": [ 736 | "# 判定系数\n", 737 | "sk_model.score(x,y)" 738 | ] 739 | }, 740 | { 741 | "cell_type": "code", 742 | "execution_count": 28, 743 | "metadata": {}, 744 | "outputs": [ 745 | { 746 | "data": { 747 | "text/plain": [ 748 | "array([[1001.07463095]])" 749 | ] 750 | }, 751 | "execution_count": 28, 752 | "metadata": {}, 753 | "output_type": "execute_result" 754 | } 755 | ], 756 | "source": [ 757 | "# 预测\n", 758 | "sk_model.predict([[8300]])" 759 | ] 760 | }, 761 | { 762 | "cell_type": "markdown", 763 | "metadata": {}, 764 | "source": [ 765 | "## 实验6-5 多元线性回归分析和预测" 766 | ] 767 | }, 768 | { 769 | "cell_type": "markdown", 770 | "metadata": {}, 771 | "source": [ 772 | "Q是因变量,其他是自变量" 773 | ] 774 | }, 775 | { 776 | "cell_type": "code", 777 | "execution_count": 29, 778 | "metadata": {}, 779 | "outputs": [ 780 | { 781 | "data": { 782 | "text/html": [ 783 | "
\n", 784 | "\n", 797 | "\n", 798 | " \n", 799 | " \n", 800 | " \n", 801 | " \n", 802 | " \n", 803 | " \n", 804 | " \n", 805 | " \n", 806 | " \n", 807 | " \n", 808 | " \n", 809 | " \n", 810 | " \n", 811 | " \n", 812 | " \n", 813 | " \n", 814 | " \n", 815 | " \n", 816 | " \n", 817 | " \n", 818 | " \n", 819 | " \n", 820 | " \n", 821 | " \n", 822 | " \n", 823 | " \n", 824 | " \n", 825 | " \n", 826 | " \n", 827 | " \n", 828 | " \n", 829 | " \n", 830 | " \n", 831 | " \n", 832 | " \n", 833 | " \n", 834 | " \n", 835 | " \n", 836 | " \n", 837 | " \n", 838 | " \n", 839 | " \n", 840 | " \n", 841 | " \n", 842 | " \n", 843 | " \n", 844 | " \n", 845 | " \n", 846 | " \n", 847 | " \n", 848 | " \n", 849 | " \n", 850 | " \n", 851 | " \n", 852 | " \n", 853 | " \n", 854 | " \n", 855 | " \n", 856 | " \n", 857 | " \n", 858 | "
QPMPAIPBMac
Month
117738.652550010.551.25
218638.652560010.451.35
317988.652570010.351.55
417758.652597010.301.05
517968.652597010.300.95
\n", 859 | "
" 860 | ], 861 | "text/plain": [ 862 | " Q P M PAI PBMac\n", 863 | "Month \n", 864 | "1 1773 8.65 25500 10.55 1.25\n", 865 | "2 1863 8.65 25600 10.45 1.35\n", 866 | "3 1798 8.65 25700 10.35 1.55\n", 867 | "4 1775 8.65 25970 10.30 1.05\n", 868 | "5 1796 8.65 25970 10.30 0.95" 869 | ] 870 | }, 871 | "execution_count": 29, 872 | "metadata": {}, 873 | "output_type": "execute_result" 874 | } 875 | ], 876 | "source": [ 877 | "pi_df = pd.read_excel('6-5.xlsx', index_col=0)\n", 878 | "pi_df " 879 | ] 880 | }, 881 | { 882 | "cell_type": "code", 883 | "execution_count": 33, 884 | "metadata": {}, 885 | "outputs": [ 886 | { 887 | "data": { 888 | "text/plain": [ 889 | "Intercept 976.591186\n", 890 | "P 8447.513762\n", 891 | "M -1.709659\n", 892 | "PAI -2652.159091\n", 893 | "PBMac -545.000000\n", 894 | "dtype: float64" 895 | ] 896 | }, 897 | "execution_count": 33, 898 | "metadata": {}, 899 | "output_type": "execute_result" 900 | } 901 | ], 902 | "source": [ 903 | "pi_model = smf.ols(formula='Q ~ P + M + PAI + PBMac' , data=pi_df)\n", 904 | "pi_result = pi_model.fit()\n", 905 | "pi_result.params" 906 | ] 907 | }, 908 | { 909 | "cell_type": "code", 910 | "execution_count": 36, 911 | "metadata": {}, 912 | "outputs": [ 913 | { 914 | "name": "stderr", 915 | "output_type": "stream", 916 | "text": [ 917 | "D:\\Anaconda3\\lib\\site-packages\\statsmodels\\stats\\stattools.py:71: ValueWarning: omni_normtest is not valid with less than 8 observations; 5 samples were given.\n", 918 | " \"samples were given.\" % int(n), ValueWarning)\n" 919 | ] 920 | }, 921 | { 922 | "data": { 923 | "text/html": [ 924 | "\n", 925 | "\n", 926 | "\n", 927 | " \n", 928 | "\n", 929 | "\n", 930 | " \n", 931 | "\n", 932 | "\n", 933 | " \n", 934 | "\n", 935 | "\n", 936 | " \n", 937 | "\n", 938 | "\n", 939 | " \n", 940 | "\n", 941 | "\n", 942 | " \n", 943 | "\n", 944 | "\n", 945 | " \n", 946 | "\n", 947 | "\n", 948 | " \n", 949 | "\n", 950 | "\n", 951 | " \n", 952 | "\n", 953 | "
OLS Regression Results
Dep. Variable: Q R-squared: 0.580
Model: OLS Adj. R-squared: -0.682
Method: Least Squares F-statistic: 0.4594
Date: Thu, 13 Aug 2020 Prob (F-statistic): 0.763
Time: 10:22:41 Log-Likelihood: -22.362
No. Observations: 5 AIC: 52.72
Df Residuals: 1 BIC: 51.16
Df Model: 3
Covariance Type: nonrobust
\n", 954 | "\n", 955 | "\n", 956 | " \n", 957 | "\n", 958 | "\n", 959 | " \n", 960 | "\n", 961 | "\n", 962 | " \n", 963 | "\n", 964 | "\n", 965 | " \n", 966 | "\n", 967 | "\n", 968 | " \n", 969 | "\n", 970 | "\n", 971 | " \n", 972 | "\n", 973 | "
coef std err t P>|t| [0.025 0.975]
Intercept 976.5912 911.512 1.071 0.478 -1.06e+04 1.26e+04
P 8447.5138 7884.577 1.071 0.478 -9.17e+04 1.09e+05
M -1.7097 1.632 -1.047 0.485 -22.453 19.033
PAI -2652.1591 2545.723 -1.042 0.487 -3.5e+04 2.97e+04
PBMac -545.0000 580.237 -0.939 0.520 -7917.610 6827.610
\n", 974 | "\n", 975 | "\n", 976 | " \n", 977 | "\n", 978 | "\n", 979 | " \n", 980 | "\n", 981 | "\n", 982 | " \n", 983 | "\n", 984 | "\n", 985 | " \n", 986 | "\n", 987 | "
Omnibus: nan Durbin-Watson: 3.250
Prob(Omnibus): nan Jarque-Bera (JB): 0.723
Skew: 0.593 Prob(JB): 0.696
Kurtosis: 1.562 Cond. No. 2.10e+20


Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 7.55e-32. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular." 988 | ], 989 | "text/plain": [ 990 | "\n", 991 | "\"\"\"\n", 992 | " OLS Regression Results \n", 993 | "==============================================================================\n", 994 | "Dep. Variable: Q R-squared: 0.580\n", 995 | "Model: OLS Adj. R-squared: -0.682\n", 996 | "Method: Least Squares F-statistic: 0.4594\n", 997 | "Date: Thu, 13 Aug 2020 Prob (F-statistic): 0.763\n", 998 | "Time: 10:22:41 Log-Likelihood: -22.362\n", 999 | "No. Observations: 5 AIC: 52.72\n", 1000 | "Df Residuals: 1 BIC: 51.16\n", 1001 | "Df Model: 3 \n", 1002 | "Covariance Type: nonrobust \n", 1003 | "==============================================================================\n", 1004 | " coef std err t P>|t| [0.025 0.975]\n", 1005 | "------------------------------------------------------------------------------\n", 1006 | "Intercept 976.5912 911.512 1.071 0.478 -1.06e+04 1.26e+04\n", 1007 | "P 8447.5138 7884.577 1.071 0.478 -9.17e+04 1.09e+05\n", 1008 | "M -1.7097 1.632 -1.047 0.485 -22.453 19.033\n", 1009 | "PAI -2652.1591 2545.723 -1.042 0.487 -3.5e+04 2.97e+04\n", 1010 | "PBMac -545.0000 580.237 -0.939 0.520 -7917.610 6827.610\n", 1011 | "==============================================================================\n", 1012 | "Omnibus: nan Durbin-Watson: 3.250\n", 1013 | "Prob(Omnibus): nan Jarque-Bera (JB): 0.723\n", 1014 | "Skew: 0.593 Prob(JB): 0.696\n", 1015 | "Kurtosis: 1.562 Cond. No. 2.10e+20\n", 1016 | "==============================================================================\n", 1017 | "\n", 1018 | "Warnings:\n", 1019 | "[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n", 1020 | "[2] The smallest eigenvalue is 7.55e-32. This might indicate that there are\n", 1021 | "strong multicollinearity problems or that the design matrix is singular.\n", 1022 | "\"\"\"" 1023 | ] 1024 | }, 1025 | "execution_count": 36, 1026 | "metadata": {}, 1027 | "output_type": "execute_result" 1028 | } 1029 | ], 1030 | "source": [ 1031 | "pi_result.summary()" 1032 | ] 1033 | }, 1034 | { 1035 | "cell_type": "raw", 1036 | "metadata": {}, 1037 | "source": [ 1038 | "注意,看第二个表格中的 coef 与 t 列,\n", 1039 | "回归方程为 Q = 976.5912 + 8447.5138P -1.7097M -2652.1591PAI -545.0000PBMac\n", 1040 | " (1.071) (1.071) (-1.047) (-1.042) (-0.939)" 1041 | ] 1042 | }, 1043 | { 1044 | "cell_type": "code", 1045 | "execution_count": 37, 1046 | "metadata": {}, 1047 | "outputs": [ 1048 | { 1049 | "data": { 1050 | "text/html": [ 1051 | "
\n", 1052 | "\n", 1065 | "\n", 1066 | " \n", 1067 | " \n", 1068 | " \n", 1069 | " \n", 1070 | " \n", 1071 | " \n", 1072 | " \n", 1073 | " \n", 1074 | " \n", 1075 | " \n", 1076 | " \n", 1077 | " \n", 1078 | " \n", 1079 | " \n", 1080 | " \n", 1081 | " \n", 1082 | " \n", 1083 | " \n", 1084 | " \n", 1085 | " \n", 1086 | " \n", 1087 | " \n", 1088 | " \n", 1089 | " \n", 1090 | " \n", 1091 | " \n", 1092 | " \n", 1093 | " \n", 1094 | " \n", 1095 | " \n", 1096 | " \n", 1097 | " \n", 1098 | " \n", 1099 | " \n", 1100 | " \n", 1101 | " \n", 1102 | " \n", 1103 | " \n", 1104 | " \n", 1105 | " \n", 1106 | " \n", 1107 | " \n", 1108 | " \n", 1109 | " \n", 1110 | " \n", 1111 | " \n", 1112 | " \n", 1113 | " \n", 1114 | " \n", 1115 | " \n", 1116 | " \n", 1117 | " \n", 1118 | "
QPMPAIPBMac
Q1.000000NaN-0.2859990.1357380.343973
PNaNNaNNaNNaNNaN
M-0.285999NaN1.000000-0.930529-0.683977
PAI0.135738NaN-0.9305291.0000000.376746
PBMac0.343973NaN-0.6839770.3767461.000000
\n", 1119 | "
" 1120 | ], 1121 | "text/plain": [ 1122 | " Q P M PAI PBMac\n", 1123 | "Q 1.000000 NaN -0.285999 0.135738 0.343973\n", 1124 | "P NaN NaN NaN NaN NaN\n", 1125 | "M -0.285999 NaN 1.000000 -0.930529 -0.683977\n", 1126 | "PAI 0.135738 NaN -0.930529 1.000000 0.376746\n", 1127 | "PBMac 0.343973 NaN -0.683977 0.376746 1.000000" 1128 | ] 1129 | }, 1130 | "execution_count": 37, 1131 | "metadata": {}, 1132 | "output_type": "execute_result" 1133 | } 1134 | ], 1135 | "source": [ 1136 | "# 单相关系数\n", 1137 | "pi_df.corr()" 1138 | ] 1139 | }, 1140 | { 1141 | "cell_type": "code", 1142 | "execution_count": 38, 1143 | "metadata": {}, 1144 | "outputs": [ 1145 | { 1146 | "data": { 1147 | "text/plain": [ 1148 | "Intercept 976.591186\n", 1149 | "P 8447.513762\n", 1150 | "M -1.709659\n", 1151 | "PAI -2652.159091\n", 1152 | "PBMac -545.000000\n", 1153 | "dtype: float64" 1154 | ] 1155 | }, 1156 | "execution_count": 38, 1157 | "metadata": {}, 1158 | "output_type": "execute_result" 1159 | } 1160 | ], 1161 | "source": [ 1162 | "# 偏相关系数\n", 1163 | "pi_model_m = smf.ols(formula='Q ~ P + M + PAI + PBMac' , data=pi_df)\n", 1164 | "pi_result_m = pi_model_m.fit()\n", 1165 | "pi_result_m.params" 1166 | ] 1167 | }, 1168 | { 1169 | "cell_type": "code", 1170 | "execution_count": 39, 1171 | "metadata": {}, 1172 | "outputs": [ 1173 | { 1174 | "name": "stderr", 1175 | "output_type": "stream", 1176 | "text": [ 1177 | "D:\\Anaconda3\\lib\\site-packages\\ipykernel_launcher.py:2: RuntimeWarning: invalid value encountered in sqrt\n", 1178 | " \n" 1179 | ] 1180 | }, 1181 | { 1182 | "data": { 1183 | "text/plain": [ 1184 | "nan" 1185 | ] 1186 | }, 1187 | "execution_count": 39, 1188 | "metadata": {}, 1189 | "output_type": "execute_result" 1190 | } 1191 | ], 1192 | "source": [ 1193 | "# Q与M的偏相关系数为\n", 1194 | "np.sqrt(pi_result.params[2] * pi_result_m.params[1])" 1195 | ] 1196 | }, 1197 | { 1198 | "cell_type": "markdown", 1199 | "metadata": {}, 1200 | "source": [ 1201 | "这一部分由于数据不全以及未提及虚拟变量,多重共线性等较深入内容,这一部分暂写至此,日后进行补充整理" 1202 | ] 1203 | }, 1204 | { 1205 | "cell_type": "markdown", 1206 | "metadata": {}, 1207 | "source": [ 1208 | "## 实验6-6 非线性回归分析" 1209 | ] 1210 | }, 1211 | { 1212 | "cell_type": "code", 1213 | "execution_count": null, 1214 | "metadata": {}, 1215 | "outputs": [], 1216 | "source": [ 1217 | "建立指数回归模型 Y = β0 β1 ^ x" 1218 | ] 1219 | }, 1220 | { 1221 | "cell_type": "code", 1222 | "execution_count": 40, 1223 | "metadata": {}, 1224 | "outputs": [ 1225 | { 1226 | "data": { 1227 | "text/html": [ 1228 | "
\n", 1229 | "\n", 1242 | "\n", 1243 | " \n", 1244 | " \n", 1245 | " \n", 1246 | " \n", 1247 | " \n", 1248 | " \n", 1249 | " \n", 1250 | " \n", 1251 | " \n", 1252 | " \n", 1253 | " \n", 1254 | " \n", 1255 | " \n", 1256 | " \n", 1257 | " \n", 1258 | " \n", 1259 | " \n", 1260 | " \n", 1261 | " \n", 1262 | " \n", 1263 | " \n", 1264 | " \n", 1265 | " \n", 1266 | " \n", 1267 | " \n", 1268 | " \n", 1269 | " \n", 1270 | " \n", 1271 | " \n", 1272 | " \n", 1273 | " \n", 1274 | " \n", 1275 | " \n", 1276 | " \n", 1277 | " \n", 1278 | " \n", 1279 | " \n", 1280 | " \n", 1281 | " \n", 1282 | " \n", 1283 | " \n", 1284 | " \n", 1285 | " \n", 1286 | " \n", 1287 | " \n", 1288 | " \n", 1289 | "
QMonthQ1
033100110.407289
147300210.764266
269000311.141862
3102000411.532728
4150000511.918391
5220000612.301383
\n", 1290 | "
" 1291 | ], 1292 | "text/plain": [ 1293 | " Q Month Q1\n", 1294 | "0 33100 1 10.407289\n", 1295 | "1 47300 2 10.764266\n", 1296 | "2 69000 3 11.141862\n", 1297 | "3 102000 4 11.532728\n", 1298 | "4 150000 5 11.918391\n", 1299 | "5 220000 6 12.301383" 1300 | ] 1301 | }, 1302 | "execution_count": 40, 1303 | "metadata": {}, 1304 | "output_type": "execute_result" 1305 | } 1306 | ], 1307 | "source": [ 1308 | "ma_df = pd.read_excel('6-6.xlsx')\n", 1309 | "ma_df" 1310 | ] 1311 | }, 1312 | { 1313 | "cell_type": "code", 1314 | "execution_count": 42, 1315 | "metadata": {}, 1316 | "outputs": [ 1317 | { 1318 | "data": { 1319 | "text/html": [ 1320 | "
\n", 1321 | "\n", 1334 | "\n", 1335 | " \n", 1336 | " \n", 1337 | " \n", 1338 | " \n", 1339 | " \n", 1340 | " \n", 1341 | " \n", 1342 | " \n", 1343 | " \n", 1344 | " \n", 1345 | " \n", 1346 | " \n", 1347 | " \n", 1348 | " \n", 1349 | " \n", 1350 | " \n", 1351 | " \n", 1352 | " \n", 1353 | " \n", 1354 | " \n", 1355 | " \n", 1356 | " \n", 1357 | " \n", 1358 | " \n", 1359 | " \n", 1360 | " \n", 1361 | " \n", 1362 | " \n", 1363 | " \n", 1364 | " \n", 1365 | " \n", 1366 | " \n", 1367 | " \n", 1368 | " \n", 1369 | " \n", 1370 | " \n", 1371 | " \n", 1372 | " \n", 1373 | " \n", 1374 | " \n", 1375 | " \n", 1376 | " \n", 1377 | " \n", 1378 | " \n", 1379 | " \n", 1380 | " \n", 1381 | "
QMonthQ1
033100110.407289
147300210.764266
269000311.141862
3102000411.532728
4150000511.918391
5220000612.301383
\n", 1382 | "
" 1383 | ], 1384 | "text/plain": [ 1385 | " Q Month Q1\n", 1386 | "0 33100 1 10.407289\n", 1387 | "1 47300 2 10.764266\n", 1388 | "2 69000 3 11.141862\n", 1389 | "3 102000 4 11.532728\n", 1390 | "4 150000 5 11.918391\n", 1391 | "5 220000 6 12.301383" 1392 | ] 1393 | }, 1394 | "execution_count": 42, 1395 | "metadata": {}, 1396 | "output_type": "execute_result" 1397 | } 1398 | ], 1399 | "source": [ 1400 | "# 先进性线性化得到线性模型: lnY = lnβ0 + lnβ1 X\n", 1401 | "ma_df.loc[:,'Q1'] = np.log(ma_df['Q'])\n", 1402 | "ma_df" 1403 | ] 1404 | }, 1405 | { 1406 | "cell_type": "code", 1407 | "execution_count": 43, 1408 | "metadata": {}, 1409 | "outputs": [ 1410 | { 1411 | "data": { 1412 | "text/plain": [ 1413 | "Intercept 10.011948\n", 1414 | "Month 0.380678\n", 1415 | "dtype: float64" 1416 | ] 1417 | }, 1418 | "execution_count": 43, 1419 | "metadata": {}, 1420 | "output_type": "execute_result" 1421 | } 1422 | ], 1423 | "source": [ 1424 | "ma_model = smf.ols(formula='Q1 ~ Month' , data=ma_df)\n", 1425 | "ma_result = ma_model.fit()\n", 1426 | "ma_result.params" 1427 | ] 1428 | }, 1429 | { 1430 | "cell_type": "code", 1431 | "execution_count": 45, 1432 | "metadata": {}, 1433 | "outputs": [ 1434 | { 1435 | "data": { 1436 | "text/plain": [ 1437 | "22291.22329846538" 1438 | ] 1439 | }, 1440 | "execution_count": 45, 1441 | "metadata": {}, 1442 | "output_type": "execute_result" 1443 | } 1444 | ], 1445 | "source": [ 1446 | "np.power(np.e , ma_result.params[0])" 1447 | ] 1448 | }, 1449 | { 1450 | "cell_type": "code", 1451 | "execution_count": 46, 1452 | "metadata": {}, 1453 | "outputs": [ 1454 | { 1455 | "data": { 1456 | "text/plain": [ 1457 | "1.4632756281161763" 1458 | ] 1459 | }, 1460 | "execution_count": 46, 1461 | "metadata": {}, 1462 | "output_type": "execute_result" 1463 | } 1464 | ], 1465 | "source": [ 1466 | "np.power(np.e , ma_result.params[1])" 1467 | ] 1468 | }, 1469 | { 1470 | "cell_type": "markdown", 1471 | "metadata": {}, 1472 | "source": [ 1473 | "lnβ0 估计值为 10.011948\n", 1474 | "\n", 1475 | "lnβ1 估计值为 0.380678\n", 1476 | "\n", 1477 | "β0 估计值为 22291.22329846538\n", 1478 | "\n", 1479 | "β1 估计值为 1.4632756281161763\n", 1480 | "\n", 1481 | "模型为 Y = 22291.22329846538 × 1.4632756281161763^x" 1482 | ] 1483 | }, 1484 | { 1485 | "cell_type": "markdown", 1486 | "metadata": {}, 1487 | "source": [ 1488 | "## 总结" 1489 | ] 1490 | }, 1491 | { 1492 | "cell_type": "markdown", 1493 | "metadata": {}, 1494 | "source": [ 1495 | "这一章篇幅较短,还有很多详细的深入的内容可以介绍,日后慢慢完善。" 1496 | ] 1497 | } 1498 | ], 1499 | "metadata": { 1500 | "kernelspec": { 1501 | "display_name": "Python [conda env:root] *", 1502 | "language": "python", 1503 | "name": "conda-root-py" 1504 | }, 1505 | "language_info": { 1506 | "codemirror_mode": { 1507 | "name": "ipython", 1508 | "version": 3 1509 | }, 1510 | "file_extension": ".py", 1511 | "mimetype": "text/x-python", 1512 | "name": "python", 1513 | "nbconvert_exporter": "python", 1514 | "pygments_lexer": "ipython3", 1515 | "version": "3.7.5" 1516 | } 1517 | }, 1518 | "nbformat": 4, 1519 | "nbformat_minor": 2 1520 | } 1521 | -------------------------------------------------------------------------------- /第四章 参数估计与假设检验/4-10.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AirFin/Statistics_Python_Codes/684c651552e0012d085e72301c4fd210748e4a93/第四章 参数估计与假设检验/4-10.xlsx -------------------------------------------------------------------------------- /第四章 参数估计与假设检验/4-13.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AirFin/Statistics_Python_Codes/684c651552e0012d085e72301c4fd210748e4a93/第四章 参数估计与假设检验/4-13.xlsx -------------------------------------------------------------------------------- /第四章 参数估计与假设检验/4-14.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AirFin/Statistics_Python_Codes/684c651552e0012d085e72301c4fd210748e4a93/第四章 参数估计与假设检验/4-14.xlsx -------------------------------------------------------------------------------- /第四章 参数估计与假设检验/4-15.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AirFin/Statistics_Python_Codes/684c651552e0012d085e72301c4fd210748e4a93/第四章 参数估计与假设检验/4-15.xlsx -------------------------------------------------------------------------------- /第四章 参数估计与假设检验/4-16.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AirFin/Statistics_Python_Codes/684c651552e0012d085e72301c4fd210748e4a93/第四章 参数估计与假设检验/4-16.xlsx -------------------------------------------------------------------------------- /第四章 参数估计与假设检验/4-17.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AirFin/Statistics_Python_Codes/684c651552e0012d085e72301c4fd210748e4a93/第四章 参数估计与假设检验/4-17.xlsx -------------------------------------------------------------------------------- /第四章 参数估计与假设检验/4-18.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AirFin/Statistics_Python_Codes/684c651552e0012d085e72301c4fd210748e4a93/第四章 参数估计与假设检验/4-18.xlsx -------------------------------------------------------------------------------- /第四章 参数估计与假设检验/4-19.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AirFin/Statistics_Python_Codes/684c651552e0012d085e72301c4fd210748e4a93/第四章 参数估计与假设检验/4-19.xlsx -------------------------------------------------------------------------------- /第四章 参数估计与假设检验/4-2.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AirFin/Statistics_Python_Codes/684c651552e0012d085e72301c4fd210748e4a93/第四章 参数估计与假设检验/4-2.xlsx -------------------------------------------------------------------------------- /第四章 参数估计与假设检验/4-6.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AirFin/Statistics_Python_Codes/684c651552e0012d085e72301c4fd210748e4a93/第四章 参数估计与假设检验/4-6.xlsx -------------------------------------------------------------------------------- /第四章 参数估计与假设检验/4-7.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AirFin/Statistics_Python_Codes/684c651552e0012d085e72301c4fd210748e4a93/第四章 参数估计与假设检验/4-7.xlsx -------------------------------------------------------------------------------- /第四章 参数估计与假设检验/4-9.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AirFin/Statistics_Python_Codes/684c651552e0012d085e72301c4fd210748e4a93/第四章 参数估计与假设检验/4-9.xlsx --------------------------------------------------------------------------------