├── .gitignore ├── Code ├── Chapter02 Python基础知识.ipynb ├── Chapter03 Pandas数据结构.ipynb ├── Chapter04 获取数据源.ipynb ├── Chapter05 数据预处理.ipynb ├── Chapter06 数据选择.ipynb ├── Chapter07 数值操作.ipynb ├── Chapter08 数据运算.ipynb ├── Chapter09 时间序列.ipynb ├── Chapter10 数据分组 数据透视表.ipynb ├── Chapter11 多表拼接.ipynb ├── Chapter12 结果导出.ipynb ├── Chapter13 数据可视化.ipynb ├── Chapter14 典型数据分析案例.ipynb └── Chapter15 NumPy数组.ipynb ├── Data ├── Chapter04.1.csv ├── Chapter04.csv ├── Chapter04.txt ├── Chapter04.xlsx ├── Chapter05.xlsx ├── Chapter06.xlsx ├── Chapter07.xlsx ├── Chapter08.xlsx ├── Chapter10.xlsx ├── Chapter11.xlsx ├── Chapter12.xlsx ├── fillna.xlsx ├── loan.csv ├── order-14.1.csv ├── order-14.3.csv ├── train-pivot.csv └── 数据集使用说明.txt ├── Note ├── Git Fork开源项目如何同步更新.pdf ├── Markdown常用标签.pdf ├── jupyter notebook导出pdf并支持中文.md ├── pandas填充缺失值fillna()函数.ipynb ├── 如何给 github 的开源项目提交 pull request.pdf └── 常见的Python代码报错及解决方案.pdf ├── Other ├── 01 Pyecharts渲染图表 .ipynb ├── Pyecharts.xlsx └── html │ ├── Gauge01.html │ ├── Gauge02.html │ ├── WordCloud.html │ ├── bar01.html │ ├── dark.html │ ├── images │ ├── Gauge01.png │ ├── Gauge02.png │ ├── WordCloud.png │ ├── bar.png │ ├── dark.png │ ├── pie.png │ └── start.png │ ├── pie.html │ └── start.html └── README.md /.gitignore: -------------------------------------------------------------------------------- 1 | Code/.ipynb_checkpoints/ 2 | Note/.ipynb_checkpoints/ 3 | -------------------------------------------------------------------------------- /Code/Chapter03 Pandas数据结构.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "**Pandas数据结构** \n", 8 | "Python数据分析主要用到Pandas、NumPy,matplotlib这几个模块,使用前需要先导入" 9 | ] 10 | }, 11 | { 12 | "cell_type": "code", 13 | "execution_count": 1, 14 | "metadata": {}, 15 | "outputs": [], 16 | "source": [ 17 | "#模块的导入\n", 18 | "import pandas as pd\n", 19 | "import numpy as np\n", 20 | "import matplotlib as plt" 21 | ] 22 | }, 23 | { 24 | "cell_type": "markdown", 25 | "metadata": {}, 26 | "source": [ 27 | "## Series数据结构" 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": {}, 33 | "source": [ 34 | "### Serise是什么 \n", 35 | "Serise是一种类似一维数组的对象,由一组数据及一组与之相关数据标签(即索引)组成" 36 | ] 37 | }, 38 | { 39 | "cell_type": "markdown", 40 | "metadata": {}, 41 | "source": [ 42 | "### 创建一个Series\n", 43 | "用pd.Series()方法创建,通过给Series()方法传入不同的对象即可实现" 44 | ] 45 | }, 46 | { 47 | "cell_type": "code", 48 | "execution_count": 2, 49 | "metadata": {}, 50 | "outputs": [ 51 | { 52 | "data": { 53 | "text/plain": [ 54 | "0 a\n", 55 | "1 b\n", 56 | "2 c\n", 57 | "3 d\n", 58 | "dtype: object" 59 | ] 60 | }, 61 | "execution_count": 2, 62 | "metadata": {}, 63 | "output_type": "execute_result" 64 | } 65 | ], 66 | "source": [ 67 | "#传入一个列表\n", 68 | "import pandas as pd\n", 69 | "S1 = pd.Series([\"a\",\"b\",\"c\",\"d\"])\n", 70 | "S1" 71 | ] 72 | }, 73 | { 74 | "cell_type": "code", 75 | "execution_count": 3, 76 | "metadata": {}, 77 | "outputs": [ 78 | { 79 | "data": { 80 | "text/plain": [ 81 | "a 1\n", 82 | "b 2\n", 83 | "c 3\n", 84 | "d 4\n", 85 | "dtype: int64" 86 | ] 87 | }, 88 | "execution_count": 3, 89 | "metadata": {}, 90 | "output_type": "execute_result" 91 | } 92 | ], 93 | "source": [ 94 | "#指定索引\n", 95 | "S2 = pd.Series([1,2,3,4],index = [\"a\",\"b\",\"c\",\"d\"])\n", 96 | "S2" 97 | ] 98 | }, 99 | { 100 | "cell_type": "code", 101 | "execution_count": 4, 102 | "metadata": {}, 103 | "outputs": [ 104 | { 105 | "data": { 106 | "text/plain": [ 107 | "a 1\n", 108 | "b 2\n", 109 | "c 3\n", 110 | "d 4\n", 111 | "dtype: int64" 112 | ] 113 | }, 114 | "execution_count": 4, 115 | "metadata": {}, 116 | "output_type": "execute_result" 117 | } 118 | ], 119 | "source": [ 120 | "#传入字典\n", 121 | "S3 = pd.Series({\"a\":1,\"b\":2,\"c\":3,\"d\":4})\n", 122 | "S3" 123 | ] 124 | }, 125 | { 126 | "cell_type": "markdown", 127 | "metadata": {}, 128 | "source": [ 129 | "### 利用index方法获取Series的索引" 130 | ] 131 | }, 132 | { 133 | "cell_type": "code", 134 | "execution_count": 5, 135 | "metadata": {}, 136 | "outputs": [ 137 | { 138 | "data": { 139 | "text/plain": [ 140 | "RangeIndex(start=0, stop=4, step=1)" 141 | ] 142 | }, 143 | "execution_count": 5, 144 | "metadata": {}, 145 | "output_type": "execute_result" 146 | } 147 | ], 148 | "source": [ 149 | "S1.index" 150 | ] 151 | }, 152 | { 153 | "cell_type": "code", 154 | "execution_count": 6, 155 | "metadata": {}, 156 | "outputs": [ 157 | { 158 | "data": { 159 | "text/plain": [ 160 | "Index(['a', 'b', 'c', 'd'], dtype='object')" 161 | ] 162 | }, 163 | "execution_count": 6, 164 | "metadata": {}, 165 | "output_type": "execute_result" 166 | } 167 | ], 168 | "source": [ 169 | "S2.index" 170 | ] 171 | }, 172 | { 173 | "cell_type": "markdown", 174 | "metadata": {}, 175 | "source": [ 176 | "### 利用values方法获取Series的值" 177 | ] 178 | }, 179 | { 180 | "cell_type": "code", 181 | "execution_count": 10, 182 | "metadata": {}, 183 | "outputs": [ 184 | { 185 | "data": { 186 | "text/plain": [ 187 | "array(['a', 'b', 'c', 'd'], dtype=object)" 188 | ] 189 | }, 190 | "execution_count": 10, 191 | "metadata": {}, 192 | "output_type": "execute_result" 193 | } 194 | ], 195 | "source": [ 196 | "S1.values" 197 | ] 198 | }, 199 | { 200 | "cell_type": "code", 201 | "execution_count": 11, 202 | "metadata": {}, 203 | "outputs": [ 204 | { 205 | "data": { 206 | "text/plain": [ 207 | "array([1, 2, 3, 4], dtype=int64)" 208 | ] 209 | }, 210 | "execution_count": 11, 211 | "metadata": {}, 212 | "output_type": "execute_result" 213 | } 214 | ], 215 | "source": [ 216 | "S2.values" 217 | ] 218 | }, 219 | { 220 | "cell_type": "markdown", 221 | "metadata": {}, 222 | "source": [ 223 | "## DataFrame表格型数据结构" 224 | ] 225 | }, 226 | { 227 | "cell_type": "markdown", 228 | "metadata": {}, 229 | "source": [ 230 | "### DataFrame是什么 \n", 231 | "DataFrame是由一组数据与一对索引(行索引和列索引)组成的表格型数据结构" 232 | ] 233 | }, 234 | { 235 | "cell_type": "markdown", 236 | "metadata": {}, 237 | "source": [ 238 | "### 创建一个DataFrame \n", 239 | "使用pd.DataFrame()方法创建,通过传入对象即可实现" 240 | ] 241 | }, 242 | { 243 | "cell_type": "code", 244 | "execution_count": 14, 245 | "metadata": {}, 246 | "outputs": [ 247 | { 248 | "data": { 249 | "text/html": [ 250 | "
\n", 251 | "\n", 264 | "\n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | "
0
0a
1b
2c
3d
\n", 290 | "
" 291 | ], 292 | "text/plain": [ 293 | " 0\n", 294 | "0 a\n", 295 | "1 b\n", 296 | "2 c\n", 297 | "3 d" 298 | ] 299 | }, 300 | "execution_count": 14, 301 | "metadata": {}, 302 | "output_type": "execute_result" 303 | } 304 | ], 305 | "source": [ 306 | "#传入一个列表\n", 307 | "import pandas as pd\n", 308 | "df1 = pd.DataFrame([\"a\",\"b\",\"c\",\"d\"])\n", 309 | "df1" 310 | ] 311 | }, 312 | { 313 | "cell_type": "code", 314 | "execution_count": 36, 315 | "metadata": {}, 316 | "outputs": [ 317 | { 318 | "data": { 319 | "text/html": [ 320 | "
\n", 321 | "\n", 334 | "\n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | " \n", 360 | " \n", 361 | " \n", 362 | " \n", 363 | " \n", 364 | "
01
0aA
1bB
2cC
3dD
\n", 365 | "
" 366 | ], 367 | "text/plain": [ 368 | " 0 1\n", 369 | "0 a A\n", 370 | "1 b B\n", 371 | "2 c C\n", 372 | "3 d D" 373 | ] 374 | }, 375 | "execution_count": 36, 376 | "metadata": {}, 377 | "output_type": "execute_result" 378 | } 379 | ], 380 | "source": [ 381 | "#传入一个嵌套列表\n", 382 | "df2 = pd.DataFrame([[\"a\",\"A\"],[\"b\",\"B\"],[\"c\",\"C\"],[\"d\",\"D\"]])\n", 383 | "df2" 384 | ] 385 | }, 386 | { 387 | "cell_type": "markdown", 388 | "metadata": {}, 389 | "source": [ 390 | "**指定行、列索引** \n", 391 | "- columns 参数自定义列索引\n", 392 | "- index 参数自定义行索引" 393 | ] 394 | }, 395 | { 396 | "cell_type": "code", 397 | "execution_count": 22, 398 | "metadata": {}, 399 | "outputs": [ 400 | { 401 | "data": { 402 | "text/html": [ 403 | "
\n", 404 | "\n", 417 | "\n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | " \n", 423 | " \n", 424 | " \n", 425 | " \n", 426 | " \n", 427 | " \n", 428 | " \n", 429 | " \n", 430 | " \n", 431 | " \n", 432 | " \n", 433 | " \n", 434 | " \n", 435 | " \n", 436 | " \n", 437 | " \n", 438 | " \n", 439 | " \n", 440 | " \n", 441 | " \n", 442 | " \n", 443 | " \n", 444 | " \n", 445 | " \n", 446 | " \n", 447 | "
小写大写
0aA
1bB
2cC
3dD
\n", 448 | "
" 449 | ], 450 | "text/plain": [ 451 | " 小写 大写\n", 452 | "0 a A\n", 453 | "1 b B\n", 454 | "2 c C\n", 455 | "3 d D" 456 | ] 457 | }, 458 | "execution_count": 22, 459 | "metadata": {}, 460 | "output_type": "execute_result" 461 | } 462 | ], 463 | "source": [ 464 | "# 设置列索引\n", 465 | "df31 = pd.DataFrame([[\"a\",\"A\"],[\"b\",\"B\"],[\"c\",\"C\"],[\"d\",\"D\"]],columns = [\"小写\",\"大写\"])\n", 466 | "df31" 467 | ] 468 | }, 469 | { 470 | "cell_type": "code", 471 | "execution_count": 24, 472 | "metadata": {}, 473 | "outputs": [ 474 | { 475 | "data": { 476 | "text/html": [ 477 | "
\n", 478 | "\n", 491 | "\n", 492 | " \n", 493 | " \n", 494 | " \n", 495 | " \n", 496 | " \n", 497 | " \n", 498 | " \n", 499 | " \n", 500 | " \n", 501 | " \n", 502 | " \n", 503 | " \n", 504 | " \n", 505 | " \n", 506 | " \n", 507 | " \n", 508 | " \n", 509 | " \n", 510 | " \n", 511 | " \n", 512 | " \n", 513 | " \n", 514 | " \n", 515 | " \n", 516 | " \n", 517 | " \n", 518 | " \n", 519 | " \n", 520 | " \n", 521 | "
01
aA
bB
cC
dD
\n", 522 | "
" 523 | ], 524 | "text/plain": [ 525 | " 0 1\n", 526 | "一 a A\n", 527 | "二 b B\n", 528 | "三 c C\n", 529 | "四 d D" 530 | ] 531 | }, 532 | "execution_count": 24, 533 | "metadata": {}, 534 | "output_type": "execute_result" 535 | } 536 | ], 537 | "source": [ 538 | "# 设置行索引\n", 539 | "df32 = pd.DataFrame([[\"a\",\"A\"],[\"b\",\"B\"],[\"c\",\"C\"],[\"d\",\"D\"]],index = [\"一\",\"二\",\"三\",\"四\"])\n", 540 | "df32" 541 | ] 542 | }, 543 | { 544 | "cell_type": "code", 545 | "execution_count": 37, 546 | "metadata": {}, 547 | "outputs": [ 548 | { 549 | "data": { 550 | "text/html": [ 551 | "
\n", 552 | "\n", 565 | "\n", 566 | " \n", 567 | " \n", 568 | " \n", 569 | " \n", 570 | " \n", 571 | " \n", 572 | " \n", 573 | " \n", 574 | " \n", 575 | " \n", 576 | " \n", 577 | " \n", 578 | " \n", 579 | " \n", 580 | " \n", 581 | " \n", 582 | " \n", 583 | " \n", 584 | " \n", 585 | " \n", 586 | " \n", 587 | " \n", 588 | " \n", 589 | " \n", 590 | " \n", 591 | " \n", 592 | " \n", 593 | " \n", 594 | " \n", 595 | "
小写大写
aA
bB
cC
dD
\n", 596 | "
" 597 | ], 598 | "text/plain": [ 599 | " 小写 大写\n", 600 | "一 a A\n", 601 | "二 b B\n", 602 | "三 c C\n", 603 | "四 d D" 604 | ] 605 | }, 606 | "execution_count": 37, 607 | "metadata": {}, 608 | "output_type": "execute_result" 609 | } 610 | ], 611 | "source": [ 612 | "# 行、列同时设置\n", 613 | "df33 = pd.DataFrame([[\"a\",\"A\"],[\"b\",\"B\"],[\"c\",\"C\"],[\"d\",\"D\"]],columns = [\"小写\",\"大写\"],index = [\"一\",\"二\",\"三\",\"四\"])\n", 614 | "df33" 615 | ] 616 | }, 617 | { 618 | "cell_type": "code", 619 | "execution_count": 38, 620 | "metadata": {}, 621 | "outputs": [ 622 | { 623 | "data": { 624 | "text/html": [ 625 | "
\n", 626 | "\n", 639 | "\n", 640 | " \n", 641 | " \n", 642 | " \n", 643 | " \n", 644 | " \n", 645 | " \n", 646 | " \n", 647 | " \n", 648 | " \n", 649 | " \n", 650 | " \n", 651 | " \n", 652 | " \n", 653 | " \n", 654 | " \n", 655 | " \n", 656 | " \n", 657 | " \n", 658 | " \n", 659 | " \n", 660 | " \n", 661 | " \n", 662 | " \n", 663 | " \n", 664 | " \n", 665 | " \n", 666 | " \n", 667 | " \n", 668 | " \n", 669 | "
小写大写
0aA
1bB
2cC
3dD
\n", 670 | "
" 671 | ], 672 | "text/plain": [ 673 | " 小写 大写\n", 674 | "0 a A\n", 675 | "1 b B\n", 676 | "2 c C\n", 677 | "3 d D" 678 | ] 679 | }, 680 | "execution_count": 38, 681 | "metadata": {}, 682 | "output_type": "execute_result" 683 | } 684 | ], 685 | "source": [ 686 | "#传入一个字段\n", 687 | "data = {\"小写\":[\"a\",\"b\",\"c\",\"d\"],\"大写\":[\"A\",\"B\",\"C\",\"D\"]}\n", 688 | "df41 = pd.DataFrame(data)\n", 689 | "df41" 690 | ] 691 | }, 692 | { 693 | "cell_type": "markdown", 694 | "metadata": {}, 695 | "source": [ 696 | "- 字典传入DataFrame时,key的值相当于列索引,如没设置行索引默认从0开始,如需设置行索引,可以赢index参数" 697 | ] 698 | }, 699 | { 700 | "cell_type": "code", 701 | "execution_count": 28, 702 | "metadata": {}, 703 | "outputs": [ 704 | { 705 | "data": { 706 | "text/html": [ 707 | "
\n", 708 | "\n", 721 | "\n", 722 | " \n", 723 | " \n", 724 | " \n", 725 | " \n", 726 | " \n", 727 | " \n", 728 | " \n", 729 | " \n", 730 | " \n", 731 | " \n", 732 | " \n", 733 | " \n", 734 | " \n", 735 | " \n", 736 | " \n", 737 | " \n", 738 | " \n", 739 | " \n", 740 | " \n", 741 | " \n", 742 | " \n", 743 | " \n", 744 | " \n", 745 | " \n", 746 | " \n", 747 | " \n", 748 | " \n", 749 | " \n", 750 | " \n", 751 | "
小写大写
aA
bB
cC
dD
\n", 752 | "
" 753 | ], 754 | "text/plain": [ 755 | " 小写 大写\n", 756 | "一 a A\n", 757 | "二 b B\n", 758 | "三 c C\n", 759 | "四 d D" 760 | ] 761 | }, 762 | "execution_count": 28, 763 | "metadata": {}, 764 | "output_type": "execute_result" 765 | } 766 | ], 767 | "source": [ 768 | "# 给传入字典的数据设置行索引\n", 769 | "data = {\"小写\":[\"a\",\"b\",\"c\",\"d\"],\"大写\":[\"A\",\"B\",\"C\",\"D\"]}\n", 770 | "df42 = pd.DataFrame(data,index = [\"一\",\"二\",\"三\",\"四\"])\n", 771 | "df42" 772 | ] 773 | }, 774 | { 775 | "cell_type": "markdown", 776 | "metadata": {}, 777 | "source": [ 778 | "### 获取DataFrame的行、列索引 \n", 779 | "- 利用columns方法获取DataFrame的列索引\n", 780 | "- 利用index方法获取DataFrame的行索引" 781 | ] 782 | }, 783 | { 784 | "cell_type": "code", 785 | "execution_count": 29, 786 | "metadata": {}, 787 | "outputs": [ 788 | { 789 | "data": { 790 | "text/plain": [ 791 | "RangeIndex(start=0, stop=2, step=1)" 792 | ] 793 | }, 794 | "execution_count": 29, 795 | "metadata": {}, 796 | "output_type": "execute_result" 797 | } 798 | ], 799 | "source": [ 800 | "#获取DataFrame列索引\n", 801 | "df2.columns" 802 | ] 803 | }, 804 | { 805 | "cell_type": "code", 806 | "execution_count": 33, 807 | "metadata": {}, 808 | "outputs": [ 809 | { 810 | "data": { 811 | "text/plain": [ 812 | "Index(['小写', '大写'], dtype='object')" 813 | ] 814 | }, 815 | "execution_count": 33, 816 | "metadata": {}, 817 | "output_type": "execute_result" 818 | } 819 | ], 820 | "source": [ 821 | "df33.columns" 822 | ] 823 | }, 824 | { 825 | "cell_type": "code", 826 | "execution_count": 34, 827 | "metadata": {}, 828 | "outputs": [ 829 | { 830 | "data": { 831 | "text/plain": [ 832 | "RangeIndex(start=0, stop=4, step=1)" 833 | ] 834 | }, 835 | "execution_count": 34, 836 | "metadata": {}, 837 | "output_type": "execute_result" 838 | } 839 | ], 840 | "source": [ 841 | "#获取DataFrame行索引\n", 842 | "df2.index" 843 | ] 844 | }, 845 | { 846 | "cell_type": "code", 847 | "execution_count": 35, 848 | "metadata": {}, 849 | "outputs": [ 850 | { 851 | "data": { 852 | "text/plain": [ 853 | "Index(['一', '二', '三', '四'], dtype='object')" 854 | ] 855 | }, 856 | "execution_count": 35, 857 | "metadata": {}, 858 | "output_type": "execute_result" 859 | } 860 | ], 861 | "source": [ 862 | "df33.index" 863 | ] 864 | }, 865 | { 866 | "cell_type": "markdown", 867 | "metadata": {}, 868 | "source": [ 869 | "## 获取DataFrame的值\n", 870 | "第6章中介绍" 871 | ] 872 | } 873 | ], 874 | "metadata": { 875 | "kernelspec": { 876 | "display_name": "Python 3", 877 | "language": "python", 878 | "name": "python3" 879 | }, 880 | "language_info": { 881 | "codemirror_mode": { 882 | "name": "ipython", 883 | "version": 3 884 | }, 885 | "file_extension": ".py", 886 | "mimetype": "text/x-python", 887 | "name": "python", 888 | "nbconvert_exporter": "python", 889 | "pygments_lexer": "ipython3", 890 | "version": "3.7.0" 891 | }, 892 | "toc": { 893 | "base_numbering": 1, 894 | "nav_menu": {}, 895 | "number_sections": true, 896 | "sideBar": true, 897 | "skip_h1_title": false, 898 | "title_cell": "Table of Contents", 899 | "title_sidebar": "第3章 Pandas数据结构", 900 | "toc_cell": false, 901 | "toc_position": { 902 | "height": "calc(100% - 180px)", 903 | "left": "10px", 904 | "top": "150px", 905 | "width": "320px" 906 | }, 907 | "toc_section_display": true, 908 | "toc_window_display": true 909 | } 910 | }, 911 | "nbformat": 4, 912 | "nbformat_minor": 2 913 | } 914 | -------------------------------------------------------------------------------- /Code/Chapter05 数据预处理.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 数据处理" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "## 缺失值处理\n", 15 | "### 缺失值查看" 16 | ] 17 | }, 18 | { 19 | "cell_type": "code", 20 | "execution_count": 1, 21 | "metadata": {}, 22 | "outputs": [ 23 | { 24 | "name": "stdout", 25 | "output_type": "stream", 26 | "text": [ 27 | "\n", 28 | "RangeIndex: 5 entries, 0 to 4\n", 29 | "Data columns (total 4 columns):\n", 30 | "编号 4 non-null object\n", 31 | "年龄 4 non-null float64\n", 32 | "性别 3 non-null object\n", 33 | "注册时间 4 non-null datetime64[ns]\n", 34 | "dtypes: datetime64[ns](1), float64(1), object(2)\n", 35 | "memory usage: 240.0+ bytes\n" 36 | ] 37 | } 38 | ], 39 | "source": [ 40 | "import pandas as pd\n", 41 | "df = pd.read_excel(r\"..\\Data\\Chapter05.xlsx\")\n", 42 | "df.head(20).info()#head()默认只显示前5条数据\n", 43 | "#df.info()#info()方法返回各个字段属性及每一列缺失数据的情况" 44 | ] 45 | }, 46 | { 47 | "cell_type": "markdown", 48 | "metadata": {}, 49 | "source": [ 50 | "### 缺失值删除" 51 | ] 52 | }, 53 | { 54 | "cell_type": "code", 55 | "execution_count": 2, 56 | "metadata": {}, 57 | "outputs": [ 58 | { 59 | "data": { 60 | "text/html": [ 61 | "
\n", 62 | "\n", 75 | "\n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | "
编号年龄性别注册时间
0A154.02018-08-08
1A216.0NaN2018-08-09
3A347.02018-08-10
4A441.02018-08-11
\n", 116 | "
" 117 | ], 118 | "text/plain": [ 119 | " 编号 年龄 性别 注册时间\n", 120 | "0 A1 54.0 男 2018-08-08\n", 121 | "1 A2 16.0 NaN 2018-08-09\n", 122 | "3 A3 47.0 女 2018-08-10\n", 123 | "4 A4 41.0 男 2018-08-11" 124 | ] 125 | }, 126 | "execution_count": 2, 127 | "metadata": {}, 128 | "output_type": "execute_result" 129 | } 130 | ], 131 | "source": [ 132 | "import pandas as pd\n", 133 | "df = pd.read_excel(r\"..\\Data\\Chapter05.xlsx\")\n", 134 | "df.dropna() #dropna()删除缺失值的行\n", 135 | "df.dropna(how = \"all\")#删除所有列为空的行" 136 | ] 137 | }, 138 | { 139 | "cell_type": "markdown", 140 | "metadata": {}, 141 | "source": [ 142 | "### 缺失值填充" 143 | ] 144 | }, 145 | { 146 | "cell_type": "code", 147 | "execution_count": 5, 148 | "metadata": {}, 149 | "outputs": [ 150 | { 151 | "data": { 152 | "text/html": [ 153 | "
\n", 154 | "\n", 167 | "\n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | "
编号年龄性别注册时间
0A154.02018-08-08
1A216.02018-08-09
2A330.02018-08-10
3A441.02018-08-11
\n", 208 | "
" 209 | ], 210 | "text/plain": [ 211 | " 编号 年龄 性别 注册时间\n", 212 | "0 A1 54.0 男 2018-08-08\n", 213 | "1 A2 16.0 男 2018-08-09\n", 214 | "2 A3 30.0 女 2018-08-10\n", 215 | "3 A4 41.0 男 2018-08-11" 216 | ] 217 | }, 218 | "execution_count": 5, 219 | "metadata": {}, 220 | "output_type": "execute_result" 221 | } 222 | ], 223 | "source": [ 224 | "import pandas as pd\n", 225 | "df = pd.read_excel(r\"..\\Data\\Chapter05.xlsx\",sheet_name=1)\n", 226 | "df.fillna(0)#fillna将缺失值填充为0\n", 227 | "df.fillna({\"性别\":\"男\",\"年龄\":30})#分别对性别和年龄填充\n" 228 | ] 229 | }, 230 | { 231 | "cell_type": "markdown", 232 | "metadata": {}, 233 | "source": [ 234 | "## 重复数据处理" 235 | ] 236 | }, 237 | { 238 | "cell_type": "code", 239 | "execution_count": 9, 240 | "metadata": {}, 241 | "outputs": [ 242 | { 243 | "data": { 244 | "text/html": [ 245 | "
\n", 246 | "\n", 259 | "\n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | " \n", 299 | "
订单编号客户姓名唯一识别码成交时间
0A1张通1012018-08-08
1A2李谷1022018-08-09
3A3孙凤1032018-08-10
5A5赵恒1042018-08-11
\n", 300 | "
" 301 | ], 302 | "text/plain": [ 303 | " 订单编号 客户姓名 唯一识别码 成交时间\n", 304 | "0 A1 张通 101 2018-08-08\n", 305 | "1 A2 李谷 102 2018-08-09\n", 306 | "3 A3 孙凤 103 2018-08-10\n", 307 | "5 A5 赵恒 104 2018-08-11" 308 | ] 309 | }, 310 | "execution_count": 9, 311 | "metadata": {}, 312 | "output_type": "execute_result" 313 | } 314 | ], 315 | "source": [ 316 | "import pandas as pd\n", 317 | "df = pd.read_excel(r\"..\\Data\\Chapter05.xlsx\",sheet_name=2)\n", 318 | "df.drop_duplicates() #删除重复的列\n", 319 | "df.drop_duplicates(subset = \"唯一识别码\") #指定判断的列\n", 320 | "df.drop_duplicates(subset = [\"客户姓名\",\"唯一识别码\"])\n", 321 | "df.drop_duplicates(subset = [\"客户姓名\",\"唯一识别码\"],keep = \"last\") #keep参数(first,last)设置保留那个值\n" 322 | ] 323 | }, 324 | { 325 | "cell_type": "markdown", 326 | "metadata": {}, 327 | "source": [ 328 | "## 异常值的检测与处理" 329 | ] 330 | }, 331 | { 332 | "cell_type": "markdown", 333 | "metadata": {}, 334 | "source": [ 335 | "对于异常值一般有以下几种处理方式:\n", 336 | "- 最常用的处理方式就是删除。\n", 337 | "- 把异常值当作缺失值来填充。\n", 338 | "- 把异常值当作特殊情况,研究异常值出现的原因" 339 | ] 340 | }, 341 | { 342 | "cell_type": "markdown", 343 | "metadata": {}, 344 | "source": [ 345 | "## 数据类型转换" 346 | ] 347 | }, 348 | { 349 | "cell_type": "markdown", 350 | "metadata": {}, 351 | "source": [ 352 | "### 数据类型" 353 | ] 354 | }, 355 | { 356 | "cell_type": "markdown", 357 | "metadata": {}, 358 | "source": [ 359 | "类型 | 说明\n", 360 | "---|---\n", 361 | "int | 整型数,即整数\n", 362 | "flat | 浮点数,即含有小数点的数\n", 363 | "object | Python对象类型,用O表示\n", 364 | "string_ | 字符串类型,经常用S表示,S10表示长度为10的字符串\n", 365 | "unicode_ | 谷歌程度的unicode类型,跟字符串的定义方式一样\n", 366 | "datatime64[ns] | 表示时间格式" 367 | ] 368 | }, 369 | { 370 | "cell_type": "code", 371 | "execution_count": 6, 372 | "metadata": {}, 373 | "outputs": [ 374 | { 375 | "name": "stdout", 376 | "output_type": "stream", 377 | "text": [ 378 | "\n", 379 | "RangeIndex: 6 entries, 0 to 5\n", 380 | "Data columns (total 4 columns):\n", 381 | "订单编号 6 non-null object\n", 382 | "客户姓名 6 non-null object\n", 383 | "唯一识别码 6 non-null int64\n", 384 | "成交时间 6 non-null datetime64[ns]\n", 385 | "dtypes: datetime64[ns](1), int64(1), object(2)\n", 386 | "memory usage: 272.0+ bytes\n" 387 | ] 388 | }, 389 | { 390 | "data": { 391 | "text/plain": [ 392 | "dtype('int64')" 393 | ] 394 | }, 395 | "execution_count": 6, 396 | "metadata": {}, 397 | "output_type": "execute_result" 398 | } 399 | ], 400 | "source": [ 401 | "import pandas as pd\n", 402 | "df = pd.read_excel(r\"..\\Data\\Chapter05.xlsx\",sheet_name = 2)\n", 403 | "df.info() #info( )获取每一列的数据类型\n", 404 | "df[\"订单编号\"].dtype # 查看订单编号这一列的数据类型\n", 405 | "df[\"唯一识别码\"].dtype # 查看唯一识别码这一列的数据类型" 406 | ] 407 | }, 408 | { 409 | "cell_type": "markdown", 410 | "metadata": {}, 411 | "source": [ 412 | "### 类型转换" 413 | ] 414 | }, 415 | { 416 | "cell_type": "code", 417 | "execution_count": 17, 418 | "metadata": {}, 419 | "outputs": [ 420 | { 421 | "data": { 422 | "text/plain": [ 423 | "0 101.0\n", 424 | "1 102.0\n", 425 | "2 103.0\n", 426 | "3 103.0\n", 427 | "4 104.0\n", 428 | "5 104.0\n", 429 | "Name: 唯一识别码, dtype: float64" 430 | ] 431 | }, 432 | "execution_count": 17, 433 | "metadata": {}, 434 | "output_type": "execute_result" 435 | } 436 | ], 437 | "source": [ 438 | "import pandas as pd\n", 439 | "df = pd.read_excel(r\"..\\Data\\Chapter05.xlsx\",sheet_name = 2)\n", 440 | "df[\"唯一识别码\"].dtype #查看类型\n", 441 | "df[\"唯一识别码\"].astype(\"float64\")#将唯一识别码冲int类型转为float类型" 442 | ] 443 | }, 444 | { 445 | "cell_type": "markdown", 446 | "metadata": {}, 447 | "source": [ 448 | "## 索引设置" 449 | ] 450 | }, 451 | { 452 | "cell_type": "markdown", 453 | "metadata": {}, 454 | "source": [ 455 | "### 为无索引表添加索引" 456 | ] 457 | }, 458 | { 459 | "cell_type": "code", 460 | "execution_count": 49, 461 | "metadata": {}, 462 | "outputs": [ 463 | { 464 | "data": { 465 | "text/html": [ 466 | "
\n", 467 | "\n", 480 | "\n", 481 | " \n", 482 | " \n", 483 | " \n", 484 | " \n", 485 | " \n", 486 | " \n", 487 | " \n", 488 | " \n", 489 | " \n", 490 | " \n", 491 | " \n", 492 | " \n", 493 | " \n", 494 | " \n", 495 | " \n", 496 | " \n", 497 | " \n", 498 | " \n", 499 | " \n", 500 | " \n", 501 | " \n", 502 | " \n", 503 | " \n", 504 | " \n", 505 | " \n", 506 | " \n", 507 | " \n", 508 | " \n", 509 | " \n", 510 | " \n", 511 | " \n", 512 | " \n", 513 | " \n", 514 | " \n", 515 | " \n", 516 | " \n", 517 | " \n", 518 | " \n", 519 | " \n", 520 | " \n", 521 | " \n", 522 | " \n", 523 | " \n", 524 | " \n", 525 | " \n", 526 | " \n", 527 | "
订单编号客户姓名唯一识别码成交时间
1A1张通1012018-08-08
2A2李谷1022018-08-09
3A3孙凤1032018-08-10
4A4赵恒1042018-08-11
5A5赵恒1042018-08-11
\n", 528 | "
" 529 | ], 530 | "text/plain": [ 531 | " 订单编号 客户姓名 唯一识别码 成交时间\n", 532 | "1 A1 张通 101 2018-08-08\n", 533 | "2 A2 李谷 102 2018-08-09\n", 534 | "3 A3 孙凤 103 2018-08-10\n", 535 | "4 A4 赵恒 104 2018-08-11\n", 536 | "5 A5 赵恒 104 2018-08-11" 537 | ] 538 | }, 539 | "execution_count": 49, 540 | "metadata": {}, 541 | "output_type": "execute_result" 542 | } 543 | ], 544 | "source": [ 545 | "import pandas as pd\n", 546 | "df = pd.read_excel(r\"..\\Data\\Chapter05.xlsx\",sheet_name = 3,header= None)\n", 547 | "df.columns = [\"订单编号\",\"客户姓名\",\"唯一识别码\",\"成交时间\"]#header需要设置为None,否则会覆盖第一行数据\n", 548 | "df.index = [1,2,3,4,5]\n", 549 | "df\n" 550 | ] 551 | }, 552 | { 553 | "cell_type": "markdown", 554 | "metadata": {}, 555 | "source": [ 556 | "### 重新设置索引" 557 | ] 558 | }, 559 | { 560 | "cell_type": "code", 561 | "execution_count": 66, 562 | "metadata": {}, 563 | "outputs": [ 564 | { 565 | "data": { 566 | "text/html": [ 567 | "
\n", 568 | "\n", 581 | "\n", 582 | " \n", 583 | " \n", 584 | " \n", 585 | " \n", 586 | " \n", 587 | " \n", 588 | " \n", 589 | " \n", 590 | " \n", 591 | " \n", 592 | " \n", 593 | " \n", 594 | " \n", 595 | " \n", 596 | " \n", 597 | " \n", 598 | " \n", 599 | " \n", 600 | " \n", 601 | " \n", 602 | " \n", 603 | " \n", 604 | " \n", 605 | " \n", 606 | " \n", 607 | " \n", 608 | " \n", 609 | " \n", 610 | " \n", 611 | " \n", 612 | " \n", 613 | " \n", 614 | " \n", 615 | " \n", 616 | " \n", 617 | " \n", 618 | " \n", 619 | " \n", 620 | " \n", 621 | " \n", 622 | " \n", 623 | " \n", 624 | " \n", 625 | " \n", 626 | " \n", 627 | " \n", 628 | " \n", 629 | " \n", 630 | " \n", 631 | " \n", 632 | " \n", 633 | " \n", 634 | "
客户姓名唯一识别码成交时间
订单编号
A1张通1012018-08-08
A2李谷1022018-08-09
A3孙凤1032018-08-10
A3孙凤1032018-08-10
A4赵恒1042018-08-11
A5赵恒1042018-08-11
\n", 635 | "
" 636 | ], 637 | "text/plain": [ 638 | " 客户姓名 唯一识别码 成交时间\n", 639 | "订单编号 \n", 640 | "A1 张通 101 2018-08-08\n", 641 | "A2 李谷 102 2018-08-09\n", 642 | "A3 孙凤 103 2018-08-10\n", 643 | "A3 孙凤 103 2018-08-10\n", 644 | "A4 赵恒 104 2018-08-11\n", 645 | "A5 赵恒 104 2018-08-11" 646 | ] 647 | }, 648 | "execution_count": 66, 649 | "metadata": {}, 650 | "output_type": "execute_result" 651 | } 652 | ], 653 | "source": [ 654 | "import pandas as pd\n", 655 | "df = pd.read_excel(r\"..\\Data\\Chapter05.xlsx\",sheet_name = 2)\n", 656 | "df.set_index(\"订单编号\") #se_index()方法重新设置索引列" 657 | ] 658 | }, 659 | { 660 | "cell_type": "markdown", 661 | "metadata": {}, 662 | "source": [ 663 | "### 重命名索引" 664 | ] 665 | }, 666 | { 667 | "cell_type": "code", 668 | "execution_count": 82, 669 | "metadata": {}, 670 | "outputs": [ 671 | { 672 | "data": { 673 | "text/html": [ 674 | "
\n", 675 | "\n", 688 | "\n", 689 | " \n", 690 | " \n", 691 | " \n", 692 | " \n", 693 | " \n", 694 | " \n", 695 | " \n", 696 | " \n", 697 | " \n", 698 | " \n", 699 | " \n", 700 | " \n", 701 | " \n", 702 | " \n", 703 | " \n", 704 | " \n", 705 | " \n", 706 | " \n", 707 | " \n", 708 | " \n", 709 | " \n", 710 | " \n", 711 | " \n", 712 | " \n", 713 | " \n", 714 | " \n", 715 | " \n", 716 | " \n", 717 | " \n", 718 | " \n", 719 | " \n", 720 | " \n", 721 | " \n", 722 | " \n", 723 | " \n", 724 | " \n", 725 | " \n", 726 | " \n", 727 | " \n", 728 | " \n", 729 | " \n", 730 | " \n", 731 | " \n", 732 | " \n", 733 | " \n", 734 | " \n", 735 | "
新订单编号新客户姓名唯一识别码成交时间
A1张通1012018-08-08
A2李谷1022018-08-09
A3孙凤1032018-08-10
A4赵恒1042018-08-11
5A5赵恒1042018-08-12
\n", 736 | "
" 737 | ], 738 | "text/plain": [ 739 | " 新订单编号 新客户姓名 唯一识别码 成交时间\n", 740 | "一 A1 张通 101 2018-08-08\n", 741 | "二 A2 李谷 102 2018-08-09\n", 742 | "三 A3 孙凤 103 2018-08-10\n", 743 | "四 A4 赵恒 104 2018-08-11\n", 744 | "5 A5 赵恒 104 2018-08-12" 745 | ] 746 | }, 747 | "execution_count": 82, 748 | "metadata": {}, 749 | "output_type": "execute_result" 750 | } 751 | ], 752 | "source": [ 753 | "import pandas as pd\n", 754 | "df = pd.read_excel(r\"..\\Data\\Chapter05.xlsx\",sheet_name = 4)\n", 755 | "df.index = [1,2,3,4,5] #添加索引\n", 756 | "df.rename(columns={\"订单编号\":\"新订单编号\",\"客户姓名\":\"新客户姓名\"}) #重命名列索引\n", 757 | "df.rename(index = {1:\"一\",2:\"二\",3:\"三\"}) #重命名行索引\n", 758 | "df.rename(columns={\"订单编号\":\"新订单编号\",\"客户姓名\":\"新客户姓名\"},index = {1:\"一\",2:\"二\",3:\"三\",4:'四'})#同时重命名列和行索引" 759 | ] 760 | }, 761 | { 762 | "cell_type": "markdown", 763 | "metadata": {}, 764 | "source": [ 765 | "### 重置索引" 766 | ] 767 | }, 768 | { 769 | "cell_type": "code", 770 | "execution_count": 7, 771 | "metadata": {}, 772 | "outputs": [ 773 | { 774 | "data": { 775 | "text/html": [ 776 | "
\n", 777 | "\n", 790 | "\n", 791 | " \n", 792 | " \n", 793 | " \n", 794 | " \n", 795 | " \n", 796 | " \n", 797 | " \n", 798 | " \n", 799 | " \n", 800 | " \n", 801 | " \n", 802 | " \n", 803 | " \n", 804 | " \n", 805 | " \n", 806 | " \n", 807 | " \n", 808 | " \n", 809 | " \n", 810 | " \n", 811 | " \n", 812 | " \n", 813 | " \n", 814 | " \n", 815 | " \n", 816 | " \n", 817 | " \n", 818 | " \n", 819 | " \n", 820 | " \n", 821 | " \n", 822 | " \n", 823 | " \n", 824 | " \n", 825 | " \n", 826 | " \n", 827 | " \n", 828 | " \n", 829 | " \n", 830 | " \n", 831 | " \n", 832 | " \n", 833 | " \n", 834 | " \n", 835 | " \n", 836 | " \n", 837 | "
level_0level_1C1C2
0Z1Z2NaNNaN
1Aa1.02.0
2NaNb3.04.0
3Ba5.06.0
4NaNb7.08.0
\n", 838 | "
" 839 | ], 840 | "text/plain": [ 841 | " level_0 level_1 C1 C2\n", 842 | "0 Z1 Z2 NaN NaN\n", 843 | "1 A a 1.0 2.0\n", 844 | "2 NaN b 3.0 4.0\n", 845 | "3 B a 5.0 6.0\n", 846 | "4 NaN b 7.0 8.0" 847 | ] 848 | }, 849 | "execution_count": 7, 850 | "metadata": {}, 851 | "output_type": "execute_result" 852 | } 853 | ], 854 | "source": [ 855 | "import pandas as pd\n", 856 | "df = pd.read_excel(r\"..\\Data\\Chapter05.xlsx\",sheet_name=5)\n", 857 | "df.reset_index()\n", 858 | "#详见第10章" 859 | ] 860 | } 861 | ], 862 | "metadata": { 863 | "kernelspec": { 864 | "display_name": "Python 3", 865 | "language": "python", 866 | "name": "python3" 867 | }, 868 | "language_info": { 869 | "codemirror_mode": { 870 | "name": "ipython", 871 | "version": 3 872 | }, 873 | "file_extension": ".py", 874 | "mimetype": "text/x-python", 875 | "name": "python", 876 | "nbconvert_exporter": "python", 877 | "pygments_lexer": "ipython3", 878 | "version": "3.7.0" 879 | }, 880 | "toc": { 881 | "base_numbering": 1, 882 | "nav_menu": {}, 883 | "number_sections": true, 884 | "sideBar": true, 885 | "skip_h1_title": false, 886 | "title_cell": "Table of Contents", 887 | "title_sidebar": "第5章 数据预处理", 888 | "toc_cell": false, 889 | "toc_position": {}, 890 | "toc_section_display": true, 891 | "toc_window_display": true 892 | } 893 | }, 894 | "nbformat": 4, 895 | "nbformat_minor": 2 896 | } 897 | -------------------------------------------------------------------------------- /Code/Chapter06 数据选择.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## 列选择" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "### 选择某一列/某几列" 15 | ] 16 | }, 17 | { 18 | "cell_type": "code", 19 | "execution_count": 8, 20 | "metadata": { 21 | "scrolled": true 22 | }, 23 | "outputs": [ 24 | { 25 | "data": { 26 | "text/html": [ 27 | "
\n", 28 | "\n", 41 | "\n", 42 | " \n", 43 | " \n", 44 | " \n", 45 | " \n", 46 | " \n", 47 | " \n", 48 | " \n", 49 | " \n", 50 | " \n", 51 | " \n", 52 | " \n", 53 | " \n", 54 | " \n", 55 | " \n", 56 | " \n", 57 | " \n", 58 | " \n", 59 | " \n", 60 | " \n", 61 | " \n", 62 | " \n", 63 | " \n", 64 | " \n", 65 | " \n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | "
订单编号唯一识别码
0A1101
1A2102
2A3103
3A3103
4A4104
5A5104
\n", 82 | "
" 83 | ], 84 | "text/plain": [ 85 | " 订单编号 唯一识别码\n", 86 | "0 A1 101\n", 87 | "1 A2 102\n", 88 | "2 A3 103\n", 89 | "3 A3 103\n", 90 | "4 A4 104\n", 91 | "5 A5 104" 92 | ] 93 | }, 94 | "execution_count": 8, 95 | "metadata": {}, 96 | "output_type": "execute_result" 97 | } 98 | ], 99 | "source": [ 100 | "import pandas as pd\n", 101 | "df = pd.read_excel(r\"..\\Data\\Chapter06.xlsx\",sheet_name = 0)\n", 102 | "#通过传入列名选择数据的方式称为普通索引\n", 103 | "df\n", 104 | "df['客户姓名']\n", 105 | "df[['订单编号','客户姓名']]\n", 106 | "#通过传入具体位置来选择数据的方式称为位置索引\n", 107 | "df.iloc[:,[0,2]] #获取第1和第3列的数值,:表示获取所有的行" 108 | ] 109 | }, 110 | { 111 | "cell_type": "markdown", 112 | "metadata": {}, 113 | "source": [ 114 | "### 连续选择某几列" 115 | ] 116 | }, 117 | { 118 | "cell_type": "code", 119 | "execution_count": 11, 120 | "metadata": {}, 121 | "outputs": [ 122 | { 123 | "data": { 124 | "text/html": [ 125 | "
\n", 126 | "\n", 139 | "\n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | "
订单编号客户姓名唯一识别码
0A1张通101
1A2李谷102
2A3孙凤103
3A3孙凤103
4A4赵恒104
5A5赵恒104
\n", 187 | "
" 188 | ], 189 | "text/plain": [ 190 | " 订单编号 客户姓名 唯一识别码\n", 191 | "0 A1 张通 101\n", 192 | "1 A2 李谷 102\n", 193 | "2 A3 孙凤 103\n", 194 | "3 A3 孙凤 103\n", 195 | "4 A4 赵恒 104\n", 196 | "5 A5 赵恒 104" 197 | ] 198 | }, 199 | "execution_count": 11, 200 | "metadata": {}, 201 | "output_type": "execute_result" 202 | } 203 | ], 204 | "source": [ 205 | "#通过传入一个位置区间来获取数据的方式称为切片索引\n", 206 | "df.iloc[:,0:3] #选择第1列到第4列的之间的值(包含第1列但是不包含第4列)" 207 | ] 208 | }, 209 | { 210 | "cell_type": "markdown", 211 | "metadata": {}, 212 | "source": [ 213 | "## 行选择" 214 | ] 215 | }, 216 | { 217 | "cell_type": "markdown", 218 | "metadata": {}, 219 | "source": [ 220 | "### 选择某一行/某几行" 221 | ] 222 | }, 223 | { 224 | "cell_type": "code", 225 | "execution_count": 18, 226 | "metadata": {}, 227 | "outputs": [ 228 | { 229 | "data": { 230 | "text/html": [ 231 | "
\n", 232 | "\n", 245 | "\n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | "
订单编号客户姓名唯一识别码成交时间
A1张通1012018-08-08
A2李谷1022018-08-09
\n", 272 | "
" 273 | ], 274 | "text/plain": [ 275 | " 订单编号 客户姓名 唯一识别码 成交时间\n", 276 | "一 A1 张通 101 2018-08-08\n", 277 | "二 A2 李谷 102 2018-08-09" 278 | ] 279 | }, 280 | "execution_count": 18, 281 | "metadata": {}, 282 | "output_type": "execute_result" 283 | } 284 | ], 285 | "source": [ 286 | "#利用loc()方法,普通索引\n", 287 | "df.index = [\"一\",\"二\",\"三\",\"四\",\"五\",\"六\"]\n", 288 | "df.loc[\"一\"]\n", 289 | "df.loc[[\"一\",\"二\"]]\n", 290 | "#利用iloc方法,位置索引\n", 291 | "df.iloc[0]\n", 292 | "df.iloc[[0,1]] #选择第一和第二行" 293 | ] 294 | }, 295 | { 296 | "cell_type": "markdown", 297 | "metadata": {}, 298 | "source": [ 299 | "### 选择连续的某几行" 300 | ] 301 | }, 302 | { 303 | "cell_type": "code", 304 | "execution_count": 19, 305 | "metadata": { 306 | "scrolled": true 307 | }, 308 | "outputs": [ 309 | { 310 | "data": { 311 | "text/html": [ 312 | "
\n", 313 | "\n", 326 | "\n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | "
订单编号客户姓名唯一识别码成交时间
A1张通1012018-08-08
A2李谷1022018-08-09
A3孙凤1032018-08-10
\n", 360 | "
" 361 | ], 362 | "text/plain": [ 363 | " 订单编号 客户姓名 唯一识别码 成交时间\n", 364 | "一 A1 张通 101 2018-08-08\n", 365 | "二 A2 李谷 102 2018-08-09\n", 366 | "三 A3 孙凤 103 2018-08-10" 367 | ] 368 | }, 369 | "execution_count": 19, 370 | "metadata": {}, 371 | "output_type": "execute_result" 372 | } 373 | ], 374 | "source": [ 375 | "df.iloc[0:3]#选择第一行到第四行(不包含第四行)" 376 | ] 377 | }, 378 | { 379 | "cell_type": "markdown", 380 | "metadata": {}, 381 | "source": [ 382 | "### 选择满足条件的行" 383 | ] 384 | }, 385 | { 386 | "cell_type": "code", 387 | "execution_count": 21, 388 | "metadata": {}, 389 | "outputs": [ 390 | { 391 | "data": { 392 | "text/html": [ 393 | "
\n", 394 | "\n", 407 | "\n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | " \n", 423 | " \n", 424 | " \n", 425 | " \n", 426 | " \n", 427 | " \n", 428 | "
订单编号客户姓名唯一识别码年龄成交时间
0A1张通101.031.02018-08-08
\n", 429 | "
" 430 | ], 431 | "text/plain": [ 432 | " 订单编号 客户姓名 唯一识别码 年龄 成交时间\n", 433 | "0 A1 张通 101.0 31.0 2018-08-08" 434 | ] 435 | }, 436 | "execution_count": 21, 437 | "metadata": {}, 438 | "output_type": "execute_result" 439 | } 440 | ], 441 | "source": [ 442 | "import pandas as pd\n", 443 | "df = pd.read_excel(r\"..\\Data\\Chapter06.xlsx\",sheet_name=3)\n", 444 | "df\n", 445 | "#选择年龄小于200的数据\n", 446 | "df[df['年龄']<200]\n", 447 | "#选择年龄小于200并且唯一识别码小于200,条件用括号括起来\n", 448 | "df[(df['年龄']<200) & (df['唯一识别码']<102)]" 449 | ] 450 | }, 451 | { 452 | "cell_type": "markdown", 453 | "metadata": {}, 454 | "source": [ 455 | "## 行列同时选择" 456 | ] 457 | }, 458 | { 459 | "cell_type": "markdown", 460 | "metadata": {}, 461 | "source": [ 462 | "### 普通索引+普通索引选择指定的行和列" 463 | ] 464 | }, 465 | { 466 | "cell_type": "code", 467 | "execution_count": 20, 468 | "metadata": {}, 469 | "outputs": [ 470 | { 471 | "data": { 472 | "text/html": [ 473 | "
\n", 474 | "\n", 487 | "\n", 488 | " \n", 489 | " \n", 490 | " \n", 491 | " \n", 492 | " \n", 493 | " \n", 494 | " \n", 495 | " \n", 496 | " \n", 497 | " \n", 498 | " \n", 499 | " \n", 500 | " \n", 501 | " \n", 502 | " \n", 503 | " \n", 504 | " \n", 505 | " \n", 506 | " \n", 507 | " \n", 508 | " \n", 509 | " \n", 510 | "
订单编号客户姓名唯一识别码
A1张通101
A2李谷102
\n", 511 | "
" 512 | ], 513 | "text/plain": [ 514 | " 订单编号 客户姓名 唯一识别码\n", 515 | "一 A1 张通 101\n", 516 | "二 A2 李谷 102" 517 | ] 518 | }, 519 | "execution_count": 20, 520 | "metadata": {}, 521 | "output_type": "execute_result" 522 | } 523 | ], 524 | "source": [ 525 | "import pandas as pd\n", 526 | "df = pd.read_excel(r\"..\\Data\\Chapter06.xlsx\",sheet_name=4)\n", 527 | "df.index = [\"一\",\"二\",\"三\",\"四\",\"五\"]\n", 528 | "#用loc传入行列名称\n", 529 | "df.loc[[\"一\",\"二\"],[\"订单编号\",\"客户姓名\",\"唯一识别码\"]]" 530 | ] 531 | }, 532 | { 533 | "cell_type": "markdown", 534 | "metadata": {}, 535 | "source": [ 536 | "### 位置索引+位置索引选择指定的行和列" 537 | ] 538 | }, 539 | { 540 | "cell_type": "code", 541 | "execution_count": 16, 542 | "metadata": {}, 543 | "outputs": [ 544 | { 545 | "data": { 546 | "text/html": [ 547 | "
\n", 548 | "\n", 561 | "\n", 562 | " \n", 563 | " \n", 564 | " \n", 565 | " \n", 566 | " \n", 567 | " \n", 568 | " \n", 569 | " \n", 570 | " \n", 571 | " \n", 572 | " \n", 573 | " \n", 574 | " \n", 575 | " \n", 576 | " \n", 577 | " \n", 578 | " \n", 579 | " \n", 580 | " \n", 581 | "
订单编号唯一识别码
A1101
A2102
\n", 582 | "
" 583 | ], 584 | "text/plain": [ 585 | " 订单编号 唯一识别码\n", 586 | "一 A1 101\n", 587 | "二 A2 102" 588 | ] 589 | }, 590 | "execution_count": 16, 591 | "metadata": {}, 592 | "output_type": "execute_result" 593 | } 594 | ], 595 | "source": [ 596 | "#用iloc方法传入行列位置\n", 597 | "df.iloc[[0,1],[0,2]]" 598 | ] 599 | }, 600 | { 601 | "cell_type": "markdown", 602 | "metadata": {}, 603 | "source": [ 604 | "### 布尔索引+普通缩影选择指定的行和列" 605 | ] 606 | }, 607 | { 608 | "cell_type": "code", 609 | "execution_count": 12, 610 | "metadata": {}, 611 | "outputs": [ 612 | { 613 | "data": { 614 | "text/html": [ 615 | "
\n", 616 | "\n", 629 | "\n", 630 | " \n", 631 | " \n", 632 | " \n", 633 | " \n", 634 | " \n", 635 | " \n", 636 | " \n", 637 | " \n", 638 | " \n", 639 | " \n", 640 | " \n", 641 | " \n", 642 | " \n", 643 | " \n", 644 | " \n", 645 | " \n", 646 | " \n", 647 | " \n", 648 | " \n", 649 | " \n", 650 | " \n", 651 | " \n", 652 | " \n", 653 | " \n", 654 | "
订单编号年龄
A131
A245
A323
\n", 655 | "
" 656 | ], 657 | "text/plain": [ 658 | " 订单编号 年龄\n", 659 | "一 A1 31\n", 660 | "二 A2 45\n", 661 | "三 A3 23" 662 | ] 663 | }, 664 | "execution_count": 12, 665 | "metadata": {}, 666 | "output_type": "execute_result" 667 | } 668 | ], 669 | "source": [ 670 | "#先进行布尔选择,然后通过普通索引选择列\n", 671 | "df[df[\"年龄\"]<200][[\"订单编号\",\"年龄\"]]" 672 | ] 673 | }, 674 | { 675 | "cell_type": "markdown", 676 | "metadata": {}, 677 | "source": [ 678 | "### 切片索引+切片索引选择指定的行和列" 679 | ] 680 | }, 681 | { 682 | "cell_type": "code", 683 | "execution_count": 17, 684 | "metadata": {}, 685 | "outputs": [ 686 | { 687 | "data": { 688 | "text/html": [ 689 | "
\n", 690 | "\n", 703 | "\n", 704 | " \n", 705 | " \n", 706 | " \n", 707 | " \n", 708 | " \n", 709 | " \n", 710 | " \n", 711 | " \n", 712 | " \n", 713 | " \n", 714 | " \n", 715 | " \n", 716 | " \n", 717 | " \n", 718 | " \n", 719 | " \n", 720 | " \n", 721 | " \n", 722 | " \n", 723 | " \n", 724 | " \n", 725 | " \n", 726 | " \n", 727 | " \n", 728 | " \n", 729 | " \n", 730 | " \n", 731 | " \n", 732 | "
客户姓名唯一识别码年龄
张通10131
李谷10245
孙凤10323
\n", 733 | "
" 734 | ], 735 | "text/plain": [ 736 | " 客户姓名 唯一识别码 年龄\n", 737 | "一 张通 101 31\n", 738 | "二 李谷 102 45\n", 739 | "三 孙凤 103 23" 740 | ] 741 | }, 742 | "execution_count": 17, 743 | "metadata": {}, 744 | "output_type": "execute_result" 745 | } 746 | ], 747 | "source": [ 748 | "import pandas as pd\n", 749 | "df = pd.read_excel(r\"..\\Data\\Chapter06.xlsx\",sheet_name=4)\n", 750 | "df.index = [\"一\",\"二\",\"三\",\"四\",\"五\"]\n", 751 | "#iloc第一个参数选择的是行区间,第二个参数选的是列的区间\n", 752 | "df.iloc[0:3,1:4]\n" 753 | ] 754 | }, 755 | { 756 | "cell_type": "markdown", 757 | "metadata": {}, 758 | "source": [ 759 | "### 切片索引+普通索引指定的行和列" 760 | ] 761 | }, 762 | { 763 | "cell_type": "code", 764 | "execution_count": 19, 765 | "metadata": {}, 766 | "outputs": [ 767 | { 768 | "name": "stderr", 769 | "output_type": "stream", 770 | "text": [ 771 | "D:\\Anaconda3\\lib\\site-packages\\ipykernel_launcher.py:2: DeprecationWarning: \n", 772 | ".ix is deprecated. Please use\n", 773 | ".loc for label based indexing or\n", 774 | ".iloc for positional indexing\n", 775 | "\n", 776 | "See the documentation here:\n", 777 | "http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated\n", 778 | " \n" 779 | ] 780 | }, 781 | { 782 | "data": { 783 | "text/html": [ 784 | "
\n", 785 | "\n", 798 | "\n", 799 | " \n", 800 | " \n", 801 | " \n", 802 | " \n", 803 | " \n", 804 | " \n", 805 | " \n", 806 | " \n", 807 | " \n", 808 | " \n", 809 | " \n", 810 | " \n", 811 | " \n", 812 | " \n", 813 | " \n", 814 | " \n", 815 | " \n", 816 | " \n", 817 | " \n", 818 | " \n", 819 | " \n", 820 | " \n", 821 | " \n", 822 | " \n", 823 | "
客户姓名唯一识别码
张通101
李谷102
孙凤103
\n", 824 | "
" 825 | ], 826 | "text/plain": [ 827 | " 客户姓名 唯一识别码\n", 828 | "一 张通 101\n", 829 | "二 李谷 102\n", 830 | "三 孙凤 103" 831 | ] 832 | }, 833 | "execution_count": 19, 834 | "metadata": {}, 835 | "output_type": "execute_result" 836 | } 837 | ], 838 | "source": [ 839 | "df\n", 840 | "df.ix[0:3,[\"客户姓名\",\"唯一识别码\"]]\n", 841 | "df.iloc[0:3][[\"客户姓名\",\"唯一识别码\"]]" 842 | ] 843 | } 844 | ], 845 | "metadata": { 846 | "kernelspec": { 847 | "display_name": "Python 3", 848 | "language": "python", 849 | "name": "python3" 850 | }, 851 | "language_info": { 852 | "codemirror_mode": { 853 | "name": "ipython", 854 | "version": 3 855 | }, 856 | "file_extension": ".py", 857 | "mimetype": "text/x-python", 858 | "name": "python", 859 | "nbconvert_exporter": "python", 860 | "pygments_lexer": "ipython3", 861 | "version": "3.7.0" 862 | }, 863 | "toc": { 864 | "base_numbering": 1, 865 | "nav_menu": {}, 866 | "number_sections": true, 867 | "sideBar": true, 868 | "skip_h1_title": false, 869 | "title_cell": "Table of Contents", 870 | "title_sidebar": "第6章 数据选择", 871 | "toc_cell": false, 872 | "toc_position": { 873 | "height": "calc(100% - 180px)", 874 | "left": "10px", 875 | "top": "150px", 876 | "width": "320px" 877 | }, 878 | "toc_section_display": true, 879 | "toc_window_display": true 880 | } 881 | }, 882 | "nbformat": 4, 883 | "nbformat_minor": 2 884 | } 885 | -------------------------------------------------------------------------------- /Code/Chapter08 数据运算.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 数据运算" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "## 算数运算" 15 | ] 16 | }, 17 | { 18 | "cell_type": "code", 19 | "execution_count": 1, 20 | "metadata": {}, 21 | "outputs": [ 22 | { 23 | "data": { 24 | "text/plain": [ 25 | "S1 3\n", 26 | "S2 9\n", 27 | "S3 15\n", 28 | "dtype: int64" 29 | ] 30 | }, 31 | "execution_count": 1, 32 | "metadata": {}, 33 | "output_type": "execute_result" 34 | } 35 | ], 36 | "source": [ 37 | "#两列相加\n", 38 | "import pandas as pd\n", 39 | "df = pd.read_excel(r\"../Data/Chapter08.xlsx\",sheet_name = 0)\n", 40 | "#添加行索引\n", 41 | "df.index=[\"S1\",\"S2\",\"S3\"]\n", 42 | "df[\"C1\"]+df[\"C2\"]" 43 | ] 44 | }, 45 | { 46 | "cell_type": "code", 47 | "execution_count": 2, 48 | "metadata": {}, 49 | "outputs": [ 50 | { 51 | "data": { 52 | "text/plain": [ 53 | "S1 -1\n", 54 | "S2 -1\n", 55 | "S3 -1\n", 56 | "dtype: int64" 57 | ] 58 | }, 59 | "execution_count": 2, 60 | "metadata": {}, 61 | "output_type": "execute_result" 62 | } 63 | ], 64 | "source": [ 65 | "#两列相减\n", 66 | "df[\"C1\"]-df[\"C2\"]" 67 | ] 68 | }, 69 | { 70 | "cell_type": "code", 71 | "execution_count": 15, 72 | "metadata": {}, 73 | "outputs": [ 74 | { 75 | "data": { 76 | "text/plain": [ 77 | "S1 2\n", 78 | "S2 20\n", 79 | "S3 56\n", 80 | "dtype: int64" 81 | ] 82 | }, 83 | "execution_count": 15, 84 | "metadata": {}, 85 | "output_type": "execute_result" 86 | } 87 | ], 88 | "source": [ 89 | "#两列相乘\n", 90 | "df[\"C1\"]*df[\"C2\"]" 91 | ] 92 | }, 93 | { 94 | "cell_type": "code", 95 | "execution_count": 4, 96 | "metadata": {}, 97 | "outputs": [ 98 | { 99 | "data": { 100 | "text/plain": [ 101 | "S1 0.500\n", 102 | "S2 0.800\n", 103 | "S3 0.875\n", 104 | "dtype: float64" 105 | ] 106 | }, 107 | "execution_count": 4, 108 | "metadata": {}, 109 | "output_type": "execute_result" 110 | } 111 | ], 112 | "source": [ 113 | "#两列相除\n", 114 | "df[\"C1\"]/df[\"C2\"]" 115 | ] 116 | }, 117 | { 118 | "cell_type": "code", 119 | "execution_count": 5, 120 | "metadata": {}, 121 | "outputs": [ 122 | { 123 | "data": { 124 | "text/plain": [ 125 | "S1 0\n", 126 | "S2 3\n", 127 | "S3 6\n", 128 | "Name: C1, dtype: int64" 129 | ] 130 | }, 131 | "execution_count": 5, 132 | "metadata": {}, 133 | "output_type": "execute_result" 134 | } 135 | ], 136 | "source": [ 137 | "#任意一列加/减一个常数\n", 138 | "df[\"C1\"]+1\n", 139 | "df[\"C1\"]-1" 140 | ] 141 | }, 142 | { 143 | "cell_type": "markdown", 144 | "metadata": {}, 145 | "source": [ 146 | "## 比较运算符" 147 | ] 148 | }, 149 | { 150 | "cell_type": "code", 151 | "execution_count": 8, 152 | "metadata": {}, 153 | "outputs": [ 154 | { 155 | "data": { 156 | "text/plain": [ 157 | "S1 True\n", 158 | "S2 True\n", 159 | "S3 True\n", 160 | "dtype: bool" 161 | ] 162 | }, 163 | "execution_count": 8, 164 | "metadata": {}, 165 | "output_type": "execute_result" 166 | } 167 | ], 168 | "source": [ 169 | "import pandas as pd\n", 170 | "df = pd.read_excel(r\"../Data/Chapter08.xlsx\",sheet_name = 0)\n", 171 | "#添加行索引\n", 172 | "df.index=[\"S1\",\"S2\",\"S3\"]\n", 173 | "df\n", 174 | "df[\"C1\"] > df[\"C2\"]\n", 175 | "df[\"C1\"] < df[\"C2\"]\n", 176 | "df[\"C1\"] != df[\"C2\"]" 177 | ] 178 | }, 179 | { 180 | "cell_type": "markdown", 181 | "metadata": {}, 182 | "source": [ 183 | "## 汇总运算" 184 | ] 185 | }, 186 | { 187 | "cell_type": "markdown", 188 | "metadata": {}, 189 | "source": [ 190 | "**count()非空值计数** \n", 191 | "非空值计数就是计算摸一个区域中非空数值的个数 \n", 192 | "默认是求每一列非空值的个数 \n", 193 | "修改axis=1可以计算每一行的非空值个数" 194 | ] 195 | }, 196 | { 197 | "cell_type": "code", 198 | "execution_count": 9, 199 | "metadata": {}, 200 | "outputs": [ 201 | { 202 | "data": { 203 | "text/plain": [ 204 | "C1 3\n", 205 | "C2 3\n", 206 | "C3 3\n", 207 | "dtype: int64" 208 | ] 209 | }, 210 | "execution_count": 9, 211 | "metadata": {}, 212 | "output_type": "execute_result" 213 | } 214 | ], 215 | "source": [ 216 | "import pandas as pd\n", 217 | "df = pd.read_excel(r\"../Data/Chapter08.xlsx\",sheet_name = 0)\n", 218 | "#添加行索引\n", 219 | "df.index=[\"S1\",\"S2\",\"S3\"]\n", 220 | "#计算每一列的非空个数\n", 221 | "df.count()" 222 | ] 223 | }, 224 | { 225 | "cell_type": "code", 226 | "execution_count": 10, 227 | "metadata": {}, 228 | "outputs": [ 229 | { 230 | "data": { 231 | "text/plain": [ 232 | "S1 3\n", 233 | "S2 3\n", 234 | "S3 3\n", 235 | "dtype: int64" 236 | ] 237 | }, 238 | "execution_count": 10, 239 | "metadata": {}, 240 | "output_type": "execute_result" 241 | } 242 | ], 243 | "source": [ 244 | "#计算每一行的非空值个数\n", 245 | "df.count(axis =1)" 246 | ] 247 | }, 248 | { 249 | "cell_type": "markdown", 250 | "metadata": {}, 251 | "source": [ 252 | "**sum()求和**" 253 | ] 254 | }, 255 | { 256 | "cell_type": "code", 257 | "execution_count": 11, 258 | "metadata": {}, 259 | "outputs": [ 260 | { 261 | "data": { 262 | "text/plain": [ 263 | "C1 12\n", 264 | "C2 15\n", 265 | "C3 18\n", 266 | "dtype: int64" 267 | ] 268 | }, 269 | "execution_count": 11, 270 | "metadata": {}, 271 | "output_type": "execute_result" 272 | } 273 | ], 274 | "source": [ 275 | "#默认对每一列求和\n", 276 | "df.sum()" 277 | ] 278 | }, 279 | { 280 | "cell_type": "code", 281 | "execution_count": 12, 282 | "metadata": {}, 283 | "outputs": [ 284 | { 285 | "data": { 286 | "text/plain": [ 287 | "S1 6\n", 288 | "S2 15\n", 289 | "S3 24\n", 290 | "dtype: int64" 291 | ] 292 | }, 293 | "execution_count": 12, 294 | "metadata": {}, 295 | "output_type": "execute_result" 296 | } 297 | ], 298 | "source": [ 299 | "#添加参数axis对每一行求和\n", 300 | "df.sum(axis = 1)" 301 | ] 302 | }, 303 | { 304 | "cell_type": "code", 305 | "execution_count": 13, 306 | "metadata": {}, 307 | "outputs": [ 308 | { 309 | "data": { 310 | "text/plain": [ 311 | "12" 312 | ] 313 | }, 314 | "execution_count": 13, 315 | "metadata": {}, 316 | "output_type": "execute_result" 317 | } 318 | ], 319 | "source": [ 320 | "#对具体某一列求和\n", 321 | "df[\"C1\"].sum()" 322 | ] 323 | }, 324 | { 325 | "cell_type": "markdown", 326 | "metadata": {}, 327 | "source": [ 328 | "**mean()求均值** \n", 329 | "求均值就是对某一区域中的所有值进行算数平均值运算" 330 | ] 331 | }, 332 | { 333 | "cell_type": "code", 334 | "execution_count": 14, 335 | "metadata": {}, 336 | "outputs": [ 337 | { 338 | "data": { 339 | "text/plain": [ 340 | "C1 4.0\n", 341 | "C2 5.0\n", 342 | "C3 6.0\n", 343 | "dtype: float64" 344 | ] 345 | }, 346 | "execution_count": 14, 347 | "metadata": {}, 348 | "output_type": "execute_result" 349 | } 350 | ], 351 | "source": [ 352 | "#默认对每一列进行均值运算\n", 353 | "df.mean()" 354 | ] 355 | }, 356 | { 357 | "cell_type": "code", 358 | "execution_count": 15, 359 | "metadata": {}, 360 | "outputs": [ 361 | { 362 | "data": { 363 | "text/plain": [ 364 | "S1 2.0\n", 365 | "S2 5.0\n", 366 | "S3 8.0\n", 367 | "dtype: float64" 368 | ] 369 | }, 370 | "execution_count": 15, 371 | "metadata": {}, 372 | "output_type": "execute_result" 373 | } 374 | ], 375 | "source": [ 376 | "#对每一行进行均值运算\n", 377 | "df.mean( axis =1)" 378 | ] 379 | }, 380 | { 381 | "cell_type": "code", 382 | "execution_count": 16, 383 | "metadata": {}, 384 | "outputs": [ 385 | { 386 | "data": { 387 | "text/plain": [ 388 | "4.0" 389 | ] 390 | }, 391 | "execution_count": 16, 392 | "metadata": {}, 393 | "output_type": "execute_result" 394 | } 395 | ], 396 | "source": [ 397 | "#指定某一列进行均值运算\n", 398 | "df[\"C1\"].mean()" 399 | ] 400 | }, 401 | { 402 | "cell_type": "markdown", 403 | "metadata": {}, 404 | "source": [ 405 | "**max()求最大值**" 406 | ] 407 | }, 408 | { 409 | "cell_type": "code", 410 | "execution_count": 17, 411 | "metadata": {}, 412 | "outputs": [ 413 | { 414 | "data": { 415 | "text/plain": [ 416 | "C1 7\n", 417 | "C2 8\n", 418 | "C3 9\n", 419 | "dtype: int64" 420 | ] 421 | }, 422 | "execution_count": 17, 423 | "metadata": {}, 424 | "output_type": "execute_result" 425 | } 426 | ], 427 | "source": [ 428 | "#默认返回每一列的最大值\n", 429 | "df.max()" 430 | ] 431 | }, 432 | { 433 | "cell_type": "code", 434 | "execution_count": 18, 435 | "metadata": {}, 436 | "outputs": [ 437 | { 438 | "data": { 439 | "text/plain": [ 440 | "S1 3\n", 441 | "S2 6\n", 442 | "S3 9\n", 443 | "dtype: int64" 444 | ] 445 | }, 446 | "execution_count": 18, 447 | "metadata": {}, 448 | "output_type": "execute_result" 449 | } 450 | ], 451 | "source": [ 452 | "#对每一行求最大值\n", 453 | "df.max( axis =1)" 454 | ] 455 | }, 456 | { 457 | "cell_type": "code", 458 | "execution_count": 19, 459 | "metadata": {}, 460 | "outputs": [ 461 | { 462 | "data": { 463 | "text/plain": [ 464 | "7" 465 | ] 466 | }, 467 | "execution_count": 19, 468 | "metadata": {}, 469 | "output_type": "execute_result" 470 | } 471 | ], 472 | "source": [ 473 | "# 对某一列求最大值\n", 474 | "df[\"C1\"].max()" 475 | ] 476 | }, 477 | { 478 | "cell_type": "markdown", 479 | "metadata": {}, 480 | "source": [ 481 | "**min()求最小值使用方法和max()一致**" 482 | ] 483 | }, 484 | { 485 | "cell_type": "markdown", 486 | "metadata": {}, 487 | "source": [ 488 | "**median()求中位数** \n", 489 | "中位数就是将一组含有n个数据的序列X按照从小到大排列,位于中间位置的那个数,使用方法和其他函数一致" 490 | ] 491 | }, 492 | { 493 | "cell_type": "code", 494 | "execution_count": 20, 495 | "metadata": {}, 496 | "outputs": [ 497 | { 498 | "data": { 499 | "text/plain": [ 500 | "C1 4.0\n", 501 | "C2 5.0\n", 502 | "C3 6.0\n", 503 | "dtype: float64" 504 | ] 505 | }, 506 | "execution_count": 20, 507 | "metadata": {}, 508 | "output_type": "execute_result" 509 | } 510 | ], 511 | "source": [ 512 | "df.median()" 513 | ] 514 | }, 515 | { 516 | "cell_type": "markdown", 517 | "metadata": {}, 518 | "source": [ 519 | "**mode()求众数** \n", 520 | "众数就是在一组数据中出现次数最多的数,使用方法与其他函数一致" 521 | ] 522 | }, 523 | { 524 | "cell_type": "code", 525 | "execution_count": 26, 526 | "metadata": {}, 527 | "outputs": [ 528 | { 529 | "data": { 530 | "text/html": [ 531 | "
\n", 532 | "\n", 545 | "\n", 546 | " \n", 547 | " \n", 548 | " \n", 549 | " \n", 550 | " \n", 551 | " \n", 552 | " \n", 553 | " \n", 554 | " \n", 555 | " \n", 556 | " \n", 557 | " \n", 558 | " \n", 559 | " \n", 560 | " \n", 561 | " \n", 562 | "
C1C2C3
0113
\n", 563 | "
" 564 | ], 565 | "text/plain": [ 566 | " C1 C2 C3\n", 567 | "0 1 1 3" 568 | ] 569 | }, 570 | "execution_count": 26, 571 | "metadata": {}, 572 | "output_type": "execute_result" 573 | } 574 | ], 575 | "source": [ 576 | "import pandas as pd\n", 577 | "df = pd.read_excel(r\"../Data/Chapter08.xlsx\",sheet_name=1)\n", 578 | "df.index=[\"S1\",\"S2\",\"S3\"]\n", 579 | "df.mode()" 580 | ] 581 | }, 582 | { 583 | "cell_type": "markdown", 584 | "metadata": {}, 585 | "source": [ 586 | "**var()求方差✩** \n", 587 | "方差是用来衡量一组数据离散程度的,使用方法与其他函数一致 \n", 588 | "**std()求标准差✩** \n", 589 | "标准差是方差的平方根,二者都是用来表示数据的离散程度的,使用方法与其他函数一致" 590 | ] 591 | }, 592 | { 593 | "cell_type": "markdown", 594 | "metadata": {}, 595 | "source": [ 596 | "**quantile()求分数位** \n", 597 | "分数位是比中数位更加详细的基于位置的指标,有四分之一分数位、四分之二分数位、四分之三分数位,而四分之二分数位就是中数位。\n" 598 | ] 599 | }, 600 | { 601 | "cell_type": "code", 602 | "execution_count": 27, 603 | "metadata": {}, 604 | "outputs": [ 605 | { 606 | "data": { 607 | "text/plain": [ 608 | "C1 4.0\n", 609 | "C2 5.0\n", 610 | "C3 6.0\n", 611 | "Name: 0.25, dtype: float64" 612 | ] 613 | }, 614 | "execution_count": 27, 615 | "metadata": {}, 616 | "output_type": "execute_result" 617 | } 618 | ], 619 | "source": [ 620 | "import pandas as pd\n", 621 | "df = pd.read_excel(r\"../Data/Chapter08.xlsx\",sheet_name=2)\n", 622 | "df.index=[\"S1\",\"S2\",\"S3\",\"S4\",\"S5\"]\n", 623 | "df\n", 624 | "df.quantile(0.25)#求四分之一分数位" 625 | ] 626 | }, 627 | { 628 | "cell_type": "code", 629 | "execution_count": 13, 630 | "metadata": {}, 631 | "outputs": [ 632 | { 633 | "data": { 634 | "text/plain": [ 635 | "C1 10.0\n", 636 | "C2 11.0\n", 637 | "C3 12.0\n", 638 | "Name: 0.75, dtype: float64" 639 | ] 640 | }, 641 | "execution_count": 13, 642 | "metadata": {}, 643 | "output_type": "execute_result" 644 | } 645 | ], 646 | "source": [ 647 | "df.quantile(0.75)#求四分之三分数位" 648 | ] 649 | }, 650 | { 651 | "cell_type": "code", 652 | "execution_count": 15, 653 | "metadata": {}, 654 | "outputs": [ 655 | { 656 | "data": { 657 | "text/plain": [ 658 | "S1 1.5\n", 659 | "S2 4.5\n", 660 | "S3 7.5\n", 661 | "S4 10.5\n", 662 | "S5 13.5\n", 663 | "Name: 0.25, dtype: float64" 664 | ] 665 | }, 666 | "execution_count": 15, 667 | "metadata": {}, 668 | "output_type": "execute_result" 669 | } 670 | ], 671 | "source": [ 672 | "df.quantile(0.25,axis = 1)#求每一行的四分之一分数位" 673 | ] 674 | }, 675 | { 676 | "cell_type": "markdown", 677 | "metadata": {}, 678 | "source": [ 679 | "## 相关性运算符✩\n", 680 | "相关性长用来衡量两个事之间的相关程度,用corr()函数" 681 | ] 682 | }, 683 | { 684 | "cell_type": "code", 685 | "execution_count": 17, 686 | "metadata": {}, 687 | "outputs": [ 688 | { 689 | "data": { 690 | "text/html": [ 691 | "
\n", 692 | "\n", 705 | "\n", 706 | " \n", 707 | " \n", 708 | " \n", 709 | " \n", 710 | " \n", 711 | " \n", 712 | " \n", 713 | " \n", 714 | " \n", 715 | " \n", 716 | " \n", 717 | " \n", 718 | " \n", 719 | " \n", 720 | " \n", 721 | " \n", 722 | " \n", 723 | " \n", 724 | " \n", 725 | " \n", 726 | " \n", 727 | " \n", 728 | " \n", 729 | " \n", 730 | " \n", 731 | " \n", 732 | " \n", 733 | " \n", 734 | "
C1C2C3
C11.01.01.0
C21.01.01.0
C31.01.01.0
\n", 735 | "
" 736 | ], 737 | "text/plain": [ 738 | " C1 C2 C3\n", 739 | "C1 1.0 1.0 1.0\n", 740 | "C2 1.0 1.0 1.0\n", 741 | "C3 1.0 1.0 1.0" 742 | ] 743 | }, 744 | "execution_count": 17, 745 | "metadata": {}, 746 | "output_type": "execute_result" 747 | } 748 | ], 749 | "source": [ 750 | "df.corr()" 751 | ] 752 | } 753 | ], 754 | "metadata": { 755 | "kernelspec": { 756 | "display_name": "Python 3", 757 | "language": "python", 758 | "name": "python3" 759 | }, 760 | "language_info": { 761 | "codemirror_mode": { 762 | "name": "ipython", 763 | "version": 3 764 | }, 765 | "file_extension": ".py", 766 | "mimetype": "text/x-python", 767 | "name": "python", 768 | "nbconvert_exporter": "python", 769 | "pygments_lexer": "ipython3", 770 | "version": "3.7.0" 771 | }, 772 | "toc": { 773 | "base_numbering": 1, 774 | "nav_menu": {}, 775 | "number_sections": true, 776 | "sideBar": true, 777 | "skip_h1_title": false, 778 | "title_cell": "Table of Contents", 779 | "title_sidebar": "第8章 数据运算", 780 | "toc_cell": false, 781 | "toc_position": { 782 | "height": "calc(100% - 180px)", 783 | "left": "10px", 784 | "top": "150px", 785 | "width": "320px" 786 | }, 787 | "toc_section_display": true, 788 | "toc_window_display": true 789 | } 790 | }, 791 | "nbformat": 4, 792 | "nbformat_minor": 2 793 | } 794 | -------------------------------------------------------------------------------- /Code/Chapter09 时间序列.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 时间序列" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "## 获取当前时刻的时间" 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "**返回当前时刻的日期和时间**" 22 | ] 23 | }, 24 | { 25 | "cell_type": "code", 26 | "execution_count": 76, 27 | "metadata": {}, 28 | "outputs": [ 29 | { 30 | "data": { 31 | "text/plain": [ 32 | "datetime.datetime(2019, 3, 14, 15, 57, 43, 307645)" 33 | ] 34 | }, 35 | "execution_count": 76, 36 | "metadata": {}, 37 | "output_type": "execute_result" 38 | } 39 | ], 40 | "source": [ 41 | "from datetime import datetime\n", 42 | "datetime.now()" 43 | ] 44 | }, 45 | { 46 | "cell_type": "markdown", 47 | "metadata": {}, 48 | "source": [ 49 | "**分别返回当前时刻的年、月、日**" 50 | ] 51 | }, 52 | { 53 | "cell_type": "code", 54 | "execution_count": 77, 55 | "metadata": {}, 56 | "outputs": [ 57 | { 58 | "data": { 59 | "text/plain": [ 60 | "2019" 61 | ] 62 | }, 63 | "execution_count": 77, 64 | "metadata": {}, 65 | "output_type": "execute_result" 66 | } 67 | ], 68 | "source": [ 69 | "datetime.now().year " 70 | ] 71 | }, 72 | { 73 | "cell_type": "code", 74 | "execution_count": 3, 75 | "metadata": {}, 76 | "outputs": [ 77 | { 78 | "data": { 79 | "text/plain": [ 80 | "3" 81 | ] 82 | }, 83 | "execution_count": 3, 84 | "metadata": {}, 85 | "output_type": "execute_result" 86 | } 87 | ], 88 | "source": [ 89 | "datetime.now().month" 90 | ] 91 | }, 92 | { 93 | "cell_type": "code", 94 | "execution_count": 78, 95 | "metadata": {}, 96 | "outputs": [ 97 | { 98 | "data": { 99 | "text/plain": [ 100 | "14" 101 | ] 102 | }, 103 | "execution_count": 78, 104 | "metadata": {}, 105 | "output_type": "execute_result" 106 | } 107 | ], 108 | "source": [ 109 | "datetime.now().day" 110 | ] 111 | }, 112 | { 113 | "cell_type": "markdown", 114 | "metadata": {}, 115 | "source": [ 116 | "**返回当前时刻的周数**" 117 | ] 118 | }, 119 | { 120 | "cell_type": "code", 121 | "execution_count": 6, 122 | "metadata": {}, 123 | "outputs": [ 124 | { 125 | "data": { 126 | "text/plain": [ 127 | "7" 128 | ] 129 | }, 130 | "execution_count": 6, 131 | "metadata": {}, 132 | "output_type": "execute_result" 133 | } 134 | ], 135 | "source": [ 136 | "#返回周几,python周几是从0开始的,所以后面加1\n", 137 | "datetime.now().weekday()+1" 138 | ] 139 | }, 140 | { 141 | "cell_type": "code", 142 | "execution_count": 9, 143 | "metadata": {}, 144 | "outputs": [ 145 | { 146 | "data": { 147 | "text/plain": [ 148 | "(2019, 10, 7)" 149 | ] 150 | }, 151 | "execution_count": 9, 152 | "metadata": {}, 153 | "output_type": "execute_result" 154 | } 155 | ], 156 | "source": [ 157 | "#返回周数\n", 158 | "datetime.now().isocalendar()\n", 159 | "#2019年第10周的第7天" 160 | ] 161 | }, 162 | { 163 | "cell_type": "code", 164 | "execution_count": 11, 165 | "metadata": {}, 166 | "outputs": [ 167 | { 168 | "data": { 169 | "text/plain": [ 170 | "10" 171 | ] 172 | }, 173 | "execution_count": 11, 174 | "metadata": {}, 175 | "output_type": "execute_result" 176 | } 177 | ], 178 | "source": [ 179 | "#返回周数\n", 180 | "datetime.now().isocalendar()[1]" 181 | ] 182 | }, 183 | { 184 | "cell_type": "markdown", 185 | "metadata": {}, 186 | "source": [ 187 | "**指定日期和时间格式** \n", 188 | "- date()函数将只展示日期 \n", 189 | "- time()函数将只展示时间 \n", 190 | "- strftime()函数可以自定义时间和日期格式" 191 | ] 192 | }, 193 | { 194 | "cell_type": "code", 195 | "execution_count": 12, 196 | "metadata": {}, 197 | "outputs": [ 198 | { 199 | "data": { 200 | "text/plain": [ 201 | "datetime.date(2019, 3, 10)" 202 | ] 203 | }, 204 | "execution_count": 12, 205 | "metadata": {}, 206 | "output_type": "execute_result" 207 | } 208 | ], 209 | "source": [ 210 | "datetime.now().date()" 211 | ] 212 | }, 213 | { 214 | "cell_type": "code", 215 | "execution_count": 13, 216 | "metadata": { 217 | "scrolled": true 218 | }, 219 | "outputs": [ 220 | { 221 | "data": { 222 | "text/plain": [ 223 | "datetime.time(22, 11, 41, 36684)" 224 | ] 225 | }, 226 | "execution_count": 13, 227 | "metadata": {}, 228 | "output_type": "execute_result" 229 | } 230 | ], 231 | "source": [ 232 | "datetime.now().time()" 233 | ] 234 | }, 235 | { 236 | "cell_type": "markdown", 237 | "metadata": {}, 238 | "source": [ 239 | "strftime()定义的时间格式 \n", 240 | "\n", 241 | "代码 | 说明\n", 242 | "---|---\n", 243 | "%H | 小时(24小时制)[00,23]\n", 244 | "%I | 小时(24小时制)[01,12]\n", 245 | "%M | 两位数的分[00,59]\n", 246 | "%S | 秒\\[00,61](60和61用于闰秒)\n", 247 | "%w | 用整数表示星期几,从0开始\n", 248 | "%U | 每年的第几周,周日被认为每周第一天\n", 249 | "%U | 每年的第几周,周一被认为每周第一天\n", 250 | "%F | %Y-%m-%d的简写形式,例如2018-04-18\n", 251 | "%D | %m/%d/%y的简写形式,例如04/18/2018" 252 | ] 253 | }, 254 | { 255 | "cell_type": "code", 256 | "execution_count": 14, 257 | "metadata": {}, 258 | "outputs": [ 259 | { 260 | "data": { 261 | "text/plain": [ 262 | "'2019-03-10'" 263 | ] 264 | }, 265 | "execution_count": 14, 266 | "metadata": {}, 267 | "output_type": "execute_result" 268 | } 269 | ], 270 | "source": [ 271 | "datetime.now().strftime(\"%Y-%m-%d\")" 272 | ] 273 | }, 274 | { 275 | "cell_type": "markdown", 276 | "metadata": {}, 277 | "source": [ 278 | "## 字符串和时间格式相互转换" 279 | ] 280 | }, 281 | { 282 | "cell_type": "markdown", 283 | "metadata": {}, 284 | "source": [ 285 | "**将时间格式转换为字符串格式** \n", 286 | "使用str()函数" 287 | ] 288 | }, 289 | { 290 | "cell_type": "code", 291 | "execution_count": 79, 292 | "metadata": {}, 293 | "outputs": [ 294 | { 295 | "data": { 296 | "text/plain": [ 297 | "str" 298 | ] 299 | }, 300 | "execution_count": 79, 301 | "metadata": {}, 302 | "output_type": "execute_result" 303 | } 304 | ], 305 | "source": [ 306 | "from datetime import datetime\n", 307 | "now = datetime.now()\n", 308 | "now\n", 309 | "type(now)\n", 310 | "type(str(now))" 311 | ] 312 | }, 313 | { 314 | "cell_type": "markdown", 315 | "metadata": {}, 316 | "source": [ 317 | "**将字符串格式转换为时间格式** \n", 318 | "使用parse()函数" 319 | ] 320 | }, 321 | { 322 | "cell_type": "code", 323 | "execution_count": 11, 324 | "metadata": {}, 325 | "outputs": [ 326 | { 327 | "data": { 328 | "text/plain": [ 329 | "datetime.datetime" 330 | ] 331 | }, 332 | "execution_count": 11, 333 | "metadata": {}, 334 | "output_type": "execute_result" 335 | } 336 | ], 337 | "source": [ 338 | "from dateutil.parser import parse\n", 339 | "str_time = \"2019-03-11\"\n", 340 | "type(str_time)\n", 341 | "parse(str_time)\n", 342 | "type(parse(str_time))" 343 | ] 344 | }, 345 | { 346 | "cell_type": "markdown", 347 | "metadata": {}, 348 | "source": [ 349 | "## 时间索引 \n", 350 | "时间索引就是根据时间来对时间格式的字段进行数据选取的一种索引方式。" 351 | ] 352 | }, 353 | { 354 | "cell_type": "code", 355 | "execution_count": 4, 356 | "metadata": {}, 357 | "outputs": [ 358 | { 359 | "data": { 360 | "text/html": [ 361 | "
\n", 362 | "\n", 375 | "\n", 376 | " \n", 377 | " \n", 378 | " \n", 379 | " \n", 380 | " \n", 381 | " \n", 382 | " \n", 383 | " \n", 384 | " \n", 385 | " \n", 386 | " \n", 387 | " \n", 388 | " \n", 389 | " \n", 390 | " \n", 391 | " \n", 392 | " \n", 393 | " \n", 394 | " \n", 395 | " \n", 396 | " \n", 397 | " \n", 398 | " \n", 399 | " \n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | " \n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | " \n", 423 | " \n", 424 | "
num
2018-01-011
2018-01-022
2018-01-033
2018-01-044
2018-01-055
2018-01-066
2018-01-077
2018-01-088
2018-01-099
2018-01-1010
\n", 425 | "
" 426 | ], 427 | "text/plain": [ 428 | " num\n", 429 | "2018-01-01 1\n", 430 | "2018-01-02 2\n", 431 | "2018-01-03 3\n", 432 | "2018-01-04 4\n", 433 | "2018-01-05 5\n", 434 | "2018-01-06 6\n", 435 | "2018-01-07 7\n", 436 | "2018-01-08 8\n", 437 | "2018-01-09 9\n", 438 | "2018-01-10 10" 439 | ] 440 | }, 441 | "execution_count": 4, 442 | "metadata": {}, 443 | "output_type": "execute_result" 444 | } 445 | ], 446 | "source": [ 447 | "import pandas as pd\n", 448 | "import numpy as np\n", 449 | "index = pd.DatetimeIndex(['2018-01-01','2018-01-02','2018-01-03','2018-01-04','2018-01-05',\n", 450 | " '2018-01-06','2018-01-07','2018-01-08','2018-01-09','2018-01-10'])\n", 451 | "data = pd.DataFrame(np.arange(1,11),columns =[\"num\"],index = index)\n", 452 | "data" 453 | ] 454 | }, 455 | { 456 | "cell_type": "code", 457 | "execution_count": 5, 458 | "metadata": {}, 459 | "outputs": [ 460 | { 461 | "data": { 462 | "text/html": [ 463 | "
\n", 464 | "\n", 477 | "\n", 478 | " \n", 479 | " \n", 480 | " \n", 481 | " \n", 482 | " \n", 483 | " \n", 484 | " \n", 485 | " \n", 486 | " \n", 487 | " \n", 488 | " \n", 489 | " \n", 490 | " \n", 491 | " \n", 492 | " \n", 493 | " \n", 494 | " \n", 495 | " \n", 496 | " \n", 497 | " \n", 498 | " \n", 499 | " \n", 500 | " \n", 501 | " \n", 502 | " \n", 503 | " \n", 504 | " \n", 505 | " \n", 506 | " \n", 507 | " \n", 508 | " \n", 509 | " \n", 510 | " \n", 511 | " \n", 512 | " \n", 513 | " \n", 514 | " \n", 515 | " \n", 516 | " \n", 517 | " \n", 518 | " \n", 519 | " \n", 520 | " \n", 521 | " \n", 522 | " \n", 523 | " \n", 524 | " \n", 525 | " \n", 526 | "
num
2018-01-011
2018-01-022
2018-01-033
2018-01-044
2018-01-055
2018-01-066
2018-01-077
2018-01-088
2018-01-099
2018-01-1010
\n", 527 | "
" 528 | ], 529 | "text/plain": [ 530 | " num\n", 531 | "2018-01-01 1\n", 532 | "2018-01-02 2\n", 533 | "2018-01-03 3\n", 534 | "2018-01-04 4\n", 535 | "2018-01-05 5\n", 536 | "2018-01-06 6\n", 537 | "2018-01-07 7\n", 538 | "2018-01-08 8\n", 539 | "2018-01-09 9\n", 540 | "2018-01-10 10" 541 | ] 542 | }, 543 | "execution_count": 5, 544 | "metadata": {}, 545 | "output_type": "execute_result" 546 | } 547 | ], 548 | "source": [ 549 | "#获取2018年的数据\n", 550 | "data[\"2018\"]" 551 | ] 552 | }, 553 | { 554 | "cell_type": "code", 555 | "execution_count": 6, 556 | "metadata": {}, 557 | "outputs": [ 558 | { 559 | "data": { 560 | "text/html": [ 561 | "
\n", 562 | "\n", 575 | "\n", 576 | " \n", 577 | " \n", 578 | " \n", 579 | " \n", 580 | " \n", 581 | " \n", 582 | " \n", 583 | " \n", 584 | " \n", 585 | " \n", 586 | " \n", 587 | " \n", 588 | " \n", 589 | " \n", 590 | " \n", 591 | " \n", 592 | " \n", 593 | " \n", 594 | " \n", 595 | " \n", 596 | " \n", 597 | " \n", 598 | " \n", 599 | " \n", 600 | " \n", 601 | " \n", 602 | " \n", 603 | " \n", 604 | " \n", 605 | " \n", 606 | " \n", 607 | " \n", 608 | " \n", 609 | " \n", 610 | " \n", 611 | " \n", 612 | " \n", 613 | " \n", 614 | " \n", 615 | " \n", 616 | " \n", 617 | " \n", 618 | " \n", 619 | " \n", 620 | " \n", 621 | " \n", 622 | " \n", 623 | " \n", 624 | "
num
2018-01-011
2018-01-022
2018-01-033
2018-01-044
2018-01-055
2018-01-066
2018-01-077
2018-01-088
2018-01-099
2018-01-1010
\n", 625 | "
" 626 | ], 627 | "text/plain": [ 628 | " num\n", 629 | "2018-01-01 1\n", 630 | "2018-01-02 2\n", 631 | "2018-01-03 3\n", 632 | "2018-01-04 4\n", 633 | "2018-01-05 5\n", 634 | "2018-01-06 6\n", 635 | "2018-01-07 7\n", 636 | "2018-01-08 8\n", 637 | "2018-01-09 9\n", 638 | "2018-01-10 10" 639 | ] 640 | }, 641 | "execution_count": 6, 642 | "metadata": {}, 643 | "output_type": "execute_result" 644 | } 645 | ], 646 | "source": [ 647 | "#获取2018年1月份的数据\n", 648 | "data[\"2018-01\"]" 649 | ] 650 | }, 651 | { 652 | "cell_type": "code", 653 | "execution_count": 8, 654 | "metadata": {}, 655 | "outputs": [ 656 | { 657 | "data": { 658 | "text/html": [ 659 | "
\n", 660 | "\n", 673 | "\n", 674 | " \n", 675 | " \n", 676 | " \n", 677 | " \n", 678 | " \n", 679 | " \n", 680 | " \n", 681 | " \n", 682 | " \n", 683 | " \n", 684 | " \n", 685 | " \n", 686 | " \n", 687 | " \n", 688 | " \n", 689 | " \n", 690 | " \n", 691 | " \n", 692 | " \n", 693 | " \n", 694 | " \n", 695 | " \n", 696 | " \n", 697 | " \n", 698 | " \n", 699 | " \n", 700 | " \n", 701 | " \n", 702 | "
num
2018-01-011
2018-01-022
2018-01-033
2018-01-044
2018-01-055
\n", 703 | "
" 704 | ], 705 | "text/plain": [ 706 | " num\n", 707 | "2018-01-01 1\n", 708 | "2018-01-02 2\n", 709 | "2018-01-03 3\n", 710 | "2018-01-04 4\n", 711 | "2018-01-05 5" 712 | ] 713 | }, 714 | "execution_count": 8, 715 | "metadata": {}, 716 | "output_type": "execute_result" 717 | } 718 | ], 719 | "source": [ 720 | "#获取2018年1月1日到2018年1月5日的数据\n", 721 | "data[\"2018-01-01\":\"2018-01-05\"]" 722 | ] 723 | }, 724 | { 725 | "cell_type": "code", 726 | "execution_count": 3, 727 | "metadata": {}, 728 | "outputs": [ 729 | { 730 | "data": { 731 | "text/html": [ 732 | "
\n", 733 | "\n", 746 | "\n", 747 | " \n", 748 | " \n", 749 | " \n", 750 | " \n", 751 | " \n", 752 | " \n", 753 | " \n", 754 | " \n", 755 | " \n", 756 | " \n", 757 | " \n", 758 | " \n", 759 | " \n", 760 | " \n", 761 | " \n", 762 | " \n", 763 | " \n", 764 | " \n", 765 | " \n", 766 | " \n", 767 | " \n", 768 | " \n", 769 | " \n", 770 | " \n", 771 | " \n", 772 | " \n", 773 | " \n", 774 | " \n", 775 | " \n", 776 | " \n", 777 | " \n", 778 | " \n", 779 | " \n", 780 | " \n", 781 | " \n", 782 | " \n", 783 | " \n", 784 | " \n", 785 | " \n", 786 | " \n", 787 | " \n", 788 | " \n", 789 | " \n", 790 | " \n", 791 | " \n", 792 | " \n", 793 | " \n", 794 | " \n", 795 | " \n", 796 | "
订单编号客户姓名唯一识别码年龄成交时间销售ID
1A2李谷102452018-08-092
2A3孙凤103232018-08-101
3A4赵恒1042402018-08-112
4A5王娜105212018-08-113
\n", 797 | "
" 798 | ], 799 | "text/plain": [ 800 | " 订单编号 客户姓名 唯一识别码 年龄 成交时间 销售ID\n", 801 | "1 A2 李谷 102 45 2018-08-09 2\n", 802 | "2 A3 孙凤 103 23 2018-08-10 1\n", 803 | "3 A4 赵恒 104 240 2018-08-11 2\n", 804 | "4 A5 王娜 105 21 2018-08-11 3" 805 | ] 806 | }, 807 | "execution_count": 3, 808 | "metadata": {}, 809 | "output_type": "execute_result" 810 | } 811 | ], 812 | "source": [ 813 | "import pandas as pd\n", 814 | "from datetime import datetime\n", 815 | "df = pd.read_excel(r\"../Data/Chapter06.xlsx\",sheet_name = 4)\n", 816 | "df[df[\"成交时间\"]>datetime(2018,8,8)]" 817 | ] 818 | }, 819 | { 820 | "cell_type": "code", 821 | "execution_count": 4, 822 | "metadata": {}, 823 | "outputs": [ 824 | { 825 | "data": { 826 | "text/html": [ 827 | "
\n", 828 | "\n", 841 | "\n", 842 | " \n", 843 | " \n", 844 | " \n", 845 | " \n", 846 | " \n", 847 | " \n", 848 | " \n", 849 | " \n", 850 | " \n", 851 | " \n", 852 | " \n", 853 | " \n", 854 | " \n", 855 | " \n", 856 | " \n", 857 | " \n", 858 | " \n", 859 | " \n", 860 | " \n", 861 | " \n", 862 | " \n", 863 | " \n", 864 | "
订单编号客户姓名唯一识别码年龄成交时间销售ID
0A1张通101312018-08-081
\n", 865 | "
" 866 | ], 867 | "text/plain": [ 868 | " 订单编号 客户姓名 唯一识别码 年龄 成交时间 销售ID\n", 869 | "0 A1 张通 101 31 2018-08-08 1" 870 | ] 871 | }, 872 | "execution_count": 4, 873 | "metadata": {}, 874 | "output_type": "execute_result" 875 | } 876 | ], 877 | "source": [ 878 | "df[df[\"成交时间\"] == datetime(2018,8,8)]" 879 | ] 880 | }, 881 | { 882 | "cell_type": "code", 883 | "execution_count": 26, 884 | "metadata": {}, 885 | "outputs": [ 886 | { 887 | "data": { 888 | "text/html": [ 889 | "
\n", 890 | "\n", 903 | "\n", 904 | " \n", 905 | " \n", 906 | " \n", 907 | " \n", 908 | " \n", 909 | " \n", 910 | " \n", 911 | " \n", 912 | " \n", 913 | " \n", 914 | " \n", 915 | " \n", 916 | " \n", 917 | " \n", 918 | " \n", 919 | " \n", 920 | " \n", 921 | " \n", 922 | " \n", 923 | " \n", 924 | " \n", 925 | " \n", 926 | "
订单编号客户姓名唯一识别码年龄成交时间销售ID
0A1张通101312018-08-081
\n", 927 | "
" 928 | ], 929 | "text/plain": [ 930 | " 订单编号 客户姓名 唯一识别码 年龄 成交时间 销售ID\n", 931 | "0 A1 张通 101 31 2018-08-08 1" 932 | ] 933 | }, 934 | "execution_count": 26, 935 | "metadata": {}, 936 | "output_type": "execute_result" 937 | } 938 | ], 939 | "source": [ 940 | "df[df[\"成交时间\"]\n", 952 | "\n", 965 | "\n", 966 | " \n", 967 | " \n", 968 | " \n", 969 | " \n", 970 | " \n", 971 | " \n", 972 | " \n", 973 | " \n", 974 | " \n", 975 | " \n", 976 | " \n", 977 | " \n", 978 | " \n", 979 | " \n", 980 | " \n", 981 | " \n", 982 | " \n", 983 | " \n", 984 | " \n", 985 | " \n", 986 | " \n", 987 | " \n", 988 | " \n", 989 | " \n", 990 | " \n", 991 | " \n", 992 | " \n", 993 | " \n", 994 | " \n", 995 | " \n", 996 | " \n", 997 | "
订单编号客户姓名唯一识别码年龄成交时间销售ID
1A2李谷102452018-08-092
2A3孙凤103232018-08-101
\n", 998 | "" 999 | ], 1000 | "text/plain": [ 1001 | " 订单编号 客户姓名 唯一识别码 年龄 成交时间 销售ID\n", 1002 | "1 A2 李谷 102 45 2018-08-09 2\n", 1003 | "2 A3 孙凤 103 23 2018-08-10 1" 1004 | ] 1005 | }, 1006 | "execution_count": 29, 1007 | "metadata": {}, 1008 | "output_type": "execute_result" 1009 | } 1010 | ], 1011 | "source": [ 1012 | "df[(df[\"成交时间\"]>datetime(2018,8,8))&(df[\"成交时间\"]< datetime(2018,8,11))]" 1013 | ] 1014 | }, 1015 | { 1016 | "cell_type": "markdown", 1017 | "metadata": {}, 1018 | "source": [ 1019 | "## 时间运算" 1020 | ] 1021 | }, 1022 | { 1023 | "cell_type": "markdown", 1024 | "metadata": {}, 1025 | "source": [ 1026 | "**两个时间之差**" 1027 | ] 1028 | }, 1029 | { 1030 | "cell_type": "code", 1031 | "execution_count": 30, 1032 | "metadata": {}, 1033 | "outputs": [ 1034 | { 1035 | "data": { 1036 | "text/plain": [ 1037 | "datetime.timedelta(days=2, seconds=83880)" 1038 | ] 1039 | }, 1040 | "execution_count": 30, 1041 | "metadata": {}, 1042 | "output_type": "execute_result" 1043 | } 1044 | ], 1045 | "source": [ 1046 | "cha = datetime(2018,5,21,19,50)-datetime(2018,5,18,20,32)\n", 1047 | "cha" 1048 | ] 1049 | }, 1050 | { 1051 | "cell_type": "code", 1052 | "execution_count": 31, 1053 | "metadata": {}, 1054 | "outputs": [ 1055 | { 1056 | "data": { 1057 | "text/plain": [ 1058 | "2" 1059 | ] 1060 | }, 1061 | "execution_count": 31, 1062 | "metadata": {}, 1063 | "output_type": "execute_result" 1064 | } 1065 | ], 1066 | "source": [ 1067 | "#返回天数\n", 1068 | "cha.days" 1069 | ] 1070 | }, 1071 | { 1072 | "cell_type": "code", 1073 | "execution_count": 33, 1074 | "metadata": {}, 1075 | "outputs": [ 1076 | { 1077 | "data": { 1078 | "text/plain": [ 1079 | "83880" 1080 | ] 1081 | }, 1082 | "execution_count": 33, 1083 | "metadata": {}, 1084 | "output_type": "execute_result" 1085 | } 1086 | ], 1087 | "source": [ 1088 | "#返回秒时差\n", 1089 | "cha.seconds" 1090 | ] 1091 | }, 1092 | { 1093 | "cell_type": "code", 1094 | "execution_count": 35, 1095 | "metadata": {}, 1096 | "outputs": [ 1097 | { 1098 | "data": { 1099 | "text/plain": [ 1100 | "23.3" 1101 | ] 1102 | }, 1103 | "execution_count": 35, 1104 | "metadata": {}, 1105 | "output_type": "execute_result" 1106 | } 1107 | ], 1108 | "source": [ 1109 | "#换算成小时的时间差\n", 1110 | "cha.seconds/3600" 1111 | ] 1112 | }, 1113 | { 1114 | "cell_type": "markdown", 1115 | "metadata": {}, 1116 | "source": [ 1117 | "**时间偏移**\n", 1118 | "- timedelata只能偏移天、秒、微秒\n", 1119 | "- 日期偏移量,可以直接实现天、小时、分钟单位的偏移date offset" 1120 | ] 1121 | }, 1122 | { 1123 | "cell_type": "markdown", 1124 | "metadata": {}, 1125 | "source": [ 1126 | "**timedelate**" 1127 | ] 1128 | }, 1129 | { 1130 | "cell_type": "code", 1131 | "execution_count": 43, 1132 | "metadata": {}, 1133 | "outputs": [ 1134 | { 1135 | "data": { 1136 | "text/plain": [ 1137 | "datetime.datetime(2019, 3, 14, 15, 39, 55, 130084)" 1138 | ] 1139 | }, 1140 | "execution_count": 43, 1141 | "metadata": {}, 1142 | "output_type": "execute_result" 1143 | } 1144 | ], 1145 | "source": [ 1146 | "from datetime import timedelta,datetime\n", 1147 | "date = datetime.now()\n", 1148 | "date" 1149 | ] 1150 | }, 1151 | { 1152 | "cell_type": "code", 1153 | "execution_count": 51, 1154 | "metadata": {}, 1155 | "outputs": [ 1156 | { 1157 | "data": { 1158 | "text/plain": [ 1159 | "datetime.datetime(2019, 3, 15, 15, 39, 55, 130084)" 1160 | ] 1161 | }, 1162 | "execution_count": 51, 1163 | "metadata": {}, 1164 | "output_type": "execute_result" 1165 | } 1166 | ], 1167 | "source": [ 1168 | "#往后推一天\n", 1169 | "date+timedelta(days =1)" 1170 | ] 1171 | }, 1172 | { 1173 | "cell_type": "code", 1174 | "execution_count": 50, 1175 | "metadata": {}, 1176 | "outputs": [ 1177 | { 1178 | "data": { 1179 | "text/plain": [ 1180 | "datetime.datetime(2019, 3, 14, 15, 40, 55, 130084)" 1181 | ] 1182 | }, 1183 | "execution_count": 50, 1184 | "metadata": {}, 1185 | "output_type": "execute_result" 1186 | } 1187 | ], 1188 | "source": [ 1189 | "#往后推60秒\n", 1190 | "date+timedelta(seconds = 60)" 1191 | ] 1192 | }, 1193 | { 1194 | "cell_type": "code", 1195 | "execution_count": 52, 1196 | "metadata": {}, 1197 | "outputs": [ 1198 | { 1199 | "data": { 1200 | "text/plain": [ 1201 | "datetime.datetime(2019, 3, 13, 15, 39, 55, 130084)" 1202 | ] 1203 | }, 1204 | "execution_count": 52, 1205 | "metadata": {}, 1206 | "output_type": "execute_result" 1207 | } 1208 | ], 1209 | "source": [ 1210 | "#往前推一天\n", 1211 | "date - timedelta(days =1)" 1212 | ] 1213 | }, 1214 | { 1215 | "cell_type": "markdown", 1216 | "metadata": {}, 1217 | "source": [ 1218 | "**data offset**" 1219 | ] 1220 | }, 1221 | { 1222 | "cell_type": "code", 1223 | "execution_count": 74, 1224 | "metadata": {}, 1225 | "outputs": [ 1226 | { 1227 | "data": { 1228 | "text/plain": [ 1229 | "datetime.datetime(2019, 3, 14, 15, 57, 32, 786664)" 1230 | ] 1231 | }, 1232 | "execution_count": 74, 1233 | "metadata": {}, 1234 | "output_type": "execute_result" 1235 | } 1236 | ], 1237 | "source": [ 1238 | "from pandas.tseries.offsets import Hour,Minute,Day,MonthEnd\n", 1239 | "date = datetime.now()\n", 1240 | "date" 1241 | ] 1242 | }, 1243 | { 1244 | "cell_type": "code", 1245 | "execution_count": 67, 1246 | "metadata": {}, 1247 | "outputs": [ 1248 | { 1249 | "data": { 1250 | "text/plain": [ 1251 | "Timestamp('2019-03-15 15:54:23.875623')" 1252 | ] 1253 | }, 1254 | "execution_count": 67, 1255 | "metadata": {}, 1256 | "output_type": "execute_result" 1257 | } 1258 | ], 1259 | "source": [ 1260 | "#往后推一天\n", 1261 | "date+Day(1)" 1262 | ] 1263 | }, 1264 | { 1265 | "cell_type": "code", 1266 | "execution_count": 70, 1267 | "metadata": {}, 1268 | "outputs": [ 1269 | { 1270 | "data": { 1271 | "text/plain": [ 1272 | "Timestamp('2019-03-14 16:54:23.875623')" 1273 | ] 1274 | }, 1275 | "execution_count": 70, 1276 | "metadata": {}, 1277 | "output_type": "execute_result" 1278 | } 1279 | ], 1280 | "source": [ 1281 | "#往后推1小时\n", 1282 | "date+Hour(1)" 1283 | ] 1284 | }, 1285 | { 1286 | "cell_type": "code", 1287 | "execution_count": 71, 1288 | "metadata": {}, 1289 | "outputs": [ 1290 | { 1291 | "data": { 1292 | "text/plain": [ 1293 | "Timestamp('2019-03-14 16:04:23.875623')" 1294 | ] 1295 | }, 1296 | "execution_count": 71, 1297 | "metadata": {}, 1298 | "output_type": "execute_result" 1299 | } 1300 | ], 1301 | "source": [ 1302 | "#往后推10分钟\n", 1303 | "date+Minute(10)" 1304 | ] 1305 | }, 1306 | { 1307 | "cell_type": "code", 1308 | "execution_count": 75, 1309 | "metadata": {}, 1310 | "outputs": [ 1311 | { 1312 | "data": { 1313 | "text/plain": [ 1314 | "Timestamp('2019-03-31 15:57:32.786664')" 1315 | ] 1316 | }, 1317 | "execution_count": 75, 1318 | "metadata": {}, 1319 | "output_type": "execute_result" 1320 | } 1321 | ], 1322 | "source": [ 1323 | "#推后到月底\n", 1324 | "date+MonthEnd(1)" 1325 | ] 1326 | } 1327 | ], 1328 | "metadata": { 1329 | "kernelspec": { 1330 | "display_name": "Python 3", 1331 | "language": "python", 1332 | "name": "python3" 1333 | }, 1334 | "language_info": { 1335 | "codemirror_mode": { 1336 | "name": "ipython", 1337 | "version": 3 1338 | }, 1339 | "file_extension": ".py", 1340 | "mimetype": "text/x-python", 1341 | "name": "python", 1342 | "nbconvert_exporter": "python", 1343 | "pygments_lexer": "ipython3", 1344 | "version": "3.7.0" 1345 | }, 1346 | "toc": { 1347 | "base_numbering": 1, 1348 | "nav_menu": {}, 1349 | "number_sections": true, 1350 | "sideBar": true, 1351 | "skip_h1_title": false, 1352 | "title_cell": "Table of Contents", 1353 | "title_sidebar": "第9章 时间序列", 1354 | "toc_cell": false, 1355 | "toc_position": {}, 1356 | "toc_section_display": true, 1357 | "toc_window_display": true 1358 | } 1359 | }, 1360 | "nbformat": 4, 1361 | "nbformat_minor": 2 1362 | } 1363 | -------------------------------------------------------------------------------- /Code/Chapter12 结果导出.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## 导出.xlsx文件" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "**设置文件导出路径**" 15 | ] 16 | }, 17 | { 18 | "cell_type": "code", 19 | "execution_count": 47, 20 | "metadata": {}, 21 | "outputs": [], 22 | "source": [ 23 | "import pandas as pd\n", 24 | "df = pd.read_excel(r\"../Data/Chapter12.xlsx\",sheet_name =0 )\n", 25 | "df.to_excel(excel_writer = r\"C:\\Users\\Administrator\\Excel-Python\\Data\\测试文档01.xlsx\")" 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "metadata": {}, 31 | "source": [ 32 | "**设置Sheet名称**" 33 | ] 34 | }, 35 | { 36 | "cell_type": "code", 37 | "execution_count": 48, 38 | "metadata": {}, 39 | "outputs": [], 40 | "source": [ 41 | "df.to_excel(excel_writer = r\"C:\\Users\\Administrator\\Excel-Python\\Data\\测试文档02.xlsx\",\n", 42 | " sheet_name =\"测试\")" 43 | ] 44 | }, 45 | { 46 | "cell_type": "markdown", 47 | "metadata": {}, 48 | "source": [ 49 | "**设置索引**" 50 | ] 51 | }, 52 | { 53 | "cell_type": "code", 54 | "execution_count": 46, 55 | "metadata": {}, 56 | "outputs": [], 57 | "source": [ 58 | "df.to_excel(excel_writer = r\"C:\\Users\\Administrator\\Excel-Python\\Data\\测试文档03.xlsx\",\n", 59 | " index = False)" 60 | ] 61 | }, 62 | { 63 | "cell_type": "markdown", 64 | "metadata": {}, 65 | "source": [ 66 | "**设置要导出的列**" 67 | ] 68 | }, 69 | { 70 | "cell_type": "code", 71 | "execution_count": 45, 72 | "metadata": {}, 73 | "outputs": [], 74 | "source": [ 75 | "df = pd.read_excel(r\"../Data/Chapter12.xlsx\",sheet_name =0 )\n", 76 | "df.to_excel(excel_writer = r\"C:\\Users\\Administrator\\Excel-Python\\Data\\测试文档04.xlsx\",\n", 77 | " sheet_name = \"测试文档\",\n", 78 | " index=False,columns = [\"用户ID\",\"7月销量\",\"8月销量\",\"9月销量\"])" 79 | ] 80 | }, 81 | { 82 | "cell_type": "markdown", 83 | "metadata": {}, 84 | "source": [ 85 | "**设置编码格式**" 86 | ] 87 | }, 88 | { 89 | "cell_type": "code", 90 | "execution_count": 43, 91 | "metadata": {}, 92 | "outputs": [], 93 | "source": [ 94 | "df.to_excel(excel_writer = r\"C:\\Users\\Administrator\\Excel-Python\\Data\\测试文档05.xlsx\",\n", 95 | " sheet_name = \"测试文档\",\n", 96 | " index = False,\n", 97 | " encoding = \"utf-8\")" 98 | ] 99 | }, 100 | { 101 | "cell_type": "markdown", 102 | "metadata": {}, 103 | "source": [ 104 | "**缺失值处理**" 105 | ] 106 | }, 107 | { 108 | "cell_type": "code", 109 | "execution_count": 42, 110 | "metadata": {}, 111 | "outputs": [], 112 | "source": [ 113 | "df = pd.read_excel(r\"../Data/Chapter12.xlsx\",sheet_name =2)\n", 114 | "df.to_excel(excel_writer = r\"C:\\Users\\Administrator\\Excel-Python\\Data\\测试文档06.xlsx\",\n", 115 | " sheet_name=\"测试文档\",\n", 116 | " index = False,\n", 117 | " encoding = \"utf-8\",\n", 118 | " na_rep = 0 #缺失值填充为0\n", 119 | " )" 120 | ] 121 | }, 122 | { 123 | "cell_type": "markdown", 124 | "metadata": {}, 125 | "source": [ 126 | "**无穷值处理**" 127 | ] 128 | }, 129 | { 130 | "cell_type": "code", 131 | "execution_count": 55, 132 | "metadata": {}, 133 | "outputs": [], 134 | "source": [ 135 | "df = pd.read_excel(r\"../Data/Chapter12.xlsx\",sheet_name =1)\n", 136 | "df.to_excel(excel_writer = r\"C:\\Users\\Administrator\\Excel-Python\\Data\\测试文档07.xlsx\",\n", 137 | " sheet_name = \"测试文档\",\n", 138 | " index = False,\n", 139 | " encoding = \"utf-8\",\n", 140 | " na_rep = 0,\n", 141 | " inf_rep = 0 #无穷值填充为0\n", 142 | " )" 143 | ] 144 | }, 145 | { 146 | "cell_type": "markdown", 147 | "metadata": {}, 148 | "source": [ 149 | "## 导出为 .csv文件" 150 | ] 151 | }, 152 | { 153 | "cell_type": "markdown", 154 | "metadata": {}, 155 | "source": [ 156 | "**设置文件导出路径**" 157 | ] 158 | }, 159 | { 160 | "cell_type": "code", 161 | "execution_count": 82, 162 | "metadata": {}, 163 | "outputs": [], 164 | "source": [ 165 | "df = pd.read_excel(r\"../Data/Chapter12.xlsx\",sheet_name =0)\n", 166 | "df.to_csv(path_or_buf = r\"C:\\Users\\Administrator\\Excel-Python\\Data\\测试文档01.csv\" )" 167 | ] 168 | }, 169 | { 170 | "cell_type": "markdown", 171 | "metadata": {}, 172 | "source": [ 173 | "**设置索引**" 174 | ] 175 | }, 176 | { 177 | "cell_type": "code", 178 | "execution_count": 64, 179 | "metadata": {}, 180 | "outputs": [], 181 | "source": [ 182 | "df.to_csv(path_or_buf = r\"C:\\Users\\Administrator\\Excel-Python\\Data\\测试文档02.csv\",\n", 183 | " index = False )" 184 | ] 185 | }, 186 | { 187 | "cell_type": "markdown", 188 | "metadata": {}, 189 | "source": [ 190 | "**设置导出的列**" 191 | ] 192 | }, 193 | { 194 | "cell_type": "code", 195 | "execution_count": 83, 196 | "metadata": {}, 197 | "outputs": [], 198 | "source": [ 199 | "df.to_csv(path_or_buf = r\"C:\\Users\\Administrator\\Excel-Python\\Data\\测试文档03.csv\" ,\n", 200 | " index= False,\n", 201 | " columns = [\"用户ID\",\"7月销量\",\"8月销量\",\"9月销量\"])" 202 | ] 203 | }, 204 | { 205 | "cell_type": "markdown", 206 | "metadata": {}, 207 | "source": [ 208 | "**设置分隔符号**" 209 | ] 210 | }, 211 | { 212 | "cell_type": "code", 213 | "execution_count": 77, 214 | "metadata": {}, 215 | "outputs": [], 216 | "source": [ 217 | "df.to_csv(path_or_buf = r\"C:\\Users\\Administrator\\Excel-Python\\Data\\测试文档04.csv\" ,\n", 218 | " index= False,\n", 219 | " columns = [\"用户ID\",\"7月销量\",\"8月销量\",\"9月销量\"],\n", 220 | " sep=\",\")" 221 | ] 222 | }, 223 | { 224 | "cell_type": "markdown", 225 | "metadata": {}, 226 | "source": [ 227 | "**缺失值处理**" 228 | ] 229 | }, 230 | { 231 | "cell_type": "code", 232 | "execution_count": 75, 233 | "metadata": {}, 234 | "outputs": [], 235 | "source": [ 236 | "df = pd.read_excel(r\"../Data/Chapter12.xlsx\",sheet_name =2)\n", 237 | "df.to_csv(path_or_buf = r\"C:\\Users\\Administrator\\Excel-Python\\Data\\测试文档05.csv\" ,\n", 238 | " index= False,\n", 239 | " columns = [\"用户ID\",\"7月销量\",\"8月销量\",\"9月销量\"],\n", 240 | " sep=\",\",\n", 241 | " na_rep = 0)" 242 | ] 243 | }, 244 | { 245 | "cell_type": "markdown", 246 | "metadata": {}, 247 | "source": [ 248 | "**设置编码格式**" 249 | ] 250 | }, 251 | { 252 | "cell_type": "code", 253 | "execution_count": 81, 254 | "metadata": {}, 255 | "outputs": [], 256 | "source": [ 257 | "df = pd.read_excel(r\"../Data/Chapter12.xlsx\",sheet_name =2)\n", 258 | "df.to_csv(path_or_buf = r\"C:\\Users\\Administrator\\Excel-Python\\Data\\测试文档06.csv\" ,\n", 259 | " index= False,\n", 260 | " columns = [\"用户ID\",\"7月销量\",\"8月销量\",\"9月销量\"],\n", 261 | " sep=\",\",\n", 262 | " na_rep = 0,\n", 263 | " encoding = \"gbk\" #设置为gbk或者utf-8-sig\n", 264 | " )" 265 | ] 266 | }, 267 | { 268 | "cell_type": "markdown", 269 | "metadata": {}, 270 | "source": [ 271 | "## 将文件导出到多个Sheet" 272 | ] 273 | }, 274 | { 275 | "cell_type": "code", 276 | "execution_count": 80, 277 | "metadata": {}, 278 | "outputs": [], 279 | "source": [ 280 | "df1 = pd.read_excel(r\"../Data/Chapter12.xlsx\",sheet_name =0)\n", 281 | "df2 = pd.read_excel(r\"../Data/Chapter12.xlsx\",sheet_name =1)\n", 282 | "df3 = pd.read_excel(r\"../Data/Chapter12.xlsx\",sheet_name =2)\n", 283 | "#声明一个对象\n", 284 | "writer = pd.ExcelWriter(r\"C:\\Users\\Administrator\\Excel-Python\\Data\\test02.xlsx\",\n", 285 | " engine = \"xlsxwriter\")\n", 286 | "#将df1、df2、df3写入Excel中的sheet1、sheet2、sheet3\n", 287 | "#重命名表1、表2、表3\n", 288 | "df1.to_excel(writer,sheet_name =\"表1\")\n", 289 | "df2.to_excel(writer,sheet_name =\"表2\")\n", 290 | "df3.to_excel(writer,sheet_name =\"表3\")\n", 291 | "#保存读写的内容\n", 292 | "writer.save()" 293 | ] 294 | } 295 | ], 296 | "metadata": { 297 | "kernelspec": { 298 | "display_name": "Python 3", 299 | "language": "python", 300 | "name": "python3" 301 | }, 302 | "language_info": { 303 | "codemirror_mode": { 304 | "name": "ipython", 305 | "version": 3 306 | }, 307 | "file_extension": ".py", 308 | "mimetype": "text/x-python", 309 | "name": "python", 310 | "nbconvert_exporter": "python", 311 | "pygments_lexer": "ipython3", 312 | "version": "3.7.0" 313 | }, 314 | "toc": { 315 | "base_numbering": 1, 316 | "nav_menu": {}, 317 | "number_sections": true, 318 | "sideBar": true, 319 | "skip_h1_title": false, 320 | "title_cell": "Table of Contents", 321 | "title_sidebar": "第12章 结果导出", 322 | "toc_cell": false, 323 | "toc_position": {}, 324 | "toc_section_display": true, 325 | "toc_window_display": true 326 | } 327 | }, 328 | "nbformat": 4, 329 | "nbformat_minor": 2 330 | } 331 | -------------------------------------------------------------------------------- /Data/Chapter04.1.csv: -------------------------------------------------------------------------------- 1 | 编号 年龄 性别 注册时间 2 | A1 54 男 2018/8/8 3 | A2 16 女 2018/8/9 4 | A3 47 女 2018/8/10 5 | A4 41 男 2018/8/11 6 | -------------------------------------------------------------------------------- /Data/Chapter04.csv: -------------------------------------------------------------------------------- 1 | 编号,年龄,性别,注册时间 2 | A1,54,男,2018/8/8 3 | A2,16,女,2018/8/9 4 | A3,47,女,2018/8/10 5 | A4,41,男,2018/8/11 6 | -------------------------------------------------------------------------------- /Data/Chapter04.txt: -------------------------------------------------------------------------------- 1 | 编号,年龄,性别,注册时间 2 | A1,54,男,2018/8/8 3 | A2,16,女,2018/8/9 4 | A3,47,女,2018/8/10 5 | A4,41,男,2018/8/11 6 | -------------------------------------------------------------------------------- /Data/Chapter04.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Data/Chapter04.xlsx -------------------------------------------------------------------------------- /Data/Chapter05.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Data/Chapter05.xlsx -------------------------------------------------------------------------------- /Data/Chapter06.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Data/Chapter06.xlsx -------------------------------------------------------------------------------- /Data/Chapter07.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Data/Chapter07.xlsx -------------------------------------------------------------------------------- /Data/Chapter08.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Data/Chapter08.xlsx -------------------------------------------------------------------------------- /Data/Chapter10.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Data/Chapter10.xlsx -------------------------------------------------------------------------------- /Data/Chapter11.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Data/Chapter11.xlsx -------------------------------------------------------------------------------- /Data/Chapter12.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Data/Chapter12.xlsx -------------------------------------------------------------------------------- /Data/fillna.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Data/fillna.xlsx -------------------------------------------------------------------------------- /Data/loan.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Data/loan.csv -------------------------------------------------------------------------------- /Data/order-14.1.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Data/order-14.1.csv -------------------------------------------------------------------------------- /Data/order-14.3.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Data/order-14.3.csv -------------------------------------------------------------------------------- /Data/train-pivot.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Data/train-pivot.csv -------------------------------------------------------------------------------- /Data/数据集使用说明.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Data/数据集使用说明.txt -------------------------------------------------------------------------------- /Note/Git Fork开源项目如何同步更新.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Note/Git Fork开源项目如何同步更新.pdf -------------------------------------------------------------------------------- /Note/Markdown常用标签.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Note/Markdown常用标签.pdf -------------------------------------------------------------------------------- /Note/jupyter notebook导出pdf并支持中文.md: -------------------------------------------------------------------------------- 1 | **Jupyter Notebook**是很好的数据科学创作环境,反正我做数据分析的项目或小练习的时候,基本都是在用jupyter notebook(原先是叫ipython notebook,所以现在文件后缀还是.ipynb),以前不怎么用到导出pdf功能,然后要用的时候就遇到很多坑了。jupyter提供导出的格式有.py、.html、.md、.pdf等。 2 | 3 | ![jupyter notebook支持的导出格式](https://upload-images.jianshu.io/upload_images/2473543-b37f85b5584364b9.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240) 4 | 5 | 从效果来看,网页中notebook的渲染是最好看的,导出的html对代码和超链接失真严重。在网页上点*Download as -> PDF via LaTex*的时候先是说缺少Pandoc库,于是pip install pandoc,之后不再说缺少这个库了,而是 6 | nbconvert failed: pdflatex not found on PATH 或者 nbconvert failed: PDF creating failed, captured latex output。查了一些资料后改用命令行,要避免*'xelatex' 不是内部或外部命令,也不是可运行的程序或批处理文件*,需要先安装MiKTeX,在其[官网下载](https://miktex.org/download)后,Windows版一路next安装就行,安装包有190MB,安装过程还是耗费些时间的,下载安装完成之后的步骤是: 7 | 8 | ### 1, ipynb文件编译为tex 9 | 在命令行中定位到要转换的jupyter文件的路径下,输入 10 | **jupyter nbconvert --to latex yourNotebookName.ipynb** 11 | 12 | ![编译ipynb文件为LaTeX文件](https://upload-images.jianshu.io/upload_images/2473543-3066970796a6043b.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240) 13 | 在文件目录下就可以看到一个叫**yourNotebookName.tex**的LaTeX文件了。 14 | ### 2, 手动编辑latex文件 15 | 为了能支持输出中文,需要改一下tex文件,在编辑器(我用的是Notepad++)打开刚才生成的LaTeX文件, 16 | 在**\documentclass{article}**(没有这一句就在\documentclass[11pt]{ctexart} 的后面插入下面的语句)后面插入 17 | ```latex 18 | \usepackage{fontspec, xunicode, xltxtra} 19 | \setmainfont{Microsoft YaHei} 20 | \usepackage{ctex} 21 | ``` 22 | ![修改latex文件](https://upload-images.jianshu.io/upload_images/2473543-898fdf8271689505.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240) 23 | 24 | ### 3, 转latex为pdf 25 | 随后在命令行下输入:(我演示文件用的是GeoCluster.tex) 26 | ``` 27 | xelatex yourNotebookName.tex 28 | ``` 29 | ![命令行转latex为pdf](https://upload-images.jianshu.io/upload_images/2473543-6624da52f9d4d9d1.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240) 30 | 之前没有运行过xelatex,首次运行会安装一些依赖文件,会慢一些,最后运行完毕: 31 | ![运行完xelatex命令](https://upload-images.jianshu.io/upload_images/2473543-192ac8f3fe434b96.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240) 32 | 可以在文件夹下看到输出的文件: 33 | ![最后文件夹下的结果](https://upload-images.jianshu.io/upload_images/2473543-c7f89da3bad6866f.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240) 34 | - .ipynb 是我们的jupyter文件 35 | - .tex 是由jupyter notebook文件生成的 36 | - .pdf 是我们最后的目标文件由.tex文件生成 37 | - .log、.out、.aux是LaTex生成pdf的一些输出和日志 38 | 39 | 总结一下,从jupyter notebook生成pdf文件需要的依赖项还是比较多的,Windows下安装MiKTeX才能用xelatex命令。生成步骤是先把ipynb文件编译为LaTex,然后为了支持中文修改一下lex文件,最后转换为pdf文件。 40 | 41 | 最后效果如下,虽然还是比不上网页端.ipynb的直接渲染效果,但比起导出的html等格式,更好地作为展示格式。 42 | ![生成pdf的效果](https://upload-images.jianshu.io/upload_images/2473543-036c476dcddbbca0.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240) 43 | 44 | ps: 45 | - 现在觉得下载安装部分说得有些简略,之后可以把这部分说得更详细; 46 | - 原文[简书链接](https://www.jianshu.com/p/6b84a9631f8a) 47 | - [MiKTeX 中文支持的解决方案](https://jingyan.baidu.com/article/ff411625e229d512e482379c.html) 48 | - [ipython notebook导出含有中文的pdf文件](https://blog.csdn.net/weixin_42114013/article/details/81106797) -------------------------------------------------------------------------------- /Note/pandas填充缺失值fillna()函数.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## **pandas填充缺失值fillna()函数** \n", 8 | "缺失值的填充在平时做数据处理的时候非常常见,fillna()函数常用的参数有8个: \n", 9 | "- 用常数填充\n", 10 | "- 用字典填充\n", 11 | "- 用计算公式填充\n", 12 | "- 使用具体某一列填充\n", 13 | "- 缺失值等于前面/后面一个值\n", 14 | "- 限定填充个数\n", 15 | "- 填充分享设定\n", 16 | "- 更改数据源" 17 | ] 18 | }, 19 | { 20 | "cell_type": "code", 21 | "execution_count": 12, 22 | "metadata": {}, 23 | "outputs": [ 24 | { 25 | "data": { 26 | "text/html": [ 27 | "
\n", 28 | "\n", 41 | "\n", 42 | " \n", 43 | " \n", 44 | " \n", 45 | " \n", 46 | " \n", 47 | " \n", 48 | " \n", 49 | " \n", 50 | " \n", 51 | " \n", 52 | " \n", 53 | " \n", 54 | " \n", 55 | " \n", 56 | " \n", 57 | " \n", 58 | " \n", 59 | " \n", 60 | " \n", 61 | " \n", 62 | " \n", 63 | " \n", 64 | " \n", 65 | " \n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | "
名次姓名语文数学外语
01郭靖90.080.076
12黄蓉100.0100.098
23黄药师NaN98.0100
34欧阳锋NaN95.085
45洪七公98.0NaN96
55周伯通88.091.088
\n", 103 | "
" 104 | ], 105 | "text/plain": [ 106 | " 名次 姓名 语文 数学 外语\n", 107 | "0 1 郭靖 90.0 80.0 76\n", 108 | "1 2 黄蓉 100.0 100.0 98\n", 109 | "2 3 黄药师 NaN 98.0 100\n", 110 | "3 4 欧阳锋 NaN 95.0 85\n", 111 | "4 5 洪七公 98.0 NaN 96\n", 112 | "5 5 周伯通 88.0 91.0 88" 113 | ] 114 | }, 115 | "execution_count": 12, 116 | "metadata": {}, 117 | "output_type": "execute_result" 118 | } 119 | ], 120 | "source": [ 121 | "import pandas as pd\n", 122 | "df = pd.read_excel(\"../Data/fillna.xlsx\")\n", 123 | "df" 124 | ] 125 | }, 126 | { 127 | "cell_type": "markdown", 128 | "metadata": {}, 129 | "source": [ 130 | "### 用常数填充" 131 | ] 132 | }, 133 | { 134 | "cell_type": "code", 135 | "execution_count": 13, 136 | "metadata": {}, 137 | "outputs": [ 138 | { 139 | "data": { 140 | "text/html": [ 141 | "
\n", 142 | "\n", 155 | "\n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | "
名次姓名语文数学外语
01郭靖90.080.076
12黄蓉100.0100.098
23黄药师0.098.0100
34欧阳锋0.095.085
45洪七公98.00.096
55周伯通88.091.088
\n", 217 | "
" 218 | ], 219 | "text/plain": [ 220 | " 名次 姓名 语文 数学 外语\n", 221 | "0 1 郭靖 90.0 80.0 76\n", 222 | "1 2 黄蓉 100.0 100.0 98\n", 223 | "2 3 黄药师 0.0 98.0 100\n", 224 | "3 4 欧阳锋 0.0 95.0 85\n", 225 | "4 5 洪七公 98.0 0.0 96\n", 226 | "5 5 周伯通 88.0 91.0 88" 227 | ] 228 | }, 229 | "execution_count": 13, 230 | "metadata": {}, 231 | "output_type": "execute_result" 232 | } 233 | ], 234 | "source": [ 235 | "df.fillna(0)" 236 | ] 237 | }, 238 | { 239 | "cell_type": "markdown", 240 | "metadata": {}, 241 | "source": [ 242 | "### 用字典填充" 243 | ] 244 | }, 245 | { 246 | "cell_type": "code", 247 | "execution_count": 23, 248 | "metadata": {}, 249 | "outputs": [ 250 | { 251 | "data": { 252 | "text/html": [ 253 | "
\n", 254 | "\n", 267 | "\n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | " \n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | "
名次姓名语文数学外语
01郭靖90.080.076
12黄蓉100.0100.098
23黄药师80.098.0100
34欧阳锋80.095.085
45洪七公98.090.096
55周伯通88.091.088
\n", 329 | "
" 330 | ], 331 | "text/plain": [ 332 | " 名次 姓名 语文 数学 外语\n", 333 | "0 1 郭靖 90.0 80.0 76\n", 334 | "1 2 黄蓉 100.0 100.0 98\n", 335 | "2 3 黄药师 80.0 98.0 100\n", 336 | "3 4 欧阳锋 80.0 95.0 85\n", 337 | "4 5 洪七公 98.0 90.0 96\n", 338 | "5 5 周伯通 88.0 91.0 88" 339 | ] 340 | }, 341 | "execution_count": 23, 342 | "metadata": {}, 343 | "output_type": "execute_result" 344 | } 345 | ], 346 | "source": [ 347 | "df.fillna({\"语文\":80,\"数学\":90})" 348 | ] 349 | }, 350 | { 351 | "cell_type": "markdown", 352 | "metadata": {}, 353 | "source": [ 354 | "### 用计算公式填充" 355 | ] 356 | }, 357 | { 358 | "cell_type": "code", 359 | "execution_count": 24, 360 | "metadata": {}, 361 | "outputs": [ 362 | { 363 | "data": { 364 | "text/html": [ 365 | "
\n", 366 | "\n", 379 | "\n", 380 | " \n", 381 | " \n", 382 | " \n", 383 | " \n", 384 | " \n", 385 | " \n", 386 | " \n", 387 | " \n", 388 | " \n", 389 | " \n", 390 | " \n", 391 | " \n", 392 | " \n", 393 | " \n", 394 | " \n", 395 | " \n", 396 | " \n", 397 | " \n", 398 | " \n", 399 | " \n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | " \n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | " \n", 423 | " \n", 424 | " \n", 425 | " \n", 426 | " \n", 427 | " \n", 428 | " \n", 429 | " \n", 430 | " \n", 431 | " \n", 432 | " \n", 433 | " \n", 434 | " \n", 435 | " \n", 436 | " \n", 437 | " \n", 438 | " \n", 439 | " \n", 440 | "
名次姓名语文数学外语
01郭靖90.080.076
12黄蓉100.0100.098
23黄药师94.098.0100
34欧阳锋94.095.085
45洪七公98.092.896
55周伯通88.091.088
\n", 441 | "
" 442 | ], 443 | "text/plain": [ 444 | " 名次 姓名 语文 数学 外语\n", 445 | "0 1 郭靖 90.0 80.0 76\n", 446 | "1 2 黄蓉 100.0 100.0 98\n", 447 | "2 3 黄药师 94.0 98.0 100\n", 448 | "3 4 欧阳锋 94.0 95.0 85\n", 449 | "4 5 洪七公 98.0 92.8 96\n", 450 | "5 5 周伯通 88.0 91.0 88" 451 | ] 452 | }, 453 | "execution_count": 24, 454 | "metadata": {}, 455 | "output_type": "execute_result" 456 | } 457 | ], 458 | "source": [ 459 | "df.fillna(df.mean())" 460 | ] 461 | }, 462 | { 463 | "cell_type": "code", 464 | "execution_count": 25, 465 | "metadata": {}, 466 | "outputs": [ 467 | { 468 | "data": { 469 | "text/html": [ 470 | "
\n", 471 | "\n", 484 | "\n", 485 | " \n", 486 | " \n", 487 | " \n", 488 | " \n", 489 | " \n", 490 | " \n", 491 | " \n", 492 | " \n", 493 | " \n", 494 | " \n", 495 | " \n", 496 | " \n", 497 | " \n", 498 | " \n", 499 | " \n", 500 | " \n", 501 | " \n", 502 | " \n", 503 | " \n", 504 | " \n", 505 | " \n", 506 | " \n", 507 | " \n", 508 | " \n", 509 | " \n", 510 | " \n", 511 | " \n", 512 | " \n", 513 | " \n", 514 | " \n", 515 | " \n", 516 | " \n", 517 | " \n", 518 | " \n", 519 | " \n", 520 | " \n", 521 | " \n", 522 | " \n", 523 | " \n", 524 | " \n", 525 | " \n", 526 | " \n", 527 | " \n", 528 | " \n", 529 | " \n", 530 | " \n", 531 | " \n", 532 | " \n", 533 | " \n", 534 | " \n", 535 | " \n", 536 | " \n", 537 | " \n", 538 | " \n", 539 | " \n", 540 | " \n", 541 | " \n", 542 | " \n", 543 | " \n", 544 | " \n", 545 | "
名次姓名语文数学外语
01郭靖90.080.076
12黄蓉100.0100.098
23黄药师376.098.0100
34欧阳锋376.095.085
45洪七公98.0464.096
55周伯通88.091.088
\n", 546 | "
" 547 | ], 548 | "text/plain": [ 549 | " 名次 姓名 语文 数学 外语\n", 550 | "0 1 郭靖 90.0 80.0 76\n", 551 | "1 2 黄蓉 100.0 100.0 98\n", 552 | "2 3 黄药师 376.0 98.0 100\n", 553 | "3 4 欧阳锋 376.0 95.0 85\n", 554 | "4 5 洪七公 98.0 464.0 96\n", 555 | "5 5 周伯通 88.0 91.0 88" 556 | ] 557 | }, 558 | "execution_count": 25, 559 | "metadata": {}, 560 | "output_type": "execute_result" 561 | } 562 | ], 563 | "source": [ 564 | "df.fillna(df.sum())" 565 | ] 566 | }, 567 | { 568 | "cell_type": "markdown", 569 | "metadata": {}, 570 | "source": [ 571 | "### 使用具体某一列填充" 572 | ] 573 | }, 574 | { 575 | "cell_type": "code", 576 | "execution_count": 17, 577 | "metadata": {}, 578 | "outputs": [ 579 | { 580 | "data": { 581 | "text/html": [ 582 | "
\n", 583 | "\n", 596 | "\n", 597 | " \n", 598 | " \n", 599 | " \n", 600 | " \n", 601 | " \n", 602 | " \n", 603 | " \n", 604 | " \n", 605 | " \n", 606 | " \n", 607 | " \n", 608 | " \n", 609 | " \n", 610 | " \n", 611 | " \n", 612 | " \n", 613 | " \n", 614 | " \n", 615 | " \n", 616 | " \n", 617 | " \n", 618 | " \n", 619 | " \n", 620 | " \n", 621 | " \n", 622 | " \n", 623 | " \n", 624 | " \n", 625 | " \n", 626 | " \n", 627 | " \n", 628 | " \n", 629 | " \n", 630 | " \n", 631 | " \n", 632 | " \n", 633 | " \n", 634 | " \n", 635 | " \n", 636 | " \n", 637 | " \n", 638 | " \n", 639 | " \n", 640 | " \n", 641 | " \n", 642 | " \n", 643 | " \n", 644 | " \n", 645 | " \n", 646 | " \n", 647 | " \n", 648 | " \n", 649 | " \n", 650 | " \n", 651 | " \n", 652 | " \n", 653 | " \n", 654 | " \n", 655 | " \n", 656 | " \n", 657 | "
名次姓名语文数学外语
01郭靖90.080.076
12黄蓉100.0100.098
23黄药师90.598.0100
34欧阳锋90.595.085
45洪七公98.090.596
55周伯通88.091.088
\n", 658 | "
" 659 | ], 660 | "text/plain": [ 661 | " 名次 姓名 语文 数学 外语\n", 662 | "0 1 郭靖 90.0 80.0 76\n", 663 | "1 2 黄蓉 100.0 100.0 98\n", 664 | "2 3 黄药师 90.5 98.0 100\n", 665 | "3 4 欧阳锋 90.5 95.0 85\n", 666 | "4 5 洪七公 98.0 90.5 96\n", 667 | "5 5 周伯通 88.0 91.0 88" 668 | ] 669 | }, 670 | "execution_count": 17, 671 | "metadata": {}, 672 | "output_type": "execute_result" 673 | } 674 | ], 675 | "source": [ 676 | "df.fillna(df.mean()['外语'])" 677 | ] 678 | }, 679 | { 680 | "cell_type": "markdown", 681 | "metadata": {}, 682 | "source": [ 683 | "### 缺失值等于前面/后面一个值 \n", 684 | "通过指定参数method的值来设定: \n", 685 | "- mothod = \"ffill/pad\" 用前一个非缺失值去填充该缺失值\n", 686 | "- mothod = \"bflii/backfill\"用下一个非缺失值填充该缺失值" 687 | ] 688 | }, 689 | { 690 | "cell_type": "code", 691 | "execution_count": 18, 692 | "metadata": {}, 693 | "outputs": [ 694 | { 695 | "data": { 696 | "text/html": [ 697 | "
\n", 698 | "\n", 711 | "\n", 712 | " \n", 713 | " \n", 714 | " \n", 715 | " \n", 716 | " \n", 717 | " \n", 718 | " \n", 719 | " \n", 720 | " \n", 721 | " \n", 722 | " \n", 723 | " \n", 724 | " \n", 725 | " \n", 726 | " \n", 727 | " \n", 728 | " \n", 729 | " \n", 730 | " \n", 731 | " \n", 732 | " \n", 733 | " \n", 734 | " \n", 735 | " \n", 736 | " \n", 737 | " \n", 738 | " \n", 739 | " \n", 740 | " \n", 741 | " \n", 742 | " \n", 743 | " \n", 744 | " \n", 745 | " \n", 746 | " \n", 747 | " \n", 748 | " \n", 749 | " \n", 750 | " \n", 751 | " \n", 752 | " \n", 753 | " \n", 754 | " \n", 755 | " \n", 756 | " \n", 757 | " \n", 758 | " \n", 759 | " \n", 760 | " \n", 761 | " \n", 762 | " \n", 763 | " \n", 764 | " \n", 765 | " \n", 766 | " \n", 767 | " \n", 768 | " \n", 769 | " \n", 770 | " \n", 771 | " \n", 772 | "
名次姓名语文数学外语
01郭靖90.080.076
12黄蓉100.0100.098
23黄药师100.098.0100
34欧阳锋100.095.085
45洪七公98.095.096
55周伯通88.091.088
\n", 773 | "
" 774 | ], 775 | "text/plain": [ 776 | " 名次 姓名 语文 数学 外语\n", 777 | "0 1 郭靖 90.0 80.0 76\n", 778 | "1 2 黄蓉 100.0 100.0 98\n", 779 | "2 3 黄药师 100.0 98.0 100\n", 780 | "3 4 欧阳锋 100.0 95.0 85\n", 781 | "4 5 洪七公 98.0 95.0 96\n", 782 | "5 5 周伯通 88.0 91.0 88" 783 | ] 784 | }, 785 | "execution_count": 18, 786 | "metadata": {}, 787 | "output_type": "execute_result" 788 | } 789 | ], 790 | "source": [ 791 | "df.fillna(method=\"ffill\")" 792 | ] 793 | }, 794 | { 795 | "cell_type": "markdown", 796 | "metadata": {}, 797 | "source": [ 798 | "### 限定填充个数" 799 | ] 800 | }, 801 | { 802 | "cell_type": "code", 803 | "execution_count": 26, 804 | "metadata": {}, 805 | "outputs": [ 806 | { 807 | "data": { 808 | "text/html": [ 809 | "
\n", 810 | "\n", 823 | "\n", 824 | " \n", 825 | " \n", 826 | " \n", 827 | " \n", 828 | " \n", 829 | " \n", 830 | " \n", 831 | " \n", 832 | " \n", 833 | " \n", 834 | " \n", 835 | " \n", 836 | " \n", 837 | " \n", 838 | " \n", 839 | " \n", 840 | " \n", 841 | " \n", 842 | " \n", 843 | " \n", 844 | " \n", 845 | " \n", 846 | " \n", 847 | " \n", 848 | " \n", 849 | " \n", 850 | " \n", 851 | " \n", 852 | " \n", 853 | " \n", 854 | " \n", 855 | " \n", 856 | " \n", 857 | " \n", 858 | " \n", 859 | " \n", 860 | " \n", 861 | " \n", 862 | " \n", 863 | " \n", 864 | " \n", 865 | " \n", 866 | " \n", 867 | " \n", 868 | " \n", 869 | " \n", 870 | " \n", 871 | " \n", 872 | " \n", 873 | " \n", 874 | " \n", 875 | " \n", 876 | " \n", 877 | " \n", 878 | " \n", 879 | " \n", 880 | " \n", 881 | " \n", 882 | " \n", 883 | " \n", 884 | "
名次姓名语文数学外语
01郭靖90.080.076
12黄蓉100.0100.098
23黄药师NaN98.0100
34欧阳锋98.095.085
45洪七公98.091.096
55周伯通88.091.088
\n", 885 | "
" 886 | ], 887 | "text/plain": [ 888 | " 名次 姓名 语文 数学 外语\n", 889 | "0 1 郭靖 90.0 80.0 76\n", 890 | "1 2 黄蓉 100.0 100.0 98\n", 891 | "2 3 黄药师 NaN 98.0 100\n", 892 | "3 4 欧阳锋 98.0 95.0 85\n", 893 | "4 5 洪七公 98.0 91.0 96\n", 894 | "5 5 周伯通 88.0 91.0 88" 895 | ] 896 | }, 897 | "execution_count": 26, 898 | "metadata": {}, 899 | "output_type": "execute_result" 900 | } 901 | ], 902 | "source": [ 903 | "df.fillna(method='bfill', limit=1)" 904 | ] 905 | }, 906 | { 907 | "cell_type": "markdown", 908 | "metadata": {}, 909 | "source": [ 910 | "### 使用左边或右边的填充指定axis参数" 911 | ] 912 | }, 913 | { 914 | "cell_type": "code", 915 | "execution_count": 21, 916 | "metadata": {}, 917 | "outputs": [ 918 | { 919 | "data": { 920 | "text/html": [ 921 | "
\n", 922 | "\n", 935 | "\n", 936 | " \n", 937 | " \n", 938 | " \n", 939 | " \n", 940 | " \n", 941 | " \n", 942 | " \n", 943 | " \n", 944 | " \n", 945 | " \n", 946 | " \n", 947 | " \n", 948 | " \n", 949 | " \n", 950 | " \n", 951 | " \n", 952 | " \n", 953 | " \n", 954 | " \n", 955 | " \n", 956 | " \n", 957 | " \n", 958 | " \n", 959 | " \n", 960 | " \n", 961 | " \n", 962 | " \n", 963 | " \n", 964 | " \n", 965 | " \n", 966 | " \n", 967 | " \n", 968 | " \n", 969 | " \n", 970 | " \n", 971 | " \n", 972 | " \n", 973 | " \n", 974 | " \n", 975 | " \n", 976 | " \n", 977 | " \n", 978 | " \n", 979 | " \n", 980 | " \n", 981 | " \n", 982 | " \n", 983 | " \n", 984 | " \n", 985 | " \n", 986 | " \n", 987 | " \n", 988 | " \n", 989 | " \n", 990 | " \n", 991 | " \n", 992 | " \n", 993 | " \n", 994 | " \n", 995 | " \n", 996 | "
名次姓名语文数学外语
01郭靖908076
12黄蓉10010098
23黄药师9898100
34欧阳锋959585
45洪七公989696
55周伯通889188
\n", 997 | "
" 998 | ], 999 | "text/plain": [ 1000 | " 名次 姓名 语文 数学 外语\n", 1001 | "0 1 郭靖 90 80 76\n", 1002 | "1 2 黄蓉 100 100 98\n", 1003 | "2 3 黄药师 98 98 100\n", 1004 | "3 4 欧阳锋 95 95 85\n", 1005 | "4 5 洪七公 98 96 96\n", 1006 | "5 5 周伯通 88 91 88" 1007 | ] 1008 | }, 1009 | "execution_count": 21, 1010 | "metadata": {}, 1011 | "output_type": "execute_result" 1012 | } 1013 | ], 1014 | "source": [ 1015 | "df.fillna(method='bfill', axis=1)" 1016 | ] 1017 | }, 1018 | { 1019 | "cell_type": "markdown", 1020 | "metadata": {}, 1021 | "source": [ 1022 | "### 更改数据源添加参数inplace = True \n", 1023 | "以上的7个参数都是没有改变源数据的,如果要改变源数据的话需要添加参数inplace = True即可。" 1024 | ] 1025 | } 1026 | ], 1027 | "metadata": { 1028 | "kernelspec": { 1029 | "display_name": "Python 3", 1030 | "language": "python", 1031 | "name": "python3" 1032 | }, 1033 | "language_info": { 1034 | "codemirror_mode": { 1035 | "name": "ipython", 1036 | "version": 3 1037 | }, 1038 | "file_extension": ".py", 1039 | "mimetype": "text/x-python", 1040 | "name": "python", 1041 | "nbconvert_exporter": "python", 1042 | "pygments_lexer": "ipython3", 1043 | "version": "3.7.1" 1044 | }, 1045 | "toc": { 1046 | "base_numbering": 1, 1047 | "nav_menu": {}, 1048 | "number_sections": true, 1049 | "sideBar": true, 1050 | "skip_h1_title": false, 1051 | "title_cell": "Table of Contents", 1052 | "title_sidebar": "Contents", 1053 | "toc_cell": false, 1054 | "toc_position": {}, 1055 | "toc_section_display": true, 1056 | "toc_window_display": false 1057 | } 1058 | }, 1059 | "nbformat": 4, 1060 | "nbformat_minor": 2 1061 | } 1062 | -------------------------------------------------------------------------------- /Note/如何给 github 的开源项目提交 pull request.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Note/如何给 github 的开源项目提交 pull request.pdf -------------------------------------------------------------------------------- /Note/常见的Python代码报错及解决方案.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Note/常见的Python代码报错及解决方案.pdf -------------------------------------------------------------------------------- /Other/01 Pyecharts渲染图表 .ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "**pyecharts 库的基本使用用法**" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "## 安装pyecharts \n", 15 | "pip install pyecharts " 16 | ] 17 | }, 18 | { 19 | "cell_type": "markdown", 20 | "metadata": {}, 21 | "source": [ 22 | "## 开始使用" 23 | ] 24 | }, 25 | { 26 | "cell_type": "code", 27 | "execution_count": 64, 28 | "metadata": {}, 29 | "outputs": [], 30 | "source": [ 31 | "from pyecharts import Bar\n", 32 | "from pyecharts import Bar\n", 33 | "\n", 34 | "df = pd.read_excel(r\"./Pyecharts.xlsx\")\n", 35 | "brands = df['品牌'].values\n", 36 | "solds = df['已售'].values\n", 37 | "bar = Bar(\"汽车各品牌销量\", \"这里是测试数据\")\n", 38 | "bar.add(\"销量\", brands, solds)\n", 39 | "# bar.print_echarts_options() # 该行只为了打印配置项,方便调试时使用\n", 40 | "bar.render(\"./html/start.html\") # 生成本地 HTML 文件" 41 | ] 42 | }, 43 | { 44 | "cell_type": "markdown", 45 | "metadata": {}, 46 | "source": [ 47 | "![图片](html/images/start.png)" 48 | ] 49 | }, 50 | { 51 | "cell_type": "markdown", 52 | "metadata": {}, 53 | "source": [ 54 | "- add():主要方法,用于添加图表的数据和设置各种配置项\n", 55 | "- print_echarts_options():打印输出图表的所有配置项\n", 56 | "- render():默认将会在根目录下生成一个 render.html 的文件,支持 path 参数,设置文件保存位置,如 render(r\"e:\\my_first_chart.html\"),文件用浏览器打开。 \n", 57 | "**Note:**可以按右边的下载按钮将图片下载到本地,如果想要提供更多实用工具按钮,请在 add() 中设置 is_more_utils 为 True" 58 | ] 59 | }, 60 | { 61 | "cell_type": "markdown", 62 | "metadata": {}, 63 | "source": [ 64 | "## 使用主题" 65 | ] 66 | }, 67 | { 68 | "cell_type": "code", 69 | "execution_count": 31, 70 | "metadata": {}, 71 | "outputs": [], 72 | "source": [ 73 | "from pyecharts import Bar\n", 74 | "from pyecharts import Bar\n", 75 | "\n", 76 | "df = pd.read_excel(r\"./Pyecharts.xlsx\")\n", 77 | "brands = df['品牌'].values\n", 78 | "solds = df['已售'].values\n", 79 | "bar = Bar(\"汽车各品牌销量\", \"这里是测试数据\")\n", 80 | "bar.use_theme('dark')\n", 81 | "bar.add(\"销量\", brands, solds)\n", 82 | "bar.render(\"./html/dark.html\")" 83 | ] 84 | }, 85 | { 86 | "cell_type": "markdown", 87 | "metadata": {}, 88 | "source": [ 89 | "![图片](html/images/dark.png)" 90 | ] 91 | }, 92 | { 93 | "cell_type": "markdown", 94 | "metadata": {}, 95 | "source": [ 96 | "## 使用 pyecharts-snapshot 插件 \n", 97 | "如果想直接将图片保存为 png, pdf, gif 格式的文件,可以使用 pyecharts-snapshot。使用该插件请确保你的系统上已经安装了 Nodejs 环境。 \n", 98 | "- 安装 phantomjs \\$ npm install -g phantomjs-prebuilt
\n", 99 | "- 安装 pyecharts-snapshot $ pip install pyecharts-snapshot \n", 100 | "- 调用 render 方法 bar.render(path='snapshot.png') 文件结尾可以为 svg/jpeg/png/pdf/gif。请注意,svg 文件需要你在初始化 bar 的时候设置 renderer='svg'。\n" 101 | ] 102 | }, 103 | { 104 | "cell_type": "markdown", 105 | "metadata": {}, 106 | "source": [ 107 | "## 图形绘制过程\n", 108 | "- 实例一个具体类型图表的对象 chart = FooChart()\n", 109 | "- 为图表添加通用的配置,如主题 chart.use_theme()\n", 110 | "- 为图表添加特定的配置 geo.add_coordinate()\n", 111 | "- 添加数据及配置项 chart.add()\n", 112 | "- 生成本地文件(html/svg/jpeg/png/pdf/gif) chart.render()" 113 | ] 114 | }, 115 | { 116 | "cell_type": "markdown", 117 | "metadata": {}, 118 | "source": [ 119 | "## 基本图表" 120 | ] 121 | }, 122 | { 123 | "cell_type": "markdown", 124 | "metadata": {}, 125 | "source": [ 126 | "### Bar(柱状图/条形图)\n", 127 | ">柱状/条形图,通过柱形的高度/条形的宽度来表现数据的大小。 \n", 128 | "\n", 129 | "Bar.add() 方法签名 \n", 130 | "```python\n", 131 | "add(name, x_axis, y_axis,\n", 132 | " is_stack=False,\n", 133 | " bar_category_gap='20%', **kwargs)\n", 134 | "``` \n", 135 | "- name -> str \n", 136 | "图例名称\n", 137 | "- attr -> list \n", 138 | "属性名称\n", 139 | "- value -> list \n", 140 | "属性所对应的值\n", 141 | "- shape -> list \n", 142 | "词云图轮廓,有'circle', 'cardioid', 'diamond', 'triangle-forward', 'triangle', 'pentagon', 'star'可选\n", 143 | "- word_gap -> int \n", 144 | "单词间隔,默认为 20。\n", 145 | "- word_size_range -> list \n", 146 | "单词字体大小范围,默认为 [12, 60]。\n", 147 | "- rotate_step -> int \n", 148 | "旋转单词角度,默认为 45" 149 | ] 150 | }, 151 | { 152 | "cell_type": "code", 153 | "execution_count": 26, 154 | "metadata": {}, 155 | "outputs": [], 156 | "source": [ 157 | "import pandas as pd\n", 158 | "from pyecharts import Bar\n", 159 | "\n", 160 | "df = pd.read_excel(r\"./Pyecharts.xlsx\")\n", 161 | "\n", 162 | "brands = df['品牌'].values\n", 163 | "solds = df['已售'].values\n", 164 | "schedules = df['已预订'].values\n", 165 | "bar = Bar(\"汽车各品牌销量\")\n", 166 | "bar.add(\"已售\", brands, sold, is_stack=True)\n", 167 | "bar.add(\"已预订\", brands, schedules, is_stack=True)\n", 168 | "bar.render(\"./html/bar01.html\")" 169 | ] 170 | }, 171 | { 172 | "cell_type": "markdown", 173 | "metadata": {}, 174 | "source": [ 175 | "![图片](html/images/bar.png)" 176 | ] 177 | }, 178 | { 179 | "cell_type": "markdown", 180 | "metadata": {}, 181 | "source": [ 182 | "### Pie(饼图)\n", 183 | ">饼图主要用于表现不同类目的数据在总和中的占比。每个的弧度表示数据数量的比例。 \n", 184 | "\n", 185 | "Pie.add() 方法签名\n", 186 | "```python\n", 187 | "add(name, attr, value,\n", 188 | " radius=None,\n", 189 | " center=None,\n", 190 | " rosetype=None, **kwargs)\n", 191 | "``` \n", 192 | "- name -> str \n", 193 | "图例名称\n", 194 | "- attr -> list \n", 195 | "属性名称\n", 196 | "- value -> list \n", 197 | "属性所对应的值\n", 198 | "- radius -> list \n", 199 | "饼图的半径,数组的第一项是内半径,第二项是外半径,默认为 [0, 75] \n", 200 | "默认设置成百分比,相对于容器高宽中较小的一项的一半\n", 201 | "- center -> list \n", 202 | "饼图的中心(圆心)坐标,数组的第一项是横坐标,第二项是纵坐标,默认为 [50, 50] \n", 203 | "默认设置成百分比,设置成百分比时第一项是相对于容器宽度,第二项是相对于容器高度\n", 204 | "- rosetype -> str \n", 205 | "是否展示成南丁格尔图,通过半径区分数据大小,有'radius'和'area'两种模式。默认为'radius' \n", 206 | "radius:扇区圆心角展现数据的百分比,半径展现数据的大小 \n", 207 | "area:所有扇区圆心角相同,仅通过半径展现数据大小" 208 | ] 209 | }, 210 | { 211 | "cell_type": "code", 212 | "execution_count": 54, 213 | "metadata": {}, 214 | "outputs": [], 215 | "source": [ 216 | "from pyecharts import Pie\n", 217 | "import pandas as pd\n", 218 | "df = pd.read_excel(r\"./Pyecharts.xlsx\")\n", 219 | "brands = df[\"品牌\"].values\n", 220 | "Sales = df[\"总计\"].values\n", 221 | "pie = Pie(\"汽车各品牌销量\")\n", 222 | "pie.add(\"\", brands , Sales, is_label_show=True)\n", 223 | "pie.render(\"./html/pie.html\")" 224 | ] 225 | }, 226 | { 227 | "cell_type": "markdown", 228 | "metadata": {}, 229 | "source": [ 230 | "![pie](./html/images/pie.png)" 231 | ] 232 | }, 233 | { 234 | "cell_type": "markdown", 235 | "metadata": {}, 236 | "source": [ 237 | "### WordCloud(词云图) \n", 238 | "WordCloud.add() 方法签名 \n", 239 | "```python\n", 240 | "add(name, attr, value,\n", 241 | " shape=\"circle\",\n", 242 | " word_gap=20,\n", 243 | " word_size_range=None,\n", 244 | " rotate_step=45)\n", 245 | "```\n", 246 | "- name -> str \n", 247 | "图例名称\n", 248 | "- attr -> list \n", 249 | "属性名称\n", 250 | "- value -> list \n", 251 | "属性所对应的值\n", 252 | "- shape -> list \n", 253 | "词云图轮廓,有'circle', 'cardioid', 'diamond', 'triangle-forward', 'triangle', 'pentagon', 'star'可选\n", 254 | "- word_gap -> int \n", 255 | "单词间隔,默认为 20。\n", 256 | "- word_size_range -> list \n", 257 | "单词字体大小范围,默认为 [12, 60]。\n", 258 | "- rotate_step -> int \n", 259 | "旋转单词角度,默认为 45" 260 | ] 261 | }, 262 | { 263 | "cell_type": "code", 264 | "execution_count": 52, 265 | "metadata": {}, 266 | "outputs": [], 267 | "source": [ 268 | "from pyecharts import WordCloud\n", 269 | "import pandas as pd\n", 270 | "df = pd.read_excel(r\"./Pyecharts.xlsx\",sheet_name=1)\n", 271 | "brands = df[\"品牌\"].values\n", 272 | "sales = df[\"总计\"].values\n", 273 | "wordcloud = WordCloud(width=1300, height=620)\n", 274 | "wordcloud.add(\"\", brands, sales, word_size_range=[20, 100])\n", 275 | "wordcloud.render(\"./html/WordCloud.html\")" 276 | ] 277 | }, 278 | { 279 | "cell_type": "markdown", 280 | "metadata": {}, 281 | "source": [ 282 | "![WordCloud](./html/images/WordCloud.png)" 283 | ] 284 | }, 285 | { 286 | "cell_type": "markdown", 287 | "metadata": {}, 288 | "source": [ 289 | "### Gauge(仪表盘) \n", 290 | "Gauge.add() 方法签名 \n", 291 | "```python\n", 292 | "add(name, attr, value,\n", 293 | " scale_range=None,\n", 294 | " angle_range=None, **kwargs)\n", 295 | "```\n", 296 | "- name -> str \n", 297 | "图例名称\n", 298 | "- attr -> list \n", 299 | "属性名称\n", 300 | "- value -> list \n", 301 | "属性所对应的值 \n", 302 | "- scale_range -> list \n", 303 | "仪表盘数据范围。默认为 [0, 100]\n", 304 | "- angle_range -> list \n", 305 | "仪表盘角度范围。默认为 [225, -45]" 306 | ] 307 | }, 308 | { 309 | "cell_type": "code", 310 | "execution_count": 62, 311 | "metadata": {}, 312 | "outputs": [], 313 | "source": [ 314 | "from pyecharts import Gauge\n", 315 | "\n", 316 | "gauge = Gauge(\"仪表盘示例\")\n", 317 | "gauge.add(\"业务指标\", \"完成率\", 66.66)\n", 318 | "gauge.render(\"./html/Gauge01.html\")" 319 | ] 320 | }, 321 | { 322 | "cell_type": "markdown", 323 | "metadata": {}, 324 | "source": [ 325 | "![Gauge01](./html/images/Gauge01.png)" 326 | ] 327 | }, 328 | { 329 | "cell_type": "code", 330 | "execution_count": 56, 331 | "metadata": {}, 332 | "outputs": [], 333 | "source": [ 334 | "gauge = Gauge(\"仪表盘示例\")\n", 335 | "gauge.add(\n", 336 | " \"业务指标\",\n", 337 | " \"完成率\",\n", 338 | " 166.66,\n", 339 | " angle_range=[180, 0],\n", 340 | " scale_range=[0, 200],\n", 341 | " is_legend_show=False,\n", 342 | ")\n", 343 | "gauge.render(\"./html/Gauge02.html\")" 344 | ] 345 | }, 346 | { 347 | "cell_type": "markdown", 348 | "metadata": {}, 349 | "source": [ 350 | "![Gauge02](./html/images/Gauge02.png)" 351 | ] 352 | } 353 | ], 354 | "metadata": { 355 | "kernelspec": { 356 | "display_name": "Python 3", 357 | "language": "python", 358 | "name": "python3" 359 | }, 360 | "language_info": { 361 | "codemirror_mode": { 362 | "name": "ipython", 363 | "version": 3 364 | }, 365 | "file_extension": ".py", 366 | "mimetype": "text/x-python", 367 | "name": "python", 368 | "nbconvert_exporter": "python", 369 | "pygments_lexer": "ipython3", 370 | "version": "3.7.1" 371 | }, 372 | "toc": { 373 | "base_numbering": 1, 374 | "nav_menu": {}, 375 | "number_sections": true, 376 | "sideBar": true, 377 | "skip_h1_title": false, 378 | "title_cell": "Table of Contents", 379 | "title_sidebar": "Contents", 380 | "toc_cell": false, 381 | "toc_position": {}, 382 | "toc_section_display": true, 383 | "toc_window_display": false 384 | } 385 | }, 386 | "nbformat": 4, 387 | "nbformat_minor": 2 388 | } 389 | -------------------------------------------------------------------------------- /Other/Pyecharts.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Other/Pyecharts.xlsx -------------------------------------------------------------------------------- /Other/html/images/Gauge01.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Other/html/images/Gauge01.png -------------------------------------------------------------------------------- /Other/html/images/Gauge02.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Other/html/images/Gauge02.png -------------------------------------------------------------------------------- /Other/html/images/WordCloud.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Other/html/images/WordCloud.png -------------------------------------------------------------------------------- /Other/html/images/bar.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Other/html/images/bar.png -------------------------------------------------------------------------------- /Other/html/images/dark.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Other/html/images/dark.png -------------------------------------------------------------------------------- /Other/html/images/pie.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Other/html/images/pie.png -------------------------------------------------------------------------------- /Other/html/images/start.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Other/html/images/start.png -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 本仓库为『对比Excel,轻松学习Python数据分析』书本的读书笔记 2 | 3 | [书本详细介绍](https://github.com/junhongzhang/Excel-Python-DA/blob/master/%E6%9C%AC%E4%B9%A6%E8%AF%A6%E7%BB%86%E4%BB%8B%E7%BB%8D.md) 4 | 5 | [本书的勘误表](https://github.com/junhongzhang/Excel-Python-DA/blob/master/%E5%8B%98%E8%AF%AF%E8%A1%A8.md) 6 | 7 | **说明** 8 | - Code文件夹存放的是知识点整理及书本的案例代码 9 | - Data文件夹存放的是书本代码案例用的基础数据 10 | - Note文件夹是我写的分享文章及其他同学分享的文章 11 | - 个人微信:net3330 欢迎一起学习交流 12 | 13 | --------------------------------------------------------------------------------