├── .gitignore
├── Code
├── Chapter02 Python基础知识.ipynb
├── Chapter03 Pandas数据结构.ipynb
├── Chapter04 获取数据源.ipynb
├── Chapter05 数据预处理.ipynb
├── Chapter06 数据选择.ipynb
├── Chapter07 数值操作.ipynb
├── Chapter08 数据运算.ipynb
├── Chapter09 时间序列.ipynb
├── Chapter10 数据分组 数据透视表.ipynb
├── Chapter11 多表拼接.ipynb
├── Chapter12 结果导出.ipynb
├── Chapter13 数据可视化.ipynb
├── Chapter14 典型数据分析案例.ipynb
└── Chapter15 NumPy数组.ipynb
├── Data
├── Chapter04.1.csv
├── Chapter04.csv
├── Chapter04.txt
├── Chapter04.xlsx
├── Chapter05.xlsx
├── Chapter06.xlsx
├── Chapter07.xlsx
├── Chapter08.xlsx
├── Chapter10.xlsx
├── Chapter11.xlsx
├── Chapter12.xlsx
├── fillna.xlsx
├── loan.csv
├── order-14.1.csv
├── order-14.3.csv
├── train-pivot.csv
└── 数据集使用说明.txt
├── Note
├── Git Fork开源项目如何同步更新.pdf
├── Markdown常用标签.pdf
├── jupyter notebook导出pdf并支持中文.md
├── pandas填充缺失值fillna()函数.ipynb
├── 如何给 github 的开源项目提交 pull request.pdf
└── 常见的Python代码报错及解决方案.pdf
├── Other
├── 01 Pyecharts渲染图表 .ipynb
├── Pyecharts.xlsx
└── html
│ ├── Gauge01.html
│ ├── Gauge02.html
│ ├── WordCloud.html
│ ├── bar01.html
│ ├── dark.html
│ ├── images
│ ├── Gauge01.png
│ ├── Gauge02.png
│ ├── WordCloud.png
│ ├── bar.png
│ ├── dark.png
│ ├── pie.png
│ └── start.png
│ ├── pie.html
│ └── start.html
└── README.md
/.gitignore:
--------------------------------------------------------------------------------
1 | Code/.ipynb_checkpoints/
2 | Note/.ipynb_checkpoints/
3 |
--------------------------------------------------------------------------------
/Code/Chapter03 Pandas数据结构.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "**Pandas数据结构** \n",
8 | "Python数据分析主要用到Pandas、NumPy,matplotlib这几个模块,使用前需要先导入"
9 | ]
10 | },
11 | {
12 | "cell_type": "code",
13 | "execution_count": 1,
14 | "metadata": {},
15 | "outputs": [],
16 | "source": [
17 | "#模块的导入\n",
18 | "import pandas as pd\n",
19 | "import numpy as np\n",
20 | "import matplotlib as plt"
21 | ]
22 | },
23 | {
24 | "cell_type": "markdown",
25 | "metadata": {},
26 | "source": [
27 | "## Series数据结构"
28 | ]
29 | },
30 | {
31 | "cell_type": "markdown",
32 | "metadata": {},
33 | "source": [
34 | "### Serise是什么 \n",
35 | "Serise是一种类似一维数组的对象,由一组数据及一组与之相关数据标签(即索引)组成"
36 | ]
37 | },
38 | {
39 | "cell_type": "markdown",
40 | "metadata": {},
41 | "source": [
42 | "### 创建一个Series\n",
43 | "用pd.Series()方法创建,通过给Series()方法传入不同的对象即可实现"
44 | ]
45 | },
46 | {
47 | "cell_type": "code",
48 | "execution_count": 2,
49 | "metadata": {},
50 | "outputs": [
51 | {
52 | "data": {
53 | "text/plain": [
54 | "0 a\n",
55 | "1 b\n",
56 | "2 c\n",
57 | "3 d\n",
58 | "dtype: object"
59 | ]
60 | },
61 | "execution_count": 2,
62 | "metadata": {},
63 | "output_type": "execute_result"
64 | }
65 | ],
66 | "source": [
67 | "#传入一个列表\n",
68 | "import pandas as pd\n",
69 | "S1 = pd.Series([\"a\",\"b\",\"c\",\"d\"])\n",
70 | "S1"
71 | ]
72 | },
73 | {
74 | "cell_type": "code",
75 | "execution_count": 3,
76 | "metadata": {},
77 | "outputs": [
78 | {
79 | "data": {
80 | "text/plain": [
81 | "a 1\n",
82 | "b 2\n",
83 | "c 3\n",
84 | "d 4\n",
85 | "dtype: int64"
86 | ]
87 | },
88 | "execution_count": 3,
89 | "metadata": {},
90 | "output_type": "execute_result"
91 | }
92 | ],
93 | "source": [
94 | "#指定索引\n",
95 | "S2 = pd.Series([1,2,3,4],index = [\"a\",\"b\",\"c\",\"d\"])\n",
96 | "S2"
97 | ]
98 | },
99 | {
100 | "cell_type": "code",
101 | "execution_count": 4,
102 | "metadata": {},
103 | "outputs": [
104 | {
105 | "data": {
106 | "text/plain": [
107 | "a 1\n",
108 | "b 2\n",
109 | "c 3\n",
110 | "d 4\n",
111 | "dtype: int64"
112 | ]
113 | },
114 | "execution_count": 4,
115 | "metadata": {},
116 | "output_type": "execute_result"
117 | }
118 | ],
119 | "source": [
120 | "#传入字典\n",
121 | "S3 = pd.Series({\"a\":1,\"b\":2,\"c\":3,\"d\":4})\n",
122 | "S3"
123 | ]
124 | },
125 | {
126 | "cell_type": "markdown",
127 | "metadata": {},
128 | "source": [
129 | "### 利用index方法获取Series的索引"
130 | ]
131 | },
132 | {
133 | "cell_type": "code",
134 | "execution_count": 5,
135 | "metadata": {},
136 | "outputs": [
137 | {
138 | "data": {
139 | "text/plain": [
140 | "RangeIndex(start=0, stop=4, step=1)"
141 | ]
142 | },
143 | "execution_count": 5,
144 | "metadata": {},
145 | "output_type": "execute_result"
146 | }
147 | ],
148 | "source": [
149 | "S1.index"
150 | ]
151 | },
152 | {
153 | "cell_type": "code",
154 | "execution_count": 6,
155 | "metadata": {},
156 | "outputs": [
157 | {
158 | "data": {
159 | "text/plain": [
160 | "Index(['a', 'b', 'c', 'd'], dtype='object')"
161 | ]
162 | },
163 | "execution_count": 6,
164 | "metadata": {},
165 | "output_type": "execute_result"
166 | }
167 | ],
168 | "source": [
169 | "S2.index"
170 | ]
171 | },
172 | {
173 | "cell_type": "markdown",
174 | "metadata": {},
175 | "source": [
176 | "### 利用values方法获取Series的值"
177 | ]
178 | },
179 | {
180 | "cell_type": "code",
181 | "execution_count": 10,
182 | "metadata": {},
183 | "outputs": [
184 | {
185 | "data": {
186 | "text/plain": [
187 | "array(['a', 'b', 'c', 'd'], dtype=object)"
188 | ]
189 | },
190 | "execution_count": 10,
191 | "metadata": {},
192 | "output_type": "execute_result"
193 | }
194 | ],
195 | "source": [
196 | "S1.values"
197 | ]
198 | },
199 | {
200 | "cell_type": "code",
201 | "execution_count": 11,
202 | "metadata": {},
203 | "outputs": [
204 | {
205 | "data": {
206 | "text/plain": [
207 | "array([1, 2, 3, 4], dtype=int64)"
208 | ]
209 | },
210 | "execution_count": 11,
211 | "metadata": {},
212 | "output_type": "execute_result"
213 | }
214 | ],
215 | "source": [
216 | "S2.values"
217 | ]
218 | },
219 | {
220 | "cell_type": "markdown",
221 | "metadata": {},
222 | "source": [
223 | "## DataFrame表格型数据结构"
224 | ]
225 | },
226 | {
227 | "cell_type": "markdown",
228 | "metadata": {},
229 | "source": [
230 | "### DataFrame是什么 \n",
231 | "DataFrame是由一组数据与一对索引(行索引和列索引)组成的表格型数据结构"
232 | ]
233 | },
234 | {
235 | "cell_type": "markdown",
236 | "metadata": {},
237 | "source": [
238 | "### 创建一个DataFrame \n",
239 | "使用pd.DataFrame()方法创建,通过传入对象即可实现"
240 | ]
241 | },
242 | {
243 | "cell_type": "code",
244 | "execution_count": 14,
245 | "metadata": {},
246 | "outputs": [
247 | {
248 | "data": {
249 | "text/html": [
250 | "
\n",
251 | "\n",
264 | "
\n",
265 | " \n",
266 | " \n",
267 | " | \n",
268 | " 0 | \n",
269 | "
\n",
270 | " \n",
271 | " \n",
272 | " \n",
273 | " 0 | \n",
274 | " a | \n",
275 | "
\n",
276 | " \n",
277 | " 1 | \n",
278 | " b | \n",
279 | "
\n",
280 | " \n",
281 | " 2 | \n",
282 | " c | \n",
283 | "
\n",
284 | " \n",
285 | " 3 | \n",
286 | " d | \n",
287 | "
\n",
288 | " \n",
289 | "
\n",
290 | "
"
291 | ],
292 | "text/plain": [
293 | " 0\n",
294 | "0 a\n",
295 | "1 b\n",
296 | "2 c\n",
297 | "3 d"
298 | ]
299 | },
300 | "execution_count": 14,
301 | "metadata": {},
302 | "output_type": "execute_result"
303 | }
304 | ],
305 | "source": [
306 | "#传入一个列表\n",
307 | "import pandas as pd\n",
308 | "df1 = pd.DataFrame([\"a\",\"b\",\"c\",\"d\"])\n",
309 | "df1"
310 | ]
311 | },
312 | {
313 | "cell_type": "code",
314 | "execution_count": 36,
315 | "metadata": {},
316 | "outputs": [
317 | {
318 | "data": {
319 | "text/html": [
320 | "\n",
321 | "\n",
334 | "
\n",
335 | " \n",
336 | " \n",
337 | " | \n",
338 | " 0 | \n",
339 | " 1 | \n",
340 | "
\n",
341 | " \n",
342 | " \n",
343 | " \n",
344 | " 0 | \n",
345 | " a | \n",
346 | " A | \n",
347 | "
\n",
348 | " \n",
349 | " 1 | \n",
350 | " b | \n",
351 | " B | \n",
352 | "
\n",
353 | " \n",
354 | " 2 | \n",
355 | " c | \n",
356 | " C | \n",
357 | "
\n",
358 | " \n",
359 | " 3 | \n",
360 | " d | \n",
361 | " D | \n",
362 | "
\n",
363 | " \n",
364 | "
\n",
365 | "
"
366 | ],
367 | "text/plain": [
368 | " 0 1\n",
369 | "0 a A\n",
370 | "1 b B\n",
371 | "2 c C\n",
372 | "3 d D"
373 | ]
374 | },
375 | "execution_count": 36,
376 | "metadata": {},
377 | "output_type": "execute_result"
378 | }
379 | ],
380 | "source": [
381 | "#传入一个嵌套列表\n",
382 | "df2 = pd.DataFrame([[\"a\",\"A\"],[\"b\",\"B\"],[\"c\",\"C\"],[\"d\",\"D\"]])\n",
383 | "df2"
384 | ]
385 | },
386 | {
387 | "cell_type": "markdown",
388 | "metadata": {},
389 | "source": [
390 | "**指定行、列索引** \n",
391 | "- columns 参数自定义列索引\n",
392 | "- index 参数自定义行索引"
393 | ]
394 | },
395 | {
396 | "cell_type": "code",
397 | "execution_count": 22,
398 | "metadata": {},
399 | "outputs": [
400 | {
401 | "data": {
402 | "text/html": [
403 | "\n",
404 | "\n",
417 | "
\n",
418 | " \n",
419 | " \n",
420 | " | \n",
421 | " 小写 | \n",
422 | " 大写 | \n",
423 | "
\n",
424 | " \n",
425 | " \n",
426 | " \n",
427 | " 0 | \n",
428 | " a | \n",
429 | " A | \n",
430 | "
\n",
431 | " \n",
432 | " 1 | \n",
433 | " b | \n",
434 | " B | \n",
435 | "
\n",
436 | " \n",
437 | " 2 | \n",
438 | " c | \n",
439 | " C | \n",
440 | "
\n",
441 | " \n",
442 | " 3 | \n",
443 | " d | \n",
444 | " D | \n",
445 | "
\n",
446 | " \n",
447 | "
\n",
448 | "
"
449 | ],
450 | "text/plain": [
451 | " 小写 大写\n",
452 | "0 a A\n",
453 | "1 b B\n",
454 | "2 c C\n",
455 | "3 d D"
456 | ]
457 | },
458 | "execution_count": 22,
459 | "metadata": {},
460 | "output_type": "execute_result"
461 | }
462 | ],
463 | "source": [
464 | "# 设置列索引\n",
465 | "df31 = pd.DataFrame([[\"a\",\"A\"],[\"b\",\"B\"],[\"c\",\"C\"],[\"d\",\"D\"]],columns = [\"小写\",\"大写\"])\n",
466 | "df31"
467 | ]
468 | },
469 | {
470 | "cell_type": "code",
471 | "execution_count": 24,
472 | "metadata": {},
473 | "outputs": [
474 | {
475 | "data": {
476 | "text/html": [
477 | "\n",
478 | "\n",
491 | "
\n",
492 | " \n",
493 | " \n",
494 | " | \n",
495 | " 0 | \n",
496 | " 1 | \n",
497 | "
\n",
498 | " \n",
499 | " \n",
500 | " \n",
501 | " 一 | \n",
502 | " a | \n",
503 | " A | \n",
504 | "
\n",
505 | " \n",
506 | " 二 | \n",
507 | " b | \n",
508 | " B | \n",
509 | "
\n",
510 | " \n",
511 | " 三 | \n",
512 | " c | \n",
513 | " C | \n",
514 | "
\n",
515 | " \n",
516 | " 四 | \n",
517 | " d | \n",
518 | " D | \n",
519 | "
\n",
520 | " \n",
521 | "
\n",
522 | "
"
523 | ],
524 | "text/plain": [
525 | " 0 1\n",
526 | "一 a A\n",
527 | "二 b B\n",
528 | "三 c C\n",
529 | "四 d D"
530 | ]
531 | },
532 | "execution_count": 24,
533 | "metadata": {},
534 | "output_type": "execute_result"
535 | }
536 | ],
537 | "source": [
538 | "# 设置行索引\n",
539 | "df32 = pd.DataFrame([[\"a\",\"A\"],[\"b\",\"B\"],[\"c\",\"C\"],[\"d\",\"D\"]],index = [\"一\",\"二\",\"三\",\"四\"])\n",
540 | "df32"
541 | ]
542 | },
543 | {
544 | "cell_type": "code",
545 | "execution_count": 37,
546 | "metadata": {},
547 | "outputs": [
548 | {
549 | "data": {
550 | "text/html": [
551 | "\n",
552 | "\n",
565 | "
\n",
566 | " \n",
567 | " \n",
568 | " | \n",
569 | " 小写 | \n",
570 | " 大写 | \n",
571 | "
\n",
572 | " \n",
573 | " \n",
574 | " \n",
575 | " 一 | \n",
576 | " a | \n",
577 | " A | \n",
578 | "
\n",
579 | " \n",
580 | " 二 | \n",
581 | " b | \n",
582 | " B | \n",
583 | "
\n",
584 | " \n",
585 | " 三 | \n",
586 | " c | \n",
587 | " C | \n",
588 | "
\n",
589 | " \n",
590 | " 四 | \n",
591 | " d | \n",
592 | " D | \n",
593 | "
\n",
594 | " \n",
595 | "
\n",
596 | "
"
597 | ],
598 | "text/plain": [
599 | " 小写 大写\n",
600 | "一 a A\n",
601 | "二 b B\n",
602 | "三 c C\n",
603 | "四 d D"
604 | ]
605 | },
606 | "execution_count": 37,
607 | "metadata": {},
608 | "output_type": "execute_result"
609 | }
610 | ],
611 | "source": [
612 | "# 行、列同时设置\n",
613 | "df33 = pd.DataFrame([[\"a\",\"A\"],[\"b\",\"B\"],[\"c\",\"C\"],[\"d\",\"D\"]],columns = [\"小写\",\"大写\"],index = [\"一\",\"二\",\"三\",\"四\"])\n",
614 | "df33"
615 | ]
616 | },
617 | {
618 | "cell_type": "code",
619 | "execution_count": 38,
620 | "metadata": {},
621 | "outputs": [
622 | {
623 | "data": {
624 | "text/html": [
625 | "\n",
626 | "\n",
639 | "
\n",
640 | " \n",
641 | " \n",
642 | " | \n",
643 | " 小写 | \n",
644 | " 大写 | \n",
645 | "
\n",
646 | " \n",
647 | " \n",
648 | " \n",
649 | " 0 | \n",
650 | " a | \n",
651 | " A | \n",
652 | "
\n",
653 | " \n",
654 | " 1 | \n",
655 | " b | \n",
656 | " B | \n",
657 | "
\n",
658 | " \n",
659 | " 2 | \n",
660 | " c | \n",
661 | " C | \n",
662 | "
\n",
663 | " \n",
664 | " 3 | \n",
665 | " d | \n",
666 | " D | \n",
667 | "
\n",
668 | " \n",
669 | "
\n",
670 | "
"
671 | ],
672 | "text/plain": [
673 | " 小写 大写\n",
674 | "0 a A\n",
675 | "1 b B\n",
676 | "2 c C\n",
677 | "3 d D"
678 | ]
679 | },
680 | "execution_count": 38,
681 | "metadata": {},
682 | "output_type": "execute_result"
683 | }
684 | ],
685 | "source": [
686 | "#传入一个字段\n",
687 | "data = {\"小写\":[\"a\",\"b\",\"c\",\"d\"],\"大写\":[\"A\",\"B\",\"C\",\"D\"]}\n",
688 | "df41 = pd.DataFrame(data)\n",
689 | "df41"
690 | ]
691 | },
692 | {
693 | "cell_type": "markdown",
694 | "metadata": {},
695 | "source": [
696 | "- 字典传入DataFrame时,key的值相当于列索引,如没设置行索引默认从0开始,如需设置行索引,可以赢index参数"
697 | ]
698 | },
699 | {
700 | "cell_type": "code",
701 | "execution_count": 28,
702 | "metadata": {},
703 | "outputs": [
704 | {
705 | "data": {
706 | "text/html": [
707 | "\n",
708 | "\n",
721 | "
\n",
722 | " \n",
723 | " \n",
724 | " | \n",
725 | " 小写 | \n",
726 | " 大写 | \n",
727 | "
\n",
728 | " \n",
729 | " \n",
730 | " \n",
731 | " 一 | \n",
732 | " a | \n",
733 | " A | \n",
734 | "
\n",
735 | " \n",
736 | " 二 | \n",
737 | " b | \n",
738 | " B | \n",
739 | "
\n",
740 | " \n",
741 | " 三 | \n",
742 | " c | \n",
743 | " C | \n",
744 | "
\n",
745 | " \n",
746 | " 四 | \n",
747 | " d | \n",
748 | " D | \n",
749 | "
\n",
750 | " \n",
751 | "
\n",
752 | "
"
753 | ],
754 | "text/plain": [
755 | " 小写 大写\n",
756 | "一 a A\n",
757 | "二 b B\n",
758 | "三 c C\n",
759 | "四 d D"
760 | ]
761 | },
762 | "execution_count": 28,
763 | "metadata": {},
764 | "output_type": "execute_result"
765 | }
766 | ],
767 | "source": [
768 | "# 给传入字典的数据设置行索引\n",
769 | "data = {\"小写\":[\"a\",\"b\",\"c\",\"d\"],\"大写\":[\"A\",\"B\",\"C\",\"D\"]}\n",
770 | "df42 = pd.DataFrame(data,index = [\"一\",\"二\",\"三\",\"四\"])\n",
771 | "df42"
772 | ]
773 | },
774 | {
775 | "cell_type": "markdown",
776 | "metadata": {},
777 | "source": [
778 | "### 获取DataFrame的行、列索引 \n",
779 | "- 利用columns方法获取DataFrame的列索引\n",
780 | "- 利用index方法获取DataFrame的行索引"
781 | ]
782 | },
783 | {
784 | "cell_type": "code",
785 | "execution_count": 29,
786 | "metadata": {},
787 | "outputs": [
788 | {
789 | "data": {
790 | "text/plain": [
791 | "RangeIndex(start=0, stop=2, step=1)"
792 | ]
793 | },
794 | "execution_count": 29,
795 | "metadata": {},
796 | "output_type": "execute_result"
797 | }
798 | ],
799 | "source": [
800 | "#获取DataFrame列索引\n",
801 | "df2.columns"
802 | ]
803 | },
804 | {
805 | "cell_type": "code",
806 | "execution_count": 33,
807 | "metadata": {},
808 | "outputs": [
809 | {
810 | "data": {
811 | "text/plain": [
812 | "Index(['小写', '大写'], dtype='object')"
813 | ]
814 | },
815 | "execution_count": 33,
816 | "metadata": {},
817 | "output_type": "execute_result"
818 | }
819 | ],
820 | "source": [
821 | "df33.columns"
822 | ]
823 | },
824 | {
825 | "cell_type": "code",
826 | "execution_count": 34,
827 | "metadata": {},
828 | "outputs": [
829 | {
830 | "data": {
831 | "text/plain": [
832 | "RangeIndex(start=0, stop=4, step=1)"
833 | ]
834 | },
835 | "execution_count": 34,
836 | "metadata": {},
837 | "output_type": "execute_result"
838 | }
839 | ],
840 | "source": [
841 | "#获取DataFrame行索引\n",
842 | "df2.index"
843 | ]
844 | },
845 | {
846 | "cell_type": "code",
847 | "execution_count": 35,
848 | "metadata": {},
849 | "outputs": [
850 | {
851 | "data": {
852 | "text/plain": [
853 | "Index(['一', '二', '三', '四'], dtype='object')"
854 | ]
855 | },
856 | "execution_count": 35,
857 | "metadata": {},
858 | "output_type": "execute_result"
859 | }
860 | ],
861 | "source": [
862 | "df33.index"
863 | ]
864 | },
865 | {
866 | "cell_type": "markdown",
867 | "metadata": {},
868 | "source": [
869 | "## 获取DataFrame的值\n",
870 | "第6章中介绍"
871 | ]
872 | }
873 | ],
874 | "metadata": {
875 | "kernelspec": {
876 | "display_name": "Python 3",
877 | "language": "python",
878 | "name": "python3"
879 | },
880 | "language_info": {
881 | "codemirror_mode": {
882 | "name": "ipython",
883 | "version": 3
884 | },
885 | "file_extension": ".py",
886 | "mimetype": "text/x-python",
887 | "name": "python",
888 | "nbconvert_exporter": "python",
889 | "pygments_lexer": "ipython3",
890 | "version": "3.7.0"
891 | },
892 | "toc": {
893 | "base_numbering": 1,
894 | "nav_menu": {},
895 | "number_sections": true,
896 | "sideBar": true,
897 | "skip_h1_title": false,
898 | "title_cell": "Table of Contents",
899 | "title_sidebar": "第3章 Pandas数据结构",
900 | "toc_cell": false,
901 | "toc_position": {
902 | "height": "calc(100% - 180px)",
903 | "left": "10px",
904 | "top": "150px",
905 | "width": "320px"
906 | },
907 | "toc_section_display": true,
908 | "toc_window_display": true
909 | }
910 | },
911 | "nbformat": 4,
912 | "nbformat_minor": 2
913 | }
914 |
--------------------------------------------------------------------------------
/Code/Chapter05 数据预处理.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# 数据处理"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "## 缺失值处理\n",
15 | "### 缺失值查看"
16 | ]
17 | },
18 | {
19 | "cell_type": "code",
20 | "execution_count": 1,
21 | "metadata": {},
22 | "outputs": [
23 | {
24 | "name": "stdout",
25 | "output_type": "stream",
26 | "text": [
27 | "\n",
28 | "RangeIndex: 5 entries, 0 to 4\n",
29 | "Data columns (total 4 columns):\n",
30 | "编号 4 non-null object\n",
31 | "年龄 4 non-null float64\n",
32 | "性别 3 non-null object\n",
33 | "注册时间 4 non-null datetime64[ns]\n",
34 | "dtypes: datetime64[ns](1), float64(1), object(2)\n",
35 | "memory usage: 240.0+ bytes\n"
36 | ]
37 | }
38 | ],
39 | "source": [
40 | "import pandas as pd\n",
41 | "df = pd.read_excel(r\"..\\Data\\Chapter05.xlsx\")\n",
42 | "df.head(20).info()#head()默认只显示前5条数据\n",
43 | "#df.info()#info()方法返回各个字段属性及每一列缺失数据的情况"
44 | ]
45 | },
46 | {
47 | "cell_type": "markdown",
48 | "metadata": {},
49 | "source": [
50 | "### 缺失值删除"
51 | ]
52 | },
53 | {
54 | "cell_type": "code",
55 | "execution_count": 2,
56 | "metadata": {},
57 | "outputs": [
58 | {
59 | "data": {
60 | "text/html": [
61 | "\n",
62 | "\n",
75 | "
\n",
76 | " \n",
77 | " \n",
78 | " | \n",
79 | " 编号 | \n",
80 | " 年龄 | \n",
81 | " 性别 | \n",
82 | " 注册时间 | \n",
83 | "
\n",
84 | " \n",
85 | " \n",
86 | " \n",
87 | " 0 | \n",
88 | " A1 | \n",
89 | " 54.0 | \n",
90 | " 男 | \n",
91 | " 2018-08-08 | \n",
92 | "
\n",
93 | " \n",
94 | " 1 | \n",
95 | " A2 | \n",
96 | " 16.0 | \n",
97 | " NaN | \n",
98 | " 2018-08-09 | \n",
99 | "
\n",
100 | " \n",
101 | " 3 | \n",
102 | " A3 | \n",
103 | " 47.0 | \n",
104 | " 女 | \n",
105 | " 2018-08-10 | \n",
106 | "
\n",
107 | " \n",
108 | " 4 | \n",
109 | " A4 | \n",
110 | " 41.0 | \n",
111 | " 男 | \n",
112 | " 2018-08-11 | \n",
113 | "
\n",
114 | " \n",
115 | "
\n",
116 | "
"
117 | ],
118 | "text/plain": [
119 | " 编号 年龄 性别 注册时间\n",
120 | "0 A1 54.0 男 2018-08-08\n",
121 | "1 A2 16.0 NaN 2018-08-09\n",
122 | "3 A3 47.0 女 2018-08-10\n",
123 | "4 A4 41.0 男 2018-08-11"
124 | ]
125 | },
126 | "execution_count": 2,
127 | "metadata": {},
128 | "output_type": "execute_result"
129 | }
130 | ],
131 | "source": [
132 | "import pandas as pd\n",
133 | "df = pd.read_excel(r\"..\\Data\\Chapter05.xlsx\")\n",
134 | "df.dropna() #dropna()删除缺失值的行\n",
135 | "df.dropna(how = \"all\")#删除所有列为空的行"
136 | ]
137 | },
138 | {
139 | "cell_type": "markdown",
140 | "metadata": {},
141 | "source": [
142 | "### 缺失值填充"
143 | ]
144 | },
145 | {
146 | "cell_type": "code",
147 | "execution_count": 5,
148 | "metadata": {},
149 | "outputs": [
150 | {
151 | "data": {
152 | "text/html": [
153 | "\n",
154 | "\n",
167 | "
\n",
168 | " \n",
169 | " \n",
170 | " | \n",
171 | " 编号 | \n",
172 | " 年龄 | \n",
173 | " 性别 | \n",
174 | " 注册时间 | \n",
175 | "
\n",
176 | " \n",
177 | " \n",
178 | " \n",
179 | " 0 | \n",
180 | " A1 | \n",
181 | " 54.0 | \n",
182 | " 男 | \n",
183 | " 2018-08-08 | \n",
184 | "
\n",
185 | " \n",
186 | " 1 | \n",
187 | " A2 | \n",
188 | " 16.0 | \n",
189 | " 男 | \n",
190 | " 2018-08-09 | \n",
191 | "
\n",
192 | " \n",
193 | " 2 | \n",
194 | " A3 | \n",
195 | " 30.0 | \n",
196 | " 女 | \n",
197 | " 2018-08-10 | \n",
198 | "
\n",
199 | " \n",
200 | " 3 | \n",
201 | " A4 | \n",
202 | " 41.0 | \n",
203 | " 男 | \n",
204 | " 2018-08-11 | \n",
205 | "
\n",
206 | " \n",
207 | "
\n",
208 | "
"
209 | ],
210 | "text/plain": [
211 | " 编号 年龄 性别 注册时间\n",
212 | "0 A1 54.0 男 2018-08-08\n",
213 | "1 A2 16.0 男 2018-08-09\n",
214 | "2 A3 30.0 女 2018-08-10\n",
215 | "3 A4 41.0 男 2018-08-11"
216 | ]
217 | },
218 | "execution_count": 5,
219 | "metadata": {},
220 | "output_type": "execute_result"
221 | }
222 | ],
223 | "source": [
224 | "import pandas as pd\n",
225 | "df = pd.read_excel(r\"..\\Data\\Chapter05.xlsx\",sheet_name=1)\n",
226 | "df.fillna(0)#fillna将缺失值填充为0\n",
227 | "df.fillna({\"性别\":\"男\",\"年龄\":30})#分别对性别和年龄填充\n"
228 | ]
229 | },
230 | {
231 | "cell_type": "markdown",
232 | "metadata": {},
233 | "source": [
234 | "## 重复数据处理"
235 | ]
236 | },
237 | {
238 | "cell_type": "code",
239 | "execution_count": 9,
240 | "metadata": {},
241 | "outputs": [
242 | {
243 | "data": {
244 | "text/html": [
245 | "\n",
246 | "\n",
259 | "
\n",
260 | " \n",
261 | " \n",
262 | " | \n",
263 | " 订单编号 | \n",
264 | " 客户姓名 | \n",
265 | " 唯一识别码 | \n",
266 | " 成交时间 | \n",
267 | "
\n",
268 | " \n",
269 | " \n",
270 | " \n",
271 | " 0 | \n",
272 | " A1 | \n",
273 | " 张通 | \n",
274 | " 101 | \n",
275 | " 2018-08-08 | \n",
276 | "
\n",
277 | " \n",
278 | " 1 | \n",
279 | " A2 | \n",
280 | " 李谷 | \n",
281 | " 102 | \n",
282 | " 2018-08-09 | \n",
283 | "
\n",
284 | " \n",
285 | " 3 | \n",
286 | " A3 | \n",
287 | " 孙凤 | \n",
288 | " 103 | \n",
289 | " 2018-08-10 | \n",
290 | "
\n",
291 | " \n",
292 | " 5 | \n",
293 | " A5 | \n",
294 | " 赵恒 | \n",
295 | " 104 | \n",
296 | " 2018-08-11 | \n",
297 | "
\n",
298 | " \n",
299 | "
\n",
300 | "
"
301 | ],
302 | "text/plain": [
303 | " 订单编号 客户姓名 唯一识别码 成交时间\n",
304 | "0 A1 张通 101 2018-08-08\n",
305 | "1 A2 李谷 102 2018-08-09\n",
306 | "3 A3 孙凤 103 2018-08-10\n",
307 | "5 A5 赵恒 104 2018-08-11"
308 | ]
309 | },
310 | "execution_count": 9,
311 | "metadata": {},
312 | "output_type": "execute_result"
313 | }
314 | ],
315 | "source": [
316 | "import pandas as pd\n",
317 | "df = pd.read_excel(r\"..\\Data\\Chapter05.xlsx\",sheet_name=2)\n",
318 | "df.drop_duplicates() #删除重复的列\n",
319 | "df.drop_duplicates(subset = \"唯一识别码\") #指定判断的列\n",
320 | "df.drop_duplicates(subset = [\"客户姓名\",\"唯一识别码\"])\n",
321 | "df.drop_duplicates(subset = [\"客户姓名\",\"唯一识别码\"],keep = \"last\") #keep参数(first,last)设置保留那个值\n"
322 | ]
323 | },
324 | {
325 | "cell_type": "markdown",
326 | "metadata": {},
327 | "source": [
328 | "## 异常值的检测与处理"
329 | ]
330 | },
331 | {
332 | "cell_type": "markdown",
333 | "metadata": {},
334 | "source": [
335 | "对于异常值一般有以下几种处理方式:\n",
336 | "- 最常用的处理方式就是删除。\n",
337 | "- 把异常值当作缺失值来填充。\n",
338 | "- 把异常值当作特殊情况,研究异常值出现的原因"
339 | ]
340 | },
341 | {
342 | "cell_type": "markdown",
343 | "metadata": {},
344 | "source": [
345 | "## 数据类型转换"
346 | ]
347 | },
348 | {
349 | "cell_type": "markdown",
350 | "metadata": {},
351 | "source": [
352 | "### 数据类型"
353 | ]
354 | },
355 | {
356 | "cell_type": "markdown",
357 | "metadata": {},
358 | "source": [
359 | "类型 | 说明\n",
360 | "---|---\n",
361 | "int | 整型数,即整数\n",
362 | "flat | 浮点数,即含有小数点的数\n",
363 | "object | Python对象类型,用O表示\n",
364 | "string_ | 字符串类型,经常用S表示,S10表示长度为10的字符串\n",
365 | "unicode_ | 谷歌程度的unicode类型,跟字符串的定义方式一样\n",
366 | "datatime64[ns] | 表示时间格式"
367 | ]
368 | },
369 | {
370 | "cell_type": "code",
371 | "execution_count": 6,
372 | "metadata": {},
373 | "outputs": [
374 | {
375 | "name": "stdout",
376 | "output_type": "stream",
377 | "text": [
378 | "\n",
379 | "RangeIndex: 6 entries, 0 to 5\n",
380 | "Data columns (total 4 columns):\n",
381 | "订单编号 6 non-null object\n",
382 | "客户姓名 6 non-null object\n",
383 | "唯一识别码 6 non-null int64\n",
384 | "成交时间 6 non-null datetime64[ns]\n",
385 | "dtypes: datetime64[ns](1), int64(1), object(2)\n",
386 | "memory usage: 272.0+ bytes\n"
387 | ]
388 | },
389 | {
390 | "data": {
391 | "text/plain": [
392 | "dtype('int64')"
393 | ]
394 | },
395 | "execution_count": 6,
396 | "metadata": {},
397 | "output_type": "execute_result"
398 | }
399 | ],
400 | "source": [
401 | "import pandas as pd\n",
402 | "df = pd.read_excel(r\"..\\Data\\Chapter05.xlsx\",sheet_name = 2)\n",
403 | "df.info() #info( )获取每一列的数据类型\n",
404 | "df[\"订单编号\"].dtype # 查看订单编号这一列的数据类型\n",
405 | "df[\"唯一识别码\"].dtype # 查看唯一识别码这一列的数据类型"
406 | ]
407 | },
408 | {
409 | "cell_type": "markdown",
410 | "metadata": {},
411 | "source": [
412 | "### 类型转换"
413 | ]
414 | },
415 | {
416 | "cell_type": "code",
417 | "execution_count": 17,
418 | "metadata": {},
419 | "outputs": [
420 | {
421 | "data": {
422 | "text/plain": [
423 | "0 101.0\n",
424 | "1 102.0\n",
425 | "2 103.0\n",
426 | "3 103.0\n",
427 | "4 104.0\n",
428 | "5 104.0\n",
429 | "Name: 唯一识别码, dtype: float64"
430 | ]
431 | },
432 | "execution_count": 17,
433 | "metadata": {},
434 | "output_type": "execute_result"
435 | }
436 | ],
437 | "source": [
438 | "import pandas as pd\n",
439 | "df = pd.read_excel(r\"..\\Data\\Chapter05.xlsx\",sheet_name = 2)\n",
440 | "df[\"唯一识别码\"].dtype #查看类型\n",
441 | "df[\"唯一识别码\"].astype(\"float64\")#将唯一识别码冲int类型转为float类型"
442 | ]
443 | },
444 | {
445 | "cell_type": "markdown",
446 | "metadata": {},
447 | "source": [
448 | "## 索引设置"
449 | ]
450 | },
451 | {
452 | "cell_type": "markdown",
453 | "metadata": {},
454 | "source": [
455 | "### 为无索引表添加索引"
456 | ]
457 | },
458 | {
459 | "cell_type": "code",
460 | "execution_count": 49,
461 | "metadata": {},
462 | "outputs": [
463 | {
464 | "data": {
465 | "text/html": [
466 | "\n",
467 | "\n",
480 | "
\n",
481 | " \n",
482 | " \n",
483 | " | \n",
484 | " 订单编号 | \n",
485 | " 客户姓名 | \n",
486 | " 唯一识别码 | \n",
487 | " 成交时间 | \n",
488 | "
\n",
489 | " \n",
490 | " \n",
491 | " \n",
492 | " 1 | \n",
493 | " A1 | \n",
494 | " 张通 | \n",
495 | " 101 | \n",
496 | " 2018-08-08 | \n",
497 | "
\n",
498 | " \n",
499 | " 2 | \n",
500 | " A2 | \n",
501 | " 李谷 | \n",
502 | " 102 | \n",
503 | " 2018-08-09 | \n",
504 | "
\n",
505 | " \n",
506 | " 3 | \n",
507 | " A3 | \n",
508 | " 孙凤 | \n",
509 | " 103 | \n",
510 | " 2018-08-10 | \n",
511 | "
\n",
512 | " \n",
513 | " 4 | \n",
514 | " A4 | \n",
515 | " 赵恒 | \n",
516 | " 104 | \n",
517 | " 2018-08-11 | \n",
518 | "
\n",
519 | " \n",
520 | " 5 | \n",
521 | " A5 | \n",
522 | " 赵恒 | \n",
523 | " 104 | \n",
524 | " 2018-08-11 | \n",
525 | "
\n",
526 | " \n",
527 | "
\n",
528 | "
"
529 | ],
530 | "text/plain": [
531 | " 订单编号 客户姓名 唯一识别码 成交时间\n",
532 | "1 A1 张通 101 2018-08-08\n",
533 | "2 A2 李谷 102 2018-08-09\n",
534 | "3 A3 孙凤 103 2018-08-10\n",
535 | "4 A4 赵恒 104 2018-08-11\n",
536 | "5 A5 赵恒 104 2018-08-11"
537 | ]
538 | },
539 | "execution_count": 49,
540 | "metadata": {},
541 | "output_type": "execute_result"
542 | }
543 | ],
544 | "source": [
545 | "import pandas as pd\n",
546 | "df = pd.read_excel(r\"..\\Data\\Chapter05.xlsx\",sheet_name = 3,header= None)\n",
547 | "df.columns = [\"订单编号\",\"客户姓名\",\"唯一识别码\",\"成交时间\"]#header需要设置为None,否则会覆盖第一行数据\n",
548 | "df.index = [1,2,3,4,5]\n",
549 | "df\n"
550 | ]
551 | },
552 | {
553 | "cell_type": "markdown",
554 | "metadata": {},
555 | "source": [
556 | "### 重新设置索引"
557 | ]
558 | },
559 | {
560 | "cell_type": "code",
561 | "execution_count": 66,
562 | "metadata": {},
563 | "outputs": [
564 | {
565 | "data": {
566 | "text/html": [
567 | "\n",
568 | "\n",
581 | "
\n",
582 | " \n",
583 | " \n",
584 | " | \n",
585 | " 客户姓名 | \n",
586 | " 唯一识别码 | \n",
587 | " 成交时间 | \n",
588 | "
\n",
589 | " \n",
590 | " 订单编号 | \n",
591 | " | \n",
592 | " | \n",
593 | " | \n",
594 | "
\n",
595 | " \n",
596 | " \n",
597 | " \n",
598 | " A1 | \n",
599 | " 张通 | \n",
600 | " 101 | \n",
601 | " 2018-08-08 | \n",
602 | "
\n",
603 | " \n",
604 | " A2 | \n",
605 | " 李谷 | \n",
606 | " 102 | \n",
607 | " 2018-08-09 | \n",
608 | "
\n",
609 | " \n",
610 | " A3 | \n",
611 | " 孙凤 | \n",
612 | " 103 | \n",
613 | " 2018-08-10 | \n",
614 | "
\n",
615 | " \n",
616 | " A3 | \n",
617 | " 孙凤 | \n",
618 | " 103 | \n",
619 | " 2018-08-10 | \n",
620 | "
\n",
621 | " \n",
622 | " A4 | \n",
623 | " 赵恒 | \n",
624 | " 104 | \n",
625 | " 2018-08-11 | \n",
626 | "
\n",
627 | " \n",
628 | " A5 | \n",
629 | " 赵恒 | \n",
630 | " 104 | \n",
631 | " 2018-08-11 | \n",
632 | "
\n",
633 | " \n",
634 | "
\n",
635 | "
"
636 | ],
637 | "text/plain": [
638 | " 客户姓名 唯一识别码 成交时间\n",
639 | "订单编号 \n",
640 | "A1 张通 101 2018-08-08\n",
641 | "A2 李谷 102 2018-08-09\n",
642 | "A3 孙凤 103 2018-08-10\n",
643 | "A3 孙凤 103 2018-08-10\n",
644 | "A4 赵恒 104 2018-08-11\n",
645 | "A5 赵恒 104 2018-08-11"
646 | ]
647 | },
648 | "execution_count": 66,
649 | "metadata": {},
650 | "output_type": "execute_result"
651 | }
652 | ],
653 | "source": [
654 | "import pandas as pd\n",
655 | "df = pd.read_excel(r\"..\\Data\\Chapter05.xlsx\",sheet_name = 2)\n",
656 | "df.set_index(\"订单编号\") #se_index()方法重新设置索引列"
657 | ]
658 | },
659 | {
660 | "cell_type": "markdown",
661 | "metadata": {},
662 | "source": [
663 | "### 重命名索引"
664 | ]
665 | },
666 | {
667 | "cell_type": "code",
668 | "execution_count": 82,
669 | "metadata": {},
670 | "outputs": [
671 | {
672 | "data": {
673 | "text/html": [
674 | "\n",
675 | "\n",
688 | "
\n",
689 | " \n",
690 | " \n",
691 | " | \n",
692 | " 新订单编号 | \n",
693 | " 新客户姓名 | \n",
694 | " 唯一识别码 | \n",
695 | " 成交时间 | \n",
696 | "
\n",
697 | " \n",
698 | " \n",
699 | " \n",
700 | " 一 | \n",
701 | " A1 | \n",
702 | " 张通 | \n",
703 | " 101 | \n",
704 | " 2018-08-08 | \n",
705 | "
\n",
706 | " \n",
707 | " 二 | \n",
708 | " A2 | \n",
709 | " 李谷 | \n",
710 | " 102 | \n",
711 | " 2018-08-09 | \n",
712 | "
\n",
713 | " \n",
714 | " 三 | \n",
715 | " A3 | \n",
716 | " 孙凤 | \n",
717 | " 103 | \n",
718 | " 2018-08-10 | \n",
719 | "
\n",
720 | " \n",
721 | " 四 | \n",
722 | " A4 | \n",
723 | " 赵恒 | \n",
724 | " 104 | \n",
725 | " 2018-08-11 | \n",
726 | "
\n",
727 | " \n",
728 | " 5 | \n",
729 | " A5 | \n",
730 | " 赵恒 | \n",
731 | " 104 | \n",
732 | " 2018-08-12 | \n",
733 | "
\n",
734 | " \n",
735 | "
\n",
736 | "
"
737 | ],
738 | "text/plain": [
739 | " 新订单编号 新客户姓名 唯一识别码 成交时间\n",
740 | "一 A1 张通 101 2018-08-08\n",
741 | "二 A2 李谷 102 2018-08-09\n",
742 | "三 A3 孙凤 103 2018-08-10\n",
743 | "四 A4 赵恒 104 2018-08-11\n",
744 | "5 A5 赵恒 104 2018-08-12"
745 | ]
746 | },
747 | "execution_count": 82,
748 | "metadata": {},
749 | "output_type": "execute_result"
750 | }
751 | ],
752 | "source": [
753 | "import pandas as pd\n",
754 | "df = pd.read_excel(r\"..\\Data\\Chapter05.xlsx\",sheet_name = 4)\n",
755 | "df.index = [1,2,3,4,5] #添加索引\n",
756 | "df.rename(columns={\"订单编号\":\"新订单编号\",\"客户姓名\":\"新客户姓名\"}) #重命名列索引\n",
757 | "df.rename(index = {1:\"一\",2:\"二\",3:\"三\"}) #重命名行索引\n",
758 | "df.rename(columns={\"订单编号\":\"新订单编号\",\"客户姓名\":\"新客户姓名\"},index = {1:\"一\",2:\"二\",3:\"三\",4:'四'})#同时重命名列和行索引"
759 | ]
760 | },
761 | {
762 | "cell_type": "markdown",
763 | "metadata": {},
764 | "source": [
765 | "### 重置索引"
766 | ]
767 | },
768 | {
769 | "cell_type": "code",
770 | "execution_count": 7,
771 | "metadata": {},
772 | "outputs": [
773 | {
774 | "data": {
775 | "text/html": [
776 | "\n",
777 | "\n",
790 | "
\n",
791 | " \n",
792 | " \n",
793 | " | \n",
794 | " level_0 | \n",
795 | " level_1 | \n",
796 | " C1 | \n",
797 | " C2 | \n",
798 | "
\n",
799 | " \n",
800 | " \n",
801 | " \n",
802 | " 0 | \n",
803 | " Z1 | \n",
804 | " Z2 | \n",
805 | " NaN | \n",
806 | " NaN | \n",
807 | "
\n",
808 | " \n",
809 | " 1 | \n",
810 | " A | \n",
811 | " a | \n",
812 | " 1.0 | \n",
813 | " 2.0 | \n",
814 | "
\n",
815 | " \n",
816 | " 2 | \n",
817 | " NaN | \n",
818 | " b | \n",
819 | " 3.0 | \n",
820 | " 4.0 | \n",
821 | "
\n",
822 | " \n",
823 | " 3 | \n",
824 | " B | \n",
825 | " a | \n",
826 | " 5.0 | \n",
827 | " 6.0 | \n",
828 | "
\n",
829 | " \n",
830 | " 4 | \n",
831 | " NaN | \n",
832 | " b | \n",
833 | " 7.0 | \n",
834 | " 8.0 | \n",
835 | "
\n",
836 | " \n",
837 | "
\n",
838 | "
"
839 | ],
840 | "text/plain": [
841 | " level_0 level_1 C1 C2\n",
842 | "0 Z1 Z2 NaN NaN\n",
843 | "1 A a 1.0 2.0\n",
844 | "2 NaN b 3.0 4.0\n",
845 | "3 B a 5.0 6.0\n",
846 | "4 NaN b 7.0 8.0"
847 | ]
848 | },
849 | "execution_count": 7,
850 | "metadata": {},
851 | "output_type": "execute_result"
852 | }
853 | ],
854 | "source": [
855 | "import pandas as pd\n",
856 | "df = pd.read_excel(r\"..\\Data\\Chapter05.xlsx\",sheet_name=5)\n",
857 | "df.reset_index()\n",
858 | "#详见第10章"
859 | ]
860 | }
861 | ],
862 | "metadata": {
863 | "kernelspec": {
864 | "display_name": "Python 3",
865 | "language": "python",
866 | "name": "python3"
867 | },
868 | "language_info": {
869 | "codemirror_mode": {
870 | "name": "ipython",
871 | "version": 3
872 | },
873 | "file_extension": ".py",
874 | "mimetype": "text/x-python",
875 | "name": "python",
876 | "nbconvert_exporter": "python",
877 | "pygments_lexer": "ipython3",
878 | "version": "3.7.0"
879 | },
880 | "toc": {
881 | "base_numbering": 1,
882 | "nav_menu": {},
883 | "number_sections": true,
884 | "sideBar": true,
885 | "skip_h1_title": false,
886 | "title_cell": "Table of Contents",
887 | "title_sidebar": "第5章 数据预处理",
888 | "toc_cell": false,
889 | "toc_position": {},
890 | "toc_section_display": true,
891 | "toc_window_display": true
892 | }
893 | },
894 | "nbformat": 4,
895 | "nbformat_minor": 2
896 | }
897 |
--------------------------------------------------------------------------------
/Code/Chapter06 数据选择.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## 列选择"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "### 选择某一列/某几列"
15 | ]
16 | },
17 | {
18 | "cell_type": "code",
19 | "execution_count": 8,
20 | "metadata": {
21 | "scrolled": true
22 | },
23 | "outputs": [
24 | {
25 | "data": {
26 | "text/html": [
27 | "\n",
28 | "\n",
41 | "
\n",
42 | " \n",
43 | " \n",
44 | " | \n",
45 | " 订单编号 | \n",
46 | " 唯一识别码 | \n",
47 | "
\n",
48 | " \n",
49 | " \n",
50 | " \n",
51 | " 0 | \n",
52 | " A1 | \n",
53 | " 101 | \n",
54 | "
\n",
55 | " \n",
56 | " 1 | \n",
57 | " A2 | \n",
58 | " 102 | \n",
59 | "
\n",
60 | " \n",
61 | " 2 | \n",
62 | " A3 | \n",
63 | " 103 | \n",
64 | "
\n",
65 | " \n",
66 | " 3 | \n",
67 | " A3 | \n",
68 | " 103 | \n",
69 | "
\n",
70 | " \n",
71 | " 4 | \n",
72 | " A4 | \n",
73 | " 104 | \n",
74 | "
\n",
75 | " \n",
76 | " 5 | \n",
77 | " A5 | \n",
78 | " 104 | \n",
79 | "
\n",
80 | " \n",
81 | "
\n",
82 | "
"
83 | ],
84 | "text/plain": [
85 | " 订单编号 唯一识别码\n",
86 | "0 A1 101\n",
87 | "1 A2 102\n",
88 | "2 A3 103\n",
89 | "3 A3 103\n",
90 | "4 A4 104\n",
91 | "5 A5 104"
92 | ]
93 | },
94 | "execution_count": 8,
95 | "metadata": {},
96 | "output_type": "execute_result"
97 | }
98 | ],
99 | "source": [
100 | "import pandas as pd\n",
101 | "df = pd.read_excel(r\"..\\Data\\Chapter06.xlsx\",sheet_name = 0)\n",
102 | "#通过传入列名选择数据的方式称为普通索引\n",
103 | "df\n",
104 | "df['客户姓名']\n",
105 | "df[['订单编号','客户姓名']]\n",
106 | "#通过传入具体位置来选择数据的方式称为位置索引\n",
107 | "df.iloc[:,[0,2]] #获取第1和第3列的数值,:表示获取所有的行"
108 | ]
109 | },
110 | {
111 | "cell_type": "markdown",
112 | "metadata": {},
113 | "source": [
114 | "### 连续选择某几列"
115 | ]
116 | },
117 | {
118 | "cell_type": "code",
119 | "execution_count": 11,
120 | "metadata": {},
121 | "outputs": [
122 | {
123 | "data": {
124 | "text/html": [
125 | "\n",
126 | "\n",
139 | "
\n",
140 | " \n",
141 | " \n",
142 | " | \n",
143 | " 订单编号 | \n",
144 | " 客户姓名 | \n",
145 | " 唯一识别码 | \n",
146 | "
\n",
147 | " \n",
148 | " \n",
149 | " \n",
150 | " 0 | \n",
151 | " A1 | \n",
152 | " 张通 | \n",
153 | " 101 | \n",
154 | "
\n",
155 | " \n",
156 | " 1 | \n",
157 | " A2 | \n",
158 | " 李谷 | \n",
159 | " 102 | \n",
160 | "
\n",
161 | " \n",
162 | " 2 | \n",
163 | " A3 | \n",
164 | " 孙凤 | \n",
165 | " 103 | \n",
166 | "
\n",
167 | " \n",
168 | " 3 | \n",
169 | " A3 | \n",
170 | " 孙凤 | \n",
171 | " 103 | \n",
172 | "
\n",
173 | " \n",
174 | " 4 | \n",
175 | " A4 | \n",
176 | " 赵恒 | \n",
177 | " 104 | \n",
178 | "
\n",
179 | " \n",
180 | " 5 | \n",
181 | " A5 | \n",
182 | " 赵恒 | \n",
183 | " 104 | \n",
184 | "
\n",
185 | " \n",
186 | "
\n",
187 | "
"
188 | ],
189 | "text/plain": [
190 | " 订单编号 客户姓名 唯一识别码\n",
191 | "0 A1 张通 101\n",
192 | "1 A2 李谷 102\n",
193 | "2 A3 孙凤 103\n",
194 | "3 A3 孙凤 103\n",
195 | "4 A4 赵恒 104\n",
196 | "5 A5 赵恒 104"
197 | ]
198 | },
199 | "execution_count": 11,
200 | "metadata": {},
201 | "output_type": "execute_result"
202 | }
203 | ],
204 | "source": [
205 | "#通过传入一个位置区间来获取数据的方式称为切片索引\n",
206 | "df.iloc[:,0:3] #选择第1列到第4列的之间的值(包含第1列但是不包含第4列)"
207 | ]
208 | },
209 | {
210 | "cell_type": "markdown",
211 | "metadata": {},
212 | "source": [
213 | "## 行选择"
214 | ]
215 | },
216 | {
217 | "cell_type": "markdown",
218 | "metadata": {},
219 | "source": [
220 | "### 选择某一行/某几行"
221 | ]
222 | },
223 | {
224 | "cell_type": "code",
225 | "execution_count": 18,
226 | "metadata": {},
227 | "outputs": [
228 | {
229 | "data": {
230 | "text/html": [
231 | "\n",
232 | "\n",
245 | "
\n",
246 | " \n",
247 | " \n",
248 | " | \n",
249 | " 订单编号 | \n",
250 | " 客户姓名 | \n",
251 | " 唯一识别码 | \n",
252 | " 成交时间 | \n",
253 | "
\n",
254 | " \n",
255 | " \n",
256 | " \n",
257 | " 一 | \n",
258 | " A1 | \n",
259 | " 张通 | \n",
260 | " 101 | \n",
261 | " 2018-08-08 | \n",
262 | "
\n",
263 | " \n",
264 | " 二 | \n",
265 | " A2 | \n",
266 | " 李谷 | \n",
267 | " 102 | \n",
268 | " 2018-08-09 | \n",
269 | "
\n",
270 | " \n",
271 | "
\n",
272 | "
"
273 | ],
274 | "text/plain": [
275 | " 订单编号 客户姓名 唯一识别码 成交时间\n",
276 | "一 A1 张通 101 2018-08-08\n",
277 | "二 A2 李谷 102 2018-08-09"
278 | ]
279 | },
280 | "execution_count": 18,
281 | "metadata": {},
282 | "output_type": "execute_result"
283 | }
284 | ],
285 | "source": [
286 | "#利用loc()方法,普通索引\n",
287 | "df.index = [\"一\",\"二\",\"三\",\"四\",\"五\",\"六\"]\n",
288 | "df.loc[\"一\"]\n",
289 | "df.loc[[\"一\",\"二\"]]\n",
290 | "#利用iloc方法,位置索引\n",
291 | "df.iloc[0]\n",
292 | "df.iloc[[0,1]] #选择第一和第二行"
293 | ]
294 | },
295 | {
296 | "cell_type": "markdown",
297 | "metadata": {},
298 | "source": [
299 | "### 选择连续的某几行"
300 | ]
301 | },
302 | {
303 | "cell_type": "code",
304 | "execution_count": 19,
305 | "metadata": {
306 | "scrolled": true
307 | },
308 | "outputs": [
309 | {
310 | "data": {
311 | "text/html": [
312 | "\n",
313 | "\n",
326 | "
\n",
327 | " \n",
328 | " \n",
329 | " | \n",
330 | " 订单编号 | \n",
331 | " 客户姓名 | \n",
332 | " 唯一识别码 | \n",
333 | " 成交时间 | \n",
334 | "
\n",
335 | " \n",
336 | " \n",
337 | " \n",
338 | " 一 | \n",
339 | " A1 | \n",
340 | " 张通 | \n",
341 | " 101 | \n",
342 | " 2018-08-08 | \n",
343 | "
\n",
344 | " \n",
345 | " 二 | \n",
346 | " A2 | \n",
347 | " 李谷 | \n",
348 | " 102 | \n",
349 | " 2018-08-09 | \n",
350 | "
\n",
351 | " \n",
352 | " 三 | \n",
353 | " A3 | \n",
354 | " 孙凤 | \n",
355 | " 103 | \n",
356 | " 2018-08-10 | \n",
357 | "
\n",
358 | " \n",
359 | "
\n",
360 | "
"
361 | ],
362 | "text/plain": [
363 | " 订单编号 客户姓名 唯一识别码 成交时间\n",
364 | "一 A1 张通 101 2018-08-08\n",
365 | "二 A2 李谷 102 2018-08-09\n",
366 | "三 A3 孙凤 103 2018-08-10"
367 | ]
368 | },
369 | "execution_count": 19,
370 | "metadata": {},
371 | "output_type": "execute_result"
372 | }
373 | ],
374 | "source": [
375 | "df.iloc[0:3]#选择第一行到第四行(不包含第四行)"
376 | ]
377 | },
378 | {
379 | "cell_type": "markdown",
380 | "metadata": {},
381 | "source": [
382 | "### 选择满足条件的行"
383 | ]
384 | },
385 | {
386 | "cell_type": "code",
387 | "execution_count": 21,
388 | "metadata": {},
389 | "outputs": [
390 | {
391 | "data": {
392 | "text/html": [
393 | "\n",
394 | "\n",
407 | "
\n",
408 | " \n",
409 | " \n",
410 | " | \n",
411 | " 订单编号 | \n",
412 | " 客户姓名 | \n",
413 | " 唯一识别码 | \n",
414 | " 年龄 | \n",
415 | " 成交时间 | \n",
416 | "
\n",
417 | " \n",
418 | " \n",
419 | " \n",
420 | " 0 | \n",
421 | " A1 | \n",
422 | " 张通 | \n",
423 | " 101.0 | \n",
424 | " 31.0 | \n",
425 | " 2018-08-08 | \n",
426 | "
\n",
427 | " \n",
428 | "
\n",
429 | "
"
430 | ],
431 | "text/plain": [
432 | " 订单编号 客户姓名 唯一识别码 年龄 成交时间\n",
433 | "0 A1 张通 101.0 31.0 2018-08-08"
434 | ]
435 | },
436 | "execution_count": 21,
437 | "metadata": {},
438 | "output_type": "execute_result"
439 | }
440 | ],
441 | "source": [
442 | "import pandas as pd\n",
443 | "df = pd.read_excel(r\"..\\Data\\Chapter06.xlsx\",sheet_name=3)\n",
444 | "df\n",
445 | "#选择年龄小于200的数据\n",
446 | "df[df['年龄']<200]\n",
447 | "#选择年龄小于200并且唯一识别码小于200,条件用括号括起来\n",
448 | "df[(df['年龄']<200) & (df['唯一识别码']<102)]"
449 | ]
450 | },
451 | {
452 | "cell_type": "markdown",
453 | "metadata": {},
454 | "source": [
455 | "## 行列同时选择"
456 | ]
457 | },
458 | {
459 | "cell_type": "markdown",
460 | "metadata": {},
461 | "source": [
462 | "### 普通索引+普通索引选择指定的行和列"
463 | ]
464 | },
465 | {
466 | "cell_type": "code",
467 | "execution_count": 20,
468 | "metadata": {},
469 | "outputs": [
470 | {
471 | "data": {
472 | "text/html": [
473 | "\n",
474 | "\n",
487 | "
\n",
488 | " \n",
489 | " \n",
490 | " | \n",
491 | " 订单编号 | \n",
492 | " 客户姓名 | \n",
493 | " 唯一识别码 | \n",
494 | "
\n",
495 | " \n",
496 | " \n",
497 | " \n",
498 | " 一 | \n",
499 | " A1 | \n",
500 | " 张通 | \n",
501 | " 101 | \n",
502 | "
\n",
503 | " \n",
504 | " 二 | \n",
505 | " A2 | \n",
506 | " 李谷 | \n",
507 | " 102 | \n",
508 | "
\n",
509 | " \n",
510 | "
\n",
511 | "
"
512 | ],
513 | "text/plain": [
514 | " 订单编号 客户姓名 唯一识别码\n",
515 | "一 A1 张通 101\n",
516 | "二 A2 李谷 102"
517 | ]
518 | },
519 | "execution_count": 20,
520 | "metadata": {},
521 | "output_type": "execute_result"
522 | }
523 | ],
524 | "source": [
525 | "import pandas as pd\n",
526 | "df = pd.read_excel(r\"..\\Data\\Chapter06.xlsx\",sheet_name=4)\n",
527 | "df.index = [\"一\",\"二\",\"三\",\"四\",\"五\"]\n",
528 | "#用loc传入行列名称\n",
529 | "df.loc[[\"一\",\"二\"],[\"订单编号\",\"客户姓名\",\"唯一识别码\"]]"
530 | ]
531 | },
532 | {
533 | "cell_type": "markdown",
534 | "metadata": {},
535 | "source": [
536 | "### 位置索引+位置索引选择指定的行和列"
537 | ]
538 | },
539 | {
540 | "cell_type": "code",
541 | "execution_count": 16,
542 | "metadata": {},
543 | "outputs": [
544 | {
545 | "data": {
546 | "text/html": [
547 | "\n",
548 | "\n",
561 | "
\n",
562 | " \n",
563 | " \n",
564 | " | \n",
565 | " 订单编号 | \n",
566 | " 唯一识别码 | \n",
567 | "
\n",
568 | " \n",
569 | " \n",
570 | " \n",
571 | " 一 | \n",
572 | " A1 | \n",
573 | " 101 | \n",
574 | "
\n",
575 | " \n",
576 | " 二 | \n",
577 | " A2 | \n",
578 | " 102 | \n",
579 | "
\n",
580 | " \n",
581 | "
\n",
582 | "
"
583 | ],
584 | "text/plain": [
585 | " 订单编号 唯一识别码\n",
586 | "一 A1 101\n",
587 | "二 A2 102"
588 | ]
589 | },
590 | "execution_count": 16,
591 | "metadata": {},
592 | "output_type": "execute_result"
593 | }
594 | ],
595 | "source": [
596 | "#用iloc方法传入行列位置\n",
597 | "df.iloc[[0,1],[0,2]]"
598 | ]
599 | },
600 | {
601 | "cell_type": "markdown",
602 | "metadata": {},
603 | "source": [
604 | "### 布尔索引+普通缩影选择指定的行和列"
605 | ]
606 | },
607 | {
608 | "cell_type": "code",
609 | "execution_count": 12,
610 | "metadata": {},
611 | "outputs": [
612 | {
613 | "data": {
614 | "text/html": [
615 | "\n",
616 | "\n",
629 | "
\n",
630 | " \n",
631 | " \n",
632 | " | \n",
633 | " 订单编号 | \n",
634 | " 年龄 | \n",
635 | "
\n",
636 | " \n",
637 | " \n",
638 | " \n",
639 | " 一 | \n",
640 | " A1 | \n",
641 | " 31 | \n",
642 | "
\n",
643 | " \n",
644 | " 二 | \n",
645 | " A2 | \n",
646 | " 45 | \n",
647 | "
\n",
648 | " \n",
649 | " 三 | \n",
650 | " A3 | \n",
651 | " 23 | \n",
652 | "
\n",
653 | " \n",
654 | "
\n",
655 | "
"
656 | ],
657 | "text/plain": [
658 | " 订单编号 年龄\n",
659 | "一 A1 31\n",
660 | "二 A2 45\n",
661 | "三 A3 23"
662 | ]
663 | },
664 | "execution_count": 12,
665 | "metadata": {},
666 | "output_type": "execute_result"
667 | }
668 | ],
669 | "source": [
670 | "#先进行布尔选择,然后通过普通索引选择列\n",
671 | "df[df[\"年龄\"]<200][[\"订单编号\",\"年龄\"]]"
672 | ]
673 | },
674 | {
675 | "cell_type": "markdown",
676 | "metadata": {},
677 | "source": [
678 | "### 切片索引+切片索引选择指定的行和列"
679 | ]
680 | },
681 | {
682 | "cell_type": "code",
683 | "execution_count": 17,
684 | "metadata": {},
685 | "outputs": [
686 | {
687 | "data": {
688 | "text/html": [
689 | "\n",
690 | "\n",
703 | "
\n",
704 | " \n",
705 | " \n",
706 | " | \n",
707 | " 客户姓名 | \n",
708 | " 唯一识别码 | \n",
709 | " 年龄 | \n",
710 | "
\n",
711 | " \n",
712 | " \n",
713 | " \n",
714 | " 一 | \n",
715 | " 张通 | \n",
716 | " 101 | \n",
717 | " 31 | \n",
718 | "
\n",
719 | " \n",
720 | " 二 | \n",
721 | " 李谷 | \n",
722 | " 102 | \n",
723 | " 45 | \n",
724 | "
\n",
725 | " \n",
726 | " 三 | \n",
727 | " 孙凤 | \n",
728 | " 103 | \n",
729 | " 23 | \n",
730 | "
\n",
731 | " \n",
732 | "
\n",
733 | "
"
734 | ],
735 | "text/plain": [
736 | " 客户姓名 唯一识别码 年龄\n",
737 | "一 张通 101 31\n",
738 | "二 李谷 102 45\n",
739 | "三 孙凤 103 23"
740 | ]
741 | },
742 | "execution_count": 17,
743 | "metadata": {},
744 | "output_type": "execute_result"
745 | }
746 | ],
747 | "source": [
748 | "import pandas as pd\n",
749 | "df = pd.read_excel(r\"..\\Data\\Chapter06.xlsx\",sheet_name=4)\n",
750 | "df.index = [\"一\",\"二\",\"三\",\"四\",\"五\"]\n",
751 | "#iloc第一个参数选择的是行区间,第二个参数选的是列的区间\n",
752 | "df.iloc[0:3,1:4]\n"
753 | ]
754 | },
755 | {
756 | "cell_type": "markdown",
757 | "metadata": {},
758 | "source": [
759 | "### 切片索引+普通索引指定的行和列"
760 | ]
761 | },
762 | {
763 | "cell_type": "code",
764 | "execution_count": 19,
765 | "metadata": {},
766 | "outputs": [
767 | {
768 | "name": "stderr",
769 | "output_type": "stream",
770 | "text": [
771 | "D:\\Anaconda3\\lib\\site-packages\\ipykernel_launcher.py:2: DeprecationWarning: \n",
772 | ".ix is deprecated. Please use\n",
773 | ".loc for label based indexing or\n",
774 | ".iloc for positional indexing\n",
775 | "\n",
776 | "See the documentation here:\n",
777 | "http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated\n",
778 | " \n"
779 | ]
780 | },
781 | {
782 | "data": {
783 | "text/html": [
784 | "\n",
785 | "\n",
798 | "
\n",
799 | " \n",
800 | " \n",
801 | " | \n",
802 | " 客户姓名 | \n",
803 | " 唯一识别码 | \n",
804 | "
\n",
805 | " \n",
806 | " \n",
807 | " \n",
808 | " 一 | \n",
809 | " 张通 | \n",
810 | " 101 | \n",
811 | "
\n",
812 | " \n",
813 | " 二 | \n",
814 | " 李谷 | \n",
815 | " 102 | \n",
816 | "
\n",
817 | " \n",
818 | " 三 | \n",
819 | " 孙凤 | \n",
820 | " 103 | \n",
821 | "
\n",
822 | " \n",
823 | "
\n",
824 | "
"
825 | ],
826 | "text/plain": [
827 | " 客户姓名 唯一识别码\n",
828 | "一 张通 101\n",
829 | "二 李谷 102\n",
830 | "三 孙凤 103"
831 | ]
832 | },
833 | "execution_count": 19,
834 | "metadata": {},
835 | "output_type": "execute_result"
836 | }
837 | ],
838 | "source": [
839 | "df\n",
840 | "df.ix[0:3,[\"客户姓名\",\"唯一识别码\"]]\n",
841 | "df.iloc[0:3][[\"客户姓名\",\"唯一识别码\"]]"
842 | ]
843 | }
844 | ],
845 | "metadata": {
846 | "kernelspec": {
847 | "display_name": "Python 3",
848 | "language": "python",
849 | "name": "python3"
850 | },
851 | "language_info": {
852 | "codemirror_mode": {
853 | "name": "ipython",
854 | "version": 3
855 | },
856 | "file_extension": ".py",
857 | "mimetype": "text/x-python",
858 | "name": "python",
859 | "nbconvert_exporter": "python",
860 | "pygments_lexer": "ipython3",
861 | "version": "3.7.0"
862 | },
863 | "toc": {
864 | "base_numbering": 1,
865 | "nav_menu": {},
866 | "number_sections": true,
867 | "sideBar": true,
868 | "skip_h1_title": false,
869 | "title_cell": "Table of Contents",
870 | "title_sidebar": "第6章 数据选择",
871 | "toc_cell": false,
872 | "toc_position": {
873 | "height": "calc(100% - 180px)",
874 | "left": "10px",
875 | "top": "150px",
876 | "width": "320px"
877 | },
878 | "toc_section_display": true,
879 | "toc_window_display": true
880 | }
881 | },
882 | "nbformat": 4,
883 | "nbformat_minor": 2
884 | }
885 |
--------------------------------------------------------------------------------
/Code/Chapter08 数据运算.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# 数据运算"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "## 算数运算"
15 | ]
16 | },
17 | {
18 | "cell_type": "code",
19 | "execution_count": 1,
20 | "metadata": {},
21 | "outputs": [
22 | {
23 | "data": {
24 | "text/plain": [
25 | "S1 3\n",
26 | "S2 9\n",
27 | "S3 15\n",
28 | "dtype: int64"
29 | ]
30 | },
31 | "execution_count": 1,
32 | "metadata": {},
33 | "output_type": "execute_result"
34 | }
35 | ],
36 | "source": [
37 | "#两列相加\n",
38 | "import pandas as pd\n",
39 | "df = pd.read_excel(r\"../Data/Chapter08.xlsx\",sheet_name = 0)\n",
40 | "#添加行索引\n",
41 | "df.index=[\"S1\",\"S2\",\"S3\"]\n",
42 | "df[\"C1\"]+df[\"C2\"]"
43 | ]
44 | },
45 | {
46 | "cell_type": "code",
47 | "execution_count": 2,
48 | "metadata": {},
49 | "outputs": [
50 | {
51 | "data": {
52 | "text/plain": [
53 | "S1 -1\n",
54 | "S2 -1\n",
55 | "S3 -1\n",
56 | "dtype: int64"
57 | ]
58 | },
59 | "execution_count": 2,
60 | "metadata": {},
61 | "output_type": "execute_result"
62 | }
63 | ],
64 | "source": [
65 | "#两列相减\n",
66 | "df[\"C1\"]-df[\"C2\"]"
67 | ]
68 | },
69 | {
70 | "cell_type": "code",
71 | "execution_count": 15,
72 | "metadata": {},
73 | "outputs": [
74 | {
75 | "data": {
76 | "text/plain": [
77 | "S1 2\n",
78 | "S2 20\n",
79 | "S3 56\n",
80 | "dtype: int64"
81 | ]
82 | },
83 | "execution_count": 15,
84 | "metadata": {},
85 | "output_type": "execute_result"
86 | }
87 | ],
88 | "source": [
89 | "#两列相乘\n",
90 | "df[\"C1\"]*df[\"C2\"]"
91 | ]
92 | },
93 | {
94 | "cell_type": "code",
95 | "execution_count": 4,
96 | "metadata": {},
97 | "outputs": [
98 | {
99 | "data": {
100 | "text/plain": [
101 | "S1 0.500\n",
102 | "S2 0.800\n",
103 | "S3 0.875\n",
104 | "dtype: float64"
105 | ]
106 | },
107 | "execution_count": 4,
108 | "metadata": {},
109 | "output_type": "execute_result"
110 | }
111 | ],
112 | "source": [
113 | "#两列相除\n",
114 | "df[\"C1\"]/df[\"C2\"]"
115 | ]
116 | },
117 | {
118 | "cell_type": "code",
119 | "execution_count": 5,
120 | "metadata": {},
121 | "outputs": [
122 | {
123 | "data": {
124 | "text/plain": [
125 | "S1 0\n",
126 | "S2 3\n",
127 | "S3 6\n",
128 | "Name: C1, dtype: int64"
129 | ]
130 | },
131 | "execution_count": 5,
132 | "metadata": {},
133 | "output_type": "execute_result"
134 | }
135 | ],
136 | "source": [
137 | "#任意一列加/减一个常数\n",
138 | "df[\"C1\"]+1\n",
139 | "df[\"C1\"]-1"
140 | ]
141 | },
142 | {
143 | "cell_type": "markdown",
144 | "metadata": {},
145 | "source": [
146 | "## 比较运算符"
147 | ]
148 | },
149 | {
150 | "cell_type": "code",
151 | "execution_count": 8,
152 | "metadata": {},
153 | "outputs": [
154 | {
155 | "data": {
156 | "text/plain": [
157 | "S1 True\n",
158 | "S2 True\n",
159 | "S3 True\n",
160 | "dtype: bool"
161 | ]
162 | },
163 | "execution_count": 8,
164 | "metadata": {},
165 | "output_type": "execute_result"
166 | }
167 | ],
168 | "source": [
169 | "import pandas as pd\n",
170 | "df = pd.read_excel(r\"../Data/Chapter08.xlsx\",sheet_name = 0)\n",
171 | "#添加行索引\n",
172 | "df.index=[\"S1\",\"S2\",\"S3\"]\n",
173 | "df\n",
174 | "df[\"C1\"] > df[\"C2\"]\n",
175 | "df[\"C1\"] < df[\"C2\"]\n",
176 | "df[\"C1\"] != df[\"C2\"]"
177 | ]
178 | },
179 | {
180 | "cell_type": "markdown",
181 | "metadata": {},
182 | "source": [
183 | "## 汇总运算"
184 | ]
185 | },
186 | {
187 | "cell_type": "markdown",
188 | "metadata": {},
189 | "source": [
190 | "**count()非空值计数** \n",
191 | "非空值计数就是计算摸一个区域中非空数值的个数 \n",
192 | "默认是求每一列非空值的个数 \n",
193 | "修改axis=1可以计算每一行的非空值个数"
194 | ]
195 | },
196 | {
197 | "cell_type": "code",
198 | "execution_count": 9,
199 | "metadata": {},
200 | "outputs": [
201 | {
202 | "data": {
203 | "text/plain": [
204 | "C1 3\n",
205 | "C2 3\n",
206 | "C3 3\n",
207 | "dtype: int64"
208 | ]
209 | },
210 | "execution_count": 9,
211 | "metadata": {},
212 | "output_type": "execute_result"
213 | }
214 | ],
215 | "source": [
216 | "import pandas as pd\n",
217 | "df = pd.read_excel(r\"../Data/Chapter08.xlsx\",sheet_name = 0)\n",
218 | "#添加行索引\n",
219 | "df.index=[\"S1\",\"S2\",\"S3\"]\n",
220 | "#计算每一列的非空个数\n",
221 | "df.count()"
222 | ]
223 | },
224 | {
225 | "cell_type": "code",
226 | "execution_count": 10,
227 | "metadata": {},
228 | "outputs": [
229 | {
230 | "data": {
231 | "text/plain": [
232 | "S1 3\n",
233 | "S2 3\n",
234 | "S3 3\n",
235 | "dtype: int64"
236 | ]
237 | },
238 | "execution_count": 10,
239 | "metadata": {},
240 | "output_type": "execute_result"
241 | }
242 | ],
243 | "source": [
244 | "#计算每一行的非空值个数\n",
245 | "df.count(axis =1)"
246 | ]
247 | },
248 | {
249 | "cell_type": "markdown",
250 | "metadata": {},
251 | "source": [
252 | "**sum()求和**"
253 | ]
254 | },
255 | {
256 | "cell_type": "code",
257 | "execution_count": 11,
258 | "metadata": {},
259 | "outputs": [
260 | {
261 | "data": {
262 | "text/plain": [
263 | "C1 12\n",
264 | "C2 15\n",
265 | "C3 18\n",
266 | "dtype: int64"
267 | ]
268 | },
269 | "execution_count": 11,
270 | "metadata": {},
271 | "output_type": "execute_result"
272 | }
273 | ],
274 | "source": [
275 | "#默认对每一列求和\n",
276 | "df.sum()"
277 | ]
278 | },
279 | {
280 | "cell_type": "code",
281 | "execution_count": 12,
282 | "metadata": {},
283 | "outputs": [
284 | {
285 | "data": {
286 | "text/plain": [
287 | "S1 6\n",
288 | "S2 15\n",
289 | "S3 24\n",
290 | "dtype: int64"
291 | ]
292 | },
293 | "execution_count": 12,
294 | "metadata": {},
295 | "output_type": "execute_result"
296 | }
297 | ],
298 | "source": [
299 | "#添加参数axis对每一行求和\n",
300 | "df.sum(axis = 1)"
301 | ]
302 | },
303 | {
304 | "cell_type": "code",
305 | "execution_count": 13,
306 | "metadata": {},
307 | "outputs": [
308 | {
309 | "data": {
310 | "text/plain": [
311 | "12"
312 | ]
313 | },
314 | "execution_count": 13,
315 | "metadata": {},
316 | "output_type": "execute_result"
317 | }
318 | ],
319 | "source": [
320 | "#对具体某一列求和\n",
321 | "df[\"C1\"].sum()"
322 | ]
323 | },
324 | {
325 | "cell_type": "markdown",
326 | "metadata": {},
327 | "source": [
328 | "**mean()求均值** \n",
329 | "求均值就是对某一区域中的所有值进行算数平均值运算"
330 | ]
331 | },
332 | {
333 | "cell_type": "code",
334 | "execution_count": 14,
335 | "metadata": {},
336 | "outputs": [
337 | {
338 | "data": {
339 | "text/plain": [
340 | "C1 4.0\n",
341 | "C2 5.0\n",
342 | "C3 6.0\n",
343 | "dtype: float64"
344 | ]
345 | },
346 | "execution_count": 14,
347 | "metadata": {},
348 | "output_type": "execute_result"
349 | }
350 | ],
351 | "source": [
352 | "#默认对每一列进行均值运算\n",
353 | "df.mean()"
354 | ]
355 | },
356 | {
357 | "cell_type": "code",
358 | "execution_count": 15,
359 | "metadata": {},
360 | "outputs": [
361 | {
362 | "data": {
363 | "text/plain": [
364 | "S1 2.0\n",
365 | "S2 5.0\n",
366 | "S3 8.0\n",
367 | "dtype: float64"
368 | ]
369 | },
370 | "execution_count": 15,
371 | "metadata": {},
372 | "output_type": "execute_result"
373 | }
374 | ],
375 | "source": [
376 | "#对每一行进行均值运算\n",
377 | "df.mean( axis =1)"
378 | ]
379 | },
380 | {
381 | "cell_type": "code",
382 | "execution_count": 16,
383 | "metadata": {},
384 | "outputs": [
385 | {
386 | "data": {
387 | "text/plain": [
388 | "4.0"
389 | ]
390 | },
391 | "execution_count": 16,
392 | "metadata": {},
393 | "output_type": "execute_result"
394 | }
395 | ],
396 | "source": [
397 | "#指定某一列进行均值运算\n",
398 | "df[\"C1\"].mean()"
399 | ]
400 | },
401 | {
402 | "cell_type": "markdown",
403 | "metadata": {},
404 | "source": [
405 | "**max()求最大值**"
406 | ]
407 | },
408 | {
409 | "cell_type": "code",
410 | "execution_count": 17,
411 | "metadata": {},
412 | "outputs": [
413 | {
414 | "data": {
415 | "text/plain": [
416 | "C1 7\n",
417 | "C2 8\n",
418 | "C3 9\n",
419 | "dtype: int64"
420 | ]
421 | },
422 | "execution_count": 17,
423 | "metadata": {},
424 | "output_type": "execute_result"
425 | }
426 | ],
427 | "source": [
428 | "#默认返回每一列的最大值\n",
429 | "df.max()"
430 | ]
431 | },
432 | {
433 | "cell_type": "code",
434 | "execution_count": 18,
435 | "metadata": {},
436 | "outputs": [
437 | {
438 | "data": {
439 | "text/plain": [
440 | "S1 3\n",
441 | "S2 6\n",
442 | "S3 9\n",
443 | "dtype: int64"
444 | ]
445 | },
446 | "execution_count": 18,
447 | "metadata": {},
448 | "output_type": "execute_result"
449 | }
450 | ],
451 | "source": [
452 | "#对每一行求最大值\n",
453 | "df.max( axis =1)"
454 | ]
455 | },
456 | {
457 | "cell_type": "code",
458 | "execution_count": 19,
459 | "metadata": {},
460 | "outputs": [
461 | {
462 | "data": {
463 | "text/plain": [
464 | "7"
465 | ]
466 | },
467 | "execution_count": 19,
468 | "metadata": {},
469 | "output_type": "execute_result"
470 | }
471 | ],
472 | "source": [
473 | "# 对某一列求最大值\n",
474 | "df[\"C1\"].max()"
475 | ]
476 | },
477 | {
478 | "cell_type": "markdown",
479 | "metadata": {},
480 | "source": [
481 | "**min()求最小值使用方法和max()一致**"
482 | ]
483 | },
484 | {
485 | "cell_type": "markdown",
486 | "metadata": {},
487 | "source": [
488 | "**median()求中位数** \n",
489 | "中位数就是将一组含有n个数据的序列X按照从小到大排列,位于中间位置的那个数,使用方法和其他函数一致"
490 | ]
491 | },
492 | {
493 | "cell_type": "code",
494 | "execution_count": 20,
495 | "metadata": {},
496 | "outputs": [
497 | {
498 | "data": {
499 | "text/plain": [
500 | "C1 4.0\n",
501 | "C2 5.0\n",
502 | "C3 6.0\n",
503 | "dtype: float64"
504 | ]
505 | },
506 | "execution_count": 20,
507 | "metadata": {},
508 | "output_type": "execute_result"
509 | }
510 | ],
511 | "source": [
512 | "df.median()"
513 | ]
514 | },
515 | {
516 | "cell_type": "markdown",
517 | "metadata": {},
518 | "source": [
519 | "**mode()求众数** \n",
520 | "众数就是在一组数据中出现次数最多的数,使用方法与其他函数一致"
521 | ]
522 | },
523 | {
524 | "cell_type": "code",
525 | "execution_count": 26,
526 | "metadata": {},
527 | "outputs": [
528 | {
529 | "data": {
530 | "text/html": [
531 | "\n",
532 | "\n",
545 | "
\n",
546 | " \n",
547 | " \n",
548 | " | \n",
549 | " C1 | \n",
550 | " C2 | \n",
551 | " C3 | \n",
552 | "
\n",
553 | " \n",
554 | " \n",
555 | " \n",
556 | " 0 | \n",
557 | " 1 | \n",
558 | " 1 | \n",
559 | " 3 | \n",
560 | "
\n",
561 | " \n",
562 | "
\n",
563 | "
"
564 | ],
565 | "text/plain": [
566 | " C1 C2 C3\n",
567 | "0 1 1 3"
568 | ]
569 | },
570 | "execution_count": 26,
571 | "metadata": {},
572 | "output_type": "execute_result"
573 | }
574 | ],
575 | "source": [
576 | "import pandas as pd\n",
577 | "df = pd.read_excel(r\"../Data/Chapter08.xlsx\",sheet_name=1)\n",
578 | "df.index=[\"S1\",\"S2\",\"S3\"]\n",
579 | "df.mode()"
580 | ]
581 | },
582 | {
583 | "cell_type": "markdown",
584 | "metadata": {},
585 | "source": [
586 | "**var()求方差✩** \n",
587 | "方差是用来衡量一组数据离散程度的,使用方法与其他函数一致 \n",
588 | "**std()求标准差✩** \n",
589 | "标准差是方差的平方根,二者都是用来表示数据的离散程度的,使用方法与其他函数一致"
590 | ]
591 | },
592 | {
593 | "cell_type": "markdown",
594 | "metadata": {},
595 | "source": [
596 | "**quantile()求分数位** \n",
597 | "分数位是比中数位更加详细的基于位置的指标,有四分之一分数位、四分之二分数位、四分之三分数位,而四分之二分数位就是中数位。\n"
598 | ]
599 | },
600 | {
601 | "cell_type": "code",
602 | "execution_count": 27,
603 | "metadata": {},
604 | "outputs": [
605 | {
606 | "data": {
607 | "text/plain": [
608 | "C1 4.0\n",
609 | "C2 5.0\n",
610 | "C3 6.0\n",
611 | "Name: 0.25, dtype: float64"
612 | ]
613 | },
614 | "execution_count": 27,
615 | "metadata": {},
616 | "output_type": "execute_result"
617 | }
618 | ],
619 | "source": [
620 | "import pandas as pd\n",
621 | "df = pd.read_excel(r\"../Data/Chapter08.xlsx\",sheet_name=2)\n",
622 | "df.index=[\"S1\",\"S2\",\"S3\",\"S4\",\"S5\"]\n",
623 | "df\n",
624 | "df.quantile(0.25)#求四分之一分数位"
625 | ]
626 | },
627 | {
628 | "cell_type": "code",
629 | "execution_count": 13,
630 | "metadata": {},
631 | "outputs": [
632 | {
633 | "data": {
634 | "text/plain": [
635 | "C1 10.0\n",
636 | "C2 11.0\n",
637 | "C3 12.0\n",
638 | "Name: 0.75, dtype: float64"
639 | ]
640 | },
641 | "execution_count": 13,
642 | "metadata": {},
643 | "output_type": "execute_result"
644 | }
645 | ],
646 | "source": [
647 | "df.quantile(0.75)#求四分之三分数位"
648 | ]
649 | },
650 | {
651 | "cell_type": "code",
652 | "execution_count": 15,
653 | "metadata": {},
654 | "outputs": [
655 | {
656 | "data": {
657 | "text/plain": [
658 | "S1 1.5\n",
659 | "S2 4.5\n",
660 | "S3 7.5\n",
661 | "S4 10.5\n",
662 | "S5 13.5\n",
663 | "Name: 0.25, dtype: float64"
664 | ]
665 | },
666 | "execution_count": 15,
667 | "metadata": {},
668 | "output_type": "execute_result"
669 | }
670 | ],
671 | "source": [
672 | "df.quantile(0.25,axis = 1)#求每一行的四分之一分数位"
673 | ]
674 | },
675 | {
676 | "cell_type": "markdown",
677 | "metadata": {},
678 | "source": [
679 | "## 相关性运算符✩\n",
680 | "相关性长用来衡量两个事之间的相关程度,用corr()函数"
681 | ]
682 | },
683 | {
684 | "cell_type": "code",
685 | "execution_count": 17,
686 | "metadata": {},
687 | "outputs": [
688 | {
689 | "data": {
690 | "text/html": [
691 | "\n",
692 | "\n",
705 | "
\n",
706 | " \n",
707 | " \n",
708 | " | \n",
709 | " C1 | \n",
710 | " C2 | \n",
711 | " C3 | \n",
712 | "
\n",
713 | " \n",
714 | " \n",
715 | " \n",
716 | " C1 | \n",
717 | " 1.0 | \n",
718 | " 1.0 | \n",
719 | " 1.0 | \n",
720 | "
\n",
721 | " \n",
722 | " C2 | \n",
723 | " 1.0 | \n",
724 | " 1.0 | \n",
725 | " 1.0 | \n",
726 | "
\n",
727 | " \n",
728 | " C3 | \n",
729 | " 1.0 | \n",
730 | " 1.0 | \n",
731 | " 1.0 | \n",
732 | "
\n",
733 | " \n",
734 | "
\n",
735 | "
"
736 | ],
737 | "text/plain": [
738 | " C1 C2 C3\n",
739 | "C1 1.0 1.0 1.0\n",
740 | "C2 1.0 1.0 1.0\n",
741 | "C3 1.0 1.0 1.0"
742 | ]
743 | },
744 | "execution_count": 17,
745 | "metadata": {},
746 | "output_type": "execute_result"
747 | }
748 | ],
749 | "source": [
750 | "df.corr()"
751 | ]
752 | }
753 | ],
754 | "metadata": {
755 | "kernelspec": {
756 | "display_name": "Python 3",
757 | "language": "python",
758 | "name": "python3"
759 | },
760 | "language_info": {
761 | "codemirror_mode": {
762 | "name": "ipython",
763 | "version": 3
764 | },
765 | "file_extension": ".py",
766 | "mimetype": "text/x-python",
767 | "name": "python",
768 | "nbconvert_exporter": "python",
769 | "pygments_lexer": "ipython3",
770 | "version": "3.7.0"
771 | },
772 | "toc": {
773 | "base_numbering": 1,
774 | "nav_menu": {},
775 | "number_sections": true,
776 | "sideBar": true,
777 | "skip_h1_title": false,
778 | "title_cell": "Table of Contents",
779 | "title_sidebar": "第8章 数据运算",
780 | "toc_cell": false,
781 | "toc_position": {
782 | "height": "calc(100% - 180px)",
783 | "left": "10px",
784 | "top": "150px",
785 | "width": "320px"
786 | },
787 | "toc_section_display": true,
788 | "toc_window_display": true
789 | }
790 | },
791 | "nbformat": 4,
792 | "nbformat_minor": 2
793 | }
794 |
--------------------------------------------------------------------------------
/Code/Chapter09 时间序列.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# 时间序列"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "## 获取当前时刻的时间"
15 | ]
16 | },
17 | {
18 | "cell_type": "markdown",
19 | "metadata": {},
20 | "source": [
21 | "**返回当前时刻的日期和时间**"
22 | ]
23 | },
24 | {
25 | "cell_type": "code",
26 | "execution_count": 76,
27 | "metadata": {},
28 | "outputs": [
29 | {
30 | "data": {
31 | "text/plain": [
32 | "datetime.datetime(2019, 3, 14, 15, 57, 43, 307645)"
33 | ]
34 | },
35 | "execution_count": 76,
36 | "metadata": {},
37 | "output_type": "execute_result"
38 | }
39 | ],
40 | "source": [
41 | "from datetime import datetime\n",
42 | "datetime.now()"
43 | ]
44 | },
45 | {
46 | "cell_type": "markdown",
47 | "metadata": {},
48 | "source": [
49 | "**分别返回当前时刻的年、月、日**"
50 | ]
51 | },
52 | {
53 | "cell_type": "code",
54 | "execution_count": 77,
55 | "metadata": {},
56 | "outputs": [
57 | {
58 | "data": {
59 | "text/plain": [
60 | "2019"
61 | ]
62 | },
63 | "execution_count": 77,
64 | "metadata": {},
65 | "output_type": "execute_result"
66 | }
67 | ],
68 | "source": [
69 | "datetime.now().year "
70 | ]
71 | },
72 | {
73 | "cell_type": "code",
74 | "execution_count": 3,
75 | "metadata": {},
76 | "outputs": [
77 | {
78 | "data": {
79 | "text/plain": [
80 | "3"
81 | ]
82 | },
83 | "execution_count": 3,
84 | "metadata": {},
85 | "output_type": "execute_result"
86 | }
87 | ],
88 | "source": [
89 | "datetime.now().month"
90 | ]
91 | },
92 | {
93 | "cell_type": "code",
94 | "execution_count": 78,
95 | "metadata": {},
96 | "outputs": [
97 | {
98 | "data": {
99 | "text/plain": [
100 | "14"
101 | ]
102 | },
103 | "execution_count": 78,
104 | "metadata": {},
105 | "output_type": "execute_result"
106 | }
107 | ],
108 | "source": [
109 | "datetime.now().day"
110 | ]
111 | },
112 | {
113 | "cell_type": "markdown",
114 | "metadata": {},
115 | "source": [
116 | "**返回当前时刻的周数**"
117 | ]
118 | },
119 | {
120 | "cell_type": "code",
121 | "execution_count": 6,
122 | "metadata": {},
123 | "outputs": [
124 | {
125 | "data": {
126 | "text/plain": [
127 | "7"
128 | ]
129 | },
130 | "execution_count": 6,
131 | "metadata": {},
132 | "output_type": "execute_result"
133 | }
134 | ],
135 | "source": [
136 | "#返回周几,python周几是从0开始的,所以后面加1\n",
137 | "datetime.now().weekday()+1"
138 | ]
139 | },
140 | {
141 | "cell_type": "code",
142 | "execution_count": 9,
143 | "metadata": {},
144 | "outputs": [
145 | {
146 | "data": {
147 | "text/plain": [
148 | "(2019, 10, 7)"
149 | ]
150 | },
151 | "execution_count": 9,
152 | "metadata": {},
153 | "output_type": "execute_result"
154 | }
155 | ],
156 | "source": [
157 | "#返回周数\n",
158 | "datetime.now().isocalendar()\n",
159 | "#2019年第10周的第7天"
160 | ]
161 | },
162 | {
163 | "cell_type": "code",
164 | "execution_count": 11,
165 | "metadata": {},
166 | "outputs": [
167 | {
168 | "data": {
169 | "text/plain": [
170 | "10"
171 | ]
172 | },
173 | "execution_count": 11,
174 | "metadata": {},
175 | "output_type": "execute_result"
176 | }
177 | ],
178 | "source": [
179 | "#返回周数\n",
180 | "datetime.now().isocalendar()[1]"
181 | ]
182 | },
183 | {
184 | "cell_type": "markdown",
185 | "metadata": {},
186 | "source": [
187 | "**指定日期和时间格式** \n",
188 | "- date()函数将只展示日期 \n",
189 | "- time()函数将只展示时间 \n",
190 | "- strftime()函数可以自定义时间和日期格式"
191 | ]
192 | },
193 | {
194 | "cell_type": "code",
195 | "execution_count": 12,
196 | "metadata": {},
197 | "outputs": [
198 | {
199 | "data": {
200 | "text/plain": [
201 | "datetime.date(2019, 3, 10)"
202 | ]
203 | },
204 | "execution_count": 12,
205 | "metadata": {},
206 | "output_type": "execute_result"
207 | }
208 | ],
209 | "source": [
210 | "datetime.now().date()"
211 | ]
212 | },
213 | {
214 | "cell_type": "code",
215 | "execution_count": 13,
216 | "metadata": {
217 | "scrolled": true
218 | },
219 | "outputs": [
220 | {
221 | "data": {
222 | "text/plain": [
223 | "datetime.time(22, 11, 41, 36684)"
224 | ]
225 | },
226 | "execution_count": 13,
227 | "metadata": {},
228 | "output_type": "execute_result"
229 | }
230 | ],
231 | "source": [
232 | "datetime.now().time()"
233 | ]
234 | },
235 | {
236 | "cell_type": "markdown",
237 | "metadata": {},
238 | "source": [
239 | "strftime()定义的时间格式 \n",
240 | "\n",
241 | "代码 | 说明\n",
242 | "---|---\n",
243 | "%H | 小时(24小时制)[00,23]\n",
244 | "%I | 小时(24小时制)[01,12]\n",
245 | "%M | 两位数的分[00,59]\n",
246 | "%S | 秒\\[00,61](60和61用于闰秒)\n",
247 | "%w | 用整数表示星期几,从0开始\n",
248 | "%U | 每年的第几周,周日被认为每周第一天\n",
249 | "%U | 每年的第几周,周一被认为每周第一天\n",
250 | "%F | %Y-%m-%d的简写形式,例如2018-04-18\n",
251 | "%D | %m/%d/%y的简写形式,例如04/18/2018"
252 | ]
253 | },
254 | {
255 | "cell_type": "code",
256 | "execution_count": 14,
257 | "metadata": {},
258 | "outputs": [
259 | {
260 | "data": {
261 | "text/plain": [
262 | "'2019-03-10'"
263 | ]
264 | },
265 | "execution_count": 14,
266 | "metadata": {},
267 | "output_type": "execute_result"
268 | }
269 | ],
270 | "source": [
271 | "datetime.now().strftime(\"%Y-%m-%d\")"
272 | ]
273 | },
274 | {
275 | "cell_type": "markdown",
276 | "metadata": {},
277 | "source": [
278 | "## 字符串和时间格式相互转换"
279 | ]
280 | },
281 | {
282 | "cell_type": "markdown",
283 | "metadata": {},
284 | "source": [
285 | "**将时间格式转换为字符串格式** \n",
286 | "使用str()函数"
287 | ]
288 | },
289 | {
290 | "cell_type": "code",
291 | "execution_count": 79,
292 | "metadata": {},
293 | "outputs": [
294 | {
295 | "data": {
296 | "text/plain": [
297 | "str"
298 | ]
299 | },
300 | "execution_count": 79,
301 | "metadata": {},
302 | "output_type": "execute_result"
303 | }
304 | ],
305 | "source": [
306 | "from datetime import datetime\n",
307 | "now = datetime.now()\n",
308 | "now\n",
309 | "type(now)\n",
310 | "type(str(now))"
311 | ]
312 | },
313 | {
314 | "cell_type": "markdown",
315 | "metadata": {},
316 | "source": [
317 | "**将字符串格式转换为时间格式** \n",
318 | "使用parse()函数"
319 | ]
320 | },
321 | {
322 | "cell_type": "code",
323 | "execution_count": 11,
324 | "metadata": {},
325 | "outputs": [
326 | {
327 | "data": {
328 | "text/plain": [
329 | "datetime.datetime"
330 | ]
331 | },
332 | "execution_count": 11,
333 | "metadata": {},
334 | "output_type": "execute_result"
335 | }
336 | ],
337 | "source": [
338 | "from dateutil.parser import parse\n",
339 | "str_time = \"2019-03-11\"\n",
340 | "type(str_time)\n",
341 | "parse(str_time)\n",
342 | "type(parse(str_time))"
343 | ]
344 | },
345 | {
346 | "cell_type": "markdown",
347 | "metadata": {},
348 | "source": [
349 | "## 时间索引 \n",
350 | "时间索引就是根据时间来对时间格式的字段进行数据选取的一种索引方式。"
351 | ]
352 | },
353 | {
354 | "cell_type": "code",
355 | "execution_count": 4,
356 | "metadata": {},
357 | "outputs": [
358 | {
359 | "data": {
360 | "text/html": [
361 | "\n",
362 | "\n",
375 | "
\n",
376 | " \n",
377 | " \n",
378 | " | \n",
379 | " num | \n",
380 | "
\n",
381 | " \n",
382 | " \n",
383 | " \n",
384 | " 2018-01-01 | \n",
385 | " 1 | \n",
386 | "
\n",
387 | " \n",
388 | " 2018-01-02 | \n",
389 | " 2 | \n",
390 | "
\n",
391 | " \n",
392 | " 2018-01-03 | \n",
393 | " 3 | \n",
394 | "
\n",
395 | " \n",
396 | " 2018-01-04 | \n",
397 | " 4 | \n",
398 | "
\n",
399 | " \n",
400 | " 2018-01-05 | \n",
401 | " 5 | \n",
402 | "
\n",
403 | " \n",
404 | " 2018-01-06 | \n",
405 | " 6 | \n",
406 | "
\n",
407 | " \n",
408 | " 2018-01-07 | \n",
409 | " 7 | \n",
410 | "
\n",
411 | " \n",
412 | " 2018-01-08 | \n",
413 | " 8 | \n",
414 | "
\n",
415 | " \n",
416 | " 2018-01-09 | \n",
417 | " 9 | \n",
418 | "
\n",
419 | " \n",
420 | " 2018-01-10 | \n",
421 | " 10 | \n",
422 | "
\n",
423 | " \n",
424 | "
\n",
425 | "
"
426 | ],
427 | "text/plain": [
428 | " num\n",
429 | "2018-01-01 1\n",
430 | "2018-01-02 2\n",
431 | "2018-01-03 3\n",
432 | "2018-01-04 4\n",
433 | "2018-01-05 5\n",
434 | "2018-01-06 6\n",
435 | "2018-01-07 7\n",
436 | "2018-01-08 8\n",
437 | "2018-01-09 9\n",
438 | "2018-01-10 10"
439 | ]
440 | },
441 | "execution_count": 4,
442 | "metadata": {},
443 | "output_type": "execute_result"
444 | }
445 | ],
446 | "source": [
447 | "import pandas as pd\n",
448 | "import numpy as np\n",
449 | "index = pd.DatetimeIndex(['2018-01-01','2018-01-02','2018-01-03','2018-01-04','2018-01-05',\n",
450 | " '2018-01-06','2018-01-07','2018-01-08','2018-01-09','2018-01-10'])\n",
451 | "data = pd.DataFrame(np.arange(1,11),columns =[\"num\"],index = index)\n",
452 | "data"
453 | ]
454 | },
455 | {
456 | "cell_type": "code",
457 | "execution_count": 5,
458 | "metadata": {},
459 | "outputs": [
460 | {
461 | "data": {
462 | "text/html": [
463 | "\n",
464 | "\n",
477 | "
\n",
478 | " \n",
479 | " \n",
480 | " | \n",
481 | " num | \n",
482 | "
\n",
483 | " \n",
484 | " \n",
485 | " \n",
486 | " 2018-01-01 | \n",
487 | " 1 | \n",
488 | "
\n",
489 | " \n",
490 | " 2018-01-02 | \n",
491 | " 2 | \n",
492 | "
\n",
493 | " \n",
494 | " 2018-01-03 | \n",
495 | " 3 | \n",
496 | "
\n",
497 | " \n",
498 | " 2018-01-04 | \n",
499 | " 4 | \n",
500 | "
\n",
501 | " \n",
502 | " 2018-01-05 | \n",
503 | " 5 | \n",
504 | "
\n",
505 | " \n",
506 | " 2018-01-06 | \n",
507 | " 6 | \n",
508 | "
\n",
509 | " \n",
510 | " 2018-01-07 | \n",
511 | " 7 | \n",
512 | "
\n",
513 | " \n",
514 | " 2018-01-08 | \n",
515 | " 8 | \n",
516 | "
\n",
517 | " \n",
518 | " 2018-01-09 | \n",
519 | " 9 | \n",
520 | "
\n",
521 | " \n",
522 | " 2018-01-10 | \n",
523 | " 10 | \n",
524 | "
\n",
525 | " \n",
526 | "
\n",
527 | "
"
528 | ],
529 | "text/plain": [
530 | " num\n",
531 | "2018-01-01 1\n",
532 | "2018-01-02 2\n",
533 | "2018-01-03 3\n",
534 | "2018-01-04 4\n",
535 | "2018-01-05 5\n",
536 | "2018-01-06 6\n",
537 | "2018-01-07 7\n",
538 | "2018-01-08 8\n",
539 | "2018-01-09 9\n",
540 | "2018-01-10 10"
541 | ]
542 | },
543 | "execution_count": 5,
544 | "metadata": {},
545 | "output_type": "execute_result"
546 | }
547 | ],
548 | "source": [
549 | "#获取2018年的数据\n",
550 | "data[\"2018\"]"
551 | ]
552 | },
553 | {
554 | "cell_type": "code",
555 | "execution_count": 6,
556 | "metadata": {},
557 | "outputs": [
558 | {
559 | "data": {
560 | "text/html": [
561 | "\n",
562 | "\n",
575 | "
\n",
576 | " \n",
577 | " \n",
578 | " | \n",
579 | " num | \n",
580 | "
\n",
581 | " \n",
582 | " \n",
583 | " \n",
584 | " 2018-01-01 | \n",
585 | " 1 | \n",
586 | "
\n",
587 | " \n",
588 | " 2018-01-02 | \n",
589 | " 2 | \n",
590 | "
\n",
591 | " \n",
592 | " 2018-01-03 | \n",
593 | " 3 | \n",
594 | "
\n",
595 | " \n",
596 | " 2018-01-04 | \n",
597 | " 4 | \n",
598 | "
\n",
599 | " \n",
600 | " 2018-01-05 | \n",
601 | " 5 | \n",
602 | "
\n",
603 | " \n",
604 | " 2018-01-06 | \n",
605 | " 6 | \n",
606 | "
\n",
607 | " \n",
608 | " 2018-01-07 | \n",
609 | " 7 | \n",
610 | "
\n",
611 | " \n",
612 | " 2018-01-08 | \n",
613 | " 8 | \n",
614 | "
\n",
615 | " \n",
616 | " 2018-01-09 | \n",
617 | " 9 | \n",
618 | "
\n",
619 | " \n",
620 | " 2018-01-10 | \n",
621 | " 10 | \n",
622 | "
\n",
623 | " \n",
624 | "
\n",
625 | "
"
626 | ],
627 | "text/plain": [
628 | " num\n",
629 | "2018-01-01 1\n",
630 | "2018-01-02 2\n",
631 | "2018-01-03 3\n",
632 | "2018-01-04 4\n",
633 | "2018-01-05 5\n",
634 | "2018-01-06 6\n",
635 | "2018-01-07 7\n",
636 | "2018-01-08 8\n",
637 | "2018-01-09 9\n",
638 | "2018-01-10 10"
639 | ]
640 | },
641 | "execution_count": 6,
642 | "metadata": {},
643 | "output_type": "execute_result"
644 | }
645 | ],
646 | "source": [
647 | "#获取2018年1月份的数据\n",
648 | "data[\"2018-01\"]"
649 | ]
650 | },
651 | {
652 | "cell_type": "code",
653 | "execution_count": 8,
654 | "metadata": {},
655 | "outputs": [
656 | {
657 | "data": {
658 | "text/html": [
659 | "\n",
660 | "\n",
673 | "
\n",
674 | " \n",
675 | " \n",
676 | " | \n",
677 | " num | \n",
678 | "
\n",
679 | " \n",
680 | " \n",
681 | " \n",
682 | " 2018-01-01 | \n",
683 | " 1 | \n",
684 | "
\n",
685 | " \n",
686 | " 2018-01-02 | \n",
687 | " 2 | \n",
688 | "
\n",
689 | " \n",
690 | " 2018-01-03 | \n",
691 | " 3 | \n",
692 | "
\n",
693 | " \n",
694 | " 2018-01-04 | \n",
695 | " 4 | \n",
696 | "
\n",
697 | " \n",
698 | " 2018-01-05 | \n",
699 | " 5 | \n",
700 | "
\n",
701 | " \n",
702 | "
\n",
703 | "
"
704 | ],
705 | "text/plain": [
706 | " num\n",
707 | "2018-01-01 1\n",
708 | "2018-01-02 2\n",
709 | "2018-01-03 3\n",
710 | "2018-01-04 4\n",
711 | "2018-01-05 5"
712 | ]
713 | },
714 | "execution_count": 8,
715 | "metadata": {},
716 | "output_type": "execute_result"
717 | }
718 | ],
719 | "source": [
720 | "#获取2018年1月1日到2018年1月5日的数据\n",
721 | "data[\"2018-01-01\":\"2018-01-05\"]"
722 | ]
723 | },
724 | {
725 | "cell_type": "code",
726 | "execution_count": 3,
727 | "metadata": {},
728 | "outputs": [
729 | {
730 | "data": {
731 | "text/html": [
732 | "\n",
733 | "\n",
746 | "
\n",
747 | " \n",
748 | " \n",
749 | " | \n",
750 | " 订单编号 | \n",
751 | " 客户姓名 | \n",
752 | " 唯一识别码 | \n",
753 | " 年龄 | \n",
754 | " 成交时间 | \n",
755 | " 销售ID | \n",
756 | "
\n",
757 | " \n",
758 | " \n",
759 | " \n",
760 | " 1 | \n",
761 | " A2 | \n",
762 | " 李谷 | \n",
763 | " 102 | \n",
764 | " 45 | \n",
765 | " 2018-08-09 | \n",
766 | " 2 | \n",
767 | "
\n",
768 | " \n",
769 | " 2 | \n",
770 | " A3 | \n",
771 | " 孙凤 | \n",
772 | " 103 | \n",
773 | " 23 | \n",
774 | " 2018-08-10 | \n",
775 | " 1 | \n",
776 | "
\n",
777 | " \n",
778 | " 3 | \n",
779 | " A4 | \n",
780 | " 赵恒 | \n",
781 | " 104 | \n",
782 | " 240 | \n",
783 | " 2018-08-11 | \n",
784 | " 2 | \n",
785 | "
\n",
786 | " \n",
787 | " 4 | \n",
788 | " A5 | \n",
789 | " 王娜 | \n",
790 | " 105 | \n",
791 | " 21 | \n",
792 | " 2018-08-11 | \n",
793 | " 3 | \n",
794 | "
\n",
795 | " \n",
796 | "
\n",
797 | "
"
798 | ],
799 | "text/plain": [
800 | " 订单编号 客户姓名 唯一识别码 年龄 成交时间 销售ID\n",
801 | "1 A2 李谷 102 45 2018-08-09 2\n",
802 | "2 A3 孙凤 103 23 2018-08-10 1\n",
803 | "3 A4 赵恒 104 240 2018-08-11 2\n",
804 | "4 A5 王娜 105 21 2018-08-11 3"
805 | ]
806 | },
807 | "execution_count": 3,
808 | "metadata": {},
809 | "output_type": "execute_result"
810 | }
811 | ],
812 | "source": [
813 | "import pandas as pd\n",
814 | "from datetime import datetime\n",
815 | "df = pd.read_excel(r\"../Data/Chapter06.xlsx\",sheet_name = 4)\n",
816 | "df[df[\"成交时间\"]>datetime(2018,8,8)]"
817 | ]
818 | },
819 | {
820 | "cell_type": "code",
821 | "execution_count": 4,
822 | "metadata": {},
823 | "outputs": [
824 | {
825 | "data": {
826 | "text/html": [
827 | "\n",
828 | "\n",
841 | "
\n",
842 | " \n",
843 | " \n",
844 | " | \n",
845 | " 订单编号 | \n",
846 | " 客户姓名 | \n",
847 | " 唯一识别码 | \n",
848 | " 年龄 | \n",
849 | " 成交时间 | \n",
850 | " 销售ID | \n",
851 | "
\n",
852 | " \n",
853 | " \n",
854 | " \n",
855 | " 0 | \n",
856 | " A1 | \n",
857 | " 张通 | \n",
858 | " 101 | \n",
859 | " 31 | \n",
860 | " 2018-08-08 | \n",
861 | " 1 | \n",
862 | "
\n",
863 | " \n",
864 | "
\n",
865 | "
"
866 | ],
867 | "text/plain": [
868 | " 订单编号 客户姓名 唯一识别码 年龄 成交时间 销售ID\n",
869 | "0 A1 张通 101 31 2018-08-08 1"
870 | ]
871 | },
872 | "execution_count": 4,
873 | "metadata": {},
874 | "output_type": "execute_result"
875 | }
876 | ],
877 | "source": [
878 | "df[df[\"成交时间\"] == datetime(2018,8,8)]"
879 | ]
880 | },
881 | {
882 | "cell_type": "code",
883 | "execution_count": 26,
884 | "metadata": {},
885 | "outputs": [
886 | {
887 | "data": {
888 | "text/html": [
889 | "\n",
890 | "\n",
903 | "
\n",
904 | " \n",
905 | " \n",
906 | " | \n",
907 | " 订单编号 | \n",
908 | " 客户姓名 | \n",
909 | " 唯一识别码 | \n",
910 | " 年龄 | \n",
911 | " 成交时间 | \n",
912 | " 销售ID | \n",
913 | "
\n",
914 | " \n",
915 | " \n",
916 | " \n",
917 | " 0 | \n",
918 | " A1 | \n",
919 | " 张通 | \n",
920 | " 101 | \n",
921 | " 31 | \n",
922 | " 2018-08-08 | \n",
923 | " 1 | \n",
924 | "
\n",
925 | " \n",
926 | "
\n",
927 | "
"
928 | ],
929 | "text/plain": [
930 | " 订单编号 客户姓名 唯一识别码 年龄 成交时间 销售ID\n",
931 | "0 A1 张通 101 31 2018-08-08 1"
932 | ]
933 | },
934 | "execution_count": 26,
935 | "metadata": {},
936 | "output_type": "execute_result"
937 | }
938 | ],
939 | "source": [
940 | "df[df[\"成交时间\"]\n",
952 | "\n",
965 | "\n",
966 | " \n",
967 | " \n",
968 | " | \n",
969 | " 订单编号 | \n",
970 | " 客户姓名 | \n",
971 | " 唯一识别码 | \n",
972 | " 年龄 | \n",
973 | " 成交时间 | \n",
974 | " 销售ID | \n",
975 | "
\n",
976 | " \n",
977 | " \n",
978 | " \n",
979 | " 1 | \n",
980 | " A2 | \n",
981 | " 李谷 | \n",
982 | " 102 | \n",
983 | " 45 | \n",
984 | " 2018-08-09 | \n",
985 | " 2 | \n",
986 | "
\n",
987 | " \n",
988 | " 2 | \n",
989 | " A3 | \n",
990 | " 孙凤 | \n",
991 | " 103 | \n",
992 | " 23 | \n",
993 | " 2018-08-10 | \n",
994 | " 1 | \n",
995 | "
\n",
996 | " \n",
997 | "
\n",
998 | ""
999 | ],
1000 | "text/plain": [
1001 | " 订单编号 客户姓名 唯一识别码 年龄 成交时间 销售ID\n",
1002 | "1 A2 李谷 102 45 2018-08-09 2\n",
1003 | "2 A3 孙凤 103 23 2018-08-10 1"
1004 | ]
1005 | },
1006 | "execution_count": 29,
1007 | "metadata": {},
1008 | "output_type": "execute_result"
1009 | }
1010 | ],
1011 | "source": [
1012 | "df[(df[\"成交时间\"]>datetime(2018,8,8))&(df[\"成交时间\"]< datetime(2018,8,11))]"
1013 | ]
1014 | },
1015 | {
1016 | "cell_type": "markdown",
1017 | "metadata": {},
1018 | "source": [
1019 | "## 时间运算"
1020 | ]
1021 | },
1022 | {
1023 | "cell_type": "markdown",
1024 | "metadata": {},
1025 | "source": [
1026 | "**两个时间之差**"
1027 | ]
1028 | },
1029 | {
1030 | "cell_type": "code",
1031 | "execution_count": 30,
1032 | "metadata": {},
1033 | "outputs": [
1034 | {
1035 | "data": {
1036 | "text/plain": [
1037 | "datetime.timedelta(days=2, seconds=83880)"
1038 | ]
1039 | },
1040 | "execution_count": 30,
1041 | "metadata": {},
1042 | "output_type": "execute_result"
1043 | }
1044 | ],
1045 | "source": [
1046 | "cha = datetime(2018,5,21,19,50)-datetime(2018,5,18,20,32)\n",
1047 | "cha"
1048 | ]
1049 | },
1050 | {
1051 | "cell_type": "code",
1052 | "execution_count": 31,
1053 | "metadata": {},
1054 | "outputs": [
1055 | {
1056 | "data": {
1057 | "text/plain": [
1058 | "2"
1059 | ]
1060 | },
1061 | "execution_count": 31,
1062 | "metadata": {},
1063 | "output_type": "execute_result"
1064 | }
1065 | ],
1066 | "source": [
1067 | "#返回天数\n",
1068 | "cha.days"
1069 | ]
1070 | },
1071 | {
1072 | "cell_type": "code",
1073 | "execution_count": 33,
1074 | "metadata": {},
1075 | "outputs": [
1076 | {
1077 | "data": {
1078 | "text/plain": [
1079 | "83880"
1080 | ]
1081 | },
1082 | "execution_count": 33,
1083 | "metadata": {},
1084 | "output_type": "execute_result"
1085 | }
1086 | ],
1087 | "source": [
1088 | "#返回秒时差\n",
1089 | "cha.seconds"
1090 | ]
1091 | },
1092 | {
1093 | "cell_type": "code",
1094 | "execution_count": 35,
1095 | "metadata": {},
1096 | "outputs": [
1097 | {
1098 | "data": {
1099 | "text/plain": [
1100 | "23.3"
1101 | ]
1102 | },
1103 | "execution_count": 35,
1104 | "metadata": {},
1105 | "output_type": "execute_result"
1106 | }
1107 | ],
1108 | "source": [
1109 | "#换算成小时的时间差\n",
1110 | "cha.seconds/3600"
1111 | ]
1112 | },
1113 | {
1114 | "cell_type": "markdown",
1115 | "metadata": {},
1116 | "source": [
1117 | "**时间偏移**\n",
1118 | "- timedelata只能偏移天、秒、微秒\n",
1119 | "- 日期偏移量,可以直接实现天、小时、分钟单位的偏移date offset"
1120 | ]
1121 | },
1122 | {
1123 | "cell_type": "markdown",
1124 | "metadata": {},
1125 | "source": [
1126 | "**timedelate**"
1127 | ]
1128 | },
1129 | {
1130 | "cell_type": "code",
1131 | "execution_count": 43,
1132 | "metadata": {},
1133 | "outputs": [
1134 | {
1135 | "data": {
1136 | "text/plain": [
1137 | "datetime.datetime(2019, 3, 14, 15, 39, 55, 130084)"
1138 | ]
1139 | },
1140 | "execution_count": 43,
1141 | "metadata": {},
1142 | "output_type": "execute_result"
1143 | }
1144 | ],
1145 | "source": [
1146 | "from datetime import timedelta,datetime\n",
1147 | "date = datetime.now()\n",
1148 | "date"
1149 | ]
1150 | },
1151 | {
1152 | "cell_type": "code",
1153 | "execution_count": 51,
1154 | "metadata": {},
1155 | "outputs": [
1156 | {
1157 | "data": {
1158 | "text/plain": [
1159 | "datetime.datetime(2019, 3, 15, 15, 39, 55, 130084)"
1160 | ]
1161 | },
1162 | "execution_count": 51,
1163 | "metadata": {},
1164 | "output_type": "execute_result"
1165 | }
1166 | ],
1167 | "source": [
1168 | "#往后推一天\n",
1169 | "date+timedelta(days =1)"
1170 | ]
1171 | },
1172 | {
1173 | "cell_type": "code",
1174 | "execution_count": 50,
1175 | "metadata": {},
1176 | "outputs": [
1177 | {
1178 | "data": {
1179 | "text/plain": [
1180 | "datetime.datetime(2019, 3, 14, 15, 40, 55, 130084)"
1181 | ]
1182 | },
1183 | "execution_count": 50,
1184 | "metadata": {},
1185 | "output_type": "execute_result"
1186 | }
1187 | ],
1188 | "source": [
1189 | "#往后推60秒\n",
1190 | "date+timedelta(seconds = 60)"
1191 | ]
1192 | },
1193 | {
1194 | "cell_type": "code",
1195 | "execution_count": 52,
1196 | "metadata": {},
1197 | "outputs": [
1198 | {
1199 | "data": {
1200 | "text/plain": [
1201 | "datetime.datetime(2019, 3, 13, 15, 39, 55, 130084)"
1202 | ]
1203 | },
1204 | "execution_count": 52,
1205 | "metadata": {},
1206 | "output_type": "execute_result"
1207 | }
1208 | ],
1209 | "source": [
1210 | "#往前推一天\n",
1211 | "date - timedelta(days =1)"
1212 | ]
1213 | },
1214 | {
1215 | "cell_type": "markdown",
1216 | "metadata": {},
1217 | "source": [
1218 | "**data offset**"
1219 | ]
1220 | },
1221 | {
1222 | "cell_type": "code",
1223 | "execution_count": 74,
1224 | "metadata": {},
1225 | "outputs": [
1226 | {
1227 | "data": {
1228 | "text/plain": [
1229 | "datetime.datetime(2019, 3, 14, 15, 57, 32, 786664)"
1230 | ]
1231 | },
1232 | "execution_count": 74,
1233 | "metadata": {},
1234 | "output_type": "execute_result"
1235 | }
1236 | ],
1237 | "source": [
1238 | "from pandas.tseries.offsets import Hour,Minute,Day,MonthEnd\n",
1239 | "date = datetime.now()\n",
1240 | "date"
1241 | ]
1242 | },
1243 | {
1244 | "cell_type": "code",
1245 | "execution_count": 67,
1246 | "metadata": {},
1247 | "outputs": [
1248 | {
1249 | "data": {
1250 | "text/plain": [
1251 | "Timestamp('2019-03-15 15:54:23.875623')"
1252 | ]
1253 | },
1254 | "execution_count": 67,
1255 | "metadata": {},
1256 | "output_type": "execute_result"
1257 | }
1258 | ],
1259 | "source": [
1260 | "#往后推一天\n",
1261 | "date+Day(1)"
1262 | ]
1263 | },
1264 | {
1265 | "cell_type": "code",
1266 | "execution_count": 70,
1267 | "metadata": {},
1268 | "outputs": [
1269 | {
1270 | "data": {
1271 | "text/plain": [
1272 | "Timestamp('2019-03-14 16:54:23.875623')"
1273 | ]
1274 | },
1275 | "execution_count": 70,
1276 | "metadata": {},
1277 | "output_type": "execute_result"
1278 | }
1279 | ],
1280 | "source": [
1281 | "#往后推1小时\n",
1282 | "date+Hour(1)"
1283 | ]
1284 | },
1285 | {
1286 | "cell_type": "code",
1287 | "execution_count": 71,
1288 | "metadata": {},
1289 | "outputs": [
1290 | {
1291 | "data": {
1292 | "text/plain": [
1293 | "Timestamp('2019-03-14 16:04:23.875623')"
1294 | ]
1295 | },
1296 | "execution_count": 71,
1297 | "metadata": {},
1298 | "output_type": "execute_result"
1299 | }
1300 | ],
1301 | "source": [
1302 | "#往后推10分钟\n",
1303 | "date+Minute(10)"
1304 | ]
1305 | },
1306 | {
1307 | "cell_type": "code",
1308 | "execution_count": 75,
1309 | "metadata": {},
1310 | "outputs": [
1311 | {
1312 | "data": {
1313 | "text/plain": [
1314 | "Timestamp('2019-03-31 15:57:32.786664')"
1315 | ]
1316 | },
1317 | "execution_count": 75,
1318 | "metadata": {},
1319 | "output_type": "execute_result"
1320 | }
1321 | ],
1322 | "source": [
1323 | "#推后到月底\n",
1324 | "date+MonthEnd(1)"
1325 | ]
1326 | }
1327 | ],
1328 | "metadata": {
1329 | "kernelspec": {
1330 | "display_name": "Python 3",
1331 | "language": "python",
1332 | "name": "python3"
1333 | },
1334 | "language_info": {
1335 | "codemirror_mode": {
1336 | "name": "ipython",
1337 | "version": 3
1338 | },
1339 | "file_extension": ".py",
1340 | "mimetype": "text/x-python",
1341 | "name": "python",
1342 | "nbconvert_exporter": "python",
1343 | "pygments_lexer": "ipython3",
1344 | "version": "3.7.0"
1345 | },
1346 | "toc": {
1347 | "base_numbering": 1,
1348 | "nav_menu": {},
1349 | "number_sections": true,
1350 | "sideBar": true,
1351 | "skip_h1_title": false,
1352 | "title_cell": "Table of Contents",
1353 | "title_sidebar": "第9章 时间序列",
1354 | "toc_cell": false,
1355 | "toc_position": {},
1356 | "toc_section_display": true,
1357 | "toc_window_display": true
1358 | }
1359 | },
1360 | "nbformat": 4,
1361 | "nbformat_minor": 2
1362 | }
1363 |
--------------------------------------------------------------------------------
/Code/Chapter12 结果导出.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## 导出.xlsx文件"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "**设置文件导出路径**"
15 | ]
16 | },
17 | {
18 | "cell_type": "code",
19 | "execution_count": 47,
20 | "metadata": {},
21 | "outputs": [],
22 | "source": [
23 | "import pandas as pd\n",
24 | "df = pd.read_excel(r\"../Data/Chapter12.xlsx\",sheet_name =0 )\n",
25 | "df.to_excel(excel_writer = r\"C:\\Users\\Administrator\\Excel-Python\\Data\\测试文档01.xlsx\")"
26 | ]
27 | },
28 | {
29 | "cell_type": "markdown",
30 | "metadata": {},
31 | "source": [
32 | "**设置Sheet名称**"
33 | ]
34 | },
35 | {
36 | "cell_type": "code",
37 | "execution_count": 48,
38 | "metadata": {},
39 | "outputs": [],
40 | "source": [
41 | "df.to_excel(excel_writer = r\"C:\\Users\\Administrator\\Excel-Python\\Data\\测试文档02.xlsx\",\n",
42 | " sheet_name =\"测试\")"
43 | ]
44 | },
45 | {
46 | "cell_type": "markdown",
47 | "metadata": {},
48 | "source": [
49 | "**设置索引**"
50 | ]
51 | },
52 | {
53 | "cell_type": "code",
54 | "execution_count": 46,
55 | "metadata": {},
56 | "outputs": [],
57 | "source": [
58 | "df.to_excel(excel_writer = r\"C:\\Users\\Administrator\\Excel-Python\\Data\\测试文档03.xlsx\",\n",
59 | " index = False)"
60 | ]
61 | },
62 | {
63 | "cell_type": "markdown",
64 | "metadata": {},
65 | "source": [
66 | "**设置要导出的列**"
67 | ]
68 | },
69 | {
70 | "cell_type": "code",
71 | "execution_count": 45,
72 | "metadata": {},
73 | "outputs": [],
74 | "source": [
75 | "df = pd.read_excel(r\"../Data/Chapter12.xlsx\",sheet_name =0 )\n",
76 | "df.to_excel(excel_writer = r\"C:\\Users\\Administrator\\Excel-Python\\Data\\测试文档04.xlsx\",\n",
77 | " sheet_name = \"测试文档\",\n",
78 | " index=False,columns = [\"用户ID\",\"7月销量\",\"8月销量\",\"9月销量\"])"
79 | ]
80 | },
81 | {
82 | "cell_type": "markdown",
83 | "metadata": {},
84 | "source": [
85 | "**设置编码格式**"
86 | ]
87 | },
88 | {
89 | "cell_type": "code",
90 | "execution_count": 43,
91 | "metadata": {},
92 | "outputs": [],
93 | "source": [
94 | "df.to_excel(excel_writer = r\"C:\\Users\\Administrator\\Excel-Python\\Data\\测试文档05.xlsx\",\n",
95 | " sheet_name = \"测试文档\",\n",
96 | " index = False,\n",
97 | " encoding = \"utf-8\")"
98 | ]
99 | },
100 | {
101 | "cell_type": "markdown",
102 | "metadata": {},
103 | "source": [
104 | "**缺失值处理**"
105 | ]
106 | },
107 | {
108 | "cell_type": "code",
109 | "execution_count": 42,
110 | "metadata": {},
111 | "outputs": [],
112 | "source": [
113 | "df = pd.read_excel(r\"../Data/Chapter12.xlsx\",sheet_name =2)\n",
114 | "df.to_excel(excel_writer = r\"C:\\Users\\Administrator\\Excel-Python\\Data\\测试文档06.xlsx\",\n",
115 | " sheet_name=\"测试文档\",\n",
116 | " index = False,\n",
117 | " encoding = \"utf-8\",\n",
118 | " na_rep = 0 #缺失值填充为0\n",
119 | " )"
120 | ]
121 | },
122 | {
123 | "cell_type": "markdown",
124 | "metadata": {},
125 | "source": [
126 | "**无穷值处理**"
127 | ]
128 | },
129 | {
130 | "cell_type": "code",
131 | "execution_count": 55,
132 | "metadata": {},
133 | "outputs": [],
134 | "source": [
135 | "df = pd.read_excel(r\"../Data/Chapter12.xlsx\",sheet_name =1)\n",
136 | "df.to_excel(excel_writer = r\"C:\\Users\\Administrator\\Excel-Python\\Data\\测试文档07.xlsx\",\n",
137 | " sheet_name = \"测试文档\",\n",
138 | " index = False,\n",
139 | " encoding = \"utf-8\",\n",
140 | " na_rep = 0,\n",
141 | " inf_rep = 0 #无穷值填充为0\n",
142 | " )"
143 | ]
144 | },
145 | {
146 | "cell_type": "markdown",
147 | "metadata": {},
148 | "source": [
149 | "## 导出为 .csv文件"
150 | ]
151 | },
152 | {
153 | "cell_type": "markdown",
154 | "metadata": {},
155 | "source": [
156 | "**设置文件导出路径**"
157 | ]
158 | },
159 | {
160 | "cell_type": "code",
161 | "execution_count": 82,
162 | "metadata": {},
163 | "outputs": [],
164 | "source": [
165 | "df = pd.read_excel(r\"../Data/Chapter12.xlsx\",sheet_name =0)\n",
166 | "df.to_csv(path_or_buf = r\"C:\\Users\\Administrator\\Excel-Python\\Data\\测试文档01.csv\" )"
167 | ]
168 | },
169 | {
170 | "cell_type": "markdown",
171 | "metadata": {},
172 | "source": [
173 | "**设置索引**"
174 | ]
175 | },
176 | {
177 | "cell_type": "code",
178 | "execution_count": 64,
179 | "metadata": {},
180 | "outputs": [],
181 | "source": [
182 | "df.to_csv(path_or_buf = r\"C:\\Users\\Administrator\\Excel-Python\\Data\\测试文档02.csv\",\n",
183 | " index = False )"
184 | ]
185 | },
186 | {
187 | "cell_type": "markdown",
188 | "metadata": {},
189 | "source": [
190 | "**设置导出的列**"
191 | ]
192 | },
193 | {
194 | "cell_type": "code",
195 | "execution_count": 83,
196 | "metadata": {},
197 | "outputs": [],
198 | "source": [
199 | "df.to_csv(path_or_buf = r\"C:\\Users\\Administrator\\Excel-Python\\Data\\测试文档03.csv\" ,\n",
200 | " index= False,\n",
201 | " columns = [\"用户ID\",\"7月销量\",\"8月销量\",\"9月销量\"])"
202 | ]
203 | },
204 | {
205 | "cell_type": "markdown",
206 | "metadata": {},
207 | "source": [
208 | "**设置分隔符号**"
209 | ]
210 | },
211 | {
212 | "cell_type": "code",
213 | "execution_count": 77,
214 | "metadata": {},
215 | "outputs": [],
216 | "source": [
217 | "df.to_csv(path_or_buf = r\"C:\\Users\\Administrator\\Excel-Python\\Data\\测试文档04.csv\" ,\n",
218 | " index= False,\n",
219 | " columns = [\"用户ID\",\"7月销量\",\"8月销量\",\"9月销量\"],\n",
220 | " sep=\",\")"
221 | ]
222 | },
223 | {
224 | "cell_type": "markdown",
225 | "metadata": {},
226 | "source": [
227 | "**缺失值处理**"
228 | ]
229 | },
230 | {
231 | "cell_type": "code",
232 | "execution_count": 75,
233 | "metadata": {},
234 | "outputs": [],
235 | "source": [
236 | "df = pd.read_excel(r\"../Data/Chapter12.xlsx\",sheet_name =2)\n",
237 | "df.to_csv(path_or_buf = r\"C:\\Users\\Administrator\\Excel-Python\\Data\\测试文档05.csv\" ,\n",
238 | " index= False,\n",
239 | " columns = [\"用户ID\",\"7月销量\",\"8月销量\",\"9月销量\"],\n",
240 | " sep=\",\",\n",
241 | " na_rep = 0)"
242 | ]
243 | },
244 | {
245 | "cell_type": "markdown",
246 | "metadata": {},
247 | "source": [
248 | "**设置编码格式**"
249 | ]
250 | },
251 | {
252 | "cell_type": "code",
253 | "execution_count": 81,
254 | "metadata": {},
255 | "outputs": [],
256 | "source": [
257 | "df = pd.read_excel(r\"../Data/Chapter12.xlsx\",sheet_name =2)\n",
258 | "df.to_csv(path_or_buf = r\"C:\\Users\\Administrator\\Excel-Python\\Data\\测试文档06.csv\" ,\n",
259 | " index= False,\n",
260 | " columns = [\"用户ID\",\"7月销量\",\"8月销量\",\"9月销量\"],\n",
261 | " sep=\",\",\n",
262 | " na_rep = 0,\n",
263 | " encoding = \"gbk\" #设置为gbk或者utf-8-sig\n",
264 | " )"
265 | ]
266 | },
267 | {
268 | "cell_type": "markdown",
269 | "metadata": {},
270 | "source": [
271 | "## 将文件导出到多个Sheet"
272 | ]
273 | },
274 | {
275 | "cell_type": "code",
276 | "execution_count": 80,
277 | "metadata": {},
278 | "outputs": [],
279 | "source": [
280 | "df1 = pd.read_excel(r\"../Data/Chapter12.xlsx\",sheet_name =0)\n",
281 | "df2 = pd.read_excel(r\"../Data/Chapter12.xlsx\",sheet_name =1)\n",
282 | "df3 = pd.read_excel(r\"../Data/Chapter12.xlsx\",sheet_name =2)\n",
283 | "#声明一个对象\n",
284 | "writer = pd.ExcelWriter(r\"C:\\Users\\Administrator\\Excel-Python\\Data\\test02.xlsx\",\n",
285 | " engine = \"xlsxwriter\")\n",
286 | "#将df1、df2、df3写入Excel中的sheet1、sheet2、sheet3\n",
287 | "#重命名表1、表2、表3\n",
288 | "df1.to_excel(writer,sheet_name =\"表1\")\n",
289 | "df2.to_excel(writer,sheet_name =\"表2\")\n",
290 | "df3.to_excel(writer,sheet_name =\"表3\")\n",
291 | "#保存读写的内容\n",
292 | "writer.save()"
293 | ]
294 | }
295 | ],
296 | "metadata": {
297 | "kernelspec": {
298 | "display_name": "Python 3",
299 | "language": "python",
300 | "name": "python3"
301 | },
302 | "language_info": {
303 | "codemirror_mode": {
304 | "name": "ipython",
305 | "version": 3
306 | },
307 | "file_extension": ".py",
308 | "mimetype": "text/x-python",
309 | "name": "python",
310 | "nbconvert_exporter": "python",
311 | "pygments_lexer": "ipython3",
312 | "version": "3.7.0"
313 | },
314 | "toc": {
315 | "base_numbering": 1,
316 | "nav_menu": {},
317 | "number_sections": true,
318 | "sideBar": true,
319 | "skip_h1_title": false,
320 | "title_cell": "Table of Contents",
321 | "title_sidebar": "第12章 结果导出",
322 | "toc_cell": false,
323 | "toc_position": {},
324 | "toc_section_display": true,
325 | "toc_window_display": true
326 | }
327 | },
328 | "nbformat": 4,
329 | "nbformat_minor": 2
330 | }
331 |
--------------------------------------------------------------------------------
/Data/Chapter04.1.csv:
--------------------------------------------------------------------------------
1 | 编号 年龄 性别 注册时间
2 | A1 54 男 2018/8/8
3 | A2 16 女 2018/8/9
4 | A3 47 女 2018/8/10
5 | A4 41 男 2018/8/11
6 |
--------------------------------------------------------------------------------
/Data/Chapter04.csv:
--------------------------------------------------------------------------------
1 | 编号,年龄,性别,注册时间
2 | A1,54,男,2018/8/8
3 | A2,16,女,2018/8/9
4 | A3,47,女,2018/8/10
5 | A4,41,男,2018/8/11
6 |
--------------------------------------------------------------------------------
/Data/Chapter04.txt:
--------------------------------------------------------------------------------
1 | 编号,年龄,性别,注册时间
2 | A1,54,男,2018/8/8
3 | A2,16,女,2018/8/9
4 | A3,47,女,2018/8/10
5 | A4,41,男,2018/8/11
6 |
--------------------------------------------------------------------------------
/Data/Chapter04.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Data/Chapter04.xlsx
--------------------------------------------------------------------------------
/Data/Chapter05.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Data/Chapter05.xlsx
--------------------------------------------------------------------------------
/Data/Chapter06.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Data/Chapter06.xlsx
--------------------------------------------------------------------------------
/Data/Chapter07.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Data/Chapter07.xlsx
--------------------------------------------------------------------------------
/Data/Chapter08.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Data/Chapter08.xlsx
--------------------------------------------------------------------------------
/Data/Chapter10.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Data/Chapter10.xlsx
--------------------------------------------------------------------------------
/Data/Chapter11.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Data/Chapter11.xlsx
--------------------------------------------------------------------------------
/Data/Chapter12.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Data/Chapter12.xlsx
--------------------------------------------------------------------------------
/Data/fillna.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Data/fillna.xlsx
--------------------------------------------------------------------------------
/Data/loan.csv:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Data/loan.csv
--------------------------------------------------------------------------------
/Data/order-14.1.csv:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Data/order-14.1.csv
--------------------------------------------------------------------------------
/Data/order-14.3.csv:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Data/order-14.3.csv
--------------------------------------------------------------------------------
/Data/train-pivot.csv:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Data/train-pivot.csv
--------------------------------------------------------------------------------
/Data/数据集使用说明.txt:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Data/数据集使用说明.txt
--------------------------------------------------------------------------------
/Note/Git Fork开源项目如何同步更新.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Note/Git Fork开源项目如何同步更新.pdf
--------------------------------------------------------------------------------
/Note/Markdown常用标签.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Note/Markdown常用标签.pdf
--------------------------------------------------------------------------------
/Note/jupyter notebook导出pdf并支持中文.md:
--------------------------------------------------------------------------------
1 | **Jupyter Notebook**是很好的数据科学创作环境,反正我做数据分析的项目或小练习的时候,基本都是在用jupyter notebook(原先是叫ipython notebook,所以现在文件后缀还是.ipynb),以前不怎么用到导出pdf功能,然后要用的时候就遇到很多坑了。jupyter提供导出的格式有.py、.html、.md、.pdf等。
2 |
3 | 
4 |
5 | 从效果来看,网页中notebook的渲染是最好看的,导出的html对代码和超链接失真严重。在网页上点*Download as -> PDF via LaTex*的时候先是说缺少Pandoc库,于是pip install pandoc,之后不再说缺少这个库了,而是
6 | nbconvert failed: pdflatex not found on PATH 或者 nbconvert failed: PDF creating failed, captured latex output。查了一些资料后改用命令行,要避免*'xelatex' 不是内部或外部命令,也不是可运行的程序或批处理文件*,需要先安装MiKTeX,在其[官网下载](https://miktex.org/download)后,Windows版一路next安装就行,安装包有190MB,安装过程还是耗费些时间的,下载安装完成之后的步骤是:
7 |
8 | ### 1, ipynb文件编译为tex
9 | 在命令行中定位到要转换的jupyter文件的路径下,输入
10 | **jupyter nbconvert --to latex yourNotebookName.ipynb**
11 |
12 | 
13 | 在文件目录下就可以看到一个叫**yourNotebookName.tex**的LaTeX文件了。
14 | ### 2, 手动编辑latex文件
15 | 为了能支持输出中文,需要改一下tex文件,在编辑器(我用的是Notepad++)打开刚才生成的LaTeX文件,
16 | 在**\documentclass{article}**(没有这一句就在\documentclass[11pt]{ctexart} 的后面插入下面的语句)后面插入
17 | ```latex
18 | \usepackage{fontspec, xunicode, xltxtra}
19 | \setmainfont{Microsoft YaHei}
20 | \usepackage{ctex}
21 | ```
22 | 
23 |
24 | ### 3, 转latex为pdf
25 | 随后在命令行下输入:(我演示文件用的是GeoCluster.tex)
26 | ```
27 | xelatex yourNotebookName.tex
28 | ```
29 | 
30 | 之前没有运行过xelatex,首次运行会安装一些依赖文件,会慢一些,最后运行完毕:
31 | 
32 | 可以在文件夹下看到输出的文件:
33 | 
34 | - .ipynb 是我们的jupyter文件
35 | - .tex 是由jupyter notebook文件生成的
36 | - .pdf 是我们最后的目标文件由.tex文件生成
37 | - .log、.out、.aux是LaTex生成pdf的一些输出和日志
38 |
39 | 总结一下,从jupyter notebook生成pdf文件需要的依赖项还是比较多的,Windows下安装MiKTeX才能用xelatex命令。生成步骤是先把ipynb文件编译为LaTex,然后为了支持中文修改一下lex文件,最后转换为pdf文件。
40 |
41 | 最后效果如下,虽然还是比不上网页端.ipynb的直接渲染效果,但比起导出的html等格式,更好地作为展示格式。
42 | 
43 |
44 | ps:
45 | - 现在觉得下载安装部分说得有些简略,之后可以把这部分说得更详细;
46 | - 原文[简书链接](https://www.jianshu.com/p/6b84a9631f8a)
47 | - [MiKTeX 中文支持的解决方案](https://jingyan.baidu.com/article/ff411625e229d512e482379c.html)
48 | - [ipython notebook导出含有中文的pdf文件](https://blog.csdn.net/weixin_42114013/article/details/81106797)
--------------------------------------------------------------------------------
/Note/pandas填充缺失值fillna()函数.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## **pandas填充缺失值fillna()函数** \n",
8 | "缺失值的填充在平时做数据处理的时候非常常见,fillna()函数常用的参数有8个: \n",
9 | "- 用常数填充\n",
10 | "- 用字典填充\n",
11 | "- 用计算公式填充\n",
12 | "- 使用具体某一列填充\n",
13 | "- 缺失值等于前面/后面一个值\n",
14 | "- 限定填充个数\n",
15 | "- 填充分享设定\n",
16 | "- 更改数据源"
17 | ]
18 | },
19 | {
20 | "cell_type": "code",
21 | "execution_count": 12,
22 | "metadata": {},
23 | "outputs": [
24 | {
25 | "data": {
26 | "text/html": [
27 | "\n",
28 | "\n",
41 | "
\n",
42 | " \n",
43 | " \n",
44 | " | \n",
45 | " 名次 | \n",
46 | " 姓名 | \n",
47 | " 语文 | \n",
48 | " 数学 | \n",
49 | " 外语 | \n",
50 | "
\n",
51 | " \n",
52 | " \n",
53 | " \n",
54 | " 0 | \n",
55 | " 1 | \n",
56 | " 郭靖 | \n",
57 | " 90.0 | \n",
58 | " 80.0 | \n",
59 | " 76 | \n",
60 | "
\n",
61 | " \n",
62 | " 1 | \n",
63 | " 2 | \n",
64 | " 黄蓉 | \n",
65 | " 100.0 | \n",
66 | " 100.0 | \n",
67 | " 98 | \n",
68 | "
\n",
69 | " \n",
70 | " 2 | \n",
71 | " 3 | \n",
72 | " 黄药师 | \n",
73 | " NaN | \n",
74 | " 98.0 | \n",
75 | " 100 | \n",
76 | "
\n",
77 | " \n",
78 | " 3 | \n",
79 | " 4 | \n",
80 | " 欧阳锋 | \n",
81 | " NaN | \n",
82 | " 95.0 | \n",
83 | " 85 | \n",
84 | "
\n",
85 | " \n",
86 | " 4 | \n",
87 | " 5 | \n",
88 | " 洪七公 | \n",
89 | " 98.0 | \n",
90 | " NaN | \n",
91 | " 96 | \n",
92 | "
\n",
93 | " \n",
94 | " 5 | \n",
95 | " 5 | \n",
96 | " 周伯通 | \n",
97 | " 88.0 | \n",
98 | " 91.0 | \n",
99 | " 88 | \n",
100 | "
\n",
101 | " \n",
102 | "
\n",
103 | "
"
104 | ],
105 | "text/plain": [
106 | " 名次 姓名 语文 数学 外语\n",
107 | "0 1 郭靖 90.0 80.0 76\n",
108 | "1 2 黄蓉 100.0 100.0 98\n",
109 | "2 3 黄药师 NaN 98.0 100\n",
110 | "3 4 欧阳锋 NaN 95.0 85\n",
111 | "4 5 洪七公 98.0 NaN 96\n",
112 | "5 5 周伯通 88.0 91.0 88"
113 | ]
114 | },
115 | "execution_count": 12,
116 | "metadata": {},
117 | "output_type": "execute_result"
118 | }
119 | ],
120 | "source": [
121 | "import pandas as pd\n",
122 | "df = pd.read_excel(\"../Data/fillna.xlsx\")\n",
123 | "df"
124 | ]
125 | },
126 | {
127 | "cell_type": "markdown",
128 | "metadata": {},
129 | "source": [
130 | "### 用常数填充"
131 | ]
132 | },
133 | {
134 | "cell_type": "code",
135 | "execution_count": 13,
136 | "metadata": {},
137 | "outputs": [
138 | {
139 | "data": {
140 | "text/html": [
141 | "\n",
142 | "\n",
155 | "
\n",
156 | " \n",
157 | " \n",
158 | " | \n",
159 | " 名次 | \n",
160 | " 姓名 | \n",
161 | " 语文 | \n",
162 | " 数学 | \n",
163 | " 外语 | \n",
164 | "
\n",
165 | " \n",
166 | " \n",
167 | " \n",
168 | " 0 | \n",
169 | " 1 | \n",
170 | " 郭靖 | \n",
171 | " 90.0 | \n",
172 | " 80.0 | \n",
173 | " 76 | \n",
174 | "
\n",
175 | " \n",
176 | " 1 | \n",
177 | " 2 | \n",
178 | " 黄蓉 | \n",
179 | " 100.0 | \n",
180 | " 100.0 | \n",
181 | " 98 | \n",
182 | "
\n",
183 | " \n",
184 | " 2 | \n",
185 | " 3 | \n",
186 | " 黄药师 | \n",
187 | " 0.0 | \n",
188 | " 98.0 | \n",
189 | " 100 | \n",
190 | "
\n",
191 | " \n",
192 | " 3 | \n",
193 | " 4 | \n",
194 | " 欧阳锋 | \n",
195 | " 0.0 | \n",
196 | " 95.0 | \n",
197 | " 85 | \n",
198 | "
\n",
199 | " \n",
200 | " 4 | \n",
201 | " 5 | \n",
202 | " 洪七公 | \n",
203 | " 98.0 | \n",
204 | " 0.0 | \n",
205 | " 96 | \n",
206 | "
\n",
207 | " \n",
208 | " 5 | \n",
209 | " 5 | \n",
210 | " 周伯通 | \n",
211 | " 88.0 | \n",
212 | " 91.0 | \n",
213 | " 88 | \n",
214 | "
\n",
215 | " \n",
216 | "
\n",
217 | "
"
218 | ],
219 | "text/plain": [
220 | " 名次 姓名 语文 数学 外语\n",
221 | "0 1 郭靖 90.0 80.0 76\n",
222 | "1 2 黄蓉 100.0 100.0 98\n",
223 | "2 3 黄药师 0.0 98.0 100\n",
224 | "3 4 欧阳锋 0.0 95.0 85\n",
225 | "4 5 洪七公 98.0 0.0 96\n",
226 | "5 5 周伯通 88.0 91.0 88"
227 | ]
228 | },
229 | "execution_count": 13,
230 | "metadata": {},
231 | "output_type": "execute_result"
232 | }
233 | ],
234 | "source": [
235 | "df.fillna(0)"
236 | ]
237 | },
238 | {
239 | "cell_type": "markdown",
240 | "metadata": {},
241 | "source": [
242 | "### 用字典填充"
243 | ]
244 | },
245 | {
246 | "cell_type": "code",
247 | "execution_count": 23,
248 | "metadata": {},
249 | "outputs": [
250 | {
251 | "data": {
252 | "text/html": [
253 | "\n",
254 | "\n",
267 | "
\n",
268 | " \n",
269 | " \n",
270 | " | \n",
271 | " 名次 | \n",
272 | " 姓名 | \n",
273 | " 语文 | \n",
274 | " 数学 | \n",
275 | " 外语 | \n",
276 | "
\n",
277 | " \n",
278 | " \n",
279 | " \n",
280 | " 0 | \n",
281 | " 1 | \n",
282 | " 郭靖 | \n",
283 | " 90.0 | \n",
284 | " 80.0 | \n",
285 | " 76 | \n",
286 | "
\n",
287 | " \n",
288 | " 1 | \n",
289 | " 2 | \n",
290 | " 黄蓉 | \n",
291 | " 100.0 | \n",
292 | " 100.0 | \n",
293 | " 98 | \n",
294 | "
\n",
295 | " \n",
296 | " 2 | \n",
297 | " 3 | \n",
298 | " 黄药师 | \n",
299 | " 80.0 | \n",
300 | " 98.0 | \n",
301 | " 100 | \n",
302 | "
\n",
303 | " \n",
304 | " 3 | \n",
305 | " 4 | \n",
306 | " 欧阳锋 | \n",
307 | " 80.0 | \n",
308 | " 95.0 | \n",
309 | " 85 | \n",
310 | "
\n",
311 | " \n",
312 | " 4 | \n",
313 | " 5 | \n",
314 | " 洪七公 | \n",
315 | " 98.0 | \n",
316 | " 90.0 | \n",
317 | " 96 | \n",
318 | "
\n",
319 | " \n",
320 | " 5 | \n",
321 | " 5 | \n",
322 | " 周伯通 | \n",
323 | " 88.0 | \n",
324 | " 91.0 | \n",
325 | " 88 | \n",
326 | "
\n",
327 | " \n",
328 | "
\n",
329 | "
"
330 | ],
331 | "text/plain": [
332 | " 名次 姓名 语文 数学 外语\n",
333 | "0 1 郭靖 90.0 80.0 76\n",
334 | "1 2 黄蓉 100.0 100.0 98\n",
335 | "2 3 黄药师 80.0 98.0 100\n",
336 | "3 4 欧阳锋 80.0 95.0 85\n",
337 | "4 5 洪七公 98.0 90.0 96\n",
338 | "5 5 周伯通 88.0 91.0 88"
339 | ]
340 | },
341 | "execution_count": 23,
342 | "metadata": {},
343 | "output_type": "execute_result"
344 | }
345 | ],
346 | "source": [
347 | "df.fillna({\"语文\":80,\"数学\":90})"
348 | ]
349 | },
350 | {
351 | "cell_type": "markdown",
352 | "metadata": {},
353 | "source": [
354 | "### 用计算公式填充"
355 | ]
356 | },
357 | {
358 | "cell_type": "code",
359 | "execution_count": 24,
360 | "metadata": {},
361 | "outputs": [
362 | {
363 | "data": {
364 | "text/html": [
365 | "\n",
366 | "\n",
379 | "
\n",
380 | " \n",
381 | " \n",
382 | " | \n",
383 | " 名次 | \n",
384 | " 姓名 | \n",
385 | " 语文 | \n",
386 | " 数学 | \n",
387 | " 外语 | \n",
388 | "
\n",
389 | " \n",
390 | " \n",
391 | " \n",
392 | " 0 | \n",
393 | " 1 | \n",
394 | " 郭靖 | \n",
395 | " 90.0 | \n",
396 | " 80.0 | \n",
397 | " 76 | \n",
398 | "
\n",
399 | " \n",
400 | " 1 | \n",
401 | " 2 | \n",
402 | " 黄蓉 | \n",
403 | " 100.0 | \n",
404 | " 100.0 | \n",
405 | " 98 | \n",
406 | "
\n",
407 | " \n",
408 | " 2 | \n",
409 | " 3 | \n",
410 | " 黄药师 | \n",
411 | " 94.0 | \n",
412 | " 98.0 | \n",
413 | " 100 | \n",
414 | "
\n",
415 | " \n",
416 | " 3 | \n",
417 | " 4 | \n",
418 | " 欧阳锋 | \n",
419 | " 94.0 | \n",
420 | " 95.0 | \n",
421 | " 85 | \n",
422 | "
\n",
423 | " \n",
424 | " 4 | \n",
425 | " 5 | \n",
426 | " 洪七公 | \n",
427 | " 98.0 | \n",
428 | " 92.8 | \n",
429 | " 96 | \n",
430 | "
\n",
431 | " \n",
432 | " 5 | \n",
433 | " 5 | \n",
434 | " 周伯通 | \n",
435 | " 88.0 | \n",
436 | " 91.0 | \n",
437 | " 88 | \n",
438 | "
\n",
439 | " \n",
440 | "
\n",
441 | "
"
442 | ],
443 | "text/plain": [
444 | " 名次 姓名 语文 数学 外语\n",
445 | "0 1 郭靖 90.0 80.0 76\n",
446 | "1 2 黄蓉 100.0 100.0 98\n",
447 | "2 3 黄药师 94.0 98.0 100\n",
448 | "3 4 欧阳锋 94.0 95.0 85\n",
449 | "4 5 洪七公 98.0 92.8 96\n",
450 | "5 5 周伯通 88.0 91.0 88"
451 | ]
452 | },
453 | "execution_count": 24,
454 | "metadata": {},
455 | "output_type": "execute_result"
456 | }
457 | ],
458 | "source": [
459 | "df.fillna(df.mean())"
460 | ]
461 | },
462 | {
463 | "cell_type": "code",
464 | "execution_count": 25,
465 | "metadata": {},
466 | "outputs": [
467 | {
468 | "data": {
469 | "text/html": [
470 | "\n",
471 | "\n",
484 | "
\n",
485 | " \n",
486 | " \n",
487 | " | \n",
488 | " 名次 | \n",
489 | " 姓名 | \n",
490 | " 语文 | \n",
491 | " 数学 | \n",
492 | " 外语 | \n",
493 | "
\n",
494 | " \n",
495 | " \n",
496 | " \n",
497 | " 0 | \n",
498 | " 1 | \n",
499 | " 郭靖 | \n",
500 | " 90.0 | \n",
501 | " 80.0 | \n",
502 | " 76 | \n",
503 | "
\n",
504 | " \n",
505 | " 1 | \n",
506 | " 2 | \n",
507 | " 黄蓉 | \n",
508 | " 100.0 | \n",
509 | " 100.0 | \n",
510 | " 98 | \n",
511 | "
\n",
512 | " \n",
513 | " 2 | \n",
514 | " 3 | \n",
515 | " 黄药师 | \n",
516 | " 376.0 | \n",
517 | " 98.0 | \n",
518 | " 100 | \n",
519 | "
\n",
520 | " \n",
521 | " 3 | \n",
522 | " 4 | \n",
523 | " 欧阳锋 | \n",
524 | " 376.0 | \n",
525 | " 95.0 | \n",
526 | " 85 | \n",
527 | "
\n",
528 | " \n",
529 | " 4 | \n",
530 | " 5 | \n",
531 | " 洪七公 | \n",
532 | " 98.0 | \n",
533 | " 464.0 | \n",
534 | " 96 | \n",
535 | "
\n",
536 | " \n",
537 | " 5 | \n",
538 | " 5 | \n",
539 | " 周伯通 | \n",
540 | " 88.0 | \n",
541 | " 91.0 | \n",
542 | " 88 | \n",
543 | "
\n",
544 | " \n",
545 | "
\n",
546 | "
"
547 | ],
548 | "text/plain": [
549 | " 名次 姓名 语文 数学 外语\n",
550 | "0 1 郭靖 90.0 80.0 76\n",
551 | "1 2 黄蓉 100.0 100.0 98\n",
552 | "2 3 黄药师 376.0 98.0 100\n",
553 | "3 4 欧阳锋 376.0 95.0 85\n",
554 | "4 5 洪七公 98.0 464.0 96\n",
555 | "5 5 周伯通 88.0 91.0 88"
556 | ]
557 | },
558 | "execution_count": 25,
559 | "metadata": {},
560 | "output_type": "execute_result"
561 | }
562 | ],
563 | "source": [
564 | "df.fillna(df.sum())"
565 | ]
566 | },
567 | {
568 | "cell_type": "markdown",
569 | "metadata": {},
570 | "source": [
571 | "### 使用具体某一列填充"
572 | ]
573 | },
574 | {
575 | "cell_type": "code",
576 | "execution_count": 17,
577 | "metadata": {},
578 | "outputs": [
579 | {
580 | "data": {
581 | "text/html": [
582 | "\n",
583 | "\n",
596 | "
\n",
597 | " \n",
598 | " \n",
599 | " | \n",
600 | " 名次 | \n",
601 | " 姓名 | \n",
602 | " 语文 | \n",
603 | " 数学 | \n",
604 | " 外语 | \n",
605 | "
\n",
606 | " \n",
607 | " \n",
608 | " \n",
609 | " 0 | \n",
610 | " 1 | \n",
611 | " 郭靖 | \n",
612 | " 90.0 | \n",
613 | " 80.0 | \n",
614 | " 76 | \n",
615 | "
\n",
616 | " \n",
617 | " 1 | \n",
618 | " 2 | \n",
619 | " 黄蓉 | \n",
620 | " 100.0 | \n",
621 | " 100.0 | \n",
622 | " 98 | \n",
623 | "
\n",
624 | " \n",
625 | " 2 | \n",
626 | " 3 | \n",
627 | " 黄药师 | \n",
628 | " 90.5 | \n",
629 | " 98.0 | \n",
630 | " 100 | \n",
631 | "
\n",
632 | " \n",
633 | " 3 | \n",
634 | " 4 | \n",
635 | " 欧阳锋 | \n",
636 | " 90.5 | \n",
637 | " 95.0 | \n",
638 | " 85 | \n",
639 | "
\n",
640 | " \n",
641 | " 4 | \n",
642 | " 5 | \n",
643 | " 洪七公 | \n",
644 | " 98.0 | \n",
645 | " 90.5 | \n",
646 | " 96 | \n",
647 | "
\n",
648 | " \n",
649 | " 5 | \n",
650 | " 5 | \n",
651 | " 周伯通 | \n",
652 | " 88.0 | \n",
653 | " 91.0 | \n",
654 | " 88 | \n",
655 | "
\n",
656 | " \n",
657 | "
\n",
658 | "
"
659 | ],
660 | "text/plain": [
661 | " 名次 姓名 语文 数学 外语\n",
662 | "0 1 郭靖 90.0 80.0 76\n",
663 | "1 2 黄蓉 100.0 100.0 98\n",
664 | "2 3 黄药师 90.5 98.0 100\n",
665 | "3 4 欧阳锋 90.5 95.0 85\n",
666 | "4 5 洪七公 98.0 90.5 96\n",
667 | "5 5 周伯通 88.0 91.0 88"
668 | ]
669 | },
670 | "execution_count": 17,
671 | "metadata": {},
672 | "output_type": "execute_result"
673 | }
674 | ],
675 | "source": [
676 | "df.fillna(df.mean()['外语'])"
677 | ]
678 | },
679 | {
680 | "cell_type": "markdown",
681 | "metadata": {},
682 | "source": [
683 | "### 缺失值等于前面/后面一个值 \n",
684 | "通过指定参数method的值来设定: \n",
685 | "- mothod = \"ffill/pad\" 用前一个非缺失值去填充该缺失值\n",
686 | "- mothod = \"bflii/backfill\"用下一个非缺失值填充该缺失值"
687 | ]
688 | },
689 | {
690 | "cell_type": "code",
691 | "execution_count": 18,
692 | "metadata": {},
693 | "outputs": [
694 | {
695 | "data": {
696 | "text/html": [
697 | "\n",
698 | "\n",
711 | "
\n",
712 | " \n",
713 | " \n",
714 | " | \n",
715 | " 名次 | \n",
716 | " 姓名 | \n",
717 | " 语文 | \n",
718 | " 数学 | \n",
719 | " 外语 | \n",
720 | "
\n",
721 | " \n",
722 | " \n",
723 | " \n",
724 | " 0 | \n",
725 | " 1 | \n",
726 | " 郭靖 | \n",
727 | " 90.0 | \n",
728 | " 80.0 | \n",
729 | " 76 | \n",
730 | "
\n",
731 | " \n",
732 | " 1 | \n",
733 | " 2 | \n",
734 | " 黄蓉 | \n",
735 | " 100.0 | \n",
736 | " 100.0 | \n",
737 | " 98 | \n",
738 | "
\n",
739 | " \n",
740 | " 2 | \n",
741 | " 3 | \n",
742 | " 黄药师 | \n",
743 | " 100.0 | \n",
744 | " 98.0 | \n",
745 | " 100 | \n",
746 | "
\n",
747 | " \n",
748 | " 3 | \n",
749 | " 4 | \n",
750 | " 欧阳锋 | \n",
751 | " 100.0 | \n",
752 | " 95.0 | \n",
753 | " 85 | \n",
754 | "
\n",
755 | " \n",
756 | " 4 | \n",
757 | " 5 | \n",
758 | " 洪七公 | \n",
759 | " 98.0 | \n",
760 | " 95.0 | \n",
761 | " 96 | \n",
762 | "
\n",
763 | " \n",
764 | " 5 | \n",
765 | " 5 | \n",
766 | " 周伯通 | \n",
767 | " 88.0 | \n",
768 | " 91.0 | \n",
769 | " 88 | \n",
770 | "
\n",
771 | " \n",
772 | "
\n",
773 | "
"
774 | ],
775 | "text/plain": [
776 | " 名次 姓名 语文 数学 外语\n",
777 | "0 1 郭靖 90.0 80.0 76\n",
778 | "1 2 黄蓉 100.0 100.0 98\n",
779 | "2 3 黄药师 100.0 98.0 100\n",
780 | "3 4 欧阳锋 100.0 95.0 85\n",
781 | "4 5 洪七公 98.0 95.0 96\n",
782 | "5 5 周伯通 88.0 91.0 88"
783 | ]
784 | },
785 | "execution_count": 18,
786 | "metadata": {},
787 | "output_type": "execute_result"
788 | }
789 | ],
790 | "source": [
791 | "df.fillna(method=\"ffill\")"
792 | ]
793 | },
794 | {
795 | "cell_type": "markdown",
796 | "metadata": {},
797 | "source": [
798 | "### 限定填充个数"
799 | ]
800 | },
801 | {
802 | "cell_type": "code",
803 | "execution_count": 26,
804 | "metadata": {},
805 | "outputs": [
806 | {
807 | "data": {
808 | "text/html": [
809 | "\n",
810 | "\n",
823 | "
\n",
824 | " \n",
825 | " \n",
826 | " | \n",
827 | " 名次 | \n",
828 | " 姓名 | \n",
829 | " 语文 | \n",
830 | " 数学 | \n",
831 | " 外语 | \n",
832 | "
\n",
833 | " \n",
834 | " \n",
835 | " \n",
836 | " 0 | \n",
837 | " 1 | \n",
838 | " 郭靖 | \n",
839 | " 90.0 | \n",
840 | " 80.0 | \n",
841 | " 76 | \n",
842 | "
\n",
843 | " \n",
844 | " 1 | \n",
845 | " 2 | \n",
846 | " 黄蓉 | \n",
847 | " 100.0 | \n",
848 | " 100.0 | \n",
849 | " 98 | \n",
850 | "
\n",
851 | " \n",
852 | " 2 | \n",
853 | " 3 | \n",
854 | " 黄药师 | \n",
855 | " NaN | \n",
856 | " 98.0 | \n",
857 | " 100 | \n",
858 | "
\n",
859 | " \n",
860 | " 3 | \n",
861 | " 4 | \n",
862 | " 欧阳锋 | \n",
863 | " 98.0 | \n",
864 | " 95.0 | \n",
865 | " 85 | \n",
866 | "
\n",
867 | " \n",
868 | " 4 | \n",
869 | " 5 | \n",
870 | " 洪七公 | \n",
871 | " 98.0 | \n",
872 | " 91.0 | \n",
873 | " 96 | \n",
874 | "
\n",
875 | " \n",
876 | " 5 | \n",
877 | " 5 | \n",
878 | " 周伯通 | \n",
879 | " 88.0 | \n",
880 | " 91.0 | \n",
881 | " 88 | \n",
882 | "
\n",
883 | " \n",
884 | "
\n",
885 | "
"
886 | ],
887 | "text/plain": [
888 | " 名次 姓名 语文 数学 外语\n",
889 | "0 1 郭靖 90.0 80.0 76\n",
890 | "1 2 黄蓉 100.0 100.0 98\n",
891 | "2 3 黄药师 NaN 98.0 100\n",
892 | "3 4 欧阳锋 98.0 95.0 85\n",
893 | "4 5 洪七公 98.0 91.0 96\n",
894 | "5 5 周伯通 88.0 91.0 88"
895 | ]
896 | },
897 | "execution_count": 26,
898 | "metadata": {},
899 | "output_type": "execute_result"
900 | }
901 | ],
902 | "source": [
903 | "df.fillna(method='bfill', limit=1)"
904 | ]
905 | },
906 | {
907 | "cell_type": "markdown",
908 | "metadata": {},
909 | "source": [
910 | "### 使用左边或右边的填充指定axis参数"
911 | ]
912 | },
913 | {
914 | "cell_type": "code",
915 | "execution_count": 21,
916 | "metadata": {},
917 | "outputs": [
918 | {
919 | "data": {
920 | "text/html": [
921 | "\n",
922 | "\n",
935 | "
\n",
936 | " \n",
937 | " \n",
938 | " | \n",
939 | " 名次 | \n",
940 | " 姓名 | \n",
941 | " 语文 | \n",
942 | " 数学 | \n",
943 | " 外语 | \n",
944 | "
\n",
945 | " \n",
946 | " \n",
947 | " \n",
948 | " 0 | \n",
949 | " 1 | \n",
950 | " 郭靖 | \n",
951 | " 90 | \n",
952 | " 80 | \n",
953 | " 76 | \n",
954 | "
\n",
955 | " \n",
956 | " 1 | \n",
957 | " 2 | \n",
958 | " 黄蓉 | \n",
959 | " 100 | \n",
960 | " 100 | \n",
961 | " 98 | \n",
962 | "
\n",
963 | " \n",
964 | " 2 | \n",
965 | " 3 | \n",
966 | " 黄药师 | \n",
967 | " 98 | \n",
968 | " 98 | \n",
969 | " 100 | \n",
970 | "
\n",
971 | " \n",
972 | " 3 | \n",
973 | " 4 | \n",
974 | " 欧阳锋 | \n",
975 | " 95 | \n",
976 | " 95 | \n",
977 | " 85 | \n",
978 | "
\n",
979 | " \n",
980 | " 4 | \n",
981 | " 5 | \n",
982 | " 洪七公 | \n",
983 | " 98 | \n",
984 | " 96 | \n",
985 | " 96 | \n",
986 | "
\n",
987 | " \n",
988 | " 5 | \n",
989 | " 5 | \n",
990 | " 周伯通 | \n",
991 | " 88 | \n",
992 | " 91 | \n",
993 | " 88 | \n",
994 | "
\n",
995 | " \n",
996 | "
\n",
997 | "
"
998 | ],
999 | "text/plain": [
1000 | " 名次 姓名 语文 数学 外语\n",
1001 | "0 1 郭靖 90 80 76\n",
1002 | "1 2 黄蓉 100 100 98\n",
1003 | "2 3 黄药师 98 98 100\n",
1004 | "3 4 欧阳锋 95 95 85\n",
1005 | "4 5 洪七公 98 96 96\n",
1006 | "5 5 周伯通 88 91 88"
1007 | ]
1008 | },
1009 | "execution_count": 21,
1010 | "metadata": {},
1011 | "output_type": "execute_result"
1012 | }
1013 | ],
1014 | "source": [
1015 | "df.fillna(method='bfill', axis=1)"
1016 | ]
1017 | },
1018 | {
1019 | "cell_type": "markdown",
1020 | "metadata": {},
1021 | "source": [
1022 | "### 更改数据源添加参数inplace = True \n",
1023 | "以上的7个参数都是没有改变源数据的,如果要改变源数据的话需要添加参数inplace = True即可。"
1024 | ]
1025 | }
1026 | ],
1027 | "metadata": {
1028 | "kernelspec": {
1029 | "display_name": "Python 3",
1030 | "language": "python",
1031 | "name": "python3"
1032 | },
1033 | "language_info": {
1034 | "codemirror_mode": {
1035 | "name": "ipython",
1036 | "version": 3
1037 | },
1038 | "file_extension": ".py",
1039 | "mimetype": "text/x-python",
1040 | "name": "python",
1041 | "nbconvert_exporter": "python",
1042 | "pygments_lexer": "ipython3",
1043 | "version": "3.7.1"
1044 | },
1045 | "toc": {
1046 | "base_numbering": 1,
1047 | "nav_menu": {},
1048 | "number_sections": true,
1049 | "sideBar": true,
1050 | "skip_h1_title": false,
1051 | "title_cell": "Table of Contents",
1052 | "title_sidebar": "Contents",
1053 | "toc_cell": false,
1054 | "toc_position": {},
1055 | "toc_section_display": true,
1056 | "toc_window_display": false
1057 | }
1058 | },
1059 | "nbformat": 4,
1060 | "nbformat_minor": 2
1061 | }
1062 |
--------------------------------------------------------------------------------
/Note/如何给 github 的开源项目提交 pull request.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Note/如何给 github 的开源项目提交 pull request.pdf
--------------------------------------------------------------------------------
/Note/常见的Python代码报错及解决方案.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Note/常见的Python代码报错及解决方案.pdf
--------------------------------------------------------------------------------
/Other/01 Pyecharts渲染图表 .ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "**pyecharts 库的基本使用用法**"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "## 安装pyecharts \n",
15 | "pip install pyecharts "
16 | ]
17 | },
18 | {
19 | "cell_type": "markdown",
20 | "metadata": {},
21 | "source": [
22 | "## 开始使用"
23 | ]
24 | },
25 | {
26 | "cell_type": "code",
27 | "execution_count": 64,
28 | "metadata": {},
29 | "outputs": [],
30 | "source": [
31 | "from pyecharts import Bar\n",
32 | "from pyecharts import Bar\n",
33 | "\n",
34 | "df = pd.read_excel(r\"./Pyecharts.xlsx\")\n",
35 | "brands = df['品牌'].values\n",
36 | "solds = df['已售'].values\n",
37 | "bar = Bar(\"汽车各品牌销量\", \"这里是测试数据\")\n",
38 | "bar.add(\"销量\", brands, solds)\n",
39 | "# bar.print_echarts_options() # 该行只为了打印配置项,方便调试时使用\n",
40 | "bar.render(\"./html/start.html\") # 生成本地 HTML 文件"
41 | ]
42 | },
43 | {
44 | "cell_type": "markdown",
45 | "metadata": {},
46 | "source": [
47 | ""
48 | ]
49 | },
50 | {
51 | "cell_type": "markdown",
52 | "metadata": {},
53 | "source": [
54 | "- add():主要方法,用于添加图表的数据和设置各种配置项\n",
55 | "- print_echarts_options():打印输出图表的所有配置项\n",
56 | "- render():默认将会在根目录下生成一个 render.html 的文件,支持 path 参数,设置文件保存位置,如 render(r\"e:\\my_first_chart.html\"),文件用浏览器打开。 \n",
57 | "**Note:**可以按右边的下载按钮将图片下载到本地,如果想要提供更多实用工具按钮,请在 add() 中设置 is_more_utils 为 True"
58 | ]
59 | },
60 | {
61 | "cell_type": "markdown",
62 | "metadata": {},
63 | "source": [
64 | "## 使用主题"
65 | ]
66 | },
67 | {
68 | "cell_type": "code",
69 | "execution_count": 31,
70 | "metadata": {},
71 | "outputs": [],
72 | "source": [
73 | "from pyecharts import Bar\n",
74 | "from pyecharts import Bar\n",
75 | "\n",
76 | "df = pd.read_excel(r\"./Pyecharts.xlsx\")\n",
77 | "brands = df['品牌'].values\n",
78 | "solds = df['已售'].values\n",
79 | "bar = Bar(\"汽车各品牌销量\", \"这里是测试数据\")\n",
80 | "bar.use_theme('dark')\n",
81 | "bar.add(\"销量\", brands, solds)\n",
82 | "bar.render(\"./html/dark.html\")"
83 | ]
84 | },
85 | {
86 | "cell_type": "markdown",
87 | "metadata": {},
88 | "source": [
89 | ""
90 | ]
91 | },
92 | {
93 | "cell_type": "markdown",
94 | "metadata": {},
95 | "source": [
96 | "## 使用 pyecharts-snapshot 插件 \n",
97 | "如果想直接将图片保存为 png, pdf, gif 格式的文件,可以使用 pyecharts-snapshot。使用该插件请确保你的系统上已经安装了 Nodejs 环境。 \n",
98 | "- 安装 phantomjs \\$ npm install -g phantomjs-prebuilt
\n",
99 | "- 安装 pyecharts-snapshot $ pip install pyecharts-snapshot \n",
100 | "- 调用 render 方法 bar.render(path='snapshot.png') 文件结尾可以为 svg/jpeg/png/pdf/gif。请注意,svg 文件需要你在初始化 bar 的时候设置 renderer='svg'。\n"
101 | ]
102 | },
103 | {
104 | "cell_type": "markdown",
105 | "metadata": {},
106 | "source": [
107 | "## 图形绘制过程\n",
108 | "- 实例一个具体类型图表的对象 chart = FooChart()\n",
109 | "- 为图表添加通用的配置,如主题 chart.use_theme()\n",
110 | "- 为图表添加特定的配置 geo.add_coordinate()\n",
111 | "- 添加数据及配置项 chart.add()\n",
112 | "- 生成本地文件(html/svg/jpeg/png/pdf/gif) chart.render()"
113 | ]
114 | },
115 | {
116 | "cell_type": "markdown",
117 | "metadata": {},
118 | "source": [
119 | "## 基本图表"
120 | ]
121 | },
122 | {
123 | "cell_type": "markdown",
124 | "metadata": {},
125 | "source": [
126 | "### Bar(柱状图/条形图)\n",
127 | ">柱状/条形图,通过柱形的高度/条形的宽度来表现数据的大小。 \n",
128 | "\n",
129 | "Bar.add() 方法签名 \n",
130 | "```python\n",
131 | "add(name, x_axis, y_axis,\n",
132 | " is_stack=False,\n",
133 | " bar_category_gap='20%', **kwargs)\n",
134 | "``` \n",
135 | "- name -> str \n",
136 | "图例名称\n",
137 | "- attr -> list \n",
138 | "属性名称\n",
139 | "- value -> list \n",
140 | "属性所对应的值\n",
141 | "- shape -> list \n",
142 | "词云图轮廓,有'circle', 'cardioid', 'diamond', 'triangle-forward', 'triangle', 'pentagon', 'star'可选\n",
143 | "- word_gap -> int \n",
144 | "单词间隔,默认为 20。\n",
145 | "- word_size_range -> list \n",
146 | "单词字体大小范围,默认为 [12, 60]。\n",
147 | "- rotate_step -> int \n",
148 | "旋转单词角度,默认为 45"
149 | ]
150 | },
151 | {
152 | "cell_type": "code",
153 | "execution_count": 26,
154 | "metadata": {},
155 | "outputs": [],
156 | "source": [
157 | "import pandas as pd\n",
158 | "from pyecharts import Bar\n",
159 | "\n",
160 | "df = pd.read_excel(r\"./Pyecharts.xlsx\")\n",
161 | "\n",
162 | "brands = df['品牌'].values\n",
163 | "solds = df['已售'].values\n",
164 | "schedules = df['已预订'].values\n",
165 | "bar = Bar(\"汽车各品牌销量\")\n",
166 | "bar.add(\"已售\", brands, sold, is_stack=True)\n",
167 | "bar.add(\"已预订\", brands, schedules, is_stack=True)\n",
168 | "bar.render(\"./html/bar01.html\")"
169 | ]
170 | },
171 | {
172 | "cell_type": "markdown",
173 | "metadata": {},
174 | "source": [
175 | ""
176 | ]
177 | },
178 | {
179 | "cell_type": "markdown",
180 | "metadata": {},
181 | "source": [
182 | "### Pie(饼图)\n",
183 | ">饼图主要用于表现不同类目的数据在总和中的占比。每个的弧度表示数据数量的比例。 \n",
184 | "\n",
185 | "Pie.add() 方法签名\n",
186 | "```python\n",
187 | "add(name, attr, value,\n",
188 | " radius=None,\n",
189 | " center=None,\n",
190 | " rosetype=None, **kwargs)\n",
191 | "``` \n",
192 | "- name -> str \n",
193 | "图例名称\n",
194 | "- attr -> list \n",
195 | "属性名称\n",
196 | "- value -> list \n",
197 | "属性所对应的值\n",
198 | "- radius -> list \n",
199 | "饼图的半径,数组的第一项是内半径,第二项是外半径,默认为 [0, 75] \n",
200 | "默认设置成百分比,相对于容器高宽中较小的一项的一半\n",
201 | "- center -> list \n",
202 | "饼图的中心(圆心)坐标,数组的第一项是横坐标,第二项是纵坐标,默认为 [50, 50] \n",
203 | "默认设置成百分比,设置成百分比时第一项是相对于容器宽度,第二项是相对于容器高度\n",
204 | "- rosetype -> str \n",
205 | "是否展示成南丁格尔图,通过半径区分数据大小,有'radius'和'area'两种模式。默认为'radius' \n",
206 | "radius:扇区圆心角展现数据的百分比,半径展现数据的大小 \n",
207 | "area:所有扇区圆心角相同,仅通过半径展现数据大小"
208 | ]
209 | },
210 | {
211 | "cell_type": "code",
212 | "execution_count": 54,
213 | "metadata": {},
214 | "outputs": [],
215 | "source": [
216 | "from pyecharts import Pie\n",
217 | "import pandas as pd\n",
218 | "df = pd.read_excel(r\"./Pyecharts.xlsx\")\n",
219 | "brands = df[\"品牌\"].values\n",
220 | "Sales = df[\"总计\"].values\n",
221 | "pie = Pie(\"汽车各品牌销量\")\n",
222 | "pie.add(\"\", brands , Sales, is_label_show=True)\n",
223 | "pie.render(\"./html/pie.html\")"
224 | ]
225 | },
226 | {
227 | "cell_type": "markdown",
228 | "metadata": {},
229 | "source": [
230 | ""
231 | ]
232 | },
233 | {
234 | "cell_type": "markdown",
235 | "metadata": {},
236 | "source": [
237 | "### WordCloud(词云图) \n",
238 | "WordCloud.add() 方法签名 \n",
239 | "```python\n",
240 | "add(name, attr, value,\n",
241 | " shape=\"circle\",\n",
242 | " word_gap=20,\n",
243 | " word_size_range=None,\n",
244 | " rotate_step=45)\n",
245 | "```\n",
246 | "- name -> str \n",
247 | "图例名称\n",
248 | "- attr -> list \n",
249 | "属性名称\n",
250 | "- value -> list \n",
251 | "属性所对应的值\n",
252 | "- shape -> list \n",
253 | "词云图轮廓,有'circle', 'cardioid', 'diamond', 'triangle-forward', 'triangle', 'pentagon', 'star'可选\n",
254 | "- word_gap -> int \n",
255 | "单词间隔,默认为 20。\n",
256 | "- word_size_range -> list \n",
257 | "单词字体大小范围,默认为 [12, 60]。\n",
258 | "- rotate_step -> int \n",
259 | "旋转单词角度,默认为 45"
260 | ]
261 | },
262 | {
263 | "cell_type": "code",
264 | "execution_count": 52,
265 | "metadata": {},
266 | "outputs": [],
267 | "source": [
268 | "from pyecharts import WordCloud\n",
269 | "import pandas as pd\n",
270 | "df = pd.read_excel(r\"./Pyecharts.xlsx\",sheet_name=1)\n",
271 | "brands = df[\"品牌\"].values\n",
272 | "sales = df[\"总计\"].values\n",
273 | "wordcloud = WordCloud(width=1300, height=620)\n",
274 | "wordcloud.add(\"\", brands, sales, word_size_range=[20, 100])\n",
275 | "wordcloud.render(\"./html/WordCloud.html\")"
276 | ]
277 | },
278 | {
279 | "cell_type": "markdown",
280 | "metadata": {},
281 | "source": [
282 | ""
283 | ]
284 | },
285 | {
286 | "cell_type": "markdown",
287 | "metadata": {},
288 | "source": [
289 | "### Gauge(仪表盘) \n",
290 | "Gauge.add() 方法签名 \n",
291 | "```python\n",
292 | "add(name, attr, value,\n",
293 | " scale_range=None,\n",
294 | " angle_range=None, **kwargs)\n",
295 | "```\n",
296 | "- name -> str \n",
297 | "图例名称\n",
298 | "- attr -> list \n",
299 | "属性名称\n",
300 | "- value -> list \n",
301 | "属性所对应的值 \n",
302 | "- scale_range -> list \n",
303 | "仪表盘数据范围。默认为 [0, 100]\n",
304 | "- angle_range -> list \n",
305 | "仪表盘角度范围。默认为 [225, -45]"
306 | ]
307 | },
308 | {
309 | "cell_type": "code",
310 | "execution_count": 62,
311 | "metadata": {},
312 | "outputs": [],
313 | "source": [
314 | "from pyecharts import Gauge\n",
315 | "\n",
316 | "gauge = Gauge(\"仪表盘示例\")\n",
317 | "gauge.add(\"业务指标\", \"完成率\", 66.66)\n",
318 | "gauge.render(\"./html/Gauge01.html\")"
319 | ]
320 | },
321 | {
322 | "cell_type": "markdown",
323 | "metadata": {},
324 | "source": [
325 | ""
326 | ]
327 | },
328 | {
329 | "cell_type": "code",
330 | "execution_count": 56,
331 | "metadata": {},
332 | "outputs": [],
333 | "source": [
334 | "gauge = Gauge(\"仪表盘示例\")\n",
335 | "gauge.add(\n",
336 | " \"业务指标\",\n",
337 | " \"完成率\",\n",
338 | " 166.66,\n",
339 | " angle_range=[180, 0],\n",
340 | " scale_range=[0, 200],\n",
341 | " is_legend_show=False,\n",
342 | ")\n",
343 | "gauge.render(\"./html/Gauge02.html\")"
344 | ]
345 | },
346 | {
347 | "cell_type": "markdown",
348 | "metadata": {},
349 | "source": [
350 | ""
351 | ]
352 | }
353 | ],
354 | "metadata": {
355 | "kernelspec": {
356 | "display_name": "Python 3",
357 | "language": "python",
358 | "name": "python3"
359 | },
360 | "language_info": {
361 | "codemirror_mode": {
362 | "name": "ipython",
363 | "version": 3
364 | },
365 | "file_extension": ".py",
366 | "mimetype": "text/x-python",
367 | "name": "python",
368 | "nbconvert_exporter": "python",
369 | "pygments_lexer": "ipython3",
370 | "version": "3.7.1"
371 | },
372 | "toc": {
373 | "base_numbering": 1,
374 | "nav_menu": {},
375 | "number_sections": true,
376 | "sideBar": true,
377 | "skip_h1_title": false,
378 | "title_cell": "Table of Contents",
379 | "title_sidebar": "Contents",
380 | "toc_cell": false,
381 | "toc_position": {},
382 | "toc_section_display": true,
383 | "toc_window_display": false
384 | }
385 | },
386 | "nbformat": 4,
387 | "nbformat_minor": 2
388 | }
389 |
--------------------------------------------------------------------------------
/Other/Pyecharts.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Other/Pyecharts.xlsx
--------------------------------------------------------------------------------
/Other/html/images/Gauge01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Other/html/images/Gauge01.png
--------------------------------------------------------------------------------
/Other/html/images/Gauge02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Other/html/images/Gauge02.png
--------------------------------------------------------------------------------
/Other/html/images/WordCloud.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Other/html/images/WordCloud.png
--------------------------------------------------------------------------------
/Other/html/images/bar.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Other/html/images/bar.png
--------------------------------------------------------------------------------
/Other/html/images/dark.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Other/html/images/dark.png
--------------------------------------------------------------------------------
/Other/html/images/pie.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Other/html/images/pie.png
--------------------------------------------------------------------------------
/Other/html/images/start.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xmaniu/Excel-Python/26408c8a29d6eafb0bb83ac7532e6fd58140af00/Other/html/images/start.png
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | 本仓库为『对比Excel,轻松学习Python数据分析』书本的读书笔记
2 |
3 | [书本详细介绍](https://github.com/junhongzhang/Excel-Python-DA/blob/master/%E6%9C%AC%E4%B9%A6%E8%AF%A6%E7%BB%86%E4%BB%8B%E7%BB%8D.md)
4 |
5 | [本书的勘误表](https://github.com/junhongzhang/Excel-Python-DA/blob/master/%E5%8B%98%E8%AF%AF%E8%A1%A8.md)
6 |
7 | **说明**
8 | - Code文件夹存放的是知识点整理及书本的案例代码
9 | - Data文件夹存放的是书本代码案例用的基础数据
10 | - Note文件夹是我写的分享文章及其他同学分享的文章
11 | - 个人微信:net3330 欢迎一起学习交流
12 |
13 |
--------------------------------------------------------------------------------