我的网页

├── README.md ├── Task01-文件处理与邮件自动化 ├── Task01 文件处理与邮件自动化.ipynb ├── Task01 文件自动化与邮件处理.md └── png │ ├── 1.png │ ├── 2.png │ ├── 3.png │ ├── 4.png │ ├── 5.png │ ├── 6.png │ ├── 7.png │ ├── 8.png │ ├── 9.png │ └── os.png ├── Task02-Python与Excel ├── OpenPyXL_test │ ├── creat_sheet_test.xlsx │ ├── new_test.xlsx │ ├── test.xlsx │ ├── 业务经理信息表_pd.xlsx │ ├── 业务经理信息表_xl.xlsx │ ├── 业务联系表.xlsx │ ├── 业务联系表_pd.xlsx │ ├── 客户信息表_pd.xlsx │ ├── 客户信息表_xl.xlsx │ ├── 用户行为偏好.xlsx │ └── 用户行为偏好_1.xlsx ├── Python_Excel_OpenPyXL.ipynb ├── Python_Excel_OpenPyXL.md ├── Python_Excel_OpenPyXL.pdf ├── Python_Excel_XLWings.ipynb ├── Python_Excel_XLWings.md ├── Python_Excel_XLWings.pdf ├── XLWings_test │ └── xlwings_wb.xlsx └── imgs │ ├── 2.3.png │ ├── Python_Excel_XLWings │ ├── code-result.png │ ├── xlwings-border.png │ ├── xlwings-charts.png │ ├── xlwings-format.png │ ├── xlwings-line.png │ ├── xlwings-local.png │ ├── xlwings-matplotlib.png │ ├── xlwings-practice1.png │ ├── xlwings-practice2.png │ ├── xlwings-principle.png │ └── xlwings-write.png │ ├── logo.png │ ├── output_30_0.png │ └── output_36_0.png ├── Task03-Python与Word和PDF ├── excel到word.xlsx ├── python与pdf.ipynb ├── python与pdf.md ├── python与word.ipynb ├── python与word.md ├── watermark.pdf └── 易方达中小盘混合型证券投资基金2020年中期报告.pdf ├── Task04-简单的python爬虫 ├── 爬虫参考结果 │ ├── datawhale.png │ ├── wuhan_ziru.csv │ └── 鲁迅文章.txt ├── 简单的python爬虫.ipynb └── 简单的python爬虫.md ├── Task05-Python操作钉钉自动化 ├── Python操作钉钉自动化.md └── asset │ ├── 01.png │ ├── 02.png │ ├── 03.png │ ├── 04.png │ ├── 05.png │ ├── 06.png │ ├── 07.png │ ├── 08.png │ ├── 09.png │ └── 10.png ├── Task06-其它推荐软件和网页 ├── asset │ ├── 01.png │ ├── 02.png │ ├── 03.png │ ├── 04.png │ ├── 05.png │ ├── 06.png │ ├── Desktopcal.png │ ├── clippingmagic.png │ └── geek.png └── 其它优秀的小工具.md └── 自动化办公.pptx /README.md: -------------------------------------------------------------------------------- 1 | # office-automation 2 | 3 | 🎉本课程已上线至智海（国家级的AI科教平台）：https://aiplusx.momodel.cn/classroom/class/664bf764599277c8d81d326d 4 | 5 | 课程基本信息 6 | 7 | - 学习周期：14天，每天平均花费时间1小时-3小时不等，根据个人学习接受能力强弱有所浮动。 8 | - 学习形式：理论学习 + 练习 9 | - 人群定位：有Python语言编程基础，对自动化办公有需求的学员。 10 | - 先修内容：Python编程语言 11 | - 相关课程：数据采集 12 | - 测试课程：Datawhale组队学习-办公自动化 13 | 14 | ### 课程大纲 15 | 16 | **Task01 文件处理与邮件自动化** 17 | - 文件路径识别、处理、文件夹的操作理论学习 18 | - 文件自动化处理实践 19 | - 邮件自动发送理论学习 20 | 21 | **Task02 Python与excel** 22 | - Excel读取与写入 23 | - Excel样式调整 24 | - 综合练习 25 | 26 | **Task03 Python与word和PDF** 27 | - python与word相关的理论知识学习 28 | - python与PDF相关的理论知识学习 29 | 30 | **Task04 简单的Python爬虫** 31 | - requests库的理论与实践 32 | - HTML页面解析与提取方法 33 | - 自如公寓数据抓取 34 | - 36kr信息抓取与邮件发送 35 | 36 | **Task05 Python操作钉钉自动化** 37 | - Python操作钉钉的相关知识学习 38 | 39 | **Task06 其它推荐软件和网页** 40 | - 一些好用的小工具和网页 41 | 42 | ### 致谢 43 | 44 | 感谢以下成员对项目推进作出的贡献 45 | 46 | 47 | 48 | 49 | 54 | 59 | 64 | 69 | 74 | 75 | 76 |

50 |

51 | 牧小熊 52 |

Task04&Task06

53 |

55 |

56 | 老表 57 |

Task02

58 |

60 |

61 | 小一 62 |

Task05&Task03

63 |

65 |

66 | 赵信达 67 |

Task01

68 |

70 |

71 | 于鸿飞 72 |

Task03

73 |

77 | 78 | 79 | 关于Datawhale： Datawhale是一个专注于数据科学与AI领域的开源组织，汇集了众多领域院校和知名企业的优秀学习者，聚合了一群有开源精神和探索精神的团队成员。Datawhale 以“for the learner，和学习者一起成长”为愿景，鼓励真实地展现自我、开放包容、互信互助、敢于试错和勇于担当。同时 Datawhale 用开源的理念去探索开源内容、开源学习和开源方案，赋能人才培养，助力人才成长，建立起人与人，人与知识，人与企业和人与未来的联结。 80 | 81 | ![logo.png](https://camo.githubusercontent.com/8578ee173c78b587d5058439bbd0b98fa39c173def229a8c3d957e62aac0b649/68747470733a2f2f696d672d626c6f672e6373646e696d672e636e2f323032303039313330313032323639382e706e67237069635f63656e746572) 82 | -------------------------------------------------------------------------------- /Task01-文件处理与邮件自动化/Task01 文件自动化与邮件处理.md: -------------------------------------------------------------------------------- 1 | --- 2 | typora-copy-images-to: png 3 | --- 4 | 5 | # Task 01 文件自动化处理 6 | 7 | - [Task 01 文件自动化处理](#task-01-文件自动化处理) 8 | - [1.1文件处理](#11文件处理) 9 | - [1.1.1 文件与文件路径](#111--文件与文件路径) 10 | - [1.1.2 当前工作目录](#112--当前工作目录) 11 | - [1.1.3 路径操作](#113--路径操作) 12 | - [1.1.3.1 绝对路径和相对路径](#1131-绝对路径和相对路径) 13 | - [1.1.3.2 路径操作](#1132-路径操作) 14 | - [1.1.3.3 路径有效性检查](#1133-路径有效性检查) 15 | - [1.1.4 文件及文件夹操作](#114--文件及文件夹操作) 16 | - [1.1.4.1 用os.makedirs()创建新文件夹](#1141-用osmakedirs创建新文件夹) 17 | - [1.1.4.2 查看文件大小和文件夹内容](#1142-查看文件大小和文件夹内容) 18 | - [1.1.5 文件读写过程](#115-文件读写过程) 19 | - [1.1.5.1 用open()函数打开文件](#1151-用open函数打开文件) 20 | - [1.1.5.2 读取文件内容](#1152-读取文件内容) 21 | - [1.1.5.3 写入文件](#1153-写入文件) 22 | - [1.1.5.4 保存变量](#1154-保存变量) 23 | - [1.1.6 练习](#116-练习) 24 | - [1.1.7 组织文件](#117--组织文件) 25 | - [1.1.1.7.1 shutil模块](#11171-shutil模块) 26 | - [1.1.1.7.2 复制文件和文件夹](#11172-复制文件和文件夹) 27 | - [1.1.7.3 文件和文件夹的移动与改名](#1173-文件和文件夹的移动与改名) 28 | - [1.1.7.4 永久删除文件和文件夹](#1174-永久删除文件和文件夹) 29 | - [1.1.7.5 用send2trash模块安全地删除](#1175-用send2trash模块安全地删除) 30 | - [1.1.8 遍历目录树](#118-遍历目录树) 31 | - [1.1.9 用zipfile模块压缩文件](#119-用zipfile模块压缩文件) 32 | - [1.1.9.1 创建和添加到zip文件](#1191-创建和添加到zip文件) 33 | - [1.1.9.2 读取zip文件](#1192-读取zip文件) 34 | - [1.1.9.3 从zip文件中解压缩](#1193-从zip文件中解压缩) 35 | - [1.1.10 练习](#1110-练习) 36 | - [1.2 自动发送电子邮件](#12-自动发送电子邮件) 37 | 38 | 39 | 40 | **Task01 文件自动化处理&邮件批量处理（3天）** 41 | 42 | - 文件路径识别、处理、文件夹的操作理论学习 43 | - 文件自动化处理实践 44 | - 邮件自动发送理论学习，使用python发送邮件附带excel附件 45 | 46 | 我们知道，程序运行时，可以用变量来保存运算结果，但如果希望程序运行关闭后，依然可以查看运行后的结果，就需要将数据保存到文件中。简单点，你可以将文件内容理解为一个字符串值，大小可能有几个GB。本节将学习，如何使用python在硬盘上创建、读取和保存文件。 47 | ## 1.1文件处理 48 | ### 1.1.1 文件与文件路径 49 | 50 | 文件的两个属性：“路径”和“文件名”，路径指明文件在计算机上的位置，文件名是指该位置的文件的名称。比如，我的电脑上，有个名字为Datawhale - 开源发展理论研究.pdf的文件，它的路径在D:\Datawhale。在windows中，路径中的D:\部分是“根文件夹”，Datawhale是文件夹名。注：Windows中文件夹名和文件名不区分大小写的。 51 | 52 | 在windows上，路径书写是使用倒斜杠'\'作为文件夹之间的分隔符，而在OS X和Linux上，是使用正斜杠'/'作为它们的路径分隔符。通常我们用`os.path.join()`函数来创建文件名称字符串。 53 | 54 | os常用的操作函数如下图 55 | 56 | ![os](.\png\os.png) 57 | 58 | ```python 59 | import os 60 | os.path.join('Datawhale','docu') 61 | ``` 62 | 63 | 我们可以看到返回的是（'Datawhale\\\docu'），有两个斜杠，这是因为有一个斜杠是用来转义的，在OS X或Linux上调用这个函数，这个字符串就会是'Datawhale/docu'。 64 | 65 | ### 1.1.2 当前工作目录 66 | 67 | 每个运行在计算机上的程序，都有一个“当前工作目录”。利用`os.getcwd()`函数，可以取得当前工作路径的 68 | 字符串，并可以利用`os.chdir()`改变它。 69 | 70 | ![3](.\png\3.png) 71 | 72 | ### 1.1.3 路径操作 73 | 74 | #### 1.1.3.1 绝对路径和相对路径 75 | 76 | “绝对路径”，总是从根文件夹开始。 77 | “相对路径”，相对于程序的当前工作目录。 78 | 相对路径中，单个句点“.”表示当前目录的缩写，两个句点“..”表示父文件夹。 79 | ![1](.\png\1.png) 80 | 81 | “绝对路径”，总是从根文件夹开始。 82 | 83 | “相对路径”，相对于程序的当前工作目录。 84 | 85 | 相对路径中，单个句点“.”表示当前目录的缩写，两个句点“..”表示父文件夹。 86 | 87 | 几个常用的绝对路径和相对路径处理函数： 88 | 89 | - os.path.abspath(path)：将相对路径转换为绝对路径，将返回参数的绝对路径的字符串。 90 | - os.path.isabs(path)：判断是否是绝对路径，是返回True,不是则返回False 91 | 92 | ![4](.\png\4.png) 93 | 94 | 95 | 96 | #### 1.1.3.2 路径操作 97 | 98 | `os.path.relpath(path,start)`:返回从start路径到path的相对路径的字符串。如果没提供start,就使用当前工作目录作为开始路径。 99 | `os.path.dirname(path)`: 返回当前路径的目录名称。 100 | `os.path.basename(path)`：返回当前路径的文件名称。 101 | 102 | ![5](.\png\5.png) 103 | 104 | 如果同时需要一个路径的目录名称和基本名称，可以调用`os.path.split()`，获得者两个字符串的元组。 105 | 106 | ```python 107 | caFilePath = 'D:\\Datawhale\\python办公自动化\\python课程画图.pptx' 108 | os.path.split(caFilePath) #('D:\\Datawhale\\python办公自动化', 'python课程画图.pptx') 109 | ``` 110 | 111 | 我们也可以调用os.path.dirname()和os.path.basename(),将它们的返回值放在一个元组中，从而得到同样的元组。 112 | 113 | ```python 114 | (os.path.dirname(caFilePath),os.path.basename(caFilePath)) #('D:\\Datawhale\\python办公自动化', 'python课程画图.pptx') 115 | ``` 116 | 117 | 如果我们想返回每个文件夹的字符串的列表。用`os.path.split()`无法得到，我们可以用`split()`字符串方法，并根据`os.path.sep` 中的字符串进行分割。`os.path.sep` 变量设置为正确的文件夹分割斜杠。 118 | 119 | ```python 120 | caFilePath.split(os.path.sep) #['D:', 'Datawhale', 'python办公自动化', 'python课程画图.pptx'] 121 | ``` 122 | 123 | #### 1.1.3.3 路径有效性检查 124 | 125 | 如果提供的路径不存在，很多Python函数就会崩溃并报错。`os.path`模块提供了一些函数，用于检测给定的路径是否存在，以及判定是文件还是文件夹。 126 | 127 | `os.path.exists(path)`：如果path参数所指的文件或文件夹存在，则返回True,否则返回False。 128 | 129 | `os.path.isfile(path)`：如果path参数存在，并且是一个文件，则返回True,否则返回False。 130 | 131 | `os.path.isdir(path)`：如果path参数存在，并且是一个文件夹，则返回True,否则返回False。 132 | 133 | ```python 134 | os.path.exists('C:\\Windows') 135 | ``` 136 | 137 | ```python 138 | os.path.exists('C:\\else') 139 | ``` 140 | 141 | ```python 142 | os.path.isfile('D:\\Datawhale\\python办公自动化\\python课程画图.pptx') 143 | ``` 144 | 145 | ```python 146 | os.path.isdir('D:\\Datawhale\\python办公自动化\\python课程画图.pptx') 147 | ``` 148 | 149 | ### 1.1.4 文件及文件夹操作 150 | 151 | #### 1.1.4.1 用os.makedirs()创建新文件夹 152 | 153 | 注：`os.makedirs()`可以创建所有必要的中间文件夹。 154 | 155 | ```python 156 | import os 157 | os.makedirs('D:\\Datawhale\\practice') #查看目录，已创建，若文件夹已存在，不会覆盖，会报错 158 | ``` 159 | 160 | #### 1.1.4.2 查看文件大小和文件夹内容 161 | 162 | 我们已经可以处理文件路径，这是操作文件及文件夹的基础。接下来，我们可以搜集特定文件和文件夹的信息。`os.path`模块提供了一些函数，用于查看文件的字节数以及给定文件夹中的文件和子文件夹。 163 | `os.path.getsize(path)`：返回path参数中文件的字节数。 164 | `os.listdir(path)`:返回文件名字符串的列表，包含path参数中的每个文件。 165 | 166 | ```python 167 | """ 168 | 注意这里你可以自己按照这个路径新建文件夹，并任意放入一个pptx文件， 169 | 并重命名为python课程画图.pptx。否则若不存在该文件将会报错，而非0字节 170 | """ 171 | os.path.getsize('D:\\Datawhale\\python办公自动化\\python课程画图.pptx') 172 | ``` 173 | 174 | ```python 175 | os.listdir('D:\\Datawhale\\python办公自动化') 176 | ``` 177 | 178 | 如果想知道目录下所有文件的总字节数，可以同时使用`os.path.getsize()`和`os.listdir()` 179 | 180 | ```python 181 | totalSize = 0 182 | for filename in os.listdir('D:\\Datawhale\\python办公自动化'): 183 | totalSize = totalSize + os.path.getsize(os.path.join('D:\\Datawhale\\python办公自动化',filename)) 184 | print(totalSize) 185 | ``` 186 | 187 | ### 1.1.5 文件读写过程 188 | 189 | 读写文件3个步骤： 190 | 191 | 1.调用`open()`函数，返回一个File对象。 192 | 193 | 2.调用File对象的`read()`或`write()`方法。 194 | 195 | 3.调用File对象的`close()`方法，关闭该文件。 196 | 197 | open函数中常见的对象方法及其作用说明： 198 | 199 | ![6](.\png\6.png) 200 | 201 | #### 1.1.5.1 用open()函数打开文件 202 | 203 | 要用`open()`函数打开一个文件，就要向它传递一个字符串路径，表明希望打开的文件。这既可以是绝对路径，也可以是相对路径。`open()`函数返回一个File对象。 204 | 先用TextEdit创建一个文本文件，名为hello.txt。输入Hello World!作为该文本文件的内容，将它保存在你的用户文件夹中。 205 | 206 | ![7](.\png\7.png) 207 | 208 | 文件对象可以通过Python内置的open函数得到，完整的语法如下。 209 | 210 | open(file,mode=r',buffering=-1,encoding=None,errors=None,newline=None,closefd=True,opener=None) 211 | 212 | open函数有8个参数，常用前4个，除了file参数外，其他参数都有默认值。file指定了要打开的文件名称，应包含文件路径，不写路径则表示文件和当前py脚本在同一个文件夹。buffering用于指定打开文件所用的缓冲方式，默认值-1表示使用系统默认的缓冲机制。文件读写要与硬盘交互，设置缓冲区的目的是减少CPU操作磁盘的次数，延长硬盘使用寿命。encoding用于指定文件的编码方式，如GBK、UTF-8等，默认采用UTF-8，有时候打开一个文件全是乱码，这是因为编码参数和创建文件时采用的编码方式不一样。 213 | 214 | mode指定了文件的打开模式。打开文件的基本模式包括r、w、a，对应读、写、追加写入。附加模式包括b、t、+，表示二进制模式、文本模式、读写模式，附加模式需要和基本模式组合才能使用，如“rb”表示以二进制只读模式打开文件，“rb+”表示以二进制读写模式打开文件。 215 | 216 | 要注意的是，凡是带w的模式，操作时都要非常谨慎，它首先会清空原文件，但不会有提示。凡是带r的文件必须先存在，否则会因找不到文件而报错。 217 | 218 | 219 | 220 | ```python 221 | helloFile = open('D:\\Datawhale\\python办公自动化\\hello.txt') 222 | print(helloFile) 223 | ``` 224 | 225 | 可以看到，调用`open()`函数将会返回一个File对象。当你需要读取或写入该文件，就可以调用helloFile变量中的File对象的方法。 226 | 227 | #### 1.1.5.2 读取文件内容 228 | 229 | 有了File对象，我们就可以开始从它读取内容。 230 | 231 | `read()`:读取文件内容。 232 | 233 | `readlines()`:按行读取文件中的内容，取得一个字符串列表，列表中每个字符串是文本中的一行且以\n结束。 234 | 235 | ```python 236 | helloContent = helloFile.read() 237 | helloContent 238 | ``` 239 | 240 | ```python 241 | sonnetFile = open('D:\\Datawhale\\python办公自动化\\hello.txt') 242 | sonnetFile.readlines() 243 | ``` 244 | 245 | #### 1.1.5.3 写入文件 246 | 247 | 需要用“写模式”‘w’和“添加模式”'a'打开一个文件，而不能用读模式打开文件。 248 | “写模式”将覆写原有的文件，从头开始。“添加模式”将在已有文件的末尾添加文本。 249 | 250 | ```python 251 | baconFile = open('bacon.txt','w') 252 | baconFile.write('Hello world!\n') 253 | ``` 254 | 255 | ```python 256 | baconFile.close() #注意，关闭后，才能完成写入，从txt文件中看到写入的内容。 257 | ``` 258 | 259 | ```python 260 | baconFile = open('bacon.txt','a') 261 | baconFile.write('Bacon is not a vegetable.') 262 | ``` 263 | 264 | ```python 265 | baconFile.close() 266 | ``` 267 | 268 | ```python 269 | baconFile = open('bacon.txt') 270 | content = baconFile.read() 271 | baconFile.close() 272 | print(content) 273 | ``` 274 | 275 | 注意，`write()`方法不会像print()函数那样，在字符串的末尾自动添加换行字符。必须自己添加该字符。 276 | 277 | - 案例：统计字母出现的频率 278 | 279 | 文件对象有iter、next方法，所以它是一个可迭代对象，可以用for循环遍历。我们可以遍历文件获得每一行字符，再遍历每一行，获得每个字符，将字符放入列表，然后统计每个字符出现的频率。 280 | 281 | ```python 282 | from collections import Counter 283 | my_list = [] 284 | punctuation=',.!?\，。！？、()【】<>《》=：+-*“”...\n' 285 | with open('bacon.txt','r') as f: 286 | for line in f: 287 | for word in line: 288 | if word not in punctuation: 289 | my_list.append(word) 290 | 291 | counter = Counter(my_list) 292 | counter 293 | ``` 294 | 295 | 296 | 297 | #### 1.1.5.4 保存变量 298 | 299 | 1)、shelve模块 300 | 301 | 用`shelve`模块，可以将Python中的变量保存到二进制的`shelf`文件中。这样，程序就可以从硬盘中恢复变量的数据。 302 | 303 | ```python 304 | import shelve 305 | shelfFile = shelve.open('mydata') 306 | cats = ['Zonphie','Pooka','Simon'] 307 | shelfFile['cats'] = cats 308 | shelfFile.close() 309 | ``` 310 | 311 | 在Windows上运行前面的代码，我们会看到当前工作目录下有3个新文件：mydata.bak、mydata.dat和mydata.dir。在OS X上，只会创建一个mydata.db文件。 312 | 313 | 重新打开这些文件，取出数据。注意：`shelf`值不必用读模式或写模式打开，因为打开后，既能读又能写。 314 | 315 | ![8](.\png\8.png) 316 | 317 | ```python 318 | shelfFile = shelve.open('mydata') 319 | type(shelfFile) 320 | ``` 321 | 322 | ```python 323 | shelve.DbfilenameShelf 324 | ``` 325 | 326 | ```python 327 | shelfFile['cats'] 328 | ``` 329 | 330 | ```python 331 | shelfFile.close() 332 | ``` 333 | 334 | 就像字典一样，`shelf`值有`keys()`和`values()`方法，返回shelf中键和值的类似列表的值。但是这些方法返回类似列表的值，却不是真正的列表，所以应该将它们传递给`list()`函数，取得列表的形式。 335 | 336 | ```python 337 | shelfFile = shelve.open('mydata') 338 | list(shelfFile.keys()) 339 | ``` 340 | 341 | ```python 342 | list(shelfFile.values()) 343 | ``` 344 | 345 | ```python 346 | shelfFile.close() 347 | ``` 348 | 349 | 2)、用`pprint.pformat()`函数保存变量 350 | 351 | `pprint.pformat()`函数返回要打印的内容的文本字符串，这个字符串既易于阅读，也是语法上正确的Python代码。 352 | 353 | 假如，有一个字典，保存在一个变量中，希望保存这个变量和它的内容，以便将来使用。`pprint.pformat()`函数将提供一个字符串，我们可以将它写入.py文件。这个文件可以成为我们自己的模块，如果需要使用存储其中的变量，就可以导入它。 354 | 355 | ```python 356 | import pprint 357 | cats = [{'name':'Zophie','desc':'chubby'},{'name':'Pooka','desc':'fluffy'}] 358 | pprint.pformat(cats) 359 | ``` 360 | 361 | ```python 362 | fileObj = open('myCats.py','w') 363 | fileObj.write('cats = '+pprint.pformat(cats)+'\n') 364 | ``` 365 | 366 | ```python 367 | fileObj.close() 368 | ``` 369 | 370 | import语句导入的模块本身就是Python脚本。如果来自pprint.pformat()的字符串保存为一个.py文件，该文件就是一个可以导入的模块。 371 | 372 | ```python 373 | import myCats 374 | myCats.cats 375 | ``` 376 | 377 | ```python 378 | myCats.cats[0] 379 | ``` 380 | 381 | ```python 382 | myCats.cats[0]['name'] 383 | ``` 384 | 385 | ### 1.1.6 练习 386 | 387 | 1、如果已有的文件以写模式打开，会发生什么？ 388 | 389 | 提示： 390 | 391 | ``` 392 | 以写模式打开 393 | 394 | r : 只读模式，文件不存在泽报错，默认模式(文件指针位于文件末尾) 395 | 396 | w : 写入模式，文件不存在则自动报错，每次打开会覆盖原文件内容,文件不关闭则可以进行多次写入（只会在打开文件时清空文件内容） 397 | ``` 398 | 399 | 2、`read()`和`readlines()`方法之间的区别是什么？ 400 | 401 | 提示： 402 | 403 | read():以原格式返回全部文本 404 | 405 | readline(): 只返回第一行文本 406 | 407 | readlines(): 以列表的格式返回全部文本，文本的第几行对应列表的第几个元素 408 | 综合练习： 409 | 一、生成随机的测验试卷文件 410 | 假如你是一位地理老师，班上有 35 名学生，你希望进行美国各州首府的一个 411 | 小测验。不妙的是，班里有几个坏蛋，你无法确信学生不会作弊。你希望随机调整 412 | 问题的次序，这样每份试卷都是独一无二的，这让任何人都不能从其他人那里抄袭答案。当然，手工完成这件事又费时又无聊。好在，你懂一些 Python。 413 | 414 | 下面是程序所做的事： 415 | 416 | • 创建 35 份不同的测验试卷。 417 | 418 | • 为每份试卷创建 50 个多重选择题，次序随机。 419 | 420 | • 为每个问题提供一个正确答案和 3 个随机的错误答案，次序随机。 421 | 422 | • 将测验试卷写到 35 个文本文件中。 423 | 424 | • 将答案写到 35 个文本文件中。 425 | 426 | 这意味着代码需要做下面的事： 427 | 428 | • 将州和它们的首府保存在一个字典中。 429 | 430 | • 针对测验文本文件和答案文本文件，调用 open()、 write()和 close()。 431 | 432 | • 利用 random.shuffle()随机调整问题和多重选项的次序。 433 | 434 | 435 | 436 | 提示： 437 | https://blog.csdn.net/liying_tt/article/details/117968373 438 | 439 | ### 1.1.7 组织文件 440 | 441 | 在上一节中，已经学习了如何使用Python创建并写入新文件。本节将介绍如何用程序组织硬盘上已经存在的文件。不知你是否经历过查找一个文件夹，里面有几十个、几百个、甚至上千个文件，需要手工进行复制、改名、移动或压缩。比如下列这样的任务： 442 | 443 | • 在一个文件夹及其所有子文件夹中，复制所有的 pdf 文件（且只复制 pdf 文件） 444 | 445 | • 针对一个文件夹中的所有文件，删除文件名中前导的零，该文件夹中有数百个文件，名为 spam001.txt、 spam002.txt、 spam003.txt 等。 446 | 447 | • 将几个文件夹的内容压缩到一个 ZIP 文件中（这可能是一个简单的备份系统） 448 | 449 | 所有这种无聊的任务，正是在请求用 Python 实现自动化。通过对电脑编程来完成这些任务，你就把它变成了一个快速工作的文件职员，而且从不犯错。 450 | 451 | #### 1.1.1.7.1 shutil模块 452 | 453 | `shutil`(或称为shell工具)模块中包含一些函数，可以在Python程序中复制、移动、改名和删除文件。要使用`shutil`的函数，首先需要`import shutil` 454 | 455 | ![9](.\png\9.png) 456 | 457 | #### 1.1.1.7.2 复制文件和文件夹 458 | 459 | `shutil.copy(source, destination)`：将路径source处的文件复制到路径 destination处的文件夹（source 和 destination 都是字符串），并返回新复制文件绝对路径字符串。 460 | 461 | 其中destination可以是： 462 | 463 | 1）、一个文件的名称，则将source文件复制为新名称的destination 464 | 465 | 2）、一个文件夹，则将source文件复制到destination中 466 | 467 | 3）、若这个文件夹不存在，则将source目标文件内的内容复制到destination中,若destination文件夹不存在，则自动生成该文件。(慎用，因为会将source文件复制为一个没有扩展名的名字为destination的文件，这往往不是我们希望的) 468 | 469 | ```python 470 | """ 471 | 这里如果路径下没有bacon.txt，可以从当前代码文件路径下找到bacon.txt， 472 | 将其移至指定路径学习使用 473 | """ 474 | 475 | import shutil 476 | import os 477 | shutil.copy('D:\\Datawhale\\python办公自动化\\bacon.txt', 'D:\\Datawhale\\practice') 478 | ``` 479 | 480 | - shutil.copytree(source, destination):将路径source处的文件夹，包括其包含的文件夹和文件，复制到路径destination处的文件夹,并返回新复制文件夹绝对路径字符串。 481 | 482 | 注：destination处的文件夹为新创建的文件夹，如已存在，则会报错 483 | 484 | ```python 485 | import shutil 486 | shutil.copytree('D:\\Datawhale\\python办公自动化','D:\\Datawhale\\practice') 487 | ``` 488 | 489 | ```python 490 | import shutil 491 | shutil.copytree('D:\\Datawhale\\python办公自动化','D:\\Datawhale\\practice_unexist') 492 | ``` 493 | 494 | 495 | 496 | #### 1.1.7.3 文件和文件夹的移动与改名 497 | 498 | `shutil.move(source, destination)`：将路径 source 处的文件/文件夹移动到路径destination，并返回新位置的绝对路径的字符串。 499 | 500 | 1)、如果source和destination是文件夹，且destination已存在，则会将source文件夹下所有内容复制到destination文件夹中。移动。 501 | 502 | 2）、如果source是文件夹，destination不存在，则会将source文件夹下所有内容复制到destination文件夹中，source原文件夹名称将被替换为destination文件夹名。移动+重命名 503 | 504 | 3）、如果source和destination是文件，source处的文件将被移动到destination处的位置，并以destination处的文件名进行命名，移动+重命名。 505 | 506 | 注意：如果destination中有原来已经存在同名文件，移动后，会被覆写，所以应当特别注意。 507 | 508 | ```python 509 | import shutil 510 | shutil.move('D:\\Datawhale\\practice','D:\\Datawhale\\docu') 511 | ``` 512 | 513 | #### 1.1.7.4 永久删除文件和文件夹 514 | 515 | `os.unlink(path)`: 删除path处的文件。 516 | 517 | `os.rmdir(path)`: 删除path处的文件夹。该文件夹必须为空，其中没有任何文件和文件夹。 518 | 519 | `shutil.rmtree(path)`:删除 path 处的文件夹，它包含的所有文件和文件夹都会被删除。 520 | 521 | 注意：使用时，需要非常小心，避免删错文件，一般在第一次运行时，注释掉这些程序，并加上`print()`函数来帮助查看是否是想要删除的文件。 522 | 523 | ```python 524 | #建议先指定操作的文件夹，并查看 525 | os.chdir('D:\\Datawhale\\docue') 526 | os.getcwd() 527 | ``` 528 | 529 | ```python 530 | import os 531 | for filename in os.listdir(): 532 | print(filename) 533 | os.unlink(filename) 534 | 535 | # 可以看到bacon.txt已经被删除 536 | 537 | for filename in os.listdir(): 538 | print(filename) 539 | ``` 540 | 541 | #### 1.1.7.5 用send2trash模块安全地删除 542 | 543 | `shutil.rmtree(path)`会不可恢复的删除文件和文件夹，用起来会有危险。因此使用第三方的`send2trash`模块，可以将文件或文件夹发送到计算机的垃圾箱或回收站，而不是永久删除。因程序缺陷而用send2trash 删除的某些你不想删除的东西，稍后可以从垃圾箱恢复。 544 | 545 | 注意：使用时，需要非常小心，避免删错文件，一般在第一次运行时，注释掉这些程序，并加上`print()`函数来帮助查看是否是想要删除的文件。 546 | 547 | ```python 548 | !pip install send2trash #安装send2trash模块 549 | ``` 550 | 551 | ```python 552 | import send2trash 553 | send2trash.send2trash('bacon.txt') 554 | ``` 555 | 556 | ### 1.1.8 遍历目录树 557 | 558 | `os.walk(path)`:传入一个文件夹的路径，在for循环语句中使用`os.walk()`函数，遍历目录树，和range()函数遍历一个范围的数字类似。不同的是，`os.walk()`在循环的每次迭代中，返回三个值： 559 | 560 | 1）、当前文件夹称的字符串。 561 | 562 | 2）、当前文件夹中子文件夹的字符串的列表。 563 | 564 | 3）、当前文件夹中文件的字符串的列表。 565 | 566 | 注：当前文件夹，是指for循环当前迭代的文件夹。程序的当前工作目录，不会因为`os.walk()`而改变。 567 | 568 | ![2](.\png\2.png) 569 | 570 | 571 | 572 | 按照下图目录树，创建相应的文件。 573 | 574 | 575 | ```python 576 | import os 577 | for folderName, subFolders,fileNames in os.walk('D:\\animals'): 578 | print('The current folder is ' + folderName) 579 | for subFolder in subFolders: 580 | print('Subfolder of ' + folderName+':'+subFolder) 581 | for filename in fileNames: 582 | print('File Inside ' + folderName+':'+filename) 583 | print('') 584 | ``` 585 | 586 | ### 1.1.9 用zipfile模块压缩文件 587 | 588 | 为方便传输，常常将文件打包成.zip格式文件。利用zipfile模块中的函数，Python程序可以创建和打开（或解压）zip文件。 589 | 590 | #### 1.1.9.1 创建和添加到zip文件 591 | 592 | 将上述章节中animals文件夹进行压缩。创建一个example.zip的zip文件，并向其中添加文件。 593 | 594 | `zipfile.ZipFile('filename.zip', 'w')` ：以写模式创建一个压缩文件 595 | 596 | `ZipFile` 对象的 `write('filename','compress_type=zipfile.ZIP_DEFLATED')`方法：如果向`write()`方法中传入一个路径，Python 就会压缩该路径所指的文件，将它加到 ZIP 文件中。如果向`write()`方法中传入一个字符串，代表要添加的文件名。第二个参数是“压缩类型”参数，告诉计算机用怎样的算法来压缩文件。可以总是将这个值设置为 `zipfile.ZIP_DEFLATED`（这指定了 deflate 压缩算法，它对各种类型的数据都很有效）。 597 | 598 | 注意：写模式会擦除zip文件中所有原有的内容。如果只希望将文件添加到原有的zip文件中，就要向`zipfile.ZipFile()`传入'a'作为第二个参数，以添加模式打开 ZIP 文件。 599 | 600 | ```python 601 | ## 1 创建一个new.zip压缩文件，并向其中添加文件 602 | import zipfile 603 | newZip = zipfile.ZipFile('new.zip','w') 604 | newZip.write('Miki.txt',compress_type=zipfile.ZIP_DEFLATED) 605 | newZip.close() 606 | ``` 607 | 608 | ```python 609 | newZip = zipfile.ZipFile('new.zip','w') 610 | newZip.write('D:\\animals\\dogs\\Taidi.txt',compress_type=zipfile.ZIP_DEFLATED) 611 | newZip.close() 612 | ``` 613 | 614 | ```python 615 | ## 2 创建一个example.zip的压缩文件，将animals文件夹下所有文件进行压缩。 616 | import zipfile 617 | import os 618 | newZip = zipfile.ZipFile('example.zip','w') 619 | for folderName, subFolders,fileNames in os.walk('D:\\animals'): 620 | for filename in fileNames: 621 | newZip.write(os.path.join(folderName,filename),compress_type=zipfile.ZIP_DEFLATED) 622 | newZip.close() 623 | ``` 624 | 625 | #### 1.1.9.2 读取zip文件 626 | 627 | 调用`zipfile.ZipFile(filename)`函数创建一个`ZipFile`对象（注意大写字母Z和F）,filename是要读取zip文件的文件名。 628 | 629 | `ZipFile`对象中的两个常用方法： 630 | 631 | `namelis()`方法，返回zip文件中包含的所有文件和文件夹的字符串列表。 632 | 633 | `getinfo()`方法，返回一个关于特定文件的`ZipInfo`对象。 634 | 635 | `ZipInfo`对象的两个属性：`file_size`和`compress_size`，分别表示原来文件大小和压缩后文件大小。1.2.3.2 读取zip文件 636 | 637 | ``` 638 | import zipfile,os 639 | exampleZip = zipfile.ZipFile('example.zip') 640 | exampleZip.namelist() 641 | ``` 642 | 643 | ``` 644 | catInfo = exampleZip.getinfo('animals/Miki.txt') 645 | ``` 646 | 647 | ``` 648 | catInfo.file_size 649 | ``` 650 | 651 | ``` 652 | catInfo.compress_size 653 | ``` 654 | 655 | ``` 656 | print('Compressed file is %s x smaller!' %(round(catInfo.file_size/catInfo.compress_size,2))) 657 | ``` 658 | 659 | ``` 660 | exampleZip.close() 661 | ``` 662 | 663 | #### 1.1.9.3 从zip文件中解压缩 664 | 665 | `ZipFile` 对象的 `extractall()`方法：从zip文件中解压缩所有文件和文件夹，放到当前工作目录中。也可以向`extractall()`传递的一个文件夹名称，它将文件解压缩到那个文件夹，而不是当前工作目录。如果传递的文件夹名称不存在，就会被创建。 666 | 667 | `ZipFile` 对象的 `extract()`方法:从zip文件中解压单个文件。也可以向 extract()传递第二个参数，将文件解压缩到指定的文件夹，而不是当前工作目录。如果第二个参数指定的文件夹不存在， Python 就会创建它。extract()的返回值是被压缩后文件的绝对路径。 668 | 669 | ```python 670 | import zipfile, os 671 | exampleZip = zipfile.ZipFile('example.zip') 672 | exampleZip.extractall('.\zip') 673 | exampleZip.close() 674 | ``` 675 | 676 | ```python 677 | exampleZip = zipfile.ZipFile('example.zip') 678 | exampleZip.extract('animals/Miki.txt') 679 | exampleZip.extract('animals/Miki.txt', 'D:\\animals\\folders') 680 | exampleZip.close() 681 | ``` 682 | 683 | ### 1.1.10 文件查找 684 | 685 | 对于文件操作，最需要熟练掌握的就是查找文件。前面介绍了使用os.listdir、os.walk方法可以批量列出当前工作目录的全部文件，下面介绍常用于查找特定文件的模块。 686 | 687 | #### 1.1.10.1 glob 688 | 689 | glob是Python自带的一个文件操作相关模块，用它可以查找符合条件的文件。例如，我们要找到当前目录下全部的.txt文档，可以用下面的代码。 690 | 691 | ```python 692 | import glob 693 | glob.glob('*.txt') 694 | ``` 695 | 696 | 这里主要是写匹配条件，“*”匹配任意个字符，“?”匹配单个字符，也可以用“[]”匹配指定范围内的字符，如[0-9]匹配数字。 697 | 698 | 699 | - glob.glob('*[0-9]*.*')可以匹配当前目录下文件名中带有数字的文件。 700 | - glob.glob(r'G:\*')可以获取G盘下的所有文件和文件夹，但是它不会进一步列明文件夹下的文件。也就是说，其返回的文件名只包括当前目录里的文件名，不包括子文件夹里的文件 701 | 702 | #### 1.1.10.2 fnmatch模块 703 | 704 | fnmatch也是Python自带的库，是专门用来进行文件名匹配的模块，使用它可以完成更为复杂的文件名匹配。它有4个函数，分别是fnmatch、fnmatchcase、filter和translate，其中最常用的是fnmatch函数，其语法如下。 705 | 706 | - fnmatch.fnmatch(filename,pattern) 707 | 708 | pattern表示匹配条件，测试文件名filename是否符合匹配条件。 709 | 710 | 下面找出目标文件夹里所有结尾带数字的文件 711 | 712 | ```python 713 | import os,fnmatch 714 | path = os.getcwd() # 获取当前代码文件所在目录 715 | for foldname, subfolders,filenames in os.walk(path): 716 | for filename in filenames: 717 | if fnmatch.fnmatch(filename,'*[0-9].*'): 718 | print(filename) 719 | ``` 720 | 721 | fnmatchcase和fnmatch函数类似，只是fnmatchcase函数强制区分字母大小写。 722 | 723 | 以上两个函数都返回True或者False，filter函数则返回匹配的文件名列表，其语法如下: 724 | 725 | - fnmatch.filter(filelist,pattern) 726 | 727 | #### 1.1.10.3 hashlib模块 728 | 729 | 随着计算机中文件越来越多，我们需要找出重复文件。重复文件可能有不同的文件名，不能简单用文件名和文件大小来判断。从科学角度，最简单的办法就是通过MD5来确定两个文件是不是一样的。 730 | 731 | Python自带的hashlib库里提供了获取文件MD5值的方法。 732 | 733 | ```python 734 | import hashlib 735 | m = hashlib.md5() 736 | f = open('bacon.txt','rb') 737 | m.update(f.read()) 738 | f.close() 739 | md5_value = m.hexdigest() 740 | print(md5_value) 741 | ``` 742 | 743 | 电子文件容易被篡改或者伪造，在出现纠纷时，怎么提供有力的证据来证明文件的真实性？一个可行的办法就是制作文件后对整个文件生成MD5值。一旦MD5值生成之后，文件发生过任何修改，MD5值都将改变，通过此方法可以确定文件是否被篡改过。 744 | 745 | ### 1.1.11 练习 746 | 747 | 1）、编写一个程序，遍历一个目录树，查找特定扩展名的文件（诸如.pdf 或.jpg）。不论这些文件的位置在哪里，将它们拷贝到一个新的文件夹中。 748 | 749 | 2）、一些不需要的、巨大的文件或文件夹占据了硬盘的空间，这并不少见。如果你试图释放计算机上的空间，那么删除不想要的巨大文件效果最好。但首先你必须找到它们。编写一个程序，遍历一个目录树，查找特别大的文件或文件夹，比方说，超过100MB 的文件（回忆一下，要获得文件的大小，可以使用 os 模块的 `os.path.getsize()`）。将这些文件的绝对路径打印到屏幕上。 750 | 751 | 3）、编写一个程序，在一个文件夹中，找到所有带指定前缀的文件，诸如 spam001.txt,spam002.txt 等，并定位缺失的编号（例如存在 spam001.txt 和 spam003.txt，但不存在 spam002.txt）。让该程序对所有后面的文件改名，消除缺失的编号。作为附加的挑战，编写另一个程序，在一些连续编号的文件中，空出一些编号，以便加入新的文件。 752 | 753 | ## 1.3 自动发送电子邮件 754 | 755 | 使用Python实现自动化邮件发送，可以让你摆脱繁琐的重复性业务，节省非常多的时间。 756 | 757 | Python有两个内置库：`smtplib`和`email`，能够实现邮件功能，`smtplib`库负责发送邮件，`email`库负责构造邮件格式和内容。 758 | 759 | 邮件发送需要遵守**SMTP**协议，Python内置对SMTP的支持，可以发送纯文本邮件、HTML邮件以及带附件的邮件。 760 | 761 | ```python 762 | #1 先导入相关的库和方法 763 | import smtplib #导入库 764 | from smtplib import SMTP_SSL #加密邮件内容，防止中途被截获 765 | from email.mime.text import MIMEText #构造邮件的正文 766 | from email.mime.image import MIMEImage #构造邮件的图片 767 | from email.mime.multipart import MIMEMultipart #把邮件的各个部分装在一起，邮件的主体 768 | from email.header import Header #邮件的文件头，标题，收件人 769 | ``` 770 | 771 | ```python 772 | #2 设置邮箱域名、发件人邮箱、邮箱授权码、收件人邮箱 773 | host_server = 'smtp.163.com' #sina 邮箱smtp服务器 #smtp 服务器的地址 774 | sender_163 = 'pythonauto_emai@163.com' #sender_163为发件人的邮箱 775 | pwd = 'DYEPOGLZDZYLOMRI' #pwd为邮箱的授权码'DYEPOGLZDZYLOMRI' 776 | #也可以自己注册个邮箱，邮箱授权码'DYEPOGLZDZYLOMRI' 获取方式可参考#http://help.163.com/14/0923/22/A6S1FMJD00754KNP.html 777 | 778 | # 设置接受邮箱，换成自己的邮箱即可 779 | receiver = '1121091694@qq.com' 780 | ``` 781 | 782 | ```python 783 | #3 构建MIMEMultipart对象代表邮件本身，可以往里面添加文本、图片、附件等 784 | msg = MIMEMultipart() #邮件主体 785 | ``` 786 | 787 | ```python 788 | #4 设置邮件头部内容 789 | mail_title = 'python办公自动化邮件' # 邮件标题 790 | msg["Subject"] = Header(mail_title,'utf-8') #装入主体 791 | msg["From"] = sender_163 #寄件人 792 | msg["To"] = Header("测试邮箱",'utf-8') #标题 793 | ``` 794 | 795 | ```python 796 | #5 添加正文文本 797 | mail_content = "您好，这是使用python登录163邮箱发送邮件的测试" #邮件的正文内容 798 | message_text = MIMEText(mail_content,'plain','utf-8') #构造文本,参数1：正文内容，参数2：文本格式，参数3：编码方式 799 | msg.attach(message_text) # 向MIMEMultipart对象中添加文本对象 800 | ``` 801 | 802 | ```python 803 | #6 添加图片 804 | image_data = open('D:\\animals\\cats\\zophie.jpg','rb') # 二进制读取图片 805 | message_image = MIMEImage(image_data.read()) # 设置读取获取的二进制数据 806 | image_data.close() # 关闭刚才打开的文件 807 | msg.attach(message_image) # 添加图片文件到邮件信息当中去 808 | ``` 809 | 810 | ```python 811 | # 7 添加附件(excel表格) 812 | atta = MIMEText(open('D:\\animals\\cats\\cat.xlsx', 'rb').read(), 'base64', 'utf-8') # 构造附件 813 | atta["Content-Disposition"] = 'attachment; filename="cat.xlsx"' # 设置附件信息 814 | msg.attach(atta) ## 添加附件到邮件信息当中去 815 | ``` 816 | 817 | ```python 818 | #8 发送邮件 819 | smtp = SMTP_SSL(host_server) #SSL登录创建SMTP对象 820 | smtp.login(sender_163,pwd) ## 登录邮箱，传递参数1：邮箱地址，参数2：邮箱授权码 821 | smtp.sendmail(sender_163,receiver,msg.as_string()) # 发送邮件，传递参数1：发件人邮箱地址，参数2：收件人邮箱地址，参数3：把邮件内容格式改为str 822 | print("邮件发送成功") 823 | smtp.quit # 关闭SMTP对象 824 | ``` 825 | 826 | **参考！！！**： 827 | 828 | https://github.com/datawhalechina/team-learning-program/blob/master/OfficeAutomation/Task01%20%E6%96%87%E4%BB%B6%E8%87%AA%E5%8A%A8%E5%8C%96%E4%B8%8E%E9%82%AE%E4%BB%B6%E5%A4%84%E7%90%86.md 829 | 830 | 和 831 | 832 | 《学Python，不加班——轻松实现办公自动化》-何华平一书的第三章高效办公文件管理 -------------------------------------------------------------------------------- /Task01-文件处理与邮件自动化/png/1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/office-automation/cabf0171cefe33aa53c547cc40e9910f752dbf06/Task01-文件处理与邮件自动化/png/1.png -------------------------------------------------------------------------------- /Task01-文件处理与邮件自动化/png/2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/office-automation/cabf0171cefe33aa53c547cc40e9910f752dbf06/Task01-文件处理与邮件自动化/png/2.png -------------------------------------------------------------------------------- /Task01-文件处理与邮件自动化/png/3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/office-automation/cabf0171cefe33aa53c547cc40e9910f752dbf06/Task01-文件处理与邮件自动化/png/3.png -------------------------------------------------------------------------------- /Task01-文件处理与邮件自动化/png/4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/office-automation/cabf0171cefe33aa53c547cc40e9910f752dbf06/Task01-文件处理与邮件自动化/png/4.png -------------------------------------------------------------------------------- /Task01-文件处理与邮件自动化/png/5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/office-automation/cabf0171cefe33aa53c547cc40e9910f752dbf06/Task01-文件处理与邮件自动化/png/5.png -------------------------------------------------------------------------------- /Task01-文件处理与邮件自动化/png/6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/office-automation/cabf0171cefe33aa53c547cc40e9910f752dbf06/Task01-文件处理与邮件自动化/png/6.png -------------------------------------------------------------------------------- /Task01-文件处理与邮件自动化/png/7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/office-automation/cabf0171cefe33aa53c547cc40e9910f752dbf06/Task01-文件处理与邮件自动化/png/7.png -------------------------------------------------------------------------------- /Task01-文件处理与邮件自动化/png/8.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/office-automation/cabf0171cefe33aa53c547cc40e9910f752dbf06/Task01-文件处理与邮件自动化/png/8.png -------------------------------------------------------------------------------- /Task01-文件处理与邮件自动化/png/9.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/office-automation/cabf0171cefe33aa53c547cc40e9910f752dbf06/Task01-文件处理与邮件自动化/png/9.png -------------------------------------------------------------------------------- /Task01-文件处理与邮件自动化/png/os.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/office-automation/cabf0171cefe33aa53c547cc40e9910f752dbf06/Task01-文件处理与邮件自动化/png/os.png -------------------------------------------------------------------------------- /Task02-Python与Excel/OpenPyXL_test/creat_sheet_test.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/office-automation/cabf0171cefe33aa53c547cc40e9910f752dbf06/Task02-Python与Excel/OpenPyXL_test/creat_sheet_test.xlsx -------------------------------------------------------------------------------- /Task02-Python与Excel/OpenPyXL_test/new_test.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/office-automation/cabf0171cefe33aa53c547cc40e9910f752dbf06/Task02-Python与Excel/OpenPyXL_test/new_test.xlsx -------------------------------------------------------------------------------- /Task02-Python与Excel/OpenPyXL_test/test.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/office-automation/cabf0171cefe33aa53c547cc40e9910f752dbf06/Task02-Python与Excel/OpenPyXL_test/test.xlsx -------------------------------------------------------------------------------- /Task02-Python与Excel/OpenPyXL_test/业务经理信息表_pd.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/office-automation/cabf0171cefe33aa53c547cc40e9910f752dbf06/Task02-Python与Excel/OpenPyXL_test/业务经理信息表_pd.xlsx -------------------------------------------------------------------------------- /Task02-Python与Excel/OpenPyXL_test/业务经理信息表_xl.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/office-automation/cabf0171cefe33aa53c547cc40e9910f752dbf06/Task02-Python与Excel/OpenPyXL_test/业务经理信息表_xl.xlsx -------------------------------------------------------------------------------- /Task02-Python与Excel/OpenPyXL_test/业务联系表.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/office-automation/cabf0171cefe33aa53c547cc40e9910f752dbf06/Task02-Python与Excel/OpenPyXL_test/业务联系表.xlsx -------------------------------------------------------------------------------- /Task02-Python与Excel/OpenPyXL_test/业务联系表_pd.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/office-automation/cabf0171cefe33aa53c547cc40e9910f752dbf06/Task02-Python与Excel/OpenPyXL_test/业务联系表_pd.xlsx -------------------------------------------------------------------------------- /Task02-Python与Excel/OpenPyXL_test/客户信息表_pd.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/office-automation/cabf0171cefe33aa53c547cc40e9910f752dbf06/Task02-Python与Excel/OpenPyXL_test/客户信息表_pd.xlsx -------------------------------------------------------------------------------- /Task02-Python与Excel/OpenPyXL_test/客户信息表_xl.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/office-automation/cabf0171cefe33aa53c547cc40e9910f752dbf06/Task02-Python与Excel/OpenPyXL_test/客户信息表_xl.xlsx -------------------------------------------------------------------------------- /Task02-Python与Excel/OpenPyXL_test/用户行为偏好.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/office-automation/cabf0171cefe33aa53c547cc40e9910f752dbf06/Task02-Python与Excel/OpenPyXL_test/用户行为偏好.xlsx -------------------------------------------------------------------------------- /Task02-Python与Excel/OpenPyXL_test/用户行为偏好_1.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/office-automation/cabf0171cefe33aa53c547cc40e9910f752dbf06/Task02-Python与Excel/OpenPyXL_test/用户行为偏好_1.xlsx -------------------------------------------------------------------------------- /Task02-Python与Excel/Python_Excel_OpenPyXL.md: -------------------------------------------------------------------------------- 1 | # Task 02 Python Excel 自动化之 OpenPyXL 2 | 3 | - [Task 02 Python Excel 自动化之 OpenPyXL](#task-02-python-excel-自动化之-openpyxl) 4 | - [2.0 包的安装](#20-包的安装) 5 | - [2.1 Excel读取](#21-excel读取) 6 | - [2.1.1 读取Excel中的工作表](#211-读取excel中的工作表) 7 | - [1. 读取Excel文件 `用户行为偏好.xlsx ` ，查看返回值属性](#1-读取excel文件-用户行为偏好xlsx--查看返回值属性) 8 | - [2. 查看对应工作簿包含的 sheet(工作表) 的名称，读取活动表](#2-查看对应工作簿包含的-sheet工作表-的名称读取活动表) 9 | - [3. 查看指定sheet信息](#3-查看指定sheet信息) 10 | - [2.1.2 读取工作表中的单元格](#212-读取工作表中的单元格) 11 | - [2.1.3 读取多个单元格的值](#213-读取多个单元格的值) 12 | - [2.1.4 练习题](#214-练习题) 13 | - [2.2 Excel写入](#22-excel写入) 14 | - [2.2.1 写入数据并保存](#221-写入数据并保存) 15 | - [1. 原有工作簿中修改数据并保存](#1-原有工作簿中修改数据并保存) 16 | - [2. 创建新的表格写入数据并保存](#2-创建新的表格写入数据并保存) 17 | - [2.2.2 将公式写入单元格保存](#222-将公式写入单元格保存) 18 | - [2.2.3 插入空列/行](#223-插入空列行) 19 | - [2.2.4 删除](#224-删除) 20 | - [2.2.5 移动](#225-移动) 21 | - [2.3 Excel 样式](#23-excel-样式) 22 | - [2.3.1设置字体样式](#231设置字体样式) 23 | - [1. 设置单个 cell(单元格) 字体样式](#1-设置单个-cell单元格-字体样式) 24 | - [2. 设置多个 cell 的字体样式](#2-设置多个-cell-的字体样式) 25 | - [2.3.2 设置边框样式](#232-设置边框样式) 26 | - [1. 设置单元格边框样式](#1-设置单元格边框样式) 27 | - [2.3.3 设置单元格其他样式](#233-设置单元格其他样式) 28 | - [1. 设置单元格背景色](#1-设置单元格背景色) 29 | - [2.设置水平居中](#2设置水平居中) 30 | - [3. 设置行高与列宽](#3-设置行高与列宽) 31 | - [2.3.3 合并、取消合并单元格](#233-合并取消合并单元格) 32 | - [2.3.5 练习题](#235-练习题) 33 | - [2.4 综合练习](#24-综合练习) 34 | - [2.4.1 将业务联系表.xlsx 拆分成以下两个 excel：](#241-将-业务联系表xlsx-拆分成以下两个-excel) 35 | - [2.4.2 将客户信息表.xlsx 和客户关系表.xlsx 合并成一个excel](#242-将-客户信息表xlsx-和-客户关系表xlsx-合并成一个excel) 36 | - [2.5 后记](#25-后记) 37 | 38 | ## 2.0 包的安装 39 | 40 | 操作难度：⭐ 41 | 42 | 打开 CMD/Terminal 进入到自己环境后，执行下面语句安装`openpyxl`模块。 43 | ```bash 44 | pip3 install openpyxl 45 | ``` 46 | 47 | 注：openpyxl可以读/写 .xlsx /.xlsm /.xltx /.xltm 的格式文件，但是不支持去读 /.xls 格式；读取 xls 格式，可以安装 **xlrd** 模块，`pip3 install xlrd`，本章节以 /.xlsx 格式为主。 48 | 49 | ## 2.1 Excel读取 50 | 51 | 项目难度：⭐ 52 | 53 | - Excel 全称为 Microsoft Office Excel，2003年版本的是 xls 格式，2007和2007年之后的版本是 xlsx 格式。 54 | - xlsx 格式通过 `openpyxl` 模块打开； xls 格式通过 `xlwt` 模块写，`xlrd` 模块读取。 55 | - 本文以 xlsx 模式为例 56 | 57 | ### 2.1.1 读取Excel中的工作表 58 | 59 | **关于路径：** 60 | 61 | 文件应在当前工作目录才可直接用相对路径引用，可导入`os`，使用函数`os.getcwd()`弄清楚当前工作目录是什么，可使用`os.chdir()`改变当前工作目录，具体可参考第一章节。（此处显现为相对路径） 62 | 63 | 64 | ```python 65 | # 获取当前工作目录 66 | import os 67 | print(os.getcwd()) 68 | 69 | import warnings 70 | warnings.filterwarnings('ignore') 71 | root_path = './OpenPyXL_test/' 72 | ``` 73 | 74 | #### 1. 读取Excel文件 `用户行为偏好.xlsx ` ，查看返回值属性 75 | 76 | 77 | ```python 78 | # 导入模块，查看属性 79 | import openpyxl 80 | 81 | wb = openpyxl.load_workbook(root_path+'用户行为偏好.xlsx') 82 | type(wb) 83 | ``` 84 | 85 | 86 | 87 | 88 | openpyxl.workbook.workbook.Workbook 89 | 90 | 91 | 92 | 【代码解释】 93 | 94 | 这里我们使用 openpyxl 中的 load_workbook 函数来加载指定的 xlsx 文件，。 95 | - openpyxl.load_workbook( 96 | filename, 97 | read_only=False, 98 | keep_vba=False, 99 | data_only=False, 100 | keep_links=True, 101 | ) 102 | 103 | load_workbook 函数有五个参数，除 filename 外，其他参数都有默认值，各参数含义如下： 104 | 105 | - `filename`: str 类型，表示要打开的文件的相对/绝对路径； 106 | - `read_only`: bool 类型，是否以只读模式打开文件，默认值为 False，可读写； 107 | - `keep_vba`: bool 类型，是否保留文件中的 vba 内容（即使保留了也不一定在代码中能使用），默认值为 False，不保留； 108 | - `data_only`: bool 类型，如果单元格中是 excel 公式，是以公式计算后的值的形式显示还是以公式内容形式显示，默认值为 False，以公式内容形式展示； 109 | - `keep_links`: bool 类型，是否保留单元格中的外链，默认值为 True，保留外链； 110 | 111 | - 返回值类型: `openpyxl.workbook.Workbook` 112 | 113 | 如无特殊要求，我们只需要指定`filename`参数即可。 114 | 115 | 116 | 【小知识】 117 | 118 | **import * 和from...import...** 119 | 120 | `import *`和`from...import...`的区别 121 | 122 | - `import`导入一个模块，相当于导入的是一个文件夹，相对路径。 123 | - `from...import...`导入了一个模块中的一个函数，相当于文件夹中的文件，绝对路径。 124 | 125 | #### 2. 查看对应工作簿包含的 sheet(工作表) 的名称，读取活动表 126 | 127 | 128 | ```python 129 | # 导入模块中的函数，查询对应表的名称 130 | print(wb.sheetnames) 131 | ``` 132 | 133 | ['订单时长分布', 'Sheet3'] 134 | 135 | 136 | 【代码解释】 137 | 138 | 这里我们使用 `openpyxl.workbook.Workbook` 类对象的 `sheetnames` 属性来获取读取的工作簿中包含的 sheet(工作表) 的名称。 139 | 140 | 通过上述代码输出内容，我们可以知道 `用户行为偏好.xlsx` 中包含两个 sheet(工作表)，分别是：订单时长分布、 Sheet3。 141 | 142 | 143 | ```python 144 | # 读取工作簿的活动表 145 | # 活动表是工作簿在 Excel 中打开时出现的工作表，在取得 Worksheet 对象后，可通过 title 属性取得它的名称。 146 | active_sheet = wb.active 147 | print(f'active_sheet对象: {active_sheet}') 148 | print(f'active_sheet 名称: {active_sheet.title}') 149 | ``` 150 | 151 | active_sheet对象: 152 | active_sheet 名称: 订单时长分布 153 | 154 | 155 | 【小知识】 156 | 157 | 活动表是可以修改的，在我们正常打开excel，完成修改后，保存excel，在关闭 excel 前显示的 sheet 就是活动表。 158 | 159 | #### 3. 查看指定sheet信息 160 | 161 | 162 | ```python 163 | # 通过传递表名字符串读取表、类型和名称、内容占据的大小 164 | sheet = wb.get_sheet_by_name('Sheet3') 165 | print(f'sheet: {sheet}') 166 | print(f'type(sheet): {type(sheet)}') 167 | print(f'sheet.title: {sheet.title}') 168 | print(f'sheet.dimensions: {sheet.dimensions}') 169 | ``` 170 | 171 | sheet: 172 | type(sheet): 173 | sheet.title: Sheet3 174 | sheet.dimensions: A1:I17 175 | 176 | 177 | 【代码解释】 178 | 179 | 这里我们使用 `openpyxl.workbook.Workbook` 类对象的 `get_sheet_by_name` 方法，通过指定 sheetname 的方式来获取读取的工作簿中指定的 sheet(工作表) 对象。 180 | 181 | 并使用 `openpyxl.worksheet.worksheet.Worksheet` 类对象的一些属性来获取 sheet 的基本信息，比如 `Worksheet.title`获取 sheet 名称，`Worksheet.dimensions` 获取 sheet 中值的范围。 182 | 183 | 184 | Workbook.get_sheet_by_name(name) 函数只有一个参数，就是：sheetname(工作表名称)，功能是：通过 sheetname 获取到 Worksheet 对象，除了通过函数的方式获取到 Worksheet 对象，你还可以提过索引的方式，如： 185 | ```python 186 | wb['Sheet3'] 187 | ``` 188 | 189 | ### 2.1.2 读取工作表中的单元格 190 | 191 | ![image-20211110131533928](./imgs/2.3.png) 192 | 193 | **Cell(Excel单元格)** 194 | 195 | - Cell 对象有一个 value 属性，包含这个单元格中保存的值。 196 | - Cell 对象也有 row 、column 和 coordinate 属性，提供该单元格的位置信息。 197 | - Excel 用字母指定列，在Z列之后，列开始使用两个字母：AA、AB等，所以在调用的 cell() 方法时，可传入整数作为 row 和 column 关键字参数，也可以得到一个单元格。 198 | - 注：第一行或第一列的整数取1，而不是0. 199 | 200 | 201 | ```python 202 | # 从表中取得单元格在 2.1.1 中我们已经读取过工作簿了返回结果存储变量为 wb 203 | ## 获取表格名称 204 | print(f'sheetnames: {wb.sheetnames}') 205 | ``` 206 | 207 | sheetnames: ['订单时长分布', 'Sheet3'] 208 | 209 | 210 | 211 | ```python 212 | # 获取指定sheet 213 | sheet = wb.get_sheet_by_name('订单时长分布') 214 | 215 | # 通过单元格位置获取单元格对象，如：B1 216 | a = sheet['B1'] 217 | print(f"sheet[B1']: {a}") 218 | 219 | # 获取并打印 B1 单元格的文本内容 220 | print(f"sheet[B1'].value: {a.value}") 221 | 222 | # 获取并打印 B1 单元格所在行、列和数值 223 | print(f'Row: {a.row}, Column: {a.column}') 224 | 225 | # 获取并打印 B1 单元格坐标和值 226 | print(f'Cell {a.coordinate} is {a.value}') 227 | ``` 228 | 229 | sheet[B1']: 230 | sheet[B1'].value: 日期 231 | Row: 1, Column: 2 232 | Cell B1 is 日期 233 | 234 | 235 | 236 | ```python 237 | # 获取并打印出 B列前8行的奇数行单元格的值 238 | for i in range(1,8,2): 239 | print(i, sheet.cell(row=i,column=2).value) 240 | ``` 241 | 242 | 1 日期 243 | 3 2020-07-24 00:00:00 244 | 5 2020-07-24 00:00:00 245 | 7 2020-07-24 00:00:00 246 | 247 | 248 | 249 | ```python 250 | # 确定表格的最大行数和最大列数，即表的大小 251 | print(f'sheet.max_row: {sheet.max_row}') 252 | print(f'sheet.max_column: {sheet.max_column}') 253 | ``` 254 | 255 | sheet.max_row: 14 256 | sheet.max_column: 4 257 | 258 | 259 | ### 2.1.3 读取多个单元格的值 260 | 261 | 262 | ```python 263 | # 方法一：直接通过sheet索引，A1到C8区域的值 264 | cells = sheet['A1:C8'] 265 | print(f'type(cells): {type(cells)} \n') 266 | 267 | # 遍历元组 print每一个cell值 268 | for rows in cells: 269 | for cell in rows: 270 | print(cell.value, end=" |") 271 | print("\n") 272 | ``` 273 | 274 | type(cells): 275 | 276 | 编号 |日期 |行为时长 | 277 | 278 | 71401.30952380953 |2020-07-24 00:00:00 |a | 279 | 280 | 71401.30952380953 |2020-07-24 00:00:00 |b | 281 | 282 | 71401.30952380953 |2020-07-24 00:00:00 |c | 283 | 284 | 71401.30952380953 |2020-07-24 00:00:00 |d | 285 | 286 | 71401.30952380953 |2020-07-24 00:00:00 |e | 287 | 288 | 71401.30952380953 |2020-07-24 00:00:00 |f | 289 | 290 | 71401.30952380953 |2020-07-24 00:00:00 |g | 291 | 292 | 293 | 294 | 295 | ```python 296 | # 方法二：sheet.iter_rows函数按行获取数据 297 | rows = sheet.iter_rows(min_row=1, max_row=8, min_col=1, max_col=3) 298 | # 遍历元组 print每一个cell值 299 | for row in rows: 300 | for cell in row: 301 | print(cell.value, end=" |") 302 | print("\n") 303 | ``` 304 | 305 | 编号 |日期 |行为时长 | 306 | 307 | 71401.30952380953 |2020-07-24 00:00:00 |a | 308 | 309 | 71401.30952380953 |2020-07-24 00:00:00 |b | 310 | 311 | 71401.30952380953 |2020-07-24 00:00:00 |c | 312 | 313 | 71401.30952380953 |2020-07-24 00:00:00 |d | 314 | 315 | 71401.30952380953 |2020-07-24 00:00:00 |e | 316 | 317 | 71401.30952380953 |2020-07-24 00:00:00 |f | 318 | 319 | 71401.30952380953 |2020-07-24 00:00:00 |g | 320 | 321 | 322 | 323 | 324 | ```python 325 | # 方法三：sheet.iter_cols函数按列获取数据 326 | cols = sheet.iter_cols(min_row=1, max_row=4, min_col=1, max_col=3) 327 | # 遍历元组 print每一个cell值 328 | for col in cols: 329 | for cell in col: 330 | print(cell.value, end=" |") 331 | print("\n") 332 | ``` 333 | 334 | 编号 |71401.30952380953 |71401.30952380953 |71401.30952380953 | 335 | 336 | 日期 |2020-07-24 00:00:00 |2020-07-24 00:00:00 |2020-07-24 00:00:00 | 337 | 338 | 行为时长 |a |b |c | 339 | 340 | 341 | 342 | ### 2.1.4 练习题 343 | 344 | 找出`用户行为偏好.xlsx`中 Sheet3 表中空着的格子，并输出这些格子的坐标 345 | 346 | 347 | ```python 348 | from openpyxl import load_workbook 349 | 350 | exl = load_workbook(root_path+'用户行为偏好.xlsx') 351 | sheet3 = exl.get_sheet_by_name('Sheet3') 352 | ``` 353 | 354 | 355 | ```python 356 | sheet3.dimensions 357 | ``` 358 | 359 | 360 | 361 | 362 | 'A1:I17' 363 | 364 | 365 | 366 | 367 | ```python 368 | # 直接通过sheet索引，sheet3.dimensions获取sheet数据区域 369 | cells = sheet3[sheet3.dimensions] 370 | 371 | # 遍历元组判断每一个cell值是否为空 372 | for rows in cells: 373 | for cell in rows: 374 | if not cell.value: 375 | print(f'{cell.coordinate} is None \n') 376 | ``` 377 | 378 | D3 is None 379 | 380 | D8 is None 381 | 382 | G10 is None 383 | 384 | 385 | 386 | ## 2.2 Excel写入 387 | 388 | 项目难度：⭐ 389 | 390 | ### 2.2.1 写入数据并保存 391 | 392 | #### 1. 原有工作簿中修改数据并保存 393 | 394 | 395 | ```python 396 | # 1) 导入 openpyxl 中的 load_workbook 函数 397 | from openpyxl import load_workbook 398 | 399 | # 2) 获取指定 excel文件对象 Workbook 400 | exl = load_workbook(filename=root_path+'用户行为偏好.xlsx') 401 | # 3) 通过指定 sheetname 从 Workbook 中获取 sheet 对象 Worksheet 402 | sheet = exl.get_sheet_by_name('Sheet3') 403 | # 4) 通过索引方式获取指定 cell 值，并重新赋值 404 | print(f"修改前 sheet['A1']: {sheet['A1'].value}") 405 | sheet['A1'].value = 'hello world' 406 | print(f"修改后 sheet['A1']: {sheet['A1'].value}") 407 | # 5) 保存修改后的内容 408 | # 如果 filename 和原文件同名，则是直接在原文件中修改； 409 | # 否则会新建一个 excel 文件，并保存内容 410 | exl.save(filename=root_path+'用户行为偏好_1.xlsx') # 保存到一个新文件中新文件名称为：用户行为偏好_1.xlsx 411 | ``` 412 | 413 | 修改前 sheet['A1']: 1 414 | 修改后 sheet['A1']: hello world 415 | 416 | 417 | 418 | ```python 419 | # 验证保存修改内容是否成功 420 | exl_1 = load_workbook(filename=root_path+'用户行为偏好_1.xlsx') 421 | # 我们将原表中 Sheet3 中的 A1 值改为了 'hello world' 422 | # 所以读取保存文件，查看对应值是否为 'hello world' 即可 423 | a1 = exl_1['Sheet3']['A1'].value 424 | if a1 == 'hello world': 425 | print(f"修改保存成功啦～，exl_1['Sheet3']['A1'].value = {a1}") 426 | else: 427 | print(f"修改保存有问题，现在exl_1['Sheet3']['A1'].value = {a1}") 428 | ``` 429 | 430 | 修改保存成功啦～，exl_1['Sheet3']['A1'].value = hello world 431 | 432 | 433 | 【代码解释】 434 | 435 | 从这里我们可以看到，我们只需要获取到 sheet 中的 cell 对象后，就可以通过改变 cell.value 的值来改变对应单元格中的值，然后使用 Workbook 对象的 save 函数可以将修改后的工作簿内容保存起来。 436 | 437 | #### 2. 创建新的表格写入数据并保存 438 | 439 | 440 | ```python 441 | # 1) 导入 openpyxl 中的 Workbook 类 442 | from openpyxl import Workbook 443 | 444 | # 2) 初始化一个 Workbook 对象 445 | wb = Workbook() 446 | print(f'默认sheet：{wb.sheetnames}') 447 | 448 | # 3) 通过 Workbook 对象的 create_sheet 函数创建一个 sheet 449 | # title sheet 名称 450 | # index sheet 位置，默认从0开始 451 | sheet = wb.create_sheet(title='mysheet', index=0) 452 | print(f'添加后sheet：{wb.sheetnames}') 453 | 454 | # 4) 在新建的 sheet 中写入数据 455 | # 比如在 A1 单元格中写入 'this is test' 456 | sheet['A1'].value = 'this is test' 457 | 458 | print(f"sheet['A1'].value = {sheet['A1'].value}") 459 | 460 | # 保存 461 | wb.save(root_path+'creat_sheet_test.xlsx') 462 | ``` 463 | 464 | 默认sheet：['Sheet'] 465 | 添加后sheet：['mysheet', 'Sheet'] 466 | sheet['A1'].value = this is test 467 | 468 | 469 | ### 2.2.2 将公式写入单元格保存 470 | 471 | 472 | ```python 473 | # 1) 导入 openpyxl 中的 load_workbook 函数 474 | from openpyxl import load_workbook 475 | 476 | # 2) 获取指定 excel文件对象 Workbook 477 | exl_1 = load_workbook(filename=root_path+'用户行为偏好_1.xlsx') 478 | # 3) 通过指定 sheetname 从 Workbook 中获取 sheet 对象 Worksheet 479 | sheet = exl_1['订单时长分布'] 480 | 481 | print(f'订单时长分布值范围: {sheet.dimensions}') #先查看原有表格的单元格范围，防止替代原有数据 482 | ``` 483 | 484 | 订单时长分布值范围: A1:D14 485 | 486 | 487 | 488 | ```python 489 | # 单元格 A15 中写入合计 490 | sheet['A15'].value = '合计' 491 | ``` 492 | 493 | 494 | ```python 495 | # 单元格 D15 中写入求和公式：SUM(D2:D14) 496 | sheet['D15'] = '=SUM(D2:D14)' 497 | exl_1.save(filename='用户行为偏好_1.xlsx') 498 | ``` 499 | 500 | 501 | ```python 502 | # 使用 xlwings 打开 excel 文件然后保存使写入的公式生效 503 | import xlwings as xw 504 | # 打开工作簿 505 | app = xw.App(visible=False, add_book=False) 506 | wb = app.books.open('用户行为偏好_1.xlsx') 507 | wb.save() 508 | # 关闭工作簿 509 | wb.close() 510 | app.quit() 511 | ``` 512 | 513 | 514 | ```python 515 | # 验证写入是否成功 516 | # 1) 获取指定 excel文件对象 Workbook， 517 | # 并设置 data_only=True，表示读取的时候如果单元格内是公式的话，以公式计算后的值的形式显示 518 | exl_2 = load_workbook(filename = '用户行为偏好_1.xlsx', data_only=True) 519 | # 2) 打印相关信息 520 | sheet = exl_2['订单时长分布'] 521 | print(f"sheet['A15']={sheet['A15'].value}，sheet['D15']={sheet['D15'].value}") 522 | print(f"{sheet['D1'].value} 求和值为SUM(D2:D14)={sheet['D15'].value}") 523 | ``` 524 | 525 | sheet['A15']=合计，sheet['D15']=4004.7261561561563 526 | 次数求和值为SUM(D2:D14)=4004.7261561561563 527 | 528 | 529 | 【注意】 530 | 531 | 即使设置了 data_only=True，也不能立即获取到刚刚添加的公式计算后的结果，需要自己手动/添加代码打开下对应excel表格，然后 ctrl s保存下，再运行上面代码才能获取到对应公式计算后的值。 532 | 533 | 你可以使用下面代码自动打开指定 excel 文件然后保存使写入的公式生效，使用前你需要安装 xlwings，输入`pip3 install xlwings`即可，再后面我们也会学习这个模块。 534 | 535 | ```python 536 | # 使用 xlwings 打开 excel 文件然后保存使写入的公式生效 537 | import xlwings as xw 538 | # 打开工作簿 539 | app = xw.App(visible=False, add_book=False) 540 | wb = app.books.open('用户行为偏好_1.xlsx') 541 | wb.save() 542 | # 关闭工作簿 543 | wb.close() 544 | app.quit() 545 | ``` 546 | 547 | ### 2.2.3 插入空列/行 548 | 549 | 550 | ```python 551 | # 获取指定 sheet 552 | sheet = exl_1['Sheet3'] 553 | 554 | # 插入列数据 insert_cols(idx,amount=1) 555 | # idx是插入位置，amount是插入列数，默认是1 556 | # idx=2第2列，第2列前插入一列 557 | sheet.insert_cols(idx=2) 558 | # 第2列前插入5 559 | # sheet.insert_cols(idx=2, amount=5) 560 | 561 | # 插入行数据 insert_rows(idx,amount=1) 562 | # idx是插入位置，amount是插入行数，默认是1 563 | # 在第二行前插入一行 564 | sheet.insert_rows(idx=2) 565 | # 第2行前插入5行 566 | # sheet.insert_rows(idx=2, amount=5) 567 | 568 | exl_1.save(filename=root_path+'用户行为偏好_1.xlsx') 569 | ``` 570 | 571 | ### 2.2.4 删除 572 | 573 | 574 | ```python 575 | # 删除多列 576 | sheet.delete_cols(idx=5, amount=2) 577 | # 删除多行 578 | sheet.delete_rows(idx=2, amount=5) 579 | 580 | exl_1.save(filename=root_path+'用户行为偏好_1.xlsx') 581 | ``` 582 | 583 | ### 2.2.5 移动 584 | 585 | 当数字为正即向下或向右，为负即为向上或向左 586 | 587 | 588 | ```python 589 | # 移动 590 | # 当数字为正即向下或向右，为负即为向上或向左 591 | sheet.move_range('B3:E16',rows=1,cols=-1) 592 | exl_1.save(filename=root_path+'用户行为偏好_1.xlsx') 593 | ``` 594 | 595 | ## 2.3 Excel 样式 596 | 597 | 项目难度：⭐⭐ 598 | 599 | ### 2.3.1设置字体样式 600 | 601 | #### 1. 设置单个 cell(单元格) 字体样式 602 | 603 | `Font(name字体名称,size大小,bold粗体,italic斜体,color颜色)` 604 | 605 | 606 | ```python 607 | # 1) 导入 openpyxl 中的 load_workbook 函数 608 | # 导入 openpyxl 中的 styles 模块中的 Font 类 609 | from openpyxl import load_workbook 610 | from openpyxl.styles import Font 611 | 612 | # 2) 获取指定 excel文件对象 Workbook 613 | exl_1 = load_workbook(filename=root_path+'用户行为偏好_1.xlsx') 614 | # 3) 通过指定 sheetname 从 Workbook 中获取 sheet 对象 Worksheet 615 | sheet = exl_1['订单时长分布'] 616 | ``` 617 | 618 | 619 | ```python 620 | # 4) 获取到指定 cell 后，查看cell字体属性 621 | cell = sheet['A1'] 622 | cell.font 623 | ``` 624 | 625 | 626 | 627 | 628 | 629 | Parameters: 630 | name='宋体', charset=134, family=3.0, b=True, i=False, strike=None, outline=None, shadow=None, condense=None, color= 631 | Parameters: 632 | rgb=None, indexed=None, auto=None, theme=1, tint=0.0, type='theme', extend=None, sz=11.0, u=None, vertAlign=None, scheme='minor' 633 | 634 | 635 | 636 | 637 | ```python 638 | # 5) 实例化一个 Font 对象，设置字体样式 639 | # 字体改为：黑体大小改为：20 设置为：加粗斜体红色 640 | font = Font(name='黑体', size=20, bold=True, italic=True, color='FF0000') 641 | cell.font = font 642 | # 6) 保存修改 643 | exl_1.save(filename=root_path+'用户行为偏好_1.xlsx') 644 | ``` 645 | 646 | #### 2. 设置多个 cell 的字体样式 647 | 648 | 649 | ```python 650 | # 上面我们已经获取到了 '用户行为偏好_1.xlsx' 中的订单时长分布工作表 651 | # 我们处理了单元格 A1 的字体样式，我们也可以通过遍历的形式，批量设置单元格字体样式 652 | 653 | # 1) 获取要处理的单元格 654 | # 通过 sheet 索引获取第二行 cell 655 | # 获取列可以用字母索引，如 sheet['A'] 获取第一列 cell 656 | cells = sheet[2] 657 | # 2) 实例化一个 Font 对象，设置字体样式 658 | # 字体改为：黑体大小改为：10 设置为：加粗斜体红色 659 | font = Font(name='黑体', size=10, bold=True, italic=True, color='FF0000') 660 | # 3) 遍历给每一个 cell 都设置上对应字体样式 661 | for cell in cells: 662 | cell.font = font 663 | # 4) 保存修改 664 | exl_1.save(filename=root_path+'用户行为偏好_1.xlsx') 665 | ``` 666 | 667 | ### 2.3.2 设置边框样式 668 | 669 | #### 1. 设置单元格边框样式 670 | 671 | `Side`：边线样式设置类，边线颜色等 672 | 673 | Side(style=None, color=None, border_style=None) 674 | 675 | - style：边线的样式，有以下值可选：double, mediumDashDotDot, slantDashDot, dashDotDot, dotted, hair, mediumDashed, dashed, dashDot, thin, mediumDashDot, medium, thick 676 | - color：边线颜色 677 | - border_style：style 的别名，必须设置，一般直接设置 border_style 就行，不用设置 style 678 | 679 | `Border`：边框定位类，左右上下边线 680 | 681 | Border常用参数解释： 682 | 683 | - top bottom left right diagonal：上下左右和对角线的边线样式，为 Side 对象 684 | - diagonalDown：对角线从左上角向右下角方向，默认为 False 685 | - diagonalUp：对角线从右上角向左下角方向，默认为 False 686 | 687 | 688 | ```python 689 | # 上面我们已经获取到了 '用户行为偏好_1.xlsx' 中的订单时长分布工作表 sheet 690 | # 1) 导入 openpyxl 中的 styles 模块中的 Side, Border 类 691 | from openpyxl.styles import Side, Border 692 | # 2) 首先初始化一个边线对象（也可以设置多个） 693 | side = Side(border_style='double', color='FF000000') 694 | # 3) 通过 Border 去设置整个单元格边框样式 695 | border = Border(left=side, right=side, top=side, bottom=side, diagonal=side, diagonalDown=True, diagonalUp=True) 696 | ``` 697 | 698 | 699 | ```python 700 | # 4) 查看目前单元格边框样式 701 | # 获取第一行 cells 702 | cells = sheet[1] 703 | # 取出一个 cell 看边框样式 704 | cells[0].border 705 | ``` 706 | 707 | 708 | 709 | 710 | 711 | Parameters: 712 | outline=True, diagonalUp=False, diagonalDown=False, start=None, end=None, left= 713 | Parameters: 714 | style=None, color=None, right= 715 | Parameters: 716 | style=None, color=None, top= 717 | Parameters: 718 | style=None, color=None, bottom= 719 | Parameters: 720 | style=None, color=None, diagonal= 721 | Parameters: 722 | style=None, color=None, vertical=None, horizontal=None 723 | 724 | 725 | 726 | 727 | ```python 728 | # 5) 修改边框样式，并保存修改 729 | for cell in cells: 730 | cell.border = border 731 | exl_1.save(filename=root_path+'用户行为偏好_1.xlsx') 732 | ``` 733 | 734 | ### 2.3.3 设置单元格其他样式 735 | 736 | #### 1. 设置单元格背景色 737 | 738 | 739 | ```python 740 | # 上面我们已经获取到了 '用户行为偏好_1.xlsx' 中的订单时长分布工作表 sheet 741 | # 1) 从 openpyxl.styles 中导入背景颜色设置类 PatternFill, GradientFill 742 | from openpyxl.styles import PatternFill, GradientFill 743 | 744 | # 2) 实例化 PatternFill 对象，fill_type 参数必须指定 745 | pattern_fill = PatternFill(fill_type='solid',fgColor="DDDDDD") 746 | # 3) 实例化 GradientFill 对象，填充类型 type 默认为 linear 747 | gradient_fill = GradientFill(stop=('FFFFFF', '99ccff','000000')) 748 | ``` 749 | 750 | 751 | ```python 752 | # 4) 获取指定 cells 遍历填充 753 | # 对第三行 PatternFill 模式设置背景色 754 | cells = sheet[3] 755 | for cell in cells: 756 | cell.fill = pattern_fill 757 | 758 | # 对第四行 GradientFill 模式设置背景色 759 | cells = sheet[4] 760 | for cell in cells: 761 | cell.fill = gradient_fill 762 | 763 | # 5) 保存修改 764 | exl_1.save(filename=root_path+'用户行为偏好_1.xlsx') 765 | ``` 766 | 767 | #### 2.设置水平居中 768 | 769 | openpyxl.styles 中的 Alignment 类常用参数介绍： 770 | 771 | - horizontal：水平对齐，常见值 `distributed, justify, center, left, fill, centerContinuous, right, general` 772 | - vertical：垂直对齐，常见值 `bottom, distributed, justify, center, top` 773 | - textRotation：文字旋转角度，数值：0-180 774 | - wrapText：是否自动换行，bool值，默认 False 775 | 776 | 777 | ```python 778 | # 上面我们已经获取到了 '用户行为偏好_1.xlsx' 中的订单时长分布工作表 sheet 779 | # 1) 从 openpyxl.styles 中导入对齐方式设置类 Alignment 780 | from openpyxl.styles import Alignment 781 | 782 | # 2) 实例化一个 Alignment 对象，设置水平、垂直居中 783 | alignment = Alignment(horizontal='center', vertical='center') 784 | 785 | # 3) 获取指定 cells 遍历填充 786 | # 对第五行数据设置上面的对齐方式 787 | cells = sheet[5] 788 | for cell in cells: 789 | cell.alignment = alignment 790 | # 4) 保存修改 791 | exl_1.save(filename=root_path+'用户行为偏好_1.xlsx') 792 | ``` 793 | 794 | #### 3. 设置行高与列宽 795 | 796 | 797 | ```python 798 | # 1) 设置行高，通过 row_dimensions 和 column_dimensions 来获取行和列对象 799 | # 2) 设置第1行行高为 30 800 | sheet.row_dimensions[1].height = 30 801 | # 3) 设置第3列列款为 24 802 | sheet.column_dimensions['C'].width = 24 803 | # 4) 保存修改 804 | exl_1.save(filename=root_path+'用户行为偏好_1.xlsx') 805 | ``` 806 | 807 | ### 2.3.3 合并、取消合并单元格 808 | 809 | 810 | ```python 811 | # 注意：合并后的单元格只会显示合并区域中最右上角的单元格的值，会导致其他单元格内容丢失 812 | # 上面我们已经获取到了 '用户行为偏好_1.xlsx' 对象 exl_1，我们可以通过 exl_1 来索引获取自己想要的 sheet 813 | # 1) 获取 Sheet3 这个工作表 814 | sheet = exl_1['Sheet3'] 815 | 816 | # 合并指定区域单元格 817 | sheet.merge_cells('A1:B2') 818 | 819 | # sheet.merge_cells(start_row=1, start_column=3, 820 | # end_row=2, end_column=4) 821 | 822 | # 保存修改 823 | exl_1.save(filename=root_path+'用户行为偏好_1.xlsx') 824 | ``` 825 | 826 | 827 | ```python 828 | # 解除合并 829 | sheet.unmerge_cells('A1:B2') 830 | 831 | # sheet.unmerge_cells(start_row=1, start_column=3, 832 | # end_row=2, end_column=4) 833 | 834 | # 保存修改 835 | exl_1.save(filename=root_path+'用户行为偏好_1.xlsx') 836 | ``` 837 | 838 | ### 2.3.5 练习题 839 | 840 | 打开 test.xlsx 文件，找出文件中购买数量 `buy_mount` 超过5的单元格，并对其标红、加粗、加上红色边框。 841 | 842 | 843 | ```python 844 | # 1) 导入 openpyxl 相关函数和类 845 | from openpyxl import load_workbook 846 | from openpyxl.styles import Font, Side, Border 847 | 848 | # 2) 读取 test.xlsx 文件，并筛选出 buy_mount 这一列 849 | workbook = load_workbook(root_path+'test.xlsx') 850 | sheet = workbook.active 851 | buy_mount = sheet['B'] 852 | ``` 853 | 854 | 855 | ```python 856 | # 3) 设置边框文字样式 857 | side = Side(style='thin', color='FF0000') 858 | border = Border(left=side, right=side, top=side, bottom=side) 859 | font = Font(bold=True, color='FF0000') 860 | ``` 861 | 862 | 863 | ```python 864 | # 4) 遍历判断 cell 值是否满足筛选条件 865 | for cell in buy_mount: 866 | if isinstance(cell.value, float) and cell.value > 5: 867 | cell.font = font 868 | cell.border = border 869 | # 5) 修改内容另存为 new_test.xlsx 870 | workbook.save(root_path+'new_test.xlsx') 871 | ``` 872 | 873 | ## 2.4 综合练习 874 | 875 | ### 2.4.1 将业务联系表.xlsx 拆分成以下两个 excel： 876 | - 客户信息表：客户名称客户地址客户方负责人性别联系电话对接业务经理编号 877 | - 业务经理信息表：业务经理编号所在分区所在区域业务经理姓名 878 | 879 | 880 | ```python 881 | # 1) 导入 openpyxl 相关函数和类 882 | from openpyxl import load_workbook, Workbook 883 | 884 | # 2) 读取原表数据 885 | wb = load_workbook(root_path+'业务联系表.xlsx') 886 | # 3) 获取工作表 887 | sheet = wb.active 888 | ``` 889 | 890 | 891 | ```python 892 | # 草稿纸 893 | # 我们知道我们表格的实际列名在第二行 894 | # 获取每列第二行的坐标和值 895 | for i in sheet[2]: 896 | print(i.coordinate, i.value) 897 | ``` 898 | 899 | A2 业务经理编号 900 | B2 分区 901 | C2 区域 902 | D2 业务经理 903 | E2 客户名称 904 | F2 客户地址 905 | G2 客户方负责人 906 | H2 性别 907 | I2 联系电话 908 | J2 备注 909 | 910 | 911 | 912 | ```python 913 | sheet.max_column, sheet.max_row 914 | ``` 915 | 916 | 917 | 918 | 919 | (10, 57) 920 | 921 | 922 | 923 | 924 | ```python 925 | # 4) 筛选出需要的列 926 | # 4.1) 客户信息表：客户名称客户地址客户方负责人性别联系电话备注对接业务经理编号 927 | cust_info = {'业务经理编号': 'A', '客户名称': 'B', '客户地址': 'C', '客户方负责人': 'D', '性别': 'E', '联系电话': 'F', '备注': 'G'} 928 | 929 | # 4.2) 新建一个工作簿，并将默认sheet名称改成客户信息 930 | cust_info_excel = Workbook() 931 | cust_info_sh = cust_info_excel.active 932 | cust_info_sh.title = '客户信息' 933 | ``` 934 | 935 | 936 | ```python 937 | # 4.3) 遍历筛选，如果是需要的表头，就将该列的值复制到新的工作簿中的客户信息工作表中 938 | for i in sheet[2]: 939 | if i.value in cust_info: 940 | # 遍历将这一列中除了第一个cell外的所有cell值复制到新表 941 | for cell in sheet[i.coordinate[0]]: 942 | if cell.row == 1: 943 | continue 944 | cust_info_sh[f'{cust_info[i.value]}{cell.row-1}'].value = cell.value 945 | ``` 946 | 947 | 948 | ```python 949 | # 5) 筛选出需要的列 950 | # 5.1) 业务经理信息表：业务经理编号所在分区所在区域业务经理姓名 951 | manager_info = {'业务经理编号': 'A', '分区': 'B', '区域': 'C', '业务经理': 'D'} 952 | 953 | # 5.2) 新建一个工作簿，并将默认sheet名称改成客户信息 954 | manager_info_excel = Workbook() 955 | manager_info_sh = manager_info_excel.active 956 | manager_info_sh.title = '业务经理信息' 957 | ``` 958 | 959 | 960 | ```python 961 | # 5.3) 遍历筛选，如果是需要的表头，就将该列的值复制到新的工作簿中的业务经理信息工作表中 962 | for i in sheet[2]: 963 | if i.value in manager_info: 964 | # 遍历将这一列中除了第一个cell外的所有cell值复制到新表 965 | for cell in sheet[i.coordinate[0]]: 966 | if cell.row == 1: 967 | continue 968 | manager_info_sh[f'{manager_info[i.value]}{cell.row-1}'].value = cell.value 969 | ``` 970 | 971 | 972 | ```python 973 | # 6.1 ) 保存客户信息表工作簿内容 974 | cust_info_excel.save(root_path+'客户信息表_xl.xlsx') 975 | # 6.2) 保存业务经理信息表工作簿内容 976 | manager_info_excel.save(root_path+'业务经理信息表_xl.xlsx') 977 | ``` 978 | 979 | 以上，虽然完成了数据拆分，但是对于进一步数据处理，继续使用 openpyxl 并不是很便捷，比如数据去重，筛选等，接下来我将给大家介绍如何使用 pandas 更便捷的处理 excel 数据。 980 | 981 | 982 | ```python 983 | import pandas as pd 984 | 985 | # 1) 读取数据 986 | data = pd.read_excel(root_path+'业务联系表.xlsx', header=1) 987 | ``` 988 | 989 | 990 | ```python 991 | # 2) 数据筛选处理 992 | # 2.1) 客户信息表 993 | # 筛选出客户信息表需要的列 994 | cust_info_pd = data[['业务经理编号', '客户名称', '客户地址', '客户方负责人', '性别', '联系电话', '备注']] 995 | # 去除重复行 996 | cust_info_pd.drop_duplicates(inplace=True) 997 | # 打印出前三行 998 | cust_info_pd.head(3) 999 | ``` 1000 | 1001 | 1002 | 1003 | 1004 |

1005 | 1018 | 1019 | 1020 | 1021 | 1022 | 1023 | 1024 | 1025 | 1026 | 1027 | 1028 | 1029 | 1030 | 1031 | 1032 | 1033 | 1034 | 1035 | 1036 | 1037 | 1038 | 1039 | 1040 | 1041 | 1042 | 1043 | 1044 | 1045 | 1046 | 1047 | 1048 | 1049 | 1050 | 1051 | 1052 | 1053 | 1054 | 1055 | 1056 | 1057 | 1058 | 1059 | 1060 | 1061 | 1062 | 1063 |

	业务经理编号	客户名称	客户地址	客户方负责人	性别	联系电话	备注
0	1	尹承望	***-*-**	孙康适	男	*--*	NaN
1	1	何茂材	***-*-**	孙康适	男	*--*	NaN
2	1	徐新霁	***-*-**	孙康适	男	*--*	NaN

1064 |

1065 | 1066 | 1067 | 1068 | 1069 | ```python 1070 | # 2.2) 业务经理信息表 1071 | # 筛选出业务经理信息表需要的列，并打印出前三行 1072 | manager_info_pd = data[['业务经理编号', '分区', '区域', '业务经理']] 1073 | # 去除重复行 1074 | manager_info_pd.drop_duplicates(inplace=True) 1075 | # 打印出前三行 1076 | manager_info_pd.head(3) 1077 | ``` 1078 | 1079 | 1080 | 1081 | 1082 |

1083 | 1096 | 1097 | 1098 | 1099 | 1100 | 1101 | 1102 | 1103 | 1104 | 1105 | 1106 | 1107 | 1108 | 1109 | 1110 | 1111 | 1112 | 1113 | 1114 | 1115 | 1116 | 1117 | 1118 | 1119 | 1120 | 1121 | 1122 | 1123 | 1124 | 1125 | 1126 | 1127 | 1128 | 1129 |

	业务经理编号	分区	区域	业务经理
0	1	南区	贵州	占亮
5	2	南区	贵州	李朝华
11	3	北区	河北	王一磊

1130 |

1131 | 1132 | 1133 | 1134 | 1135 | ```python 1136 | # 3) 数据保存 1137 | cust_info_pd.to_excel(root_path+'客户信息表_pd.xlsx', index=None) 1138 | manager_info_pd.to_excel(root_path+'业务经理信息表_pd.xlsx', index=None) 1139 | ``` 1140 | 1141 | ### 2.4.2 将客户信息表.xlsx 和客户关系表.xlsx 合并成一个excel 1142 | 1143 | 1144 | 1145 | ```python 1146 | # 接上面的，将客户信息表.xlsx 和客户关系表.xlsx 合并成一个excel 1147 | # 这里我们依然用 pandas 来处理 1148 | business_contact = pd.merge(manager_info_pd, cust_info_pd, on='业务经理编号') 1149 | # 查看合并后数据基本信息 1150 | business_contact.info() 1151 | ``` 1152 | 1153 | 1154 | Int64Index: 55 entries, 0 to 54 1155 | Data columns (total 10 columns): 1156 | # Column Non-Null Count Dtype 1157 | --- ------ -------------- ----- 1158 | 0 业务经理编号 55 non-null int64 1159 | 1 分区 55 non-null object 1160 | 2 区域 55 non-null object 1161 | 3 业务经理 55 non-null object 1162 | 4 客户名称 55 non-null object 1163 | 5 客户地址 55 non-null object 1164 | 6 客户方负责人 55 non-null object 1165 | 7 性别 55 non-null object 1166 | 8 联系电话 55 non-null object 1167 | 9 备注 0 non-null float64 1168 | dtypes: float64(1), int64(1), object(8) 1169 | memory usage: 4.7+ KB 1170 | 1171 | 1172 | 1173 | ```python 1174 | # 查看前10条数据 1175 | business_contact.head(10) 1176 | ``` 1177 | 1178 | 1179 | 1180 | 1181 |

1182 | 1195 | 1196 | 1197 | 1198 | 1199 | 1200 | 1201 | 1202 | 1203 | 1204 | 1205 | 1206 | 1207 | 1208 | 1209 | 1210 | 1211 | 1212 | 1213 | 1214 | 1215 | 1216 | 1217 | 1218 | 1219 | 1220 | 1221 | 1222 | 1223 | 1224 | 1225 | 1226 | 1227 | 1228 | 1229 | 1230 | 1231 | 1232 | 1233 | 1234 | 1235 | 1236 | 1237 | 1238 | 1239 | 1240 | 1241 | 1242 | 1243 | 1244 | 1245 | 1246 | 1247 | 1248 | 1249 | 1250 | 1251 | 1252 | 1253 | 1254 | 1255 | 1256 | 1257 | 1258 | 1259 | 1260 | 1261 | 1262 | 1263 | 1264 | 1265 | 1266 | 1267 | 1268 | 1269 | 1270 | 1271 | 1272 | 1273 | 1274 | 1275 | 1276 | 1277 | 1278 | 1279 | 1280 | 1281 | 1282 | 1283 | 1284 | 1285 | 1286 | 1287 | 1288 | 1289 | 1290 | 1291 | 1292 | 1293 | 1294 | 1295 | 1296 | 1297 | 1298 | 1299 | 1300 | 1301 | 1302 | 1303 | 1304 | 1305 | 1306 | 1307 | 1308 | 1309 | 1310 | 1311 | 1312 | 1313 | 1314 | 1315 | 1316 | 1317 | 1318 | 1319 | 1320 | 1321 | 1322 | 1323 | 1324 | 1325 | 1326 | 1327 | 1328 | 1329 | 1330 | 1331 | 1332 | 1333 | 1334 | 1335 | 1336 | 1337 | 1338 | 1339 | 1340 | 1341 | 1342 | 1343 |

	业务经理编号	分区	区域	业务经理	客户名称	客户地址	客户方负责人	性别	联系电话	备注
0	1	南区	贵州	占亮	尹承望	***-*-**	孙康适	男	*--*	NaN
1	1	南区	贵州	占亮	何茂材	***-*-**	孙康适	男	*--*	NaN
2	1	南区	贵州	占亮	徐新霁	***-*-**	孙康适	男	*--*	NaN
3	1	南区	贵州	占亮	郭承悦	***-*-**	邓翰翮	男	*--*	NaN
4	1	南区	贵州	占亮	梁浩思	***-*-**	邓翰翮	男	*--*	NaN
5	2	南区	贵州	李朝华	毛英朗	***-*-**	邓翰翮	男	*--*	NaN
6	2	南区	贵州	李朝华	侯俊美	***-*-**	任敏智	女	*--*	NaN
7	2	南区	贵州	李朝华	许高轩	***-*-**	任敏智	女	*--*	NaN
8	2	南区	贵州	李朝华	段英豪	***-*-**	任敏智	女	*--*	NaN
9	2	南区	贵州	李朝华	汤承福	***-*-**	任敏智	女	*--*	NaN

1344 |

1345 | 1346 | 1347 | 1348 | 1349 | ```python 1350 | # 数据保存 1351 | manager_info_pd.to_excel(root_path+'业务联系表_pd.xlsx', index=None) 1352 | ``` 1353 | 1354 | ## 2.5 后记 1355 | 1356 | - Python与Excel的自动化内容较多，此篇重在介绍基础，起到抛砖引玉的学习效果。 1357 | - 后面还给大家介绍了 pandas 处理excel的案例，比较简单，大家实际工作、学习中可以按自己需要使用不同框架 1358 | 1359 | 1360 | ```python 1361 | 1362 | ``` 1363 | -------------------------------------------------------------------------------- /Task02-Python与Excel/Python_Excel_OpenPyXL.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/office-automation/cabf0171cefe33aa53c547cc40e9910f752dbf06/Task02-Python与Excel/Python_Excel_OpenPyXL.pdf -------------------------------------------------------------------------------- /Task02-Python与Excel/Python_Excel_XLWings.md: -------------------------------------------------------------------------------- 1 | # Task 02 Python Excel 自动化之 XLWings 2 | 3 | 4 | - [Task 02 Python Excel 自动化之 XLWings](#task-02-python-excel-自动化之-xlwings) 5 | - [2.0 模块基本介绍与使用](#20-模块基本介绍与使用) 6 | - [2.1 xlwings模块实战](#21-xlwings模块实战) 7 | - [2.1.1 基础语法一览](#211-基础语法一览) 8 | - [2.1.2 单元格样式设置](#212-单元格样式设置) 9 | - [2.1.3 Excel中生成统计图或者插入图片](#213-excel中生成统计图或者插入图片) 10 | - [2.2 实战练习](#22-实战练习) 11 | - [2.2.1 将消费数据可视化生成带平均线的趋势图，存入excel](#221-将消费数据可视化生成带平均线的趋势图存入excel) 12 | - [2.2.2 将股票数据以指定的格式存储到excel并生成股票走势图](#222-将股票数据以指定的格式存储到excel并生成股票走势图) 13 | 14 | ## 2.0 模块基本介绍与使用 15 | 16 | **xlwings** 17 | 18 | 基本介绍：用于Python与Excel之间的交互，可以轻松地从 Excel 调用 Python，也可以利用Python自动化操作Excel，调用VBA，非常方便。 19 | 20 | 项目地址：https://github.com/xlwings/xlwings 21 | 22 | ![xlwings-principle](./imgs/Python_Excel_XLWings/xlwings-principle.png) 23 | 基本使用方法：新建一个excel文件，取名为xlwings_wb.xlsx，并新建一个sheet，取名为first_sht，在其A1单元格内插入字符串`Datawhale`。 24 | 25 | 打开 CMD/Terminal 进入到自己环境后，执行下面语句安装 xlwings 模块。 26 | ```python 27 | pip3 install xlwings 28 | ``` 29 | 30 | 31 | ```python 32 | root_path = './XLWings_test/' 33 | ``` 34 | 35 | 36 | ```python 37 | # 导入xlwings，并起一个别名 xw，方便操作 38 | import xlwings as xw 39 | 40 | # 1、创建一个app应用，打开Excel程序 41 | # visible=True 表示打开操作Excel过程可见初次接触可以设置为True，了解其过程 42 | # add_book=False 表示启动app后不用新建个工作簿 43 | app = xw.App(visible=True, add_book=False) 44 | 45 | # 2、新建一个工作簿 46 | wb = app.books.add() 47 | 48 | # 3、新建一个sheet，并操作 49 | # 3.1 新建sheet 起名为first_sht 50 | sht = wb.sheets.add('first_sht') 51 | # 3.2 在新建的sheet表中A1位置插入一个值：Datawhale 52 | sht.range('A1').value = 'Datawhale' 53 | # 3.3 保存新建的工作簿，并起一个名字 54 | wb.save(root_path+'xlwings_wb.xlsx') 55 | 56 | # 4、关闭工作簿 57 | wb.close() 58 | 59 | # 5、程序运行结束，退出Excel程序 60 | app.quit() 61 | ``` 62 | 63 | 通过简单五步，我们就可以完成新建一个excel，并向其中指定sheet中的指定位置输入值了。 64 | ![code-result](./imgs/Python_Excel_XLWings/code-result.png) 65 | 66 | ## 2.1 xlwings模块实战 67 | 68 | ### 2.1.1 基础语法一览 69 | 70 | - 导包 71 | 72 | 73 | ```python 74 | # 基础导入包 75 | import xlwings as xw # 程序第一步 76 | ``` 77 | 78 | - 打开关闭Excel程序（理解成excel软件打开、关闭） 79 | 80 | 81 | ```python 82 | # visible=True 表示打开操作Excel过程可见初次接触可以设置为True，了解其过程 83 | # add_book=False 表示启动app后不用新建个工作簿 84 | app = xw.App(visible=True, add_book=False) # 程序第二步 85 | 86 | # 关闭excel程序 87 | # app.quit() # 程序最后一步 88 | ``` 89 | 90 | - 工作簿相关操作（理解成excel文件） 91 | 92 | 93 | ```python 94 | # 1、新建一个工作簿 95 | # wb = app.books.add() # 程序第三步 96 | 97 | # 2、保存新建的工作簿，并起一个名字 98 | # 程序倒数第三步，非常关键，保存操作数据结果 99 | # wb.save(root_path+'xlwings_wb.xlsx') 100 | 101 | 102 | # 3、打开一个已经存在的工作簿 103 | wb = app.books.open(root_path+'xlwings_wb.xlsx') # 程序第三步 104 | 105 | # 4、关闭工作簿 106 | # wb.close() # 程序倒数第二步 107 | ``` 108 | 109 | - sheet相关操作（理解成工作表） 110 | 111 | 112 | ```python 113 | # 在工作簿中新建一个sheet，起名为 second_sht 114 | sht1 = wb.sheets.add('second_sht') 115 | print('sht1:', sht1) 116 | 117 | # 选中已经存在的sheet 118 | sht2 = wb.sheets('first_sht') 119 | print('sht2:', sht2) 120 | 121 | # 也可以通过索引选择已存在的sheet 122 | sht3 = wb.sheets[0] # 选中工作簿中的第一个sheet 123 | print('sht3:', sht3) 124 | 125 | # 获取工作簿中工作表的个数 126 | sht_nums = wb.sheets.count 127 | print('工作簿中的sheet个数为：%d'% sht_nums) 128 | 129 | # 当前工作表名字 130 | print('sht1.name:', sht1.name) 131 | 132 | # 获取指定sheet中数据的行数 133 | print('sht1.used_range.last_cell.row:', sht1.used_range.last_cell.row) 134 | 135 | # 获取指定sheet中数据的列数 136 | print('sht1.used_range.last_cell.column:', sht1.used_range.last_cell.column) 137 | 138 | # 删除指定的sheet 比如删除：first_sht 139 | wb.sheets('first_sht').delete() 140 | ``` 141 | 142 | sht1: 143 | sht2: 144 | sht3: 145 | 工作簿中的sheet个数为：3 146 | sht1.name: second_sht 147 | sht1.used_range.last_cell.row: 1 148 | sht1.used_range.last_cell.column: 1 149 | 150 | 151 | - 单元格相关操作（就是excel单元格子） 152 | 153 | 154 | ```python 155 | ''' 156 | 写入 157 | ''' 158 | # 在工作表中指定位置插入数据 159 | sht1.range('B1').value = 'Datawhale' 160 | 161 | # 在工作表指定位置插入多个数据默认是横向插入 162 | sht1.range('B2').value = ['DATAWHALE', 'FOR', 'THE', 'LEARNER'] 163 | 164 | # 在工作表指定位置竖向插入多个数据 165 | # 设置 options(transpose=True)，transpose=True 表示转置的意思 166 | sht1.range('B3').options(transpose=True).value = [1, 2, 3, 4] 167 | 168 | # 在工作表指定位置开始插入多行数据 169 | sht1.range('B7').value = [['a', 'b'], ['c', 'd']] 170 | 171 | # 在工作表指定位置开始插入多列数据 172 | sht1.range('B9').options(transpose=True).value = [['a', 'b'], ['c', 'd']] 173 | 174 | # 向单元格写入公式 175 | sht1.range('F2').formula = '=sum(B2:E2)' 176 | ``` 177 | 178 | 运行结果： 179 | ![xlwings-write](./imgs/Python_Excel_XLWings/xlwings-write.png) 180 | 181 | 182 | ```python 183 | ''' 184 | 读取 185 | ''' 186 | # 在工作表中读取指定位置数据 187 | print('单元格B1=', sht1.range('B1').value) 188 | 189 | # 在工作表中读取指定区域数据一行 190 | print('单元格B2:F2=', sht1.range('B2:F2').value) 191 | 192 | # 在工作表中读取指定区域数据一列 193 | print('单元格B3:B6=', sht1.range('B3:B6').value) 194 | 195 | # 在工作表中读取指定区域数据一个区域 196 | # 设置options(transpose=True)就可以按列读不设置就是按行读 197 | print('单元格B7:C10=', sht1.range('B7:C10').options(transpose=True).value) 198 | ``` 199 | 200 | 单元格B1= Datawhale 201 | 单元格B2:F2= ['DATAWHALE', 'FOR', 'THE', 'LEARNER', 0.0] 202 | 单元格B3:B6= [1.0, 2.0, 3.0, 4.0] 203 | 单元格B7:C10= [['a', 'c', 'a', 'b'], ['b', 'd', 'c', 'd']] 204 | 205 | 206 | 207 | ```python 208 | ''' 209 | 删除 210 | ''' 211 | # 删除指定单元格中的数据 212 | sht1.range('B10').clear() 213 | 214 | # 删除指定范围内单元格数据 215 | sht1.range('B7:B9').clear() 216 | ``` 217 | 218 | ### 2.1.2 单元格样式设置 219 | 220 | 221 | ```python 222 | ''' 223 | 格式修改 224 | ''' 225 | # 选中已经存在的sheet 226 | sht1 = wb.sheets('second_sht') 227 | # 返回单元格绝对路径 228 | sht1.range('B3').get_address() 229 | # sht1.range('B3').address 230 | 231 | # 合并单元格B3 C3 232 | sht1.range('B3:C3').api.merge() 233 | 234 | # 解除合并单元格B3 C3 235 | # sht1.range('B3:C3').api.unmerge() 236 | 237 | # 向指定单元格添加带超链接文本 238 | # address- 超连接地址 239 | # text_to_display- 超链接文本内容 240 | # screen_tip- 鼠标放到超链接上后显示提示内容 241 | sht1.range('C2').add_hyperlink(address='https://datawhale.club', 242 | text_to_display='DATAWHALE 官网', 243 | screen_tip='点击查看 DATAWHALE 官网 ') 244 | 245 | # 获取指定单元格的超链接地址 246 | sht1.range('C2').hyperlink 247 | 248 | # 自动调试指定单元格高度和宽度 249 | sht1.range('B1').autofit() 250 | 251 | # 设置指定单元格背景颜色 252 | sht1.range('B1').color = (93,199,221) 253 | 254 | # 返回指定范围内的中第一列的编号数字，如：A-1 B-2 255 | sht1.range('A2:B2').column 256 | 257 | # 获取或者设置行高/列宽 258 | # row_height/column_width会返回行高/列宽，范围内行高/列宽不一致会返回None 259 | # 也可以设置一个新的行高/列宽 260 | sht1.range('A2').row_height = 25 261 | sht1.range('B2').column_width = 20 262 | ``` 263 | 264 | 运行结果： 265 | ![xlwings-format](./imgs/Python_Excel_XLWings/xlwings-format.png) 266 | 267 | - 在windows上可以使用以下方法设置单元格文字颜色等格式，如下： 268 | 269 | ```python 270 | # windows系统下字体设置在 sheet.range().api.Font下 271 | # 颜色 272 | sht1.range('A1').api.Font.Color = (255,0,124) 273 | # 字体名字 274 | sht1.range('A1').api.Font.Name = '宋体' 275 | # 字体大小 276 | sht1.range('A1').api.Font.Size = 28 277 | # 是否加粗 278 | sht1.range('A1').api.Font.Bold = True 279 | # 数字格式 280 | sht1.range('A1').api.NumberFormat = '0.0' 281 | # -4108 水平居中 282 | # -4131 靠左 283 | # -4152 靠右 284 | sht1.range('A1').api.HorizontalAlignment = -4108 285 | # -4108 垂直居中（默认) 286 | # -4160 靠上 287 | # -4107 靠下 288 | # -4130 自动换行对齐。 289 | sht1.range('A1').api.VerticalAlignment = -4130 290 | # 设置上边框线风格和粗细 291 | sht1.range('A1').api.Borders(8).LineStyle = 5 292 | sht1.range('A1').api.Borders(8).Weight = 3 293 | ``` 294 | 295 | - 在mac下可以通过以下方法设置字体格式 296 | 297 | ```python 298 | # 在mac下可以通过以下方法设置字体格式 299 | # 设置单元格的字体颜色 300 | rgb_tuple = (0, 10, 200) 301 | sht1.range('B1').api.font_object.color.set(rgb_tuple) 302 | 303 | # 获取指定单元格字体颜色 304 | sht1.range('B1').api.font_object.color.get() 305 | 306 | # 获取指定单元格字体名字可以使用set方法修改字体 set('宋体') 307 | sht1.range('B1').api.font_object.name.get() 308 | 309 | # 设置指定单元格字体格式可以用get方法查看单元格字体格式 310 | sht1.range('B3').api.font_object.font_style.set('加粗') 311 | 312 | # 设置指定单元格字体大小 313 | sht1.range('B3').api.font_object.font_size.set(20) 314 | 315 | # 设置边框线粗细 316 | sht1.range('B2').api.get_border(which_border=9).weight.set(4) 317 | 318 | # 设置边框线风格 319 | sht1.range('B2').api.get_border(which_border=9).line_style.set(8) 320 | ``` 321 | 322 | 样式值含义基本说明： 323 | 324 | ![xlwings-border](./imgs/Python_Excel_XLWings/xlwings-border.png) 325 | 326 | ![xlwings-line](./imgs/Python_Excel_XLWings/xlwings-line.png) 327 | 328 | 再次提醒，进行完所有操作后一定要记得执行以下三句： 329 | 330 | 331 | ```python 332 | # 保存新建的工作簿，并起一个名字（如果已存在有名字的excel文件，就直接save即可） 333 | wb.save() 334 | # 关闭工作簿（关闭Excel文件） 335 | wb.close() 336 | # 程序运行结束，退出Excel程序 337 | app.quit() 338 | ``` 339 | 340 | ### 2.1.3 Excel中生成统计图或者插入图片 341 | 342 | - 自动生成统计图 343 | 344 | 345 | ```python 346 | import xlwings as xw 347 | 348 | # 新建一个sheet 349 | app = xw.App(visible=True, add_book=False) 350 | wb = app.books.open(root_path+'xlwings_wb.xlsx') 351 | sht3 = wb.sheets.add('third_sht') 352 | 353 | import pandas as pd 354 | import numpy as np 355 | 356 | # 生成模拟数据 357 | df = pd.DataFrame({ 358 | 'money':np.random.randint(45, 50, size = [1, 20])[0], 359 | }, 360 | index=pd.date_range('2021-02-01', '2021-02-20'), # 行索引和时间相关 361 | ) 362 | df.index.name = '消费日期' # 设置索引名字 363 | 364 | sht3.range('A1').value = df 365 | 366 | # 生成图表 367 | chart1 = sht3.charts.add() # 创建一个图表对象 368 | chart1.set_source_data(sht3.range('A1').expand()) # 加载数据 369 | chart1.chart_type = 'line' # 设置图标类型 370 | chart1.top = sht3.range('D2').top 371 | chart1.left = sht3.range('D2').left # 设置图标开始位置 372 | ``` 373 | 374 | 运行结果： 375 | ![xlwings-charts](./imgs/Python_Excel_XLWings/xlwings-charts.png) 376 | 377 | 除了绘制折线图，我们还可以绘制其他类型的图，修改`chart_type`值即可。 378 | ```python 379 | # 查看其他chart_types值 380 | xw.constants.chart_types 381 | ``` 382 | 383 | 返回结果很长，这里选几个常见的图形列出来： 384 | ``` 385 | '3d_line', # 3D折线图 386 | '3d_pie', # 3D饼图 387 | 'area', # 面积图 388 | 'bar_clustered', # 柱状图相关 389 | 'bubble', # 气泡图 390 | 'column_clustered', # 条形图相关 391 | 'line', # 折线图 392 | 'stock_hlc', # 有意思股票K线图 393 | ``` 394 | 395 | - 将本地图片或者matplotlib图片保存到excel 396 | 397 | 398 | ```python 399 | ''' 400 | matplotlib 生成的图片存入excel 401 | ''' 402 | import matplotlib.pyplot as plt 403 | # 随便绘制一个饼图 404 | fig1 = plt.figure() # 先创建一个图像对象 405 | plt.pie([0.5, 0.3, 0.2], # 值 406 | labels=['a', 'b', 'c'], # 标签 407 | explode=(0, 0.2, 0), # （爆裂）距离 408 | autopct='%1.1f%%', # 显示百分数格式 409 | shadow=True) # 是否显示阴影 410 | plt.show() 411 | 412 | # 将饼图添加到excel指定位置 J17为图片开始位置 413 | sht3.pictures.add(fig1, name='matplotlib', 414 | left=sht3.range('J17').left, 415 | top=sht3.range('J17').top, update=True) 416 | ``` 417 | 418 | 419 | ![png](./imgs/output_30_0.png) 420 | 421 |

	date	open	high	low	close	volume
0	1990-12-19	96.050	99.980	95.790	99.980	126000
1	1990-12-20	104.300	104.390	99.980	104.390	19700
2	1990-12-21	109.070	109.130	103.730	109.130	2800
3	1990-12-24	113.570	114.550	109.130	114.550	3200
4	1990-12-25	120.090	120.250	114.550	120.250	1500
...	...	...	...	...	...	...
7686	2022-06-08	3245.017	3266.630	3216.015	3263.793	43418327000
7687	2022-06-09	3259.490	3270.557	3223.475	3238.954	42272837200
7688	2022-06-10	3214.185	3286.620	3210.808	3284.834	43986573000
7689	2022-06-13	3256.275	3272.991	3229.309	3255.551	43857831200
7690	2022-06-14	3224.214	3289.134	3195.819	3288.907	45038818700

自如友家·电建地产盛世江城·4居室-05卧