├── README.md ├── chap 3.pdf ├── chap 3.txt ├── pdf_translate.py └── show ├── pdf_picture1.png └── txt_1_picture.png /README.md: -------------------------------------------------------------------------------- 1 | # PDF_translate 2 | #### 翻译PDF英文,通过Python调取第三方库及接口进行翻译PDF文件 3 | 4 | ### 效果图: 5 | #### 翻译文件: 6 | ![pdf_picture1](https://github.com/GDUT-Rp/PDF_translate/raw/master/show/pdf_picture1.png) 7 | 8 | #### 翻译后保存在txt文件 9 | ![txt_pciture1](https://github.com/GDUT-Rp/PDF_translate/raw/master/show/txt_1_picture.png) 10 | 11 | #### 所需第三方库 12 | ##### pdfminer3k 1.3.1 13 | 14 | ##### pip install pdfminer3k 15 | ##### 直接安装或者下载下来解压:https://pypi.org/project/pdfminer/ 16 | -------------------------------------------------------------------------------- /chap 3.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/GDUT-Rp/PDF_translate/5e2bb94bf74a49a3ba158fd4e4ca1a9eaef7f05b/chap 3.pdf -------------------------------------------------------------------------------- /chap 3.txt: -------------------------------------------------------------------------------- 1 | Chap 3 Lipid 2 | 第3章脂质 3 | The early 21st century has seen the 4 | 21世纪初的 5 | development of a global epidemic of 6 | 一个全球流行的的发展 7 | obesity, which is caused by a number 8 | 肥胖,这是由一个数字 9 | of factors including willpower, 10 | 的因素包括意志力, 11 | lifestyle and genetics. The world 12 | 世界 13 | organization estimates that there are 14 | 组织估计 15 | over 300 million clinically obese 16 | 超过3亿个临床肥胖 17 | adults, with 700 million more 18 | 成年人,7亿多 19 | described as overweight. 20 | 描述为超重。 21 | Severe obesity can make you dead, it 22 | can make you sick, it can make you 23 | sad, it can make you alone, it can 24 | make you poor. 25 | 严重肥胖会让你死了,它 26 | 可以使你生病,它能让你吗 27 | 难过的时候,它可以让你孤独, 28 | 让你贫穷。 29 | • Lipids are a diverse group of biomolecules. Lipids 30 | 脂质 31 | are defined as substances from living organisms that 32 | 被定义为物质生命体 33 | dissolve in nonpolar solvents such as ether (乙醚), 34 | 溶于非极性溶剂,如醚(乙醚), 35 | chloroform (氯仿), and acetone but not appreciably 36 | 丙酮氯仿(氯仿),但不明显 37 | in water. 38 | 在水里。 39 | • Lipids are ester compounds composed of fatty acids 40 | •脂质脂肪酸组成的是酯化合物 41 | and alcohols. 42 | 和醇。 43 | Main contents 44 | 主要内容 45 | • Biological roles that lipid has 46 | •生物角色,脂质 47 | • Structure and properties of fatty acids(脂肪酸) 48 | • Structure and the properties of fatty acids (fatty acid) 49 | • Classification of lipid 50 | •脂类的分类 51 | • Structure of triglyceride/fat(甘油三酯/脂肪) 52 | • Structure of triglyceride/fat (triglycerides/fat) 53 | Section 1 Biological roles 54 | 第一节生物角色 55 | • Energy stores : 9000 cal/g fat; 4100 cal/g protein; 56 | 4200 cal/g monosaccharide 57 | 4100大卡/ g蛋白; 58 | 4200大卡/ g单糖 59 | • To supply indispensable fatty acids 60 | •提供不可或缺的脂肪酸 61 | • Structural components of biological membranes 62 | •结构组成的生物膜 63 | • To facilitate the absorbance of liposoluble vitamins 64 | (脂溶性维生素) by human bodies 65 | • Extracellular and intracellular messengers 66 | •促进脂溶的维生素的吸收 67 | (脂溶性维生素) 68 | •细胞外和细胞内信使 69 | • Transporter 70 | •运输 71 | • Hormones 72 | •激素 73 | Why lipids are used for storage of energy? 74 | 为什么油脂用于存储的能量? 75 | • The carbon in lipids (mostly CH2) 76 | is almost 77 | completely reduced (so its oxidation yields the 78 | most energy possible). 79 | • Lipids are not hydrated (as mono- and polysaccha 80 | rides are not), so they can pack more closely in sto 81 | rage tissues 82 | •碳在脂质(主要是CH2) 83 | 几乎是 84 | 完全(所以它氧化收益率降低 85 | 大多数能源可能)。 86 | •脂质不水化(mono -和polysaccha 87 | 骑没有),所以他们可以在中途包更密切 88 | 愤怒的组织 89 | Result: 90 | lipids have ~6 more energy of 91 | corresponding amount of proteins or glycogen 92 | 结果: 93 | 脂质~ 6更多的能量 94 | 相应数量的蛋白质或糖原 95 | the 96 | 的 97 | 体内脂肪含量,男性多还 98 | 是女性多? 99 | Body fat content, men also 100 | Are women more? 101 | 人类身体脂肪有保温功 102 | 能么? 103 | The human body fat have heat preservation function 104 | Can it? 105 | 为什么女生比男生怕冷? 106 | Why do girls afraid of the cold than boys? 107 | Section 2 Classification of lipid 108 | 第二节脂类的分类 109 | Lipids may be classified in many different ways. Lipids can 110 | be subdivided into the following classes according to the che 111 | mical compositions: 112 | 脂质可以 113 | 被细分为以下类根据切 114 | 米卡尔成分: 115 | Simple lipid: fatty acids 116 | and glycerol 117 | 简单的脂质:脂肪酸 118 | 和甘油 119 | Triacylglycerols; Wax ester 120 | 蜡酯 121 | compound lipid 122 | 复合脂质 123 | Phospholipids; Sphingolipids; 124 | Sphingolipids; 125 | lipoprotein; glycoprotein 126 | 糖蛋白 127 | derived lipid 128 | 衍生脂质 129 | Isoprenoids; steroids 130 | 类固醇 131 | Section 3 Triacylglycerols 132 | 第三节甘油三酯 133 |  Triacylglycerols are esters of glycerol with three fatty 134 | 甘油三酯是甘油三脂肪酸酯 135 | acid molecules. 136 | 酸分子。 137 |  Most triacylglycerol molecules contain fatty acids of 138 | 大多数三酰甘油分子含有脂肪酸 139 | varying lengths; the fatty acids may be unsaturated, 140 | 可以不饱和脂肪酸, 141 | saturated, or combination. 142 | 饱和,或组合。 143 |  Depending on their fatty acid compositions, 144 | 取决于他们的脂肪酸组成, 145 | triacylglycerol mixtures are referred to as fats or oils. 146 | 三酰甘油混合物被称为脂肪或油。 147 | Simple triacylglycerol (单纯甘油酯) : identical fatty acids 148 | Mixed triacylglycerol (混合甘油酯): different fatty acids 149 | Simple triacylglycerol (pure glyceride) : identical fatty acids 150 | Mixed triacylglycerol (Mixed glyceride) : the company fatty acids 151 | Fatty acids 152 | 脂肪酸 153 |  Fatty acids are monocarboxylic acids (单羧酸) that 154 | typically contain hydrocarbon chains (烃链) of variable 155 | lengths (between 12 and 20 or more carbons). 156 |  Basic formula:CH3(CH2)nCOOH 157 |  Fatty acids are numbered from the carboxylate end. 158 |  Greek letters are used to designate certain carbon 159 | 脂肪酸是一元羧酸(单羧酸) 160 | 通常包含烃链(烃链)的变量 161 | 长度(12到20或更多个碳)。 162 | 基本公式:CH3(CH2)nCOOH 163 | 脂肪酸屈指可数的羧酸盐。 164 | 希腊字母是用来指定特定的碳 165 | atoms. 166 | 原子。 167 | • Saturated or unsaturated . Fatty acid chains that 168 | 脂肪酸链, 169 | contain only carbon-carbon single bonds are referred 170 | to as saturated(饱和脂肪酸). Those molecules that 171 | contain one or more double bonds are said to be 172 | 只含有碳碳单键被称为 173 | 这些分子, 174 | 包含一个或多个双键说 175 | unsaturated (mono- and polyunsaturated fatty acids). 176 | 不饱和(单链不饱和脂肪和多不饱和脂肪酸)。 177 | • Most naturally occurring fatty acids have an even 178 | •大多数天然脂肪酸有偶数 179 | number of carbon atoms that form an unbranched 180 | 形成一个无支链的碳原子数 181 | chain 182 | 链 183 | • The properties of a fatty acid depend on the chain 184 | •脂肪酸的性质取决于链 185 | length and the number of double bonds. 186 | 长度和双键的数目。 187 | -------------------------------------------------------------------------------- /pdf_translate.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | from pdfminer.pdfparser import PDFParser, PDFDocument 3 | from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter 4 | from pdfminer.layout import LAParams, LTTextBoxHorizontal 5 | from pdfminer.converter import PDFPageAggregator 6 | from pdfminer.pdfinterp import PDFTextExtractionNotAllowed 7 | 8 | import urllib.request 9 | import urllib.parse 10 | import json 11 | import os 12 | 13 | 14 | def get_pdf(filePath): 15 | '''读取pdf内容,并翻译,写入txt文件''' 16 | 17 | # 以二进制读模式打开本地pdf文件 18 | fp = open(filePath, 'rb') 19 | # 用文件对象来创建一个pdf文档分析器 20 | praser_pdf = PDFParser(fp) 21 | # 创建一个PDF文档 22 | doc_pdf = PDFDocument() 23 | # 连接分析器与文档对象 24 | praser_pdf.set_document(doc_pdf) 25 | doc_pdf.set_parser(praser_pdf) 26 | # 提供初始化密码doc.initialize("123456"),如果没有密码 就创建一个空的字符串 27 | doc_pdf.initialize() 28 | 29 | # 检查文档是否提供txt转换,不提供就无法翻译文档 30 | if not doc_pdf.is_extractable: 31 | # Logger().write(self.fileName + '未能提取有效的文本,停止翻译。') 32 | print("error1") 33 | return 34 | else: 35 | # 创建PDF资源管理器来共享资源 36 | rsrcmgr = PDFResourceManager() 37 | # 创建一个PDF参数分析器 38 | laparams = LAParams() 39 | # 创建聚合器 40 | device = PDFPageAggregator(rsrcmgr, laparams=laparams) 41 | # 创建一个PDF页面解释器对象 42 | interpreter = PDFPageInterpreter(rsrcmgr, device) 43 | 44 | # 循环遍历列表,每次处理一页的内容 45 | for page in doc_pdf.get_pages(): 46 | # 使用页面解释器来读取 47 | interpreter.process_page(page) 48 | # 使用聚合器获取内容 49 | layout = device.get_result() 50 | 51 | # 这里layout是一个LTPage对象 里面存放着 这个page解析出的各种对象 一般包括LTTextBox, LTFigure, 52 | # LTImage, LTTextBoxHorizontal 等等 想要获取文本就获得对象的text属性, 53 | for out in layout: 54 | # 判断是否含有get_text()方法,图片之类的就没有 55 | if isinstance(out, LTTextBoxHorizontal): 56 | content = out.get_text() 57 | trans = youdao_translate(content) 58 | write(content) 59 | write(trans) 60 | # Logger().write(self.fileName + '翻译完成,新文档:' + self.new_fullPath) 61 | print('error2') 62 | 63 | 64 | def youdao_translate(content): 65 | '''实现有道翻译的接口''' 66 | youdao_url = 'http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule' 67 | data = {} 68 | 69 | data['i'] = content 70 | data['from'] = 'AUTO' 71 | data['to'] = 'AUTO' 72 | data['smartresult'] = 'dict' 73 | data['client'] = 'fanyideskweb' 74 | data['salt'] = '1525141473246' 75 | data['sign'] = '47ee728a4465ef98ac06510bf67f3023' 76 | data['doctype'] = 'json' 77 | data['version'] = '2.1' 78 | data['keyfrom'] = 'fanyi.web' 79 | data['action'] = 'FY_BY_CLICKBUTTION' 80 | data['typoResult'] = 'false' 81 | data = urllib.parse.urlencode(data).encode('utf-8') 82 | 83 | youdao_response = urllib.request.urlopen(youdao_url, data) 84 | youdao_html = youdao_response.read().decode('utf-8') 85 | target = json.loads(youdao_html) 86 | 87 | trans = target['translateResult'] 88 | ret = '' 89 | for i in range(len(trans)): 90 | line = '' 91 | for j in range(len(trans[i])): 92 | line = trans[i][j]['tgt'] 93 | ret += line + '\n' 94 | 95 | return ret 96 | 97 | 98 | def baidu_translate(content, type=1): 99 | '''实现百度翻译''' 100 | baidu_url = 'http://fanyi.baidu.com/basetrans' 101 | data = {} 102 | 103 | data['from'] = 'en' 104 | data['to'] = 'zh' 105 | data['query'] = content 106 | data['transtype'] = 'translang' 107 | data['simple_means_flag'] = '3' 108 | data['sign'] = '94582.365127' 109 | data['token'] = 'ec980ef090b173ebdff2eea5ffd9a778' 110 | data = urllib.parse.urlencode(data).encode('utf-8') 111 | 112 | headers = { 113 | "User-Agent": "Mozilla/5.0 (Linux; Android 5.1.1; Nexus 6 Build/LYZ28E) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Mobile Safari/537.36"} 114 | baidu_re = urllib.request.Request(baidu_url, data, headers) 115 | baidu_response = urllib.request.urlopen(baidu_re) 116 | baidu_html = baidu_response.read().decode('utf-8') 117 | target2 = json.loads(baidu_html) 118 | 119 | trans = target2['trans'] 120 | ret = '' 121 | for i in range(len(trans)): 122 | ret += trans[i]['dst'] + '\n' 123 | 124 | return ret 125 | 126 | 127 | def write(content): 128 | with open(r'C:\Users\Lenovo\Desktop\\chap 3.txt', 'a+', encoding='utf-8') as f: 129 | # 存储文本 130 | for text in content: 131 | f.writelines(text) 132 | f.close() 133 | print('写入完毕!') 134 | 135 | 136 | if __name__ == '__main__': 137 | filename = r'C:\Users\Lenovo\Desktop\chap 3.pdf' 138 | get_pdf(filename) 139 | '''该程序需要改变两个路径,安装相关包即可运行''' 140 | -------------------------------------------------------------------------------- /show/pdf_picture1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/GDUT-Rp/PDF_translate/5e2bb94bf74a49a3ba158fd4e4ca1a9eaef7f05b/show/pdf_picture1.png -------------------------------------------------------------------------------- /show/txt_1_picture.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/GDUT-Rp/PDF_translate/5e2bb94bf74a49a3ba158fd4e4ca1a9eaef7f05b/show/txt_1_picture.png --------------------------------------------------------------------------------