├── .gitignore ├── HACKING.markdown ├── LICENSE ├── MANIFEST.in ├── Makefile ├── README.rst ├── SERVING_SUGGESTIONS.markdown ├── __init__.py ├── docx.py ├── example-extracttext.py ├── example-makedocument.py ├── image1.png ├── screenshot.png ├── setup.py ├── template ├── _rels │ └── .rels ├── docProps │ └── thumbnail.jpeg └── word │ ├── fontTable.xml │ ├── numbering.xml │ ├── settings.xml │ ├── styles.xml │ └── theme │ └── theme1.xml ├── tests ├── image1.png └── test_docx.py └── tox.ini /.gitignore: -------------------------------------------------------------------------------- 1 | .coverage 2 | dist 3 | *.docx 4 | /*.egg-info/ 5 | MANIFEST 6 | *.pyc 7 | README.html 8 | _scratch 9 | template/word/media 10 | .tox 11 | -------------------------------------------------------------------------------- /HACKING.markdown: -------------------------------------------------------------------------------- 1 | Adding Features 2 | =============== 3 | 4 | # Recommended reading 5 | 6 | - The [LXML tutorial](http://codespeak.net/lxml/tutorial.html) covers the basics of XML etrees, which we create, append and insert to make XML documents. LXML also provides XPath, which we use to specify locations in the document. 7 | - If you're stuck. check out the [OpenXML specs and videos](http://openxmldeveloper.org). In particular, the is [OpenXML ECMA spec] [] is well worth a read. 8 | - Learning about [XML namespaces](http://www.w3schools.com/XML/xml_namespaces.asp) 9 | - The [Namespaces section of Dive into Python](http://diveintopython3.org/xml.html) 10 | - Microsoft's [introduction to the Office (2007) Open XML File Formats](http://msdn.microsoft.com/en-us/library/aa338205.aspx) 11 | 12 | # How can I contribute? 13 | 14 | Fork the project on github, then send the main project a [pull request](http://github.com/guides/pull-requests). The project will then accept your pull (in most cases), which will show your changes part of the changelog for the main project, along with your name and picture. 15 | 16 | # A note about namespaces and LXML 17 | 18 | LXML doesn't use namespace prefixes. It just uses the actual namespaces, and wants you to set a namespace on each tag. For example, rather than making an element with the 'w' namespace prefix, you'd make an element with the '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}' prefix. 19 | 20 | To make this easier: 21 | 22 | - The most common namespace, '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}' (prefix 'w') is automatically added by makeelement() 23 | - You can specify other namespaces with 'nsprefix', which maps the prefixes Word files use to the actual namespaces, eg: 24 | 25 |
makeelement('coreProperties',nsprefix='cp')
26 | 27 | will generate: 28 | 29 | 30 | 31 | which is the same as what Word generates: 32 | 33 | 34 | 35 | The namespace prefixes are different, but that's irrelevant as the namespaces themselves are the same. 36 | 37 | There's also a cool side effect - you can ignore setting 'xmlns' attributes that aren't used directly in the current element, since there's no need. Eg, you can make the equivalent of this from a Word file: 38 | 39 | 45 | 46 | 47 | With the following code: 48 | 49 | docprops = makeelement('coreProperties',nsprefix='cp') 50 | 51 | We only need to specify the 'cp' prefix because that's what this element uses. The other 'xmlns' attributes are used to specify the prefixes for child elements. We don't need to specify them here because each child element will have its namespace specified when we make that child. 52 | 53 | # Coding Style 54 | 55 | Basically just look at what's there. But if you need something more specific: 56 | 57 | - Functional - every function should take some inputs, return something, and not use any globals. 58 | - [Google Python Style Guide style](http://code.google.com/p/soc/wiki/PythonStyleGuide) 59 | 60 | # Unit Testing 61 | 62 | After adding code, open **tests/test_docx.py** and add a test that calls your function and checks its output. 63 | 64 | - Use **easy_install** to fetch the **nose** and **coverage** modules 65 | - Run 66 | 67 |
nosetests --with-coverage
68 | 69 | to run all the doctests. They should all pass. 70 | 71 | # Tips 72 | 73 | ## If Word complains about files: 74 | 75 | First, determine whether Word can recover the files: 76 | - If Word cannot recover the file, you most likely have a problem with your zip file 77 | - If Word can recover the file, you most likely have a problem with your XML 78 | 79 | ### Common Zipfile issues 80 | 81 | - Ensure the same file isn't included twice in your zip archive. Zip supports this, Word doesn't. 82 | - Ensure that all media files have an entry for their file type in [Content_Types].xml 83 | - Ensure that files in zip file file have leading '/'s removed. 84 | 85 | ### Common XML issues 86 | 87 | - Ensure the _rels, docProps, word, etc directories are in the top level of your zip file. 88 | - Check your namespaces - on both the tags, and the attributes 89 | - Check capitalization of tag names 90 | - Ensure you're not missing any attributes 91 | - If images or other embedded content is shown with a large red X, your relationships file is missing data. 92 | 93 | #### One common debugging technique we've used before 94 | 95 | - Re-save the document in Word will produced a fixed version of the file 96 | - Unzip and grabbing the serialized XML out of the fixed file 97 | - Use etree.fromstring() to turn it into an element, and include that in your code. 98 | - Check that a correct file is generated 99 | - Remove an element from your string-created etree (including both opening and closing tags) 100 | - Use element.append(makelement()) to add that element to your tree 101 | - Open the doc in Word and see if it still works 102 | - Repeat the last three steps until you discover which element is causing the prob 103 | 104 | [OpenXML ECMA spec]: http://www.ecma-international.org/publications/files/ECMA-ST/Office%20Open%20XML%201st%20edition%20Part%204%20(DOCX).zip -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2009-2010 Mike MacCana 2 | 3 | Permission is hereby granted, free of charge, to any person 4 | obtaining a copy of this software and associated documentation 5 | files (the "Software"), to deal in the Software without 6 | restriction, including without limitation the rights to use, 7 | copy, modify, merge, publish, distribute, sublicense, and/or sell 8 | copies of the Software, and to permit persons to whom the 9 | Software is furnished to do so, subject to the following 10 | conditions: 11 | 12 | The above copyright notice and this permission notice shall be 13 | included in all copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 16 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES 17 | OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND 18 | NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 19 | HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, 20 | WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 21 | FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR 22 | OTHER DEALINGS IN THE SOFTWARE. -------------------------------------------------------------------------------- /MANIFEST.in: -------------------------------------------------------------------------------- 1 | include template/* 2 | include template/_rels/* 3 | include template/docProps/* 4 | include template/word/* 5 | include template/word/theme/* 6 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | PYTHON = $(shell test -x bin/python && echo bin/python || echo `which python`) 2 | SETUP = $(PYTHON) ./setup.py 3 | 4 | .PHONY: clean help coverage register sdist upload 5 | 6 | help: 7 | @echo "Please use \`make ' where is one or more of" 8 | @echo " clean delete intermediate work product and start fresh" 9 | @echo " coverage run nosetests with coverage" 10 | @echo " readme update README.html from README.rst" 11 | @echo " register update metadata (README.rst) on PyPI" 12 | @echo " sdist generate a source distribution into dist/" 13 | @echo " upload upload distribution tarball to PyPI" 14 | 15 | clean: 16 | find . -type f -name \*.pyc -exec rm {} \; 17 | rm -rf dist .coverage .DS_Store MANIFEST 18 | 19 | coverage: 20 | nosetests --with-coverage --cover-package=docx --cover-erase 21 | 22 | readme: 23 | rst2html README.rst >README.html 24 | open README.html 25 | 26 | register: 27 | $(SETUP) register 28 | 29 | sdist: 30 | $(SETUP) sdist 31 | 32 | upload: 33 | $(SETUP) sdist upload 34 | -------------------------------------------------------------------------------- /README.rst: -------------------------------------------------------------------------------- 1 | ########### 2 | This Project Has Moved! 3 | ########### 4 | 5 | **Python DocX is now part of Python OpenXML**. There's all kinds of new stuff, including Python 3 support, sister libraries for doing Excel files, and more. Check out the `current Python DocX GitHub `_ and the `current Python DocX docs `_. 6 | 7 | Info below is kept for archival purposes. **Go use the new stuff!** 8 | 9 | Introduction 10 | ============ 11 | 12 | The docx module creates, reads and writes Microsoft Office Word 2007 docx 13 | files. 14 | 15 | These are referred to as 'WordML', 'Office Open XML' and 'Open XML' by 16 | Microsoft. 17 | 18 | These documents can be opened in Microsoft Office 2007 / 2010, Microsoft Mac 19 | Office 2008, Google Docs, OpenOffice.org 3, and Apple iWork 08. 20 | 21 | They also `validate as well formed XML `_. 22 | 23 | The module was created when I was looking for a Python support for MS Word 24 | .docx files, but could only find various hacks involving COM automation, 25 | calling .Net or Java, or automating OpenOffice or MS Office. 26 | 27 | The docx module has the following features: 28 | 29 | Making documents 30 | ---------------- 31 | 32 | Features for making documents include: 33 | 34 | - Paragraphs 35 | - Bullets 36 | - Numbered lists 37 | - Document properties (author, company, etc) 38 | - Multiple levels of headings 39 | - Tables 40 | - Section and page breaks 41 | - Images 42 | 43 | .. image:: http://github.com/mikemaccana/python-docx/raw/master/screenshot.png 44 | 45 | 46 | Editing documents 47 | ----------------- 48 | 49 | Thanks to the awesomeness of the lxml module, we can: 50 | 51 | - Search and replace 52 | - Extract plain text of document 53 | - Add and delete items anywhere within the document 54 | - Change document properties 55 | - Run xpath queries against particular locations in the document - useful for 56 | retrieving data from user-completed templates. 57 | 58 | 59 | Getting started 60 | =============== 61 | 62 | Making and Modifying Documents 63 | ------------------------------ 64 | 65 | - Just `download python docx `_. 66 | - Use **pip** or **easy_install** to fetch the **lxml** and **PIL** modules. 67 | - Then run:: 68 | 69 | example-makedocument.py 70 | 71 | 72 | Congratulations, you just made and then modified a Word document! 73 | 74 | 75 | Extracting Text from a Document 76 | ------------------------------- 77 | 78 | If you just want to extract the text from a Word file, run:: 79 | 80 | example-extracttext.py 'Some word file.docx' 'new file.txt' 81 | 82 | 83 | Ideas & To Do List 84 | ~~~~~~~~~~~~~~~~~~ 85 | 86 | - Further improvements to image handling 87 | - Document health checks 88 | - Egg 89 | - Markdown conversion support 90 | 91 | 92 | We love forks, changes and pull requests! 93 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 94 | 95 | - Check out the [HACKING](HACKING.markdown) to add your own changes! 96 | - For this project on github 97 | - Send a pull request via github and we'll add your changes! 98 | 99 | Want to talk? Need help? 100 | ~~~~~~~~~~~~~~~~~~~~~~~~ 101 | 102 | Email python-docx@googlegroups.com 103 | 104 | 105 | License 106 | ~~~~~~~ 107 | 108 | Licensed under the `MIT license `_ 109 | 110 | Short version: this code is copyrighted to me (Mike MacCana), I give you 111 | permission to do what you want with it except remove my name from the credits. 112 | See the LICENSE file for specific terms. 113 | -------------------------------------------------------------------------------- /SERVING_SUGGESTIONS.markdown: -------------------------------------------------------------------------------- 1 | Serving Suggestions 2 | =================== 3 | 4 | # Mashing docx with other modules 5 | 6 | This is a list of interesting things you could do with Python docx when mashed up with other modules. 7 | 8 | - [LinkedIn Python API](http://code.google.com/p/python-linkedin/) - Auto-build a Word doc whenever some old recruiting dude asks one. 9 | - [Python Natural Language Toolkit](http://www.nltk.org/) - can analyse text and extract meaning. 10 | - [Lamson](http://lamsonproject.org/) - transparently parse or modify email attachments. 11 | 12 | Any other ideas? Doing something cool you want to tell the world about? python.docx@librelist.com -------------------------------------------------------------------------------- /__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mikemaccana/python-docx/4c9b46dbebe3d2a9b82dbcd35af36584a36fd9fe/__init__.py -------------------------------------------------------------------------------- /docx.py: -------------------------------------------------------------------------------- 1 | # encoding: utf-8 2 | 3 | """ 4 | Open and modify Microsoft Word 2007 docx files (called 'OpenXML' and 5 | 'Office OpenXML' by Microsoft) 6 | 7 | Part of Python's docx module - http://github.com/mikemaccana/python-docx 8 | See LICENSE for licensing information. 9 | """ 10 | 11 | import os 12 | import re 13 | import time 14 | import shutil 15 | import zipfile 16 | 17 | from lxml import etree 18 | from os.path import abspath, basename, join 19 | 20 | try: 21 | from PIL import Image 22 | except ImportError: 23 | import Image 24 | 25 | try: 26 | from PIL.ExifTags import TAGS 27 | except ImportError: 28 | TAGS = {} 29 | 30 | from exceptions import PendingDeprecationWarning 31 | from warnings import warn 32 | 33 | import logging 34 | 35 | 36 | log = logging.getLogger(__name__) 37 | 38 | # Record template directory's location which is just 'template' for a docx 39 | # developer or 'site-packages/docx-template' if you have installed docx 40 | template_dir = join(os.path.dirname(__file__), 'docx-template') # installed 41 | if not os.path.isdir(template_dir): 42 | template_dir = join(os.path.dirname(__file__), 'template') # dev 43 | 44 | # All Word prefixes / namespace matches used in document.xml & core.xml. 45 | # LXML doesn't actually use prefixes (just the real namespace) , but these 46 | # make it easier to copy Word output more easily. 47 | nsprefixes = { 48 | 'mo': 'http://schemas.microsoft.com/office/mac/office/2008/main', 49 | 'o': 'urn:schemas-microsoft-com:office:office', 50 | 've': 'http://schemas.openxmlformats.org/markup-compatibility/2006', 51 | # Text Content 52 | 'w': 'http://schemas.openxmlformats.org/wordprocessingml/2006/main', 53 | 'w10': 'urn:schemas-microsoft-com:office:word', 54 | 'wne': 'http://schemas.microsoft.com/office/word/2006/wordml', 55 | # Drawing 56 | 'a': 'http://schemas.openxmlformats.org/drawingml/2006/main', 57 | 'm': 'http://schemas.openxmlformats.org/officeDocument/2006/math', 58 | 'mv': 'urn:schemas-microsoft-com:mac:vml', 59 | 'pic': 'http://schemas.openxmlformats.org/drawingml/2006/picture', 60 | 'v': 'urn:schemas-microsoft-com:vml', 61 | 'wp': ('http://schemas.openxmlformats.org/drawingml/2006/wordprocessing' 62 | 'Drawing'), 63 | # Properties (core and extended) 64 | 'cp': ('http://schemas.openxmlformats.org/package/2006/metadata/core-pr' 65 | 'operties'), 66 | 'dc': 'http://purl.org/dc/elements/1.1/', 67 | 'ep': ('http://schemas.openxmlformats.org/officeDocument/2006/extended-' 68 | 'properties'), 69 | 'xsi': 'http://www.w3.org/2001/XMLSchema-instance', 70 | # Content Types 71 | 'ct': 'http://schemas.openxmlformats.org/package/2006/content-types', 72 | # Package Relationships 73 | 'r': ('http://schemas.openxmlformats.org/officeDocument/2006/relationsh' 74 | 'ips'), 75 | 'pr': 'http://schemas.openxmlformats.org/package/2006/relationships', 76 | # Dublin Core document properties 77 | 'dcmitype': 'http://purl.org/dc/dcmitype/', 78 | 'dcterms': 'http://purl.org/dc/terms/'} 79 | 80 | 81 | def opendocx(file): 82 | '''Open a docx file, return a document XML tree''' 83 | mydoc = zipfile.ZipFile(file) 84 | xmlcontent = mydoc.read('word/document.xml') 85 | document = etree.fromstring(xmlcontent) 86 | return document 87 | 88 | 89 | def newdocument(): 90 | document = makeelement('document') 91 | document.append(makeelement('body')) 92 | return document 93 | 94 | 95 | def makeelement(tagname, tagtext=None, nsprefix='w', attributes=None, 96 | attrnsprefix=None): 97 | '''Create an element & return it''' 98 | # Deal with list of nsprefix by making namespacemap 99 | namespacemap = None 100 | if isinstance(nsprefix, list): 101 | namespacemap = {} 102 | for prefix in nsprefix: 103 | namespacemap[prefix] = nsprefixes[prefix] 104 | # FIXME: rest of code below expects a single prefix 105 | nsprefix = nsprefix[0] 106 | if nsprefix: 107 | namespace = '{%s}' % nsprefixes[nsprefix] 108 | else: 109 | # For when namespace = None 110 | namespace = '' 111 | newelement = etree.Element(namespace+tagname, nsmap=namespacemap) 112 | # Add attributes with namespaces 113 | if attributes: 114 | # If they haven't bothered setting attribute namespace, use an empty 115 | # string (equivalent of no namespace) 116 | if not attrnsprefix: 117 | # Quick hack: it seems every element that has a 'w' nsprefix for 118 | # its tag uses the same prefix for it's attributes 119 | if nsprefix == 'w': 120 | attributenamespace = namespace 121 | else: 122 | attributenamespace = '' 123 | else: 124 | attributenamespace = '{'+nsprefixes[attrnsprefix]+'}' 125 | 126 | for tagattribute in attributes: 127 | newelement.set(attributenamespace+tagattribute, 128 | attributes[tagattribute]) 129 | if tagtext: 130 | newelement.text = tagtext 131 | return newelement 132 | 133 | 134 | def pagebreak(type='page', orient='portrait'): 135 | '''Insert a break, default 'page'. 136 | See http://openxmldeveloper.org/forums/thread/4075.aspx 137 | Return our page break element.''' 138 | # Need to enumerate different types of page breaks. 139 | validtypes = ['page', 'section'] 140 | if type not in validtypes: 141 | tmpl = 'Page break style "%s" not implemented. Valid styles: %s.' 142 | raise ValueError(tmpl % (type, validtypes)) 143 | pagebreak = makeelement('p') 144 | if type == 'page': 145 | run = makeelement('r') 146 | br = makeelement('br', attributes={'type': type}) 147 | run.append(br) 148 | pagebreak.append(run) 149 | elif type == 'section': 150 | pPr = makeelement('pPr') 151 | sectPr = makeelement('sectPr') 152 | if orient == 'portrait': 153 | pgSz = makeelement('pgSz', attributes={'w': '12240', 'h': '15840'}) 154 | elif orient == 'landscape': 155 | pgSz = makeelement('pgSz', attributes={'h': '12240', 'w': '15840', 156 | 'orient': 'landscape'}) 157 | sectPr.append(pgSz) 158 | pPr.append(sectPr) 159 | pagebreak.append(pPr) 160 | return pagebreak 161 | 162 | 163 | def paragraph(paratext, style='BodyText', breakbefore=False, jc='left'): 164 | """ 165 | Return a new paragraph element containing *paratext*. The paragraph's 166 | default style is 'Body Text', but a new style may be set using the 167 | *style* parameter. 168 | 169 | @param string jc: Paragraph alignment, possible values: 170 | left, center, right, both (justified), ... 171 | see http://www.schemacentral.com/sc/ooxml/t-w_ST_Jc.html 172 | for a full list 173 | 174 | If *paratext* is a list, add a run for each (text, char_format_str) 175 | 2-tuple in the list. char_format_str is a string containing one or more 176 | of the characters 'b', 'i', or 'u', meaning bold, italic, and underline 177 | respectively. For example: 178 | 179 | paratext = [ 180 | ('some bold text', 'b'), 181 | ('some normal text', ''), 182 | ('some italic underlined text', 'iu') 183 | ] 184 | """ 185 | # Make our elements 186 | paragraph = makeelement('p') 187 | 188 | if not isinstance(paratext, list): 189 | paratext = [(paratext, '')] 190 | text_tuples = [] 191 | for pt in paratext: 192 | text, char_styles_str = (pt if isinstance(pt, (list, tuple)) 193 | else (pt, '')) 194 | text_elm = makeelement('t', tagtext=text) 195 | if len(text.strip()) < len(text): 196 | text_elm.set('{http://www.w3.org/XML/1998/namespace}space', 197 | 'preserve') 198 | text_tuples.append([text_elm, char_styles_str]) 199 | pPr = makeelement('pPr') 200 | pStyle = makeelement('pStyle', attributes={'val': style}) 201 | pJc = makeelement('jc', attributes={'val': jc}) 202 | pPr.append(pStyle) 203 | pPr.append(pJc) 204 | 205 | # Add the text to the run, and the run to the paragraph 206 | paragraph.append(pPr) 207 | for text_elm, char_styles_str in text_tuples: 208 | run = makeelement('r') 209 | rPr = makeelement('rPr') 210 | # Apply styles 211 | if 'b' in char_styles_str: 212 | b = makeelement('b') 213 | rPr.append(b) 214 | if 'i' in char_styles_str: 215 | i = makeelement('i') 216 | rPr.append(i) 217 | if 'u' in char_styles_str: 218 | u = makeelement('u', attributes={'val': 'single'}) 219 | rPr.append(u) 220 | run.append(rPr) 221 | # Insert lastRenderedPageBreak for assistive technologies like 222 | # document narrators to know when a page break occurred. 223 | if breakbefore: 224 | lastRenderedPageBreak = makeelement('lastRenderedPageBreak') 225 | run.append(lastRenderedPageBreak) 226 | run.append(text_elm) 227 | paragraph.append(run) 228 | # Return the combined paragraph 229 | return paragraph 230 | 231 | 232 | def contenttypes(): 233 | types = etree.fromstring( 234 | '') 236 | parts = { 237 | '/word/theme/theme1.xml': 'application/vnd.openxmlformats-officedocu' 238 | 'ment.theme+xml', 239 | '/word/fontTable.xml': 'application/vnd.openxmlformats-officedocu' 240 | 'ment.wordprocessingml.fontTable+xml', 241 | '/docProps/core.xml': 'application/vnd.openxmlformats-package.co' 242 | 're-properties+xml', 243 | '/docProps/app.xml': 'application/vnd.openxmlformats-officedocu' 244 | 'ment.extended-properties+xml', 245 | '/word/document.xml': 'application/vnd.openxmlformats-officedocu' 246 | 'ment.wordprocessingml.document.main+xml', 247 | '/word/settings.xml': 'application/vnd.openxmlformats-officedocu' 248 | 'ment.wordprocessingml.settings+xml', 249 | '/word/numbering.xml': 'application/vnd.openxmlformats-officedocu' 250 | 'ment.wordprocessingml.numbering+xml', 251 | '/word/styles.xml': 'application/vnd.openxmlformats-officedocu' 252 | 'ment.wordprocessingml.styles+xml', 253 | '/word/webSettings.xml': 'application/vnd.openxmlformats-officedocu' 254 | 'ment.wordprocessingml.webSettings+xml'} 255 | for part in parts: 256 | types.append(makeelement('Override', nsprefix=None, 257 | attributes={'PartName': part, 258 | 'ContentType': parts[part]})) 259 | # Add support for filetypes 260 | filetypes = { 261 | 'gif': 'image/gif', 262 | 'jpeg': 'image/jpeg', 263 | 'jpg': 'image/jpeg', 264 | 'png': 'image/png', 265 | 'rels': 'application/vnd.openxmlformats-package.relationships+xml', 266 | 'xml': 'application/xml' 267 | } 268 | for extension in filetypes: 269 | attrs = { 270 | 'Extension': extension, 271 | 'ContentType': filetypes[extension] 272 | } 273 | default_elm = makeelement('Default', nsprefix=None, attributes=attrs) 274 | types.append(default_elm) 275 | return types 276 | 277 | 278 | def heading(headingtext, headinglevel, lang='en'): 279 | '''Make a new heading, return the heading element''' 280 | lmap = {'en': 'Heading', 'it': 'Titolo'} 281 | # Make our elements 282 | paragraph = makeelement('p') 283 | pr = makeelement('pPr') 284 | pStyle = makeelement( 285 | 'pStyle', attributes={'val': lmap[lang]+str(headinglevel)}) 286 | run = makeelement('r') 287 | text = makeelement('t', tagtext=headingtext) 288 | # Add the text the run, and the run to the paragraph 289 | pr.append(pStyle) 290 | run.append(text) 291 | paragraph.append(pr) 292 | paragraph.append(run) 293 | # Return the combined paragraph 294 | return paragraph 295 | 296 | 297 | def table(contents, heading=True, colw=None, cwunit='dxa', tblw=0, 298 | twunit='auto', borders={}, celstyle=None): 299 | """ 300 | Return a table element based on specified parameters 301 | 302 | @param list contents: A list of lists describing contents. Every item in 303 | the list can be a string or a valid XML element 304 | itself. It can also be a list. In that case all the 305 | listed elements will be merged into the cell. 306 | @param bool heading: Tells whether first line should be treated as 307 | heading or not 308 | @param list colw: list of integer column widths specified in wunitS. 309 | @param str cwunit: Unit used for column width: 310 | 'pct' : fiftieths of a percent 311 | 'dxa' : twentieths of a point 312 | 'nil' : no width 313 | 'auto' : automagically determined 314 | @param int tblw: Table width 315 | @param str twunit: Unit used for table width. Same possible values as 316 | cwunit. 317 | @param dict borders: Dictionary defining table border. Supported keys 318 | are: 'top', 'left', 'bottom', 'right', 319 | 'insideH', 'insideV', 'all'. 320 | When specified, the 'all' key has precedence over 321 | others. Each key must define a dict of border 322 | attributes: 323 | color : The color of the border, in hex or 324 | 'auto' 325 | space : The space, measured in points 326 | sz : The size of the border, in eighths of 327 | a point 328 | val : The style of the border, see 329 | http://www.schemacentral.com/sc/ooxml/t-w_ST_Border.htm 330 | @param list celstyle: Specify the style for each colum, list of dicts. 331 | supported keys: 332 | 'align' : specify the alignment, see paragraph 333 | documentation. 334 | @return lxml.etree: Generated XML etree element 335 | """ 336 | table = makeelement('tbl') 337 | columns = len(contents[0]) 338 | # Table properties 339 | tableprops = makeelement('tblPr') 340 | tablestyle = makeelement('tblStyle', attributes={'val': ''}) 341 | tableprops.append(tablestyle) 342 | tablewidth = makeelement( 343 | 'tblW', attributes={'w': str(tblw), 'type': str(twunit)}) 344 | tableprops.append(tablewidth) 345 | if len(borders.keys()): 346 | tableborders = makeelement('tblBorders') 347 | for b in ['top', 'left', 'bottom', 'right', 'insideH', 'insideV']: 348 | if b in borders.keys() or 'all' in borders.keys(): 349 | k = 'all' if 'all' in borders.keys() else b 350 | attrs = {} 351 | for a in borders[k].keys(): 352 | attrs[a] = unicode(borders[k][a]) 353 | borderelem = makeelement(b, attributes=attrs) 354 | tableborders.append(borderelem) 355 | tableprops.append(tableborders) 356 | tablelook = makeelement('tblLook', attributes={'val': '0400'}) 357 | tableprops.append(tablelook) 358 | table.append(tableprops) 359 | # Table Grid 360 | tablegrid = makeelement('tblGrid') 361 | for i in range(columns): 362 | attrs = {'w': str(colw[i]) if colw else '2390'} 363 | tablegrid.append(makeelement('gridCol', attributes=attrs)) 364 | table.append(tablegrid) 365 | # Heading Row 366 | row = makeelement('tr') 367 | rowprops = makeelement('trPr') 368 | cnfStyle = makeelement('cnfStyle', attributes={'val': '000000100000'}) 369 | rowprops.append(cnfStyle) 370 | row.append(rowprops) 371 | if heading: 372 | i = 0 373 | for heading in contents[0]: 374 | cell = makeelement('tc') 375 | # Cell properties 376 | cellprops = makeelement('tcPr') 377 | if colw: 378 | wattr = {'w': str(colw[i]), 'type': cwunit} 379 | else: 380 | wattr = {'w': '0', 'type': 'auto'} 381 | cellwidth = makeelement('tcW', attributes=wattr) 382 | cellstyle = makeelement('shd', attributes={'val': 'clear', 383 | 'color': 'auto', 384 | 'fill': 'FFFFFF', 385 | 'themeFill': 'text2', 386 | 'themeFillTint': '99'}) 387 | cellprops.append(cellwidth) 388 | cellprops.append(cellstyle) 389 | cell.append(cellprops) 390 | # Paragraph (Content) 391 | if not isinstance(heading, (list, tuple)): 392 | heading = [heading] 393 | for h in heading: 394 | if isinstance(h, etree._Element): 395 | cell.append(h) 396 | else: 397 | cell.append(paragraph(h, jc='center')) 398 | row.append(cell) 399 | i += 1 400 | table.append(row) 401 | # Contents Rows 402 | for contentrow in contents[1 if heading else 0:]: 403 | row = makeelement('tr') 404 | i = 0 405 | for content in contentrow: 406 | cell = makeelement('tc') 407 | # Properties 408 | cellprops = makeelement('tcPr') 409 | if colw: 410 | wattr = {'w': str(colw[i]), 'type': cwunit} 411 | else: 412 | wattr = {'w': '0', 'type': 'auto'} 413 | cellwidth = makeelement('tcW', attributes=wattr) 414 | cellprops.append(cellwidth) 415 | cell.append(cellprops) 416 | # Paragraph (Content) 417 | if not isinstance(content, (list, tuple)): 418 | content = [content] 419 | for c in content: 420 | if isinstance(c, etree._Element): 421 | cell.append(c) 422 | else: 423 | if celstyle and 'align' in celstyle[i].keys(): 424 | align = celstyle[i]['align'] 425 | else: 426 | align = 'left' 427 | cell.append(paragraph(c, jc=align)) 428 | row.append(cell) 429 | i += 1 430 | table.append(row) 431 | return table 432 | 433 | 434 | def picture( 435 | relationshiplist, picname, picdescription, pixelwidth=None, 436 | pixelheight=None, nochangeaspect=True, nochangearrowheads=True, 437 | imagefiledict=None): 438 | """ 439 | Take a relationshiplist, picture file name, and return a paragraph 440 | containing the image and an updated relationshiplist 441 | """ 442 | if imagefiledict is None: 443 | warn( 444 | 'Using picture() without imagefiledict parameter will be depreca' 445 | 'ted in the future.', PendingDeprecationWarning 446 | ) 447 | 448 | # http://openxmldeveloper.org/articles/462.aspx 449 | # Create an image. Size may be specified, otherwise it will based on the 450 | # pixel size of image. Return a paragraph containing the picture 451 | 452 | # Set relationship ID to that of the image or the first available one 453 | picid = '2' 454 | picpath = abspath(picname) 455 | 456 | if imagefiledict is not None: 457 | # Keep track of the image files in a separate dictionary so they don't 458 | # need to be copied into the template directory 459 | if picpath not in imagefiledict: 460 | picrelid = 'rId' + str(len(relationshiplist) + 1) 461 | imagefiledict[picpath] = picrelid 462 | 463 | relationshiplist.append([ 464 | 'http://schemas.openxmlformats.org/officeDocument/2006/relat' 465 | 'ionships/image', 466 | 'media/%s_%s' % (picrelid, basename(picpath)) 467 | ]) 468 | else: 469 | picrelid = imagefiledict[picpath] 470 | else: 471 | # Copy files into template directory for backwards compatibility 472 | # Images still accumulate in the template directory this way 473 | picrelid = 'rId' + str(len(relationshiplist) + 1) 474 | 475 | relationshiplist.append([ 476 | 'http://schemas.openxmlformats.org/officeDocument/2006/relations' 477 | 'hips/image', 'media/' + picname 478 | ]) 479 | 480 | media_dir = join(template_dir, 'word', 'media') 481 | if not os.path.isdir(media_dir): 482 | os.mkdir(media_dir) 483 | shutil.copyfile(picname, join(media_dir, picname)) 484 | 485 | image = Image.open(picpath) 486 | 487 | # Extract EXIF data, if available 488 | try: 489 | exif = image._getexif() 490 | exif = {} if exif is None else exif 491 | except: 492 | exif = {} 493 | 494 | imageExif = {} 495 | for tag, value in exif.items(): 496 | imageExif[TAGS.get(tag, tag)] = value 497 | 498 | imageOrientation = imageExif.get('Orientation', 1) 499 | imageAngle = { 500 | 1: 0, 2: 0, 3: 180, 4: 0, 5: 90, 6: 90, 7: 270, 8: 270 501 | }[imageOrientation] 502 | imageFlipH = 'true' if imageOrientation in (2, 5, 7) else 'false' 503 | imageFlipV = 'true' if imageOrientation == 4 else 'false' 504 | 505 | # Check if the user has specified a size 506 | if not pixelwidth or not pixelheight: 507 | # If not, get info from the picture itself 508 | pixelwidth, pixelheight = image.size[0:2] 509 | 510 | # Swap width and height if necessary 511 | if imageOrientation in (5, 6, 7, 8): 512 | pixelwidth, pixelheight = pixelheight, pixelwidth 513 | 514 | # OpenXML measures on-screen objects in English Metric Units 515 | # 1cm = 36000 EMUs 516 | emuperpixel = 12700 517 | width = str(pixelwidth * emuperpixel) 518 | height = str(pixelheight * emuperpixel) 519 | 520 | # There are 3 main elements inside a picture 521 | # 1. The Blipfill - specifies how the image fills the picture area 522 | # (stretch, tile, etc.) 523 | blipfill = makeelement('blipFill', nsprefix='pic') 524 | blipfill.append(makeelement('blip', nsprefix='a', attrnsprefix='r', 525 | attributes={'embed': picrelid})) 526 | stretch = makeelement('stretch', nsprefix='a') 527 | stretch.append(makeelement('fillRect', nsprefix='a')) 528 | blipfill.append(makeelement('srcRect', nsprefix='a')) 529 | blipfill.append(stretch) 530 | 531 | # 2. The non visual picture properties 532 | nvpicpr = makeelement('nvPicPr', nsprefix='pic') 533 | cnvpr = makeelement( 534 | 'cNvPr', nsprefix='pic', 535 | attributes={'id': '0', 'name': 'Picture 1', 'descr': picdescription} 536 | ) 537 | nvpicpr.append(cnvpr) 538 | cnvpicpr = makeelement('cNvPicPr', nsprefix='pic') 539 | cnvpicpr.append(makeelement( 540 | 'picLocks', nsprefix='a', 541 | attributes={'noChangeAspect': str(int(nochangeaspect)), 542 | 'noChangeArrowheads': str(int(nochangearrowheads))})) 543 | nvpicpr.append(cnvpicpr) 544 | 545 | # 3. The Shape properties 546 | sppr = makeelement('spPr', nsprefix='pic', attributes={'bwMode': 'auto'}) 547 | xfrm = makeelement( 548 | 'xfrm', nsprefix='a', attributes={ 549 | 'rot': str(imageAngle * 60000), 'flipH': imageFlipH, 550 | 'flipV': imageFlipV 551 | } 552 | ) 553 | xfrm.append( 554 | makeelement('off', nsprefix='a', attributes={'x': '0', 'y': '0'}) 555 | ) 556 | xfrm.append( 557 | makeelement( 558 | 'ext', nsprefix='a', attributes={'cx': width, 'cy': height} 559 | ) 560 | ) 561 | prstgeom = makeelement( 562 | 'prstGeom', nsprefix='a', attributes={'prst': 'rect'} 563 | ) 564 | prstgeom.append(makeelement('avLst', nsprefix='a')) 565 | sppr.append(xfrm) 566 | sppr.append(prstgeom) 567 | 568 | # Add our 3 parts to the picture element 569 | pic = makeelement('pic', nsprefix='pic') 570 | pic.append(nvpicpr) 571 | pic.append(blipfill) 572 | pic.append(sppr) 573 | 574 | # Now make the supporting elements 575 | # The following sequence is just: make element, then add its children 576 | graphicdata = makeelement( 577 | 'graphicData', nsprefix='a', 578 | attributes={'uri': ('http://schemas.openxmlformats.org/drawingml/200' 579 | '6/picture')}) 580 | graphicdata.append(pic) 581 | graphic = makeelement('graphic', nsprefix='a') 582 | graphic.append(graphicdata) 583 | 584 | framelocks = makeelement('graphicFrameLocks', nsprefix='a', 585 | attributes={'noChangeAspect': '1'}) 586 | framepr = makeelement('cNvGraphicFramePr', nsprefix='wp') 587 | framepr.append(framelocks) 588 | docpr = makeelement('docPr', nsprefix='wp', 589 | attributes={'id': picid, 'name': 'Picture 1', 590 | 'descr': picdescription}) 591 | effectextent = makeelement('effectExtent', nsprefix='wp', 592 | attributes={'l': '25400', 't': '0', 'r': '0', 593 | 'b': '0'}) 594 | extent = makeelement('extent', nsprefix='wp', 595 | attributes={'cx': width, 'cy': height}) 596 | inline = makeelement('inline', attributes={'distT': "0", 'distB': "0", 597 | 'distL': "0", 'distR': "0"}, 598 | nsprefix='wp') 599 | inline.append(extent) 600 | inline.append(effectextent) 601 | inline.append(docpr) 602 | inline.append(framepr) 603 | inline.append(graphic) 604 | drawing = makeelement('drawing') 605 | drawing.append(inline) 606 | run = makeelement('r') 607 | run.append(drawing) 608 | paragraph = makeelement('p') 609 | paragraph.append(run) 610 | 611 | if imagefiledict is not None: 612 | return relationshiplist, paragraph, imagefiledict 613 | else: 614 | return relationshiplist, paragraph 615 | 616 | 617 | def search(document, search): 618 | '''Search a document for a regex, return success / fail result''' 619 | result = False 620 | searchre = re.compile(search) 621 | for element in document.iter(): 622 | if element.tag == '{%s}t' % nsprefixes['w']: # t (text) elements 623 | if element.text: 624 | if searchre.search(element.text): 625 | result = True 626 | return result 627 | 628 | 629 | def replace(document, search, replace): 630 | """ 631 | Replace all occurences of string with a different string, return updated 632 | document 633 | """ 634 | newdocument = document 635 | searchre = re.compile(search) 636 | for element in newdocument.iter(): 637 | if element.tag == '{%s}t' % nsprefixes['w']: # t (text) elements 638 | if element.text: 639 | if searchre.search(element.text): 640 | element.text = re.sub(search, replace, element.text) 641 | return newdocument 642 | 643 | 644 | def clean(document): 645 | """ Perform misc cleaning operations on documents. 646 | Returns cleaned document. 647 | """ 648 | 649 | newdocument = document 650 | 651 | # Clean empty text and r tags 652 | for t in ('t', 'r'): 653 | rmlist = [] 654 | for element in newdocument.iter(): 655 | if element.tag == '{%s}%s' % (nsprefixes['w'], t): 656 | if not element.text and not len(element): 657 | rmlist.append(element) 658 | for element in rmlist: 659 | element.getparent().remove(element) 660 | 661 | return newdocument 662 | 663 | 664 | def findTypeParent(element, tag): 665 | """ Finds fist parent of element of the given type 666 | 667 | @param object element: etree element 668 | @param string the tag parent to search for 669 | 670 | @return object element: the found parent or None when not found 671 | """ 672 | 673 | p = element 674 | while True: 675 | p = p.getparent() 676 | if p.tag == tag: 677 | return p 678 | 679 | # Not found 680 | return None 681 | 682 | 683 | def AdvSearch(document, search, bs=3): 684 | '''Return set of all regex matches 685 | 686 | This is an advanced version of python-docx.search() that takes into 687 | account blocks of elements at a time. 688 | 689 | What it does: 690 | It searches the entire document body for text blocks. 691 | Since the text to search could be spawned across multiple text blocks, 692 | we need to adopt some sort of algorithm to handle this situation. 693 | The smaller matching group of blocks (up to bs) is then adopted. 694 | If the matching group has more than one block, blocks other than first 695 | are cleared and all the replacement text is put on first block. 696 | 697 | Examples: 698 | original text blocks : [ 'Hel', 'lo,', ' world!' ] 699 | search : 'Hello,' 700 | output blocks : [ 'Hello,' ] 701 | 702 | original text blocks : [ 'Hel', 'lo', ' __', 'name', '__!' ] 703 | search : '(__[a-z]+__)' 704 | output blocks : [ '__name__' ] 705 | 706 | @param instance document: The original document 707 | @param str search: The text to search for (regexp) 708 | append, or a list of etree elements 709 | @param int bs: See above 710 | 711 | @return set All occurences of search string 712 | 713 | ''' 714 | 715 | # Compile the search regexp 716 | searchre = re.compile(search) 717 | 718 | matches = [] 719 | 720 | # Will match against searchels. Searchels is a list that contains last 721 | # n text elements found in the document. 1 < n < bs 722 | searchels = [] 723 | 724 | for element in document.iter(): 725 | if element.tag == '{%s}t' % nsprefixes['w']: # t (text) elements 726 | if element.text: 727 | # Add this element to searchels 728 | searchels.append(element) 729 | if len(searchels) > bs: 730 | # Is searchels is too long, remove first elements 731 | searchels.pop(0) 732 | 733 | # Search all combinations, of searchels, starting from 734 | # smaller up to bigger ones 735 | # l = search lenght 736 | # s = search start 737 | # e = element IDs to merge 738 | found = False 739 | for l in range(1, len(searchels)+1): 740 | if found: 741 | break 742 | for s in range(len(searchels)): 743 | if found: 744 | break 745 | if s+l <= len(searchels): 746 | e = range(s, s+l) 747 | txtsearch = '' 748 | for k in e: 749 | txtsearch += searchels[k].text 750 | 751 | # Searcs for the text in the whole txtsearch 752 | match = searchre.search(txtsearch) 753 | if match: 754 | matches.append(match.group()) 755 | found = True 756 | return set(matches) 757 | 758 | 759 | def advReplace(document, search, replace, bs=3): 760 | """ 761 | Replace all occurences of string with a different string, return updated 762 | document 763 | 764 | This is a modified version of python-docx.replace() that takes into 765 | account blocks of elements at a time. The replace element can also 766 | be a string or an xml etree element. 767 | 768 | What it does: 769 | It searches the entire document body for text blocks. 770 | Then scan thos text blocks for replace. 771 | Since the text to search could be spawned across multiple text blocks, 772 | we need to adopt some sort of algorithm to handle this situation. 773 | The smaller matching group of blocks (up to bs) is then adopted. 774 | If the matching group has more than one block, blocks other than first 775 | are cleared and all the replacement text is put on first block. 776 | 777 | Examples: 778 | original text blocks : [ 'Hel', 'lo,', ' world!' ] 779 | search / replace: 'Hello,' / 'Hi!' 780 | output blocks : [ 'Hi!', '', ' world!' ] 781 | 782 | original text blocks : [ 'Hel', 'lo,', ' world!' ] 783 | search / replace: 'Hello, world' / 'Hi!' 784 | output blocks : [ 'Hi!!', '', '' ] 785 | 786 | original text blocks : [ 'Hel', 'lo,', ' world!' ] 787 | search / replace: 'Hel' / 'Hal' 788 | output blocks : [ 'Hal', 'lo,', ' world!' ] 789 | 790 | @param instance document: The original document 791 | @param str search: The text to search for (regexp) 792 | @param mixed replace: The replacement text or lxml.etree element to 793 | append, or a list of etree elements 794 | @param int bs: See above 795 | 796 | @return instance The document with replacement applied 797 | 798 | """ 799 | # Enables debug output 800 | DEBUG = False 801 | 802 | newdocument = document 803 | 804 | # Compile the search regexp 805 | searchre = re.compile(search) 806 | 807 | # Will match against searchels. Searchels is a list that contains last 808 | # n text elements found in the document. 1 < n < bs 809 | searchels = [] 810 | 811 | for element in newdocument.iter(): 812 | if element.tag == '{%s}t' % nsprefixes['w']: # t (text) elements 813 | if element.text: 814 | # Add this element to searchels 815 | searchels.append(element) 816 | if len(searchels) > bs: 817 | # Is searchels is too long, remove first elements 818 | searchels.pop(0) 819 | 820 | # Search all combinations, of searchels, starting from 821 | # smaller up to bigger ones 822 | # l = search lenght 823 | # s = search start 824 | # e = element IDs to merge 825 | found = False 826 | for l in range(1, len(searchels)+1): 827 | if found: 828 | break 829 | #print "slen:", l 830 | for s in range(len(searchels)): 831 | if found: 832 | break 833 | if s+l <= len(searchels): 834 | e = range(s, s+l) 835 | #print "elems:", e 836 | txtsearch = '' 837 | for k in e: 838 | txtsearch += searchels[k].text 839 | 840 | # Searcs for the text in the whole txtsearch 841 | match = searchre.search(txtsearch) 842 | if match: 843 | found = True 844 | 845 | # I've found something :) 846 | if DEBUG: 847 | log.debug("Found element!") 848 | log.debug("Search regexp: %s", 849 | searchre.pattern) 850 | log.debug("Requested replacement: %s", 851 | replace) 852 | log.debug("Matched text: %s", txtsearch) 853 | log.debug("Matched text (splitted): %s", 854 | map(lambda i: i.text, searchels)) 855 | log.debug("Matched at position: %s", 856 | match.start()) 857 | log.debug("matched in elements: %s", e) 858 | if isinstance(replace, etree._Element): 859 | log.debug("Will replace with XML CODE") 860 | elif isinstance(replace(list, tuple)): 861 | log.debug("Will replace with LIST OF" 862 | " ELEMENTS") 863 | else: 864 | log.debug("Will replace with:", 865 | re.sub(search, replace, 866 | txtsearch)) 867 | 868 | curlen = 0 869 | replaced = False 870 | for i in e: 871 | curlen += len(searchels[i].text) 872 | if curlen > match.start() and not replaced: 873 | # The match occurred in THIS element. 874 | # Puth in the whole replaced text 875 | if isinstance(replace, etree._Element): 876 | # Convert to a list and process 877 | # it later 878 | replace = [replace] 879 | if isinstance(replace, (list, tuple)): 880 | # I'm replacing with a list of 881 | # etree elements 882 | # clear the text in the tag and 883 | # append the element after the 884 | # parent paragraph 885 | # (because t elements cannot have 886 | # childs) 887 | p = findTypeParent( 888 | searchels[i], 889 | '{%s}p' % nsprefixes['w']) 890 | searchels[i].text = re.sub( 891 | search, '', txtsearch) 892 | insindex = p.getparent().index(p)+1 893 | for r in replace: 894 | p.getparent().insert( 895 | insindex, r) 896 | insindex += 1 897 | else: 898 | # Replacing with pure text 899 | searchels[i].text = re.sub( 900 | search, replace, txtsearch) 901 | replaced = True 902 | log.debug( 903 | "Replacing in element #: %s", i) 904 | else: 905 | # Clears the other text elements 906 | searchels[i].text = '' 907 | return newdocument 908 | 909 | 910 | def getdocumenttext(document): 911 | '''Return the raw text of a document, as a list of paragraphs.''' 912 | paratextlist = [] 913 | # Compile a list of all paragraph (p) elements 914 | paralist = [] 915 | for element in document.iter(): 916 | # Find p (paragraph) elements 917 | if element.tag == '{'+nsprefixes['w']+'}p': 918 | paralist.append(element) 919 | # Since a single sentence might be spread over multiple text elements, 920 | # iterate through each paragraph, appending all text (t) children to that 921 | # paragraphs text. 922 | for para in paralist: 923 | paratext = u'' 924 | # Loop through each paragraph 925 | for element in para.iter(): 926 | # Find t (text) elements 927 | if element.tag == '{'+nsprefixes['w']+'}t': 928 | if element.text: 929 | paratext = paratext+element.text 930 | elif element.tag == '{'+nsprefixes['w']+'}tab': 931 | paratext = paratext + '\t' 932 | # Add our completed paragraph text to the list of paragraph text 933 | if not len(paratext) == 0: 934 | paratextlist.append(paratext) 935 | return paratextlist 936 | 937 | 938 | def coreproperties(title, subject, creator, keywords, lastmodifiedby=None): 939 | """ 940 | Create core properties (common document properties referred to in the 941 | 'Dublin Core' specification). See appproperties() for other stuff. 942 | """ 943 | coreprops = makeelement('coreProperties', nsprefix='cp') 944 | coreprops.append(makeelement('title', tagtext=title, nsprefix='dc')) 945 | coreprops.append(makeelement('subject', tagtext=subject, nsprefix='dc')) 946 | coreprops.append(makeelement('creator', tagtext=creator, nsprefix='dc')) 947 | coreprops.append(makeelement('keywords', tagtext=','.join(keywords), 948 | nsprefix='cp')) 949 | if not lastmodifiedby: 950 | lastmodifiedby = creator 951 | coreprops.append(makeelement('lastModifiedBy', tagtext=lastmodifiedby, 952 | nsprefix='cp')) 953 | coreprops.append(makeelement('revision', tagtext='1', nsprefix='cp')) 954 | coreprops.append( 955 | makeelement('category', tagtext='Examples', nsprefix='cp')) 956 | coreprops.append( 957 | makeelement('description', tagtext='Examples', nsprefix='dc')) 958 | currenttime = time.strftime('%Y-%m-%dT%H:%M:%SZ') 959 | # Document creation and modify times 960 | # Prob here: we have an attribute who name uses one namespace, and that 961 | # attribute's value uses another namespace. 962 | # We're creating the element from a string as a workaround... 963 | for doctime in ['created', 'modified']: 964 | elm_str = ( 965 | '%s' 968 | ) % (doctime, currenttime, doctime) 969 | coreprops.append(etree.fromstring(elm_str)) 970 | return coreprops 971 | 972 | 973 | def appproperties(): 974 | """ 975 | Create app-specific properties. See docproperties() for more common 976 | document properties. 977 | 978 | """ 979 | appprops = makeelement('Properties', nsprefix='ep') 980 | appprops = etree.fromstring( 981 | '') 985 | props =\ 986 | {'Template': 'Normal.dotm', 987 | 'TotalTime': '6', 988 | 'Pages': '1', 989 | 'Words': '83', 990 | 'Characters': '475', 991 | 'Application': 'Microsoft Word 12.0.0', 992 | 'DocSecurity': '0', 993 | 'Lines': '12', 994 | 'Paragraphs': '8', 995 | 'ScaleCrop': 'false', 996 | 'LinksUpToDate': 'false', 997 | 'CharactersWithSpaces': '583', 998 | 'SharedDoc': 'false', 999 | 'HyperlinksChanged': 'false', 1000 | 'AppVersion': '12.0000'} 1001 | for prop in props: 1002 | appprops.append(makeelement(prop, tagtext=props[prop], nsprefix=None)) 1003 | return appprops 1004 | 1005 | 1006 | def websettings(): 1007 | '''Generate websettings''' 1008 | web = makeelement('webSettings') 1009 | web.append(makeelement('allowPNG')) 1010 | web.append(makeelement('doNotSaveAsSingleFile')) 1011 | return web 1012 | 1013 | 1014 | def relationshiplist(): 1015 | relationshiplist =\ 1016 | [['http://schemas.openxmlformats.org/officeDocument/2006/' 1017 | 'relationships/numbering', 'numbering.xml'], 1018 | ['http://schemas.openxmlformats.org/officeDocument/2006/' 1019 | 'relationships/styles', 'styles.xml'], 1020 | ['http://schemas.openxmlformats.org/officeDocument/2006/' 1021 | 'relationships/settings', 'settings.xml'], 1022 | ['http://schemas.openxmlformats.org/officeDocument/2006/' 1023 | 'relationships/webSettings', 'webSettings.xml'], 1024 | ['http://schemas.openxmlformats.org/officeDocument/2006/' 1025 | 'relationships/fontTable', 'fontTable.xml'], 1026 | ['http://schemas.openxmlformats.org/officeDocument/2006/' 1027 | 'relationships/theme', 'theme/theme1.xml']] 1028 | return relationshiplist 1029 | 1030 | 1031 | def wordrelationships(relationshiplist): 1032 | '''Generate a Word relationships file''' 1033 | # Default list of relationships 1034 | # FIXME: using string hack instead of making element 1035 | #relationships = makeelement('Relationships', nsprefix='pr') 1036 | relationships = etree.fromstring( 1037 | '') 1039 | count = 0 1040 | for relationship in relationshiplist: 1041 | # Relationship IDs (rId) start at 1. 1042 | rel_elm = makeelement('Relationship', nsprefix=None, 1043 | attributes={'Id': 'rId'+str(count+1), 1044 | 'Type': relationship[0], 1045 | 'Target': relationship[1]} 1046 | ) 1047 | relationships.append(rel_elm) 1048 | count += 1 1049 | return relationships 1050 | 1051 | 1052 | def savedocx( 1053 | document, coreprops, appprops, contenttypes, websettings, 1054 | wordrelationships, output, imagefiledict=None): 1055 | """ 1056 | Save a modified document 1057 | """ 1058 | if imagefiledict is None: 1059 | warn( 1060 | 'Using savedocx() without imagefiledict parameter will be deprec' 1061 | 'ated in the future.', PendingDeprecationWarning 1062 | ) 1063 | 1064 | assert os.path.isdir(template_dir) 1065 | docxfile = zipfile.ZipFile( 1066 | output, mode='w', compression=zipfile.ZIP_DEFLATED) 1067 | 1068 | # Move to the template data path 1069 | prev_dir = os.path.abspath('.') # save previous working dir 1070 | os.chdir(template_dir) 1071 | 1072 | # Serialize our trees into out zip file 1073 | treesandfiles = { 1074 | document: 'word/document.xml', 1075 | coreprops: 'docProps/core.xml', 1076 | appprops: 'docProps/app.xml', 1077 | contenttypes: '[Content_Types].xml', 1078 | websettings: 'word/webSettings.xml', 1079 | wordrelationships: 'word/_rels/document.xml.rels' 1080 | } 1081 | for tree in treesandfiles: 1082 | log.info('Saving: %s' % treesandfiles[tree]) 1083 | treestring = etree.tostring(tree, pretty_print=True) 1084 | docxfile.writestr(treesandfiles[tree], treestring) 1085 | 1086 | # Add & compress images, if applicable 1087 | if imagefiledict is not None: 1088 | for imagepath, picrelid in imagefiledict.items(): 1089 | archivename = 'word/media/%s_%s' % (picrelid, basename(imagepath)) 1090 | log.info('Saving: %s', archivename) 1091 | docxfile.write(imagepath, archivename) 1092 | 1093 | # Add & compress support files 1094 | files_to_ignore = ['.DS_Store'] # nuisance from some os's 1095 | for dirpath, dirnames, filenames in os.walk('.'): 1096 | for filename in filenames: 1097 | if filename in files_to_ignore: 1098 | continue 1099 | templatefile = join(dirpath, filename) 1100 | archivename = templatefile[2:] 1101 | log.info('Saving: %s', archivename) 1102 | docxfile.write(templatefile, archivename) 1103 | 1104 | log.info('Saved new file to: %r', output) 1105 | docxfile.close() 1106 | os.chdir(prev_dir) # restore previous working dir 1107 | return 1108 | -------------------------------------------------------------------------------- /example-extracttext.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | """ 3 | This file opens a docx (Office 2007) file and dumps the text. 4 | 5 | If you need to extract text from documents, use this file as a basis for your 6 | work. 7 | 8 | Part of Python's docx module - http://github.com/mikemaccana/python-docx 9 | See LICENSE for licensing information. 10 | """ 11 | 12 | import sys 13 | 14 | from docx import opendocx, getdocumenttext 15 | 16 | if __name__ == '__main__': 17 | try: 18 | document = opendocx(sys.argv[1]) 19 | newfile = open(sys.argv[2], 'w') 20 | except: 21 | print( 22 | "Please supply an input and output file. For example:\n" 23 | " example-extracttext.py 'My Office 2007 document.docx' 'outp" 24 | "utfile.txt'" 25 | ) 26 | exit() 27 | 28 | # Fetch all the text out of the document we just created 29 | paratextlist = getdocumenttext(document) 30 | 31 | # Make explicit unicode version 32 | newparatextlist = [] 33 | for paratext in paratextlist: 34 | newparatextlist.append(paratext.encode("utf-8")) 35 | 36 | # Print out text of document with two newlines under each paragraph 37 | newfile.write('\n\n'.join(newparatextlist)) 38 | -------------------------------------------------------------------------------- /example-makedocument.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | """ 4 | This file makes a .docx (Word 2007) file from scratch, showing off most of the 5 | features of python-docx. 6 | 7 | If you need to make documents from scratch, you can use this file as a basis 8 | for your work. 9 | 10 | Part of Python's docx module - http://github.com/mikemaccana/python-docx 11 | See LICENSE for licensing information. 12 | """ 13 | 14 | from docx import * 15 | 16 | if __name__ == '__main__': 17 | # Default set of relationshipships - the minimum components of a document 18 | relationships = relationshiplist() 19 | 20 | # Make a new document tree - this is the main part of a Word document 21 | document = newdocument() 22 | 23 | # This xpath location is where most interesting content lives 24 | body = document.xpath('/w:document/w:body', namespaces=nsprefixes)[0] 25 | 26 | # Append two headings and a paragraph 27 | body.append(heading("Welcome to Python's docx module", 1)) 28 | body.append(heading('Make and edit docx in 200 lines of pure Python', 2)) 29 | body.append(paragraph('The module was created when I was looking for a ' 30 | 'Python support for MS Word .doc files on PyPI and Stackoverflow. ' 31 | 'Unfortunately, the only solutions I could find used:')) 32 | 33 | # Add a numbered list 34 | points = [ 'COM automation' 35 | , '.net or Java' 36 | , 'Automating OpenOffice or MS Office' 37 | ] 38 | for point in points: 39 | body.append(paragraph(point, style='ListNumber')) 40 | body.append(paragraph([('For those of us who prefer something simpler, I ' 41 | 'made docx.', 'i')])) 42 | body.append(heading('Making documents', 2)) 43 | body.append(paragraph('The docx module has the following features:')) 44 | 45 | # Add some bullets 46 | points = ['Paragraphs', 'Bullets', 'Numbered lists', 47 | 'Multiple levels of headings', 'Tables', 'Document Properties'] 48 | for point in points: 49 | body.append(paragraph(point, style='ListBullet')) 50 | 51 | body.append(paragraph('Tables are just lists of lists, like this:')) 52 | # Append a table 53 | tbl_rows = [ ['A1', 'A2', 'A3'] 54 | , ['B1', 'B2', 'B3'] 55 | , ['C1', 'C2', 'C3'] 56 | ] 57 | body.append(table(tbl_rows)) 58 | 59 | body.append(heading('Editing documents', 2)) 60 | body.append(paragraph('Thanks to the awesomeness of the lxml module, ' 61 | 'we can:')) 62 | points = [ 'Search and replace' 63 | , 'Extract plain text of document' 64 | , 'Add and delete items anywhere within the document' 65 | ] 66 | for point in points: 67 | body.append(paragraph(point, style='ListBullet')) 68 | 69 | # Add an image 70 | relationships, picpara = picture(relationships, 'image1.png', 71 | 'This is a test description') 72 | body.append(picpara) 73 | 74 | # Search and replace 75 | print 'Searching for something in a paragraph ...', 76 | if search(body, 'the awesomeness'): 77 | print 'found it!' 78 | else: 79 | print 'nope.' 80 | 81 | print 'Searching for something in a heading ...', 82 | if search(body, '200 lines'): 83 | print 'found it!' 84 | else: 85 | print 'nope.' 86 | 87 | print 'Replacing ...', 88 | body = replace(body, 'the awesomeness', 'the goshdarned awesomeness') 89 | print 'done.' 90 | 91 | # Add a pagebreak 92 | body.append(pagebreak(type='page', orient='portrait')) 93 | 94 | body.append(heading('Ideas? Questions? Want to contribute?', 2)) 95 | body.append(paragraph('Email ')) 96 | 97 | # Create our properties, contenttypes, and other support files 98 | title = 'Python docx demo' 99 | subject = 'A practical example of making docx from Python' 100 | creator = 'Mike MacCana' 101 | keywords = ['python', 'Office Open XML', 'Word'] 102 | 103 | coreprops = coreproperties(title=title, subject=subject, creator=creator, 104 | keywords=keywords) 105 | appprops = appproperties() 106 | contenttypes = contenttypes() 107 | websettings = websettings() 108 | wordrelationships = wordrelationships(relationships) 109 | 110 | # Save our document 111 | savedocx(document, coreprops, appprops, contenttypes, websettings, 112 | wordrelationships, 'Welcome to the Python docx module.docx') 113 | 114 | -------------------------------------------------------------------------------- /image1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mikemaccana/python-docx/4c9b46dbebe3d2a9b82dbcd35af36584a36fd9fe/image1.png -------------------------------------------------------------------------------- /screenshot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mikemaccana/python-docx/4c9b46dbebe3d2a9b82dbcd35af36584a36fd9fe/screenshot.png -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | try: 4 | from setuptools import setup 5 | except ImportError: 6 | from distutils.core import setup 7 | from glob import glob 8 | 9 | # Make data go into site-packages (http://tinyurl.com/site-pkg) 10 | from distutils.command.install import INSTALL_SCHEMES 11 | for scheme in INSTALL_SCHEMES.values(): 12 | scheme['data'] = scheme['purelib'] 13 | 14 | DESCRIPTION = ( 15 | 'The docx module creates, reads and writes Microsoft Office Word 2007 do' 16 | 'cx files' 17 | ) 18 | 19 | setup( 20 | name='docx', 21 | version='0.2.4', 22 | install_requires=['lxml', 'Pillow>=2.0'], 23 | description=DESCRIPTION, 24 | author='Mike MacCana', 25 | author_email='python-docx@googlegroups.com', 26 | maintainer='Steve Canny', 27 | maintainer_email='python-docx@googlegroups.com', 28 | url='http://github.com/mikemaccana/python-docx', 29 | py_modules=['docx'], 30 | data_files=[ 31 | ('docx-template/_rels', glob('template/_rels/.*')), 32 | ('docx-template/docProps', glob('template/docProps/*.*')), 33 | ('docx-template/word', glob('template/word/*.xml')), 34 | ('docx-template/word/theme', glob('template/word/theme/*.*')), 35 | ], 36 | ) 37 | -------------------------------------------------------------------------------- /template/_rels/.rels: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | -------------------------------------------------------------------------------- /template/docProps/thumbnail.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mikemaccana/python-docx/4c9b46dbebe3d2a9b82dbcd35af36584a36fd9fe/template/docProps/thumbnail.jpeg -------------------------------------------------------------------------------- /template/word/fontTable.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | -------------------------------------------------------------------------------- /template/word/numbering.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 94 | 95 | 96 | 97 | 98 | 99 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 | 110 | 111 | 112 | 113 | 114 | 115 | 116 | 117 | 118 | 119 | 120 | 121 | 122 | 123 | 124 | 125 | 126 | 127 | 128 | 129 | 130 | 131 | 132 | 133 | 134 | 135 | 136 | 137 | 138 | 139 | 140 | 141 | 142 | 143 | 144 | 145 | 146 | 147 | 148 | 149 | 150 | 151 | 152 | 153 | 154 | 155 | 156 | 157 | 158 | 159 | 160 | 161 | 162 | 163 | 164 | 165 | 166 | 167 | 168 | 169 | 170 | 171 | 172 | 173 | 174 | 175 | 176 | 177 | 178 | 179 | 180 | 181 | 182 | 183 | 184 | 185 | 186 | 187 | 188 | 189 | 190 | 191 | 192 | 193 | 194 | 195 | 196 | 197 | 198 | 199 | 200 | 201 | 202 | 203 | 204 | 205 | 206 | 207 | 208 | 209 | 210 | 211 | 212 | 213 | 214 | 215 | 216 | 217 | 218 | 219 | 220 | 221 | 222 | 223 | 224 | 225 | 226 | 227 | 228 | 229 | 230 | 231 | 232 | 233 | 234 | 235 | 236 | 237 | 238 | 239 | 240 | 241 | 242 | 243 | 244 | 245 | 246 | 247 | 248 | 249 | 250 | 251 | 252 | 253 | 254 | 255 | 256 | 257 | 258 | 259 | 260 | 261 | 262 | 263 | 264 | 265 | 266 | 267 | 268 | 269 | 270 | 271 | 272 | 273 | 274 | 275 | 276 | 277 | 278 | 279 | 280 | 281 | 282 | 283 | 284 | 285 | 286 | 287 | 288 | 289 | 290 | 291 | 292 | 293 | 294 | 295 | 296 | 297 | 298 | 299 | 300 | 301 | 302 | 303 | 304 | 305 | 306 | 307 | 308 | 309 | 310 | 311 | 312 | 313 | 314 | 315 | 316 | 317 | 318 | 319 | 320 | 321 | 322 | 323 | 324 | 325 | 326 | 327 | 328 | 329 | 330 | 331 | 332 | 333 | 334 | 335 | 336 | 337 | 338 | 339 | 340 | 341 | 342 | 343 | 344 | 345 | 346 | 347 | 348 | 349 | 350 | 351 | 352 | 353 | 354 | 355 | 356 | 357 | 358 | 359 | 360 | 361 | 362 | 363 | 364 | 365 | 366 | 367 | 368 | 369 | 370 | 371 | 372 | 373 | 374 | 375 | 376 | 377 | 378 | 379 | 380 | 381 | 382 | 383 | 384 | 385 | 386 | 387 | 388 | 389 | 390 | 391 | 392 | 393 | 394 | 395 | 396 | 397 | 398 | 399 | 400 | 401 | 402 | 403 | 404 | 405 | 406 | 407 | 408 | 409 | 410 | 411 | 412 | 413 | 414 | 415 | 416 | 417 | 418 | 419 | 420 | 421 | 422 | 423 | 424 | 425 | 426 | 427 | 428 | 429 | 430 | 431 | 432 | 433 | 434 | 435 | 436 | 437 | 438 | 439 | 440 | 441 | 442 | 443 | 444 | 445 | 446 | 447 | 448 | 449 | 450 | 451 | 452 | 453 | 454 | 455 | 456 | 457 | 458 | 459 | 460 | 461 | 462 | 463 | 464 | 465 | 466 | 467 | 468 | 469 | 470 | 471 | 472 | 473 | 474 | 475 | 476 | 477 | 478 | 479 | 480 | 481 | 482 | 483 | 484 | 485 | 486 | 487 | 488 | 489 | 490 | 491 | 492 | 493 | 494 | 495 | 496 | 497 | 498 | 499 | 500 | 501 | 502 | 503 | 504 | 505 | 506 | 507 | 508 | 509 | 510 | -------------------------------------------------------------------------------- /template/word/settings.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | -------------------------------------------------------------------------------- /template/word/styles.xml: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /template/word/theme/theme1.xml: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /tests/image1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mikemaccana/python-docx/4c9b46dbebe3d2a9b82dbcd35af36584a36fd9fe/tests/image1.png -------------------------------------------------------------------------------- /tests/test_docx.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | """ 4 | Test docx module 5 | """ 6 | 7 | import os 8 | import lxml 9 | from docx import ( 10 | appproperties, contenttypes, coreproperties, getdocumenttext, heading, 11 | makeelement, newdocument, nsprefixes, opendocx, pagebreak, paragraph, 12 | picture, relationshiplist, replace, savedocx, search, table, websettings, 13 | wordrelationships 14 | ) 15 | 16 | TEST_FILE = 'ShortTest.docx' 17 | IMAGE1_FILE = 'image1.png' 18 | 19 | 20 | # --- Setup & Support Functions --- 21 | def setup_module(): 22 | """Set up test fixtures""" 23 | import shutil 24 | if IMAGE1_FILE not in os.listdir('.'): 25 | shutil.copyfile(os.path.join(os.path.pardir, IMAGE1_FILE), IMAGE1_FILE) 26 | testnewdocument() 27 | 28 | 29 | def teardown_module(): 30 | """Tear down test fixtures""" 31 | if TEST_FILE in os.listdir('.'): 32 | os.remove(TEST_FILE) 33 | 34 | 35 | def simpledoc(noimagecopy=False): 36 | """Make a docx (document, relationships) for use in other docx tests""" 37 | relationships = relationshiplist() 38 | imagefiledict = {} 39 | document = newdocument() 40 | docbody = document.xpath('/w:document/w:body', namespaces=nsprefixes)[0] 41 | docbody.append(heading('Heading 1', 1)) 42 | docbody.append(heading('Heading 2', 2)) 43 | docbody.append(paragraph('Paragraph 1')) 44 | for point in ['List Item 1', 'List Item 2', 'List Item 3']: 45 | docbody.append(paragraph(point, style='ListNumber')) 46 | docbody.append(pagebreak(type='page')) 47 | docbody.append(paragraph('Paragraph 2')) 48 | docbody.append( 49 | table( 50 | [ 51 | ['A1', 'A2', 'A3'], 52 | ['B1', 'B2', 'B3'], 53 | ['C1', 'C2', 'C3'] 54 | ] 55 | ) 56 | ) 57 | docbody.append(pagebreak(type='section', orient='portrait')) 58 | if noimagecopy: 59 | relationships, picpara, imagefiledict = picture( 60 | relationships, IMAGE1_FILE, 'This is a test description', 61 | imagefiledict=imagefiledict 62 | ) 63 | else: 64 | relationships, picpara = picture( 65 | relationships, IMAGE1_FILE, 'This is a test description' 66 | ) 67 | docbody.append(picpara) 68 | docbody.append(pagebreak(type='section', orient='landscape')) 69 | docbody.append(paragraph('Paragraph 3')) 70 | if noimagecopy: 71 | return (document, docbody, relationships, imagefiledict) 72 | else: 73 | return (document, docbody, relationships) 74 | 75 | 76 | # --- Test Functions --- 77 | def testsearchandreplace(): 78 | """Ensure search and replace functions work""" 79 | document, docbody, relationships = simpledoc() 80 | docbody = document.xpath('/w:document/w:body', namespaces=nsprefixes)[0] 81 | assert search(docbody, 'ing 1') 82 | assert search(docbody, 'ing 2') 83 | assert search(docbody, 'graph 3') 84 | assert search(docbody, 'ist Item') 85 | assert search(docbody, 'A1') 86 | if search(docbody, 'Paragraph 2'): 87 | docbody = replace(docbody, 'Paragraph 2', 'Whacko 55') 88 | assert search(docbody, 'Whacko 55') 89 | 90 | 91 | def testtextextraction(): 92 | """Ensure text can be pulled out of a document""" 93 | document = opendocx(TEST_FILE) 94 | paratextlist = getdocumenttext(document) 95 | assert len(paratextlist) > 0 96 | 97 | 98 | def testunsupportedpagebreak(): 99 | """Ensure unsupported page break types are trapped""" 100 | document = newdocument() 101 | docbody = document.xpath('/w:document/w:body', namespaces=nsprefixes)[0] 102 | try: 103 | docbody.append(pagebreak(type='unsup')) 104 | except ValueError: 105 | return # passed 106 | assert False # failed 107 | 108 | 109 | def testnewdocument(): 110 | """Test that a new document can be created""" 111 | document, docbody, relationships = simpledoc() 112 | coreprops = coreproperties( 113 | 'Python docx testnewdocument', 114 | 'A short example of making docx from Python', 'Alan Brooks', 115 | ['python', 'Office Open XML', 'Word'] 116 | ) 117 | savedocx( 118 | document, coreprops, appproperties(), contenttypes(), websettings(), 119 | wordrelationships(relationships), TEST_FILE 120 | ) 121 | 122 | 123 | def testnewdocument_noimagecopy(): 124 | """ 125 | Test that a new document can be created 126 | """ 127 | document, docbody, relationships, imagefiledict = simpledoc( 128 | noimagecopy=True 129 | ) 130 | coreprops = coreproperties( 131 | 'Python docx testnewdocument', 132 | 'A short example of making docx from Python', 'Alan Brooks', 133 | ['python', 'Office Open XML', 'Word'] 134 | ) 135 | savedocx( 136 | document, coreprops, appproperties(), contenttypes(), websettings(), 137 | wordrelationships(relationships), TEST_FILE, 138 | imagefiledict=imagefiledict 139 | ) 140 | 141 | 142 | def testopendocx(): 143 | """Ensure an etree element is returned""" 144 | if isinstance(opendocx(TEST_FILE), lxml.etree._Element): 145 | pass 146 | else: 147 | assert False 148 | 149 | 150 | def testmakeelement(): 151 | """Ensure custom elements get created""" 152 | testelement = makeelement( 153 | 'testname', 154 | attributes={'testattribute': 'testvalue'}, 155 | tagtext='testtagtext' 156 | ) 157 | assert testelement.tag == ( 158 | '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}testn' 159 | 'ame' 160 | ) 161 | assert testelement.attrib == { 162 | '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}testa' 163 | 'ttribute': 'testvalue' 164 | } 165 | assert testelement.text == 'testtagtext' 166 | 167 | 168 | def testparagraph(): 169 | """Ensure paragraph creates p elements""" 170 | testpara = paragraph('paratext', style='BodyText') 171 | assert testpara.tag == ( 172 | '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}p' 173 | ) 174 | pass 175 | 176 | 177 | def testtable(): 178 | """Ensure tables make sense""" 179 | testtable = table([['A1', 'A2'], ['B1', 'B2'], ['C1', 'C2']]) 180 | assert ( 181 | testtable.xpath( 182 | '/ns0:tbl/ns0:tr[2]/ns0:tc[2]/ns0:p/ns0:r/ns0:t', 183 | namespaces={'ns0': ('http://schemas.openxmlformats.org/wordproce' 184 | 'ssingml/2006/main')} 185 | )[0].text == 'B2' 186 | ) 187 | 188 | 189 | if __name__ == '__main__': 190 | import nose 191 | nose.main() 192 | -------------------------------------------------------------------------------- /tox.ini: -------------------------------------------------------------------------------- 1 | # 2 | # tox.ini 3 | # 4 | # Copyright (C) 2012, 2013 Steve Canny scanny@cisco.com 5 | # 6 | # This module is part of python-docx and is released under the MIT License: 7 | # http://www.opensource.org/licenses/mit-license.php 8 | # 9 | # Configuration for tox 10 | 11 | [tox] 12 | envlist = py26, py27 13 | 14 | [testenv] 15 | deps = 16 | lxml 17 | nose 18 | Pillow 19 | 20 | commands = 21 | nosetests 22 | --------------------------------------------------------------------------------