├── .gitignore
├── HACKING.markdown
├── LICENSE
├── MANIFEST.in
├── Makefile
├── README.rst
├── SERVING_SUGGESTIONS.markdown
├── __init__.py
├── docx.py
├── example-extracttext.py
├── example-makedocument.py
├── image1.png
├── screenshot.png
├── setup.py
├── template
├── _rels
│ └── .rels
├── docProps
│ └── thumbnail.jpeg
└── word
│ ├── fontTable.xml
│ ├── numbering.xml
│ ├── settings.xml
│ ├── styles.xml
│ └── theme
│ └── theme1.xml
├── tests
├── image1.png
└── test_docx.py
└── tox.ini
/.gitignore:
--------------------------------------------------------------------------------
1 | .coverage
2 | dist
3 | *.docx
4 | /*.egg-info/
5 | MANIFEST
6 | *.pyc
7 | README.html
8 | _scratch
9 | template/word/media
10 | .tox
11 |
--------------------------------------------------------------------------------
/HACKING.markdown:
--------------------------------------------------------------------------------
1 | Adding Features
2 | ===============
3 |
4 | # Recommended reading
5 |
6 | - The [LXML tutorial](http://codespeak.net/lxml/tutorial.html) covers the basics of XML etrees, which we create, append and insert to make XML documents. LXML also provides XPath, which we use to specify locations in the document.
7 | - If you're stuck. check out the [OpenXML specs and videos](http://openxmldeveloper.org). In particular, the is [OpenXML ECMA spec] [] is well worth a read.
8 | - Learning about [XML namespaces](http://www.w3schools.com/XML/xml_namespaces.asp)
9 | - The [Namespaces section of Dive into Python](http://diveintopython3.org/xml.html)
10 | - Microsoft's [introduction to the Office (2007) Open XML File Formats](http://msdn.microsoft.com/en-us/library/aa338205.aspx)
11 |
12 | # How can I contribute?
13 |
14 | Fork the project on github, then send the main project a [pull request](http://github.com/guides/pull-requests). The project will then accept your pull (in most cases), which will show your changes part of the changelog for the main project, along with your name and picture.
15 |
16 | # A note about namespaces and LXML
17 |
18 | LXML doesn't use namespace prefixes. It just uses the actual namespaces, and wants you to set a namespace on each tag. For example, rather than making an element with the 'w' namespace prefix, you'd make an element with the '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}' prefix.
19 |
20 | To make this easier:
21 |
22 | - The most common namespace, '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}' (prefix 'w') is automatically added by makeelement()
23 | - You can specify other namespaces with 'nsprefix', which maps the prefixes Word files use to the actual namespaces, eg:
24 |
25 |
makeelement('coreProperties',nsprefix='cp')
26 |
27 | will generate:
28 |
29 |
30 |
31 | which is the same as what Word generates:
32 |
33 |
34 |
35 | The namespace prefixes are different, but that's irrelevant as the namespaces themselves are the same.
36 |
37 | There's also a cool side effect - you can ignore setting 'xmlns' attributes that aren't used directly in the current element, since there's no need. Eg, you can make the equivalent of this from a Word file:
38 |
39 |
45 |
46 |
47 | With the following code:
48 |
49 | docprops = makeelement('coreProperties',nsprefix='cp')
50 |
51 | We only need to specify the 'cp' prefix because that's what this element uses. The other 'xmlns' attributes are used to specify the prefixes for child elements. We don't need to specify them here because each child element will have its namespace specified when we make that child.
52 |
53 | # Coding Style
54 |
55 | Basically just look at what's there. But if you need something more specific:
56 |
57 | - Functional - every function should take some inputs, return something, and not use any globals.
58 | - [Google Python Style Guide style](http://code.google.com/p/soc/wiki/PythonStyleGuide)
59 |
60 | # Unit Testing
61 |
62 | After adding code, open **tests/test_docx.py** and add a test that calls your function and checks its output.
63 |
64 | - Use **easy_install** to fetch the **nose** and **coverage** modules
65 | - Run
66 |
67 | nosetests --with-coverage
68 |
69 | to run all the doctests. They should all pass.
70 |
71 | # Tips
72 |
73 | ## If Word complains about files:
74 |
75 | First, determine whether Word can recover the files:
76 | - If Word cannot recover the file, you most likely have a problem with your zip file
77 | - If Word can recover the file, you most likely have a problem with your XML
78 |
79 | ### Common Zipfile issues
80 |
81 | - Ensure the same file isn't included twice in your zip archive. Zip supports this, Word doesn't.
82 | - Ensure that all media files have an entry for their file type in [Content_Types].xml
83 | - Ensure that files in zip file file have leading '/'s removed.
84 |
85 | ### Common XML issues
86 |
87 | - Ensure the _rels, docProps, word, etc directories are in the top level of your zip file.
88 | - Check your namespaces - on both the tags, and the attributes
89 | - Check capitalization of tag names
90 | - Ensure you're not missing any attributes
91 | - If images or other embedded content is shown with a large red X, your relationships file is missing data.
92 |
93 | #### One common debugging technique we've used before
94 |
95 | - Re-save the document in Word will produced a fixed version of the file
96 | - Unzip and grabbing the serialized XML out of the fixed file
97 | - Use etree.fromstring() to turn it into an element, and include that in your code.
98 | - Check that a correct file is generated
99 | - Remove an element from your string-created etree (including both opening and closing tags)
100 | - Use element.append(makelement()) to add that element to your tree
101 | - Open the doc in Word and see if it still works
102 | - Repeat the last three steps until you discover which element is causing the prob
103 |
104 | [OpenXML ECMA spec]: http://www.ecma-international.org/publications/files/ECMA-ST/Office%20Open%20XML%201st%20edition%20Part%204%20(DOCX).zip
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | Copyright (c) 2009-2010 Mike MacCana
2 |
3 | Permission is hereby granted, free of charge, to any person
4 | obtaining a copy of this software and associated documentation
5 | files (the "Software"), to deal in the Software without
6 | restriction, including without limitation the rights to use,
7 | copy, modify, merge, publish, distribute, sublicense, and/or sell
8 | copies of the Software, and to permit persons to whom the
9 | Software is furnished to do so, subject to the following
10 | conditions:
11 |
12 | The above copyright notice and this permission notice shall be
13 | included in all copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
16 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
17 | OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
18 | NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
19 | HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
20 | WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
21 | FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
22 | OTHER DEALINGS IN THE SOFTWARE.
--------------------------------------------------------------------------------
/MANIFEST.in:
--------------------------------------------------------------------------------
1 | include template/*
2 | include template/_rels/*
3 | include template/docProps/*
4 | include template/word/*
5 | include template/word/theme/*
6 |
--------------------------------------------------------------------------------
/Makefile:
--------------------------------------------------------------------------------
1 | PYTHON = $(shell test -x bin/python && echo bin/python || echo `which python`)
2 | SETUP = $(PYTHON) ./setup.py
3 |
4 | .PHONY: clean help coverage register sdist upload
5 |
6 | help:
7 | @echo "Please use \`make ' where is one or more of"
8 | @echo " clean delete intermediate work product and start fresh"
9 | @echo " coverage run nosetests with coverage"
10 | @echo " readme update README.html from README.rst"
11 | @echo " register update metadata (README.rst) on PyPI"
12 | @echo " sdist generate a source distribution into dist/"
13 | @echo " upload upload distribution tarball to PyPI"
14 |
15 | clean:
16 | find . -type f -name \*.pyc -exec rm {} \;
17 | rm -rf dist .coverage .DS_Store MANIFEST
18 |
19 | coverage:
20 | nosetests --with-coverage --cover-package=docx --cover-erase
21 |
22 | readme:
23 | rst2html README.rst >README.html
24 | open README.html
25 |
26 | register:
27 | $(SETUP) register
28 |
29 | sdist:
30 | $(SETUP) sdist
31 |
32 | upload:
33 | $(SETUP) sdist upload
34 |
--------------------------------------------------------------------------------
/README.rst:
--------------------------------------------------------------------------------
1 | ###########
2 | This Project Has Moved!
3 | ###########
4 |
5 | **Python DocX is now part of Python OpenXML**. There's all kinds of new stuff, including Python 3 support, sister libraries for doing Excel files, and more. Check out the `current Python DocX GitHub `_ and the `current Python DocX docs `_.
6 |
7 | Info below is kept for archival purposes. **Go use the new stuff!**
8 |
9 | Introduction
10 | ============
11 |
12 | The docx module creates, reads and writes Microsoft Office Word 2007 docx
13 | files.
14 |
15 | These are referred to as 'WordML', 'Office Open XML' and 'Open XML' by
16 | Microsoft.
17 |
18 | These documents can be opened in Microsoft Office 2007 / 2010, Microsoft Mac
19 | Office 2008, Google Docs, OpenOffice.org 3, and Apple iWork 08.
20 |
21 | They also `validate as well formed XML `_.
22 |
23 | The module was created when I was looking for a Python support for MS Word
24 | .docx files, but could only find various hacks involving COM automation,
25 | calling .Net or Java, or automating OpenOffice or MS Office.
26 |
27 | The docx module has the following features:
28 |
29 | Making documents
30 | ----------------
31 |
32 | Features for making documents include:
33 |
34 | - Paragraphs
35 | - Bullets
36 | - Numbered lists
37 | - Document properties (author, company, etc)
38 | - Multiple levels of headings
39 | - Tables
40 | - Section and page breaks
41 | - Images
42 |
43 | .. image:: http://github.com/mikemaccana/python-docx/raw/master/screenshot.png
44 |
45 |
46 | Editing documents
47 | -----------------
48 |
49 | Thanks to the awesomeness of the lxml module, we can:
50 |
51 | - Search and replace
52 | - Extract plain text of document
53 | - Add and delete items anywhere within the document
54 | - Change document properties
55 | - Run xpath queries against particular locations in the document - useful for
56 | retrieving data from user-completed templates.
57 |
58 |
59 | Getting started
60 | ===============
61 |
62 | Making and Modifying Documents
63 | ------------------------------
64 |
65 | - Just `download python docx `_.
66 | - Use **pip** or **easy_install** to fetch the **lxml** and **PIL** modules.
67 | - Then run::
68 |
69 | example-makedocument.py
70 |
71 |
72 | Congratulations, you just made and then modified a Word document!
73 |
74 |
75 | Extracting Text from a Document
76 | -------------------------------
77 |
78 | If you just want to extract the text from a Word file, run::
79 |
80 | example-extracttext.py 'Some word file.docx' 'new file.txt'
81 |
82 |
83 | Ideas & To Do List
84 | ~~~~~~~~~~~~~~~~~~
85 |
86 | - Further improvements to image handling
87 | - Document health checks
88 | - Egg
89 | - Markdown conversion support
90 |
91 |
92 | We love forks, changes and pull requests!
93 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
94 |
95 | - Check out the [HACKING](HACKING.markdown) to add your own changes!
96 | - For this project on github
97 | - Send a pull request via github and we'll add your changes!
98 |
99 | Want to talk? Need help?
100 | ~~~~~~~~~~~~~~~~~~~~~~~~
101 |
102 | Email python-docx@googlegroups.com
103 |
104 |
105 | License
106 | ~~~~~~~
107 |
108 | Licensed under the `MIT license `_
109 |
110 | Short version: this code is copyrighted to me (Mike MacCana), I give you
111 | permission to do what you want with it except remove my name from the credits.
112 | See the LICENSE file for specific terms.
113 |
--------------------------------------------------------------------------------
/SERVING_SUGGESTIONS.markdown:
--------------------------------------------------------------------------------
1 | Serving Suggestions
2 | ===================
3 |
4 | # Mashing docx with other modules
5 |
6 | This is a list of interesting things you could do with Python docx when mashed up with other modules.
7 |
8 | - [LinkedIn Python API](http://code.google.com/p/python-linkedin/) - Auto-build a Word doc whenever some old recruiting dude asks one.
9 | - [Python Natural Language Toolkit](http://www.nltk.org/) - can analyse text and extract meaning.
10 | - [Lamson](http://lamsonproject.org/) - transparently parse or modify email attachments.
11 |
12 | Any other ideas? Doing something cool you want to tell the world about? python.docx@librelist.com
--------------------------------------------------------------------------------
/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mikemaccana/python-docx/4c9b46dbebe3d2a9b82dbcd35af36584a36fd9fe/__init__.py
--------------------------------------------------------------------------------
/docx.py:
--------------------------------------------------------------------------------
1 | # encoding: utf-8
2 |
3 | """
4 | Open and modify Microsoft Word 2007 docx files (called 'OpenXML' and
5 | 'Office OpenXML' by Microsoft)
6 |
7 | Part of Python's docx module - http://github.com/mikemaccana/python-docx
8 | See LICENSE for licensing information.
9 | """
10 |
11 | import os
12 | import re
13 | import time
14 | import shutil
15 | import zipfile
16 |
17 | from lxml import etree
18 | from os.path import abspath, basename, join
19 |
20 | try:
21 | from PIL import Image
22 | except ImportError:
23 | import Image
24 |
25 | try:
26 | from PIL.ExifTags import TAGS
27 | except ImportError:
28 | TAGS = {}
29 |
30 | from exceptions import PendingDeprecationWarning
31 | from warnings import warn
32 |
33 | import logging
34 |
35 |
36 | log = logging.getLogger(__name__)
37 |
38 | # Record template directory's location which is just 'template' for a docx
39 | # developer or 'site-packages/docx-template' if you have installed docx
40 | template_dir = join(os.path.dirname(__file__), 'docx-template') # installed
41 | if not os.path.isdir(template_dir):
42 | template_dir = join(os.path.dirname(__file__), 'template') # dev
43 |
44 | # All Word prefixes / namespace matches used in document.xml & core.xml.
45 | # LXML doesn't actually use prefixes (just the real namespace) , but these
46 | # make it easier to copy Word output more easily.
47 | nsprefixes = {
48 | 'mo': 'http://schemas.microsoft.com/office/mac/office/2008/main',
49 | 'o': 'urn:schemas-microsoft-com:office:office',
50 | 've': 'http://schemas.openxmlformats.org/markup-compatibility/2006',
51 | # Text Content
52 | 'w': 'http://schemas.openxmlformats.org/wordprocessingml/2006/main',
53 | 'w10': 'urn:schemas-microsoft-com:office:word',
54 | 'wne': 'http://schemas.microsoft.com/office/word/2006/wordml',
55 | # Drawing
56 | 'a': 'http://schemas.openxmlformats.org/drawingml/2006/main',
57 | 'm': 'http://schemas.openxmlformats.org/officeDocument/2006/math',
58 | 'mv': 'urn:schemas-microsoft-com:mac:vml',
59 | 'pic': 'http://schemas.openxmlformats.org/drawingml/2006/picture',
60 | 'v': 'urn:schemas-microsoft-com:vml',
61 | 'wp': ('http://schemas.openxmlformats.org/drawingml/2006/wordprocessing'
62 | 'Drawing'),
63 | # Properties (core and extended)
64 | 'cp': ('http://schemas.openxmlformats.org/package/2006/metadata/core-pr'
65 | 'operties'),
66 | 'dc': 'http://purl.org/dc/elements/1.1/',
67 | 'ep': ('http://schemas.openxmlformats.org/officeDocument/2006/extended-'
68 | 'properties'),
69 | 'xsi': 'http://www.w3.org/2001/XMLSchema-instance',
70 | # Content Types
71 | 'ct': 'http://schemas.openxmlformats.org/package/2006/content-types',
72 | # Package Relationships
73 | 'r': ('http://schemas.openxmlformats.org/officeDocument/2006/relationsh'
74 | 'ips'),
75 | 'pr': 'http://schemas.openxmlformats.org/package/2006/relationships',
76 | # Dublin Core document properties
77 | 'dcmitype': 'http://purl.org/dc/dcmitype/',
78 | 'dcterms': 'http://purl.org/dc/terms/'}
79 |
80 |
81 | def opendocx(file):
82 | '''Open a docx file, return a document XML tree'''
83 | mydoc = zipfile.ZipFile(file)
84 | xmlcontent = mydoc.read('word/document.xml')
85 | document = etree.fromstring(xmlcontent)
86 | return document
87 |
88 |
89 | def newdocument():
90 | document = makeelement('document')
91 | document.append(makeelement('body'))
92 | return document
93 |
94 |
95 | def makeelement(tagname, tagtext=None, nsprefix='w', attributes=None,
96 | attrnsprefix=None):
97 | '''Create an element & return it'''
98 | # Deal with list of nsprefix by making namespacemap
99 | namespacemap = None
100 | if isinstance(nsprefix, list):
101 | namespacemap = {}
102 | for prefix in nsprefix:
103 | namespacemap[prefix] = nsprefixes[prefix]
104 | # FIXME: rest of code below expects a single prefix
105 | nsprefix = nsprefix[0]
106 | if nsprefix:
107 | namespace = '{%s}' % nsprefixes[nsprefix]
108 | else:
109 | # For when namespace = None
110 | namespace = ''
111 | newelement = etree.Element(namespace+tagname, nsmap=namespacemap)
112 | # Add attributes with namespaces
113 | if attributes:
114 | # If they haven't bothered setting attribute namespace, use an empty
115 | # string (equivalent of no namespace)
116 | if not attrnsprefix:
117 | # Quick hack: it seems every element that has a 'w' nsprefix for
118 | # its tag uses the same prefix for it's attributes
119 | if nsprefix == 'w':
120 | attributenamespace = namespace
121 | else:
122 | attributenamespace = ''
123 | else:
124 | attributenamespace = '{'+nsprefixes[attrnsprefix]+'}'
125 |
126 | for tagattribute in attributes:
127 | newelement.set(attributenamespace+tagattribute,
128 | attributes[tagattribute])
129 | if tagtext:
130 | newelement.text = tagtext
131 | return newelement
132 |
133 |
134 | def pagebreak(type='page', orient='portrait'):
135 | '''Insert a break, default 'page'.
136 | See http://openxmldeveloper.org/forums/thread/4075.aspx
137 | Return our page break element.'''
138 | # Need to enumerate different types of page breaks.
139 | validtypes = ['page', 'section']
140 | if type not in validtypes:
141 | tmpl = 'Page break style "%s" not implemented. Valid styles: %s.'
142 | raise ValueError(tmpl % (type, validtypes))
143 | pagebreak = makeelement('p')
144 | if type == 'page':
145 | run = makeelement('r')
146 | br = makeelement('br', attributes={'type': type})
147 | run.append(br)
148 | pagebreak.append(run)
149 | elif type == 'section':
150 | pPr = makeelement('pPr')
151 | sectPr = makeelement('sectPr')
152 | if orient == 'portrait':
153 | pgSz = makeelement('pgSz', attributes={'w': '12240', 'h': '15840'})
154 | elif orient == 'landscape':
155 | pgSz = makeelement('pgSz', attributes={'h': '12240', 'w': '15840',
156 | 'orient': 'landscape'})
157 | sectPr.append(pgSz)
158 | pPr.append(sectPr)
159 | pagebreak.append(pPr)
160 | return pagebreak
161 |
162 |
163 | def paragraph(paratext, style='BodyText', breakbefore=False, jc='left'):
164 | """
165 | Return a new paragraph element containing *paratext*. The paragraph's
166 | default style is 'Body Text', but a new style may be set using the
167 | *style* parameter.
168 |
169 | @param string jc: Paragraph alignment, possible values:
170 | left, center, right, both (justified), ...
171 | see http://www.schemacentral.com/sc/ooxml/t-w_ST_Jc.html
172 | for a full list
173 |
174 | If *paratext* is a list, add a run for each (text, char_format_str)
175 | 2-tuple in the list. char_format_str is a string containing one or more
176 | of the characters 'b', 'i', or 'u', meaning bold, italic, and underline
177 | respectively. For example:
178 |
179 | paratext = [
180 | ('some bold text', 'b'),
181 | ('some normal text', ''),
182 | ('some italic underlined text', 'iu')
183 | ]
184 | """
185 | # Make our elements
186 | paragraph = makeelement('p')
187 |
188 | if not isinstance(paratext, list):
189 | paratext = [(paratext, '')]
190 | text_tuples = []
191 | for pt in paratext:
192 | text, char_styles_str = (pt if isinstance(pt, (list, tuple))
193 | else (pt, ''))
194 | text_elm = makeelement('t', tagtext=text)
195 | if len(text.strip()) < len(text):
196 | text_elm.set('{http://www.w3.org/XML/1998/namespace}space',
197 | 'preserve')
198 | text_tuples.append([text_elm, char_styles_str])
199 | pPr = makeelement('pPr')
200 | pStyle = makeelement('pStyle', attributes={'val': style})
201 | pJc = makeelement('jc', attributes={'val': jc})
202 | pPr.append(pStyle)
203 | pPr.append(pJc)
204 |
205 | # Add the text to the run, and the run to the paragraph
206 | paragraph.append(pPr)
207 | for text_elm, char_styles_str in text_tuples:
208 | run = makeelement('r')
209 | rPr = makeelement('rPr')
210 | # Apply styles
211 | if 'b' in char_styles_str:
212 | b = makeelement('b')
213 | rPr.append(b)
214 | if 'i' in char_styles_str:
215 | i = makeelement('i')
216 | rPr.append(i)
217 | if 'u' in char_styles_str:
218 | u = makeelement('u', attributes={'val': 'single'})
219 | rPr.append(u)
220 | run.append(rPr)
221 | # Insert lastRenderedPageBreak for assistive technologies like
222 | # document narrators to know when a page break occurred.
223 | if breakbefore:
224 | lastRenderedPageBreak = makeelement('lastRenderedPageBreak')
225 | run.append(lastRenderedPageBreak)
226 | run.append(text_elm)
227 | paragraph.append(run)
228 | # Return the combined paragraph
229 | return paragraph
230 |
231 |
232 | def contenttypes():
233 | types = etree.fromstring(
234 | '')
236 | parts = {
237 | '/word/theme/theme1.xml': 'application/vnd.openxmlformats-officedocu'
238 | 'ment.theme+xml',
239 | '/word/fontTable.xml': 'application/vnd.openxmlformats-officedocu'
240 | 'ment.wordprocessingml.fontTable+xml',
241 | '/docProps/core.xml': 'application/vnd.openxmlformats-package.co'
242 | 're-properties+xml',
243 | '/docProps/app.xml': 'application/vnd.openxmlformats-officedocu'
244 | 'ment.extended-properties+xml',
245 | '/word/document.xml': 'application/vnd.openxmlformats-officedocu'
246 | 'ment.wordprocessingml.document.main+xml',
247 | '/word/settings.xml': 'application/vnd.openxmlformats-officedocu'
248 | 'ment.wordprocessingml.settings+xml',
249 | '/word/numbering.xml': 'application/vnd.openxmlformats-officedocu'
250 | 'ment.wordprocessingml.numbering+xml',
251 | '/word/styles.xml': 'application/vnd.openxmlformats-officedocu'
252 | 'ment.wordprocessingml.styles+xml',
253 | '/word/webSettings.xml': 'application/vnd.openxmlformats-officedocu'
254 | 'ment.wordprocessingml.webSettings+xml'}
255 | for part in parts:
256 | types.append(makeelement('Override', nsprefix=None,
257 | attributes={'PartName': part,
258 | 'ContentType': parts[part]}))
259 | # Add support for filetypes
260 | filetypes = {
261 | 'gif': 'image/gif',
262 | 'jpeg': 'image/jpeg',
263 | 'jpg': 'image/jpeg',
264 | 'png': 'image/png',
265 | 'rels': 'application/vnd.openxmlformats-package.relationships+xml',
266 | 'xml': 'application/xml'
267 | }
268 | for extension in filetypes:
269 | attrs = {
270 | 'Extension': extension,
271 | 'ContentType': filetypes[extension]
272 | }
273 | default_elm = makeelement('Default', nsprefix=None, attributes=attrs)
274 | types.append(default_elm)
275 | return types
276 |
277 |
278 | def heading(headingtext, headinglevel, lang='en'):
279 | '''Make a new heading, return the heading element'''
280 | lmap = {'en': 'Heading', 'it': 'Titolo'}
281 | # Make our elements
282 | paragraph = makeelement('p')
283 | pr = makeelement('pPr')
284 | pStyle = makeelement(
285 | 'pStyle', attributes={'val': lmap[lang]+str(headinglevel)})
286 | run = makeelement('r')
287 | text = makeelement('t', tagtext=headingtext)
288 | # Add the text the run, and the run to the paragraph
289 | pr.append(pStyle)
290 | run.append(text)
291 | paragraph.append(pr)
292 | paragraph.append(run)
293 | # Return the combined paragraph
294 | return paragraph
295 |
296 |
297 | def table(contents, heading=True, colw=None, cwunit='dxa', tblw=0,
298 | twunit='auto', borders={}, celstyle=None):
299 | """
300 | Return a table element based on specified parameters
301 |
302 | @param list contents: A list of lists describing contents. Every item in
303 | the list can be a string or a valid XML element
304 | itself. It can also be a list. In that case all the
305 | listed elements will be merged into the cell.
306 | @param bool heading: Tells whether first line should be treated as
307 | heading or not
308 | @param list colw: list of integer column widths specified in wunitS.
309 | @param str cwunit: Unit used for column width:
310 | 'pct' : fiftieths of a percent
311 | 'dxa' : twentieths of a point
312 | 'nil' : no width
313 | 'auto' : automagically determined
314 | @param int tblw: Table width
315 | @param str twunit: Unit used for table width. Same possible values as
316 | cwunit.
317 | @param dict borders: Dictionary defining table border. Supported keys
318 | are: 'top', 'left', 'bottom', 'right',
319 | 'insideH', 'insideV', 'all'.
320 | When specified, the 'all' key has precedence over
321 | others. Each key must define a dict of border
322 | attributes:
323 | color : The color of the border, in hex or
324 | 'auto'
325 | space : The space, measured in points
326 | sz : The size of the border, in eighths of
327 | a point
328 | val : The style of the border, see
329 | http://www.schemacentral.com/sc/ooxml/t-w_ST_Border.htm
330 | @param list celstyle: Specify the style for each colum, list of dicts.
331 | supported keys:
332 | 'align' : specify the alignment, see paragraph
333 | documentation.
334 | @return lxml.etree: Generated XML etree element
335 | """
336 | table = makeelement('tbl')
337 | columns = len(contents[0])
338 | # Table properties
339 | tableprops = makeelement('tblPr')
340 | tablestyle = makeelement('tblStyle', attributes={'val': ''})
341 | tableprops.append(tablestyle)
342 | tablewidth = makeelement(
343 | 'tblW', attributes={'w': str(tblw), 'type': str(twunit)})
344 | tableprops.append(tablewidth)
345 | if len(borders.keys()):
346 | tableborders = makeelement('tblBorders')
347 | for b in ['top', 'left', 'bottom', 'right', 'insideH', 'insideV']:
348 | if b in borders.keys() or 'all' in borders.keys():
349 | k = 'all' if 'all' in borders.keys() else b
350 | attrs = {}
351 | for a in borders[k].keys():
352 | attrs[a] = unicode(borders[k][a])
353 | borderelem = makeelement(b, attributes=attrs)
354 | tableborders.append(borderelem)
355 | tableprops.append(tableborders)
356 | tablelook = makeelement('tblLook', attributes={'val': '0400'})
357 | tableprops.append(tablelook)
358 | table.append(tableprops)
359 | # Table Grid
360 | tablegrid = makeelement('tblGrid')
361 | for i in range(columns):
362 | attrs = {'w': str(colw[i]) if colw else '2390'}
363 | tablegrid.append(makeelement('gridCol', attributes=attrs))
364 | table.append(tablegrid)
365 | # Heading Row
366 | row = makeelement('tr')
367 | rowprops = makeelement('trPr')
368 | cnfStyle = makeelement('cnfStyle', attributes={'val': '000000100000'})
369 | rowprops.append(cnfStyle)
370 | row.append(rowprops)
371 | if heading:
372 | i = 0
373 | for heading in contents[0]:
374 | cell = makeelement('tc')
375 | # Cell properties
376 | cellprops = makeelement('tcPr')
377 | if colw:
378 | wattr = {'w': str(colw[i]), 'type': cwunit}
379 | else:
380 | wattr = {'w': '0', 'type': 'auto'}
381 | cellwidth = makeelement('tcW', attributes=wattr)
382 | cellstyle = makeelement('shd', attributes={'val': 'clear',
383 | 'color': 'auto',
384 | 'fill': 'FFFFFF',
385 | 'themeFill': 'text2',
386 | 'themeFillTint': '99'})
387 | cellprops.append(cellwidth)
388 | cellprops.append(cellstyle)
389 | cell.append(cellprops)
390 | # Paragraph (Content)
391 | if not isinstance(heading, (list, tuple)):
392 | heading = [heading]
393 | for h in heading:
394 | if isinstance(h, etree._Element):
395 | cell.append(h)
396 | else:
397 | cell.append(paragraph(h, jc='center'))
398 | row.append(cell)
399 | i += 1
400 | table.append(row)
401 | # Contents Rows
402 | for contentrow in contents[1 if heading else 0:]:
403 | row = makeelement('tr')
404 | i = 0
405 | for content in contentrow:
406 | cell = makeelement('tc')
407 | # Properties
408 | cellprops = makeelement('tcPr')
409 | if colw:
410 | wattr = {'w': str(colw[i]), 'type': cwunit}
411 | else:
412 | wattr = {'w': '0', 'type': 'auto'}
413 | cellwidth = makeelement('tcW', attributes=wattr)
414 | cellprops.append(cellwidth)
415 | cell.append(cellprops)
416 | # Paragraph (Content)
417 | if not isinstance(content, (list, tuple)):
418 | content = [content]
419 | for c in content:
420 | if isinstance(c, etree._Element):
421 | cell.append(c)
422 | else:
423 | if celstyle and 'align' in celstyle[i].keys():
424 | align = celstyle[i]['align']
425 | else:
426 | align = 'left'
427 | cell.append(paragraph(c, jc=align))
428 | row.append(cell)
429 | i += 1
430 | table.append(row)
431 | return table
432 |
433 |
434 | def picture(
435 | relationshiplist, picname, picdescription, pixelwidth=None,
436 | pixelheight=None, nochangeaspect=True, nochangearrowheads=True,
437 | imagefiledict=None):
438 | """
439 | Take a relationshiplist, picture file name, and return a paragraph
440 | containing the image and an updated relationshiplist
441 | """
442 | if imagefiledict is None:
443 | warn(
444 | 'Using picture() without imagefiledict parameter will be depreca'
445 | 'ted in the future.', PendingDeprecationWarning
446 | )
447 |
448 | # http://openxmldeveloper.org/articles/462.aspx
449 | # Create an image. Size may be specified, otherwise it will based on the
450 | # pixel size of image. Return a paragraph containing the picture
451 |
452 | # Set relationship ID to that of the image or the first available one
453 | picid = '2'
454 | picpath = abspath(picname)
455 |
456 | if imagefiledict is not None:
457 | # Keep track of the image files in a separate dictionary so they don't
458 | # need to be copied into the template directory
459 | if picpath not in imagefiledict:
460 | picrelid = 'rId' + str(len(relationshiplist) + 1)
461 | imagefiledict[picpath] = picrelid
462 |
463 | relationshiplist.append([
464 | 'http://schemas.openxmlformats.org/officeDocument/2006/relat'
465 | 'ionships/image',
466 | 'media/%s_%s' % (picrelid, basename(picpath))
467 | ])
468 | else:
469 | picrelid = imagefiledict[picpath]
470 | else:
471 | # Copy files into template directory for backwards compatibility
472 | # Images still accumulate in the template directory this way
473 | picrelid = 'rId' + str(len(relationshiplist) + 1)
474 |
475 | relationshiplist.append([
476 | 'http://schemas.openxmlformats.org/officeDocument/2006/relations'
477 | 'hips/image', 'media/' + picname
478 | ])
479 |
480 | media_dir = join(template_dir, 'word', 'media')
481 | if not os.path.isdir(media_dir):
482 | os.mkdir(media_dir)
483 | shutil.copyfile(picname, join(media_dir, picname))
484 |
485 | image = Image.open(picpath)
486 |
487 | # Extract EXIF data, if available
488 | try:
489 | exif = image._getexif()
490 | exif = {} if exif is None else exif
491 | except:
492 | exif = {}
493 |
494 | imageExif = {}
495 | for tag, value in exif.items():
496 | imageExif[TAGS.get(tag, tag)] = value
497 |
498 | imageOrientation = imageExif.get('Orientation', 1)
499 | imageAngle = {
500 | 1: 0, 2: 0, 3: 180, 4: 0, 5: 90, 6: 90, 7: 270, 8: 270
501 | }[imageOrientation]
502 | imageFlipH = 'true' if imageOrientation in (2, 5, 7) else 'false'
503 | imageFlipV = 'true' if imageOrientation == 4 else 'false'
504 |
505 | # Check if the user has specified a size
506 | if not pixelwidth or not pixelheight:
507 | # If not, get info from the picture itself
508 | pixelwidth, pixelheight = image.size[0:2]
509 |
510 | # Swap width and height if necessary
511 | if imageOrientation in (5, 6, 7, 8):
512 | pixelwidth, pixelheight = pixelheight, pixelwidth
513 |
514 | # OpenXML measures on-screen objects in English Metric Units
515 | # 1cm = 36000 EMUs
516 | emuperpixel = 12700
517 | width = str(pixelwidth * emuperpixel)
518 | height = str(pixelheight * emuperpixel)
519 |
520 | # There are 3 main elements inside a picture
521 | # 1. The Blipfill - specifies how the image fills the picture area
522 | # (stretch, tile, etc.)
523 | blipfill = makeelement('blipFill', nsprefix='pic')
524 | blipfill.append(makeelement('blip', nsprefix='a', attrnsprefix='r',
525 | attributes={'embed': picrelid}))
526 | stretch = makeelement('stretch', nsprefix='a')
527 | stretch.append(makeelement('fillRect', nsprefix='a'))
528 | blipfill.append(makeelement('srcRect', nsprefix='a'))
529 | blipfill.append(stretch)
530 |
531 | # 2. The non visual picture properties
532 | nvpicpr = makeelement('nvPicPr', nsprefix='pic')
533 | cnvpr = makeelement(
534 | 'cNvPr', nsprefix='pic',
535 | attributes={'id': '0', 'name': 'Picture 1', 'descr': picdescription}
536 | )
537 | nvpicpr.append(cnvpr)
538 | cnvpicpr = makeelement('cNvPicPr', nsprefix='pic')
539 | cnvpicpr.append(makeelement(
540 | 'picLocks', nsprefix='a',
541 | attributes={'noChangeAspect': str(int(nochangeaspect)),
542 | 'noChangeArrowheads': str(int(nochangearrowheads))}))
543 | nvpicpr.append(cnvpicpr)
544 |
545 | # 3. The Shape properties
546 | sppr = makeelement('spPr', nsprefix='pic', attributes={'bwMode': 'auto'})
547 | xfrm = makeelement(
548 | 'xfrm', nsprefix='a', attributes={
549 | 'rot': str(imageAngle * 60000), 'flipH': imageFlipH,
550 | 'flipV': imageFlipV
551 | }
552 | )
553 | xfrm.append(
554 | makeelement('off', nsprefix='a', attributes={'x': '0', 'y': '0'})
555 | )
556 | xfrm.append(
557 | makeelement(
558 | 'ext', nsprefix='a', attributes={'cx': width, 'cy': height}
559 | )
560 | )
561 | prstgeom = makeelement(
562 | 'prstGeom', nsprefix='a', attributes={'prst': 'rect'}
563 | )
564 | prstgeom.append(makeelement('avLst', nsprefix='a'))
565 | sppr.append(xfrm)
566 | sppr.append(prstgeom)
567 |
568 | # Add our 3 parts to the picture element
569 | pic = makeelement('pic', nsprefix='pic')
570 | pic.append(nvpicpr)
571 | pic.append(blipfill)
572 | pic.append(sppr)
573 |
574 | # Now make the supporting elements
575 | # The following sequence is just: make element, then add its children
576 | graphicdata = makeelement(
577 | 'graphicData', nsprefix='a',
578 | attributes={'uri': ('http://schemas.openxmlformats.org/drawingml/200'
579 | '6/picture')})
580 | graphicdata.append(pic)
581 | graphic = makeelement('graphic', nsprefix='a')
582 | graphic.append(graphicdata)
583 |
584 | framelocks = makeelement('graphicFrameLocks', nsprefix='a',
585 | attributes={'noChangeAspect': '1'})
586 | framepr = makeelement('cNvGraphicFramePr', nsprefix='wp')
587 | framepr.append(framelocks)
588 | docpr = makeelement('docPr', nsprefix='wp',
589 | attributes={'id': picid, 'name': 'Picture 1',
590 | 'descr': picdescription})
591 | effectextent = makeelement('effectExtent', nsprefix='wp',
592 | attributes={'l': '25400', 't': '0', 'r': '0',
593 | 'b': '0'})
594 | extent = makeelement('extent', nsprefix='wp',
595 | attributes={'cx': width, 'cy': height})
596 | inline = makeelement('inline', attributes={'distT': "0", 'distB': "0",
597 | 'distL': "0", 'distR': "0"},
598 | nsprefix='wp')
599 | inline.append(extent)
600 | inline.append(effectextent)
601 | inline.append(docpr)
602 | inline.append(framepr)
603 | inline.append(graphic)
604 | drawing = makeelement('drawing')
605 | drawing.append(inline)
606 | run = makeelement('r')
607 | run.append(drawing)
608 | paragraph = makeelement('p')
609 | paragraph.append(run)
610 |
611 | if imagefiledict is not None:
612 | return relationshiplist, paragraph, imagefiledict
613 | else:
614 | return relationshiplist, paragraph
615 |
616 |
617 | def search(document, search):
618 | '''Search a document for a regex, return success / fail result'''
619 | result = False
620 | searchre = re.compile(search)
621 | for element in document.iter():
622 | if element.tag == '{%s}t' % nsprefixes['w']: # t (text) elements
623 | if element.text:
624 | if searchre.search(element.text):
625 | result = True
626 | return result
627 |
628 |
629 | def replace(document, search, replace):
630 | """
631 | Replace all occurences of string with a different string, return updated
632 | document
633 | """
634 | newdocument = document
635 | searchre = re.compile(search)
636 | for element in newdocument.iter():
637 | if element.tag == '{%s}t' % nsprefixes['w']: # t (text) elements
638 | if element.text:
639 | if searchre.search(element.text):
640 | element.text = re.sub(search, replace, element.text)
641 | return newdocument
642 |
643 |
644 | def clean(document):
645 | """ Perform misc cleaning operations on documents.
646 | Returns cleaned document.
647 | """
648 |
649 | newdocument = document
650 |
651 | # Clean empty text and r tags
652 | for t in ('t', 'r'):
653 | rmlist = []
654 | for element in newdocument.iter():
655 | if element.tag == '{%s}%s' % (nsprefixes['w'], t):
656 | if not element.text and not len(element):
657 | rmlist.append(element)
658 | for element in rmlist:
659 | element.getparent().remove(element)
660 |
661 | return newdocument
662 |
663 |
664 | def findTypeParent(element, tag):
665 | """ Finds fist parent of element of the given type
666 |
667 | @param object element: etree element
668 | @param string the tag parent to search for
669 |
670 | @return object element: the found parent or None when not found
671 | """
672 |
673 | p = element
674 | while True:
675 | p = p.getparent()
676 | if p.tag == tag:
677 | return p
678 |
679 | # Not found
680 | return None
681 |
682 |
683 | def AdvSearch(document, search, bs=3):
684 | '''Return set of all regex matches
685 |
686 | This is an advanced version of python-docx.search() that takes into
687 | account blocks of elements at a time.
688 |
689 | What it does:
690 | It searches the entire document body for text blocks.
691 | Since the text to search could be spawned across multiple text blocks,
692 | we need to adopt some sort of algorithm to handle this situation.
693 | The smaller matching group of blocks (up to bs) is then adopted.
694 | If the matching group has more than one block, blocks other than first
695 | are cleared and all the replacement text is put on first block.
696 |
697 | Examples:
698 | original text blocks : [ 'Hel', 'lo,', ' world!' ]
699 | search : 'Hello,'
700 | output blocks : [ 'Hello,' ]
701 |
702 | original text blocks : [ 'Hel', 'lo', ' __', 'name', '__!' ]
703 | search : '(__[a-z]+__)'
704 | output blocks : [ '__name__' ]
705 |
706 | @param instance document: The original document
707 | @param str search: The text to search for (regexp)
708 | append, or a list of etree elements
709 | @param int bs: See above
710 |
711 | @return set All occurences of search string
712 |
713 | '''
714 |
715 | # Compile the search regexp
716 | searchre = re.compile(search)
717 |
718 | matches = []
719 |
720 | # Will match against searchels. Searchels is a list that contains last
721 | # n text elements found in the document. 1 < n < bs
722 | searchels = []
723 |
724 | for element in document.iter():
725 | if element.tag == '{%s}t' % nsprefixes['w']: # t (text) elements
726 | if element.text:
727 | # Add this element to searchels
728 | searchels.append(element)
729 | if len(searchels) > bs:
730 | # Is searchels is too long, remove first elements
731 | searchels.pop(0)
732 |
733 | # Search all combinations, of searchels, starting from
734 | # smaller up to bigger ones
735 | # l = search lenght
736 | # s = search start
737 | # e = element IDs to merge
738 | found = False
739 | for l in range(1, len(searchels)+1):
740 | if found:
741 | break
742 | for s in range(len(searchels)):
743 | if found:
744 | break
745 | if s+l <= len(searchels):
746 | e = range(s, s+l)
747 | txtsearch = ''
748 | for k in e:
749 | txtsearch += searchels[k].text
750 |
751 | # Searcs for the text in the whole txtsearch
752 | match = searchre.search(txtsearch)
753 | if match:
754 | matches.append(match.group())
755 | found = True
756 | return set(matches)
757 |
758 |
759 | def advReplace(document, search, replace, bs=3):
760 | """
761 | Replace all occurences of string with a different string, return updated
762 | document
763 |
764 | This is a modified version of python-docx.replace() that takes into
765 | account blocks of elements at a time. The replace element can also
766 | be a string or an xml etree element.
767 |
768 | What it does:
769 | It searches the entire document body for text blocks.
770 | Then scan thos text blocks for replace.
771 | Since the text to search could be spawned across multiple text blocks,
772 | we need to adopt some sort of algorithm to handle this situation.
773 | The smaller matching group of blocks (up to bs) is then adopted.
774 | If the matching group has more than one block, blocks other than first
775 | are cleared and all the replacement text is put on first block.
776 |
777 | Examples:
778 | original text blocks : [ 'Hel', 'lo,', ' world!' ]
779 | search / replace: 'Hello,' / 'Hi!'
780 | output blocks : [ 'Hi!', '', ' world!' ]
781 |
782 | original text blocks : [ 'Hel', 'lo,', ' world!' ]
783 | search / replace: 'Hello, world' / 'Hi!'
784 | output blocks : [ 'Hi!!', '', '' ]
785 |
786 | original text blocks : [ 'Hel', 'lo,', ' world!' ]
787 | search / replace: 'Hel' / 'Hal'
788 | output blocks : [ 'Hal', 'lo,', ' world!' ]
789 |
790 | @param instance document: The original document
791 | @param str search: The text to search for (regexp)
792 | @param mixed replace: The replacement text or lxml.etree element to
793 | append, or a list of etree elements
794 | @param int bs: See above
795 |
796 | @return instance The document with replacement applied
797 |
798 | """
799 | # Enables debug output
800 | DEBUG = False
801 |
802 | newdocument = document
803 |
804 | # Compile the search regexp
805 | searchre = re.compile(search)
806 |
807 | # Will match against searchels. Searchels is a list that contains last
808 | # n text elements found in the document. 1 < n < bs
809 | searchels = []
810 |
811 | for element in newdocument.iter():
812 | if element.tag == '{%s}t' % nsprefixes['w']: # t (text) elements
813 | if element.text:
814 | # Add this element to searchels
815 | searchels.append(element)
816 | if len(searchels) > bs:
817 | # Is searchels is too long, remove first elements
818 | searchels.pop(0)
819 |
820 | # Search all combinations, of searchels, starting from
821 | # smaller up to bigger ones
822 | # l = search lenght
823 | # s = search start
824 | # e = element IDs to merge
825 | found = False
826 | for l in range(1, len(searchels)+1):
827 | if found:
828 | break
829 | #print "slen:", l
830 | for s in range(len(searchels)):
831 | if found:
832 | break
833 | if s+l <= len(searchels):
834 | e = range(s, s+l)
835 | #print "elems:", e
836 | txtsearch = ''
837 | for k in e:
838 | txtsearch += searchels[k].text
839 |
840 | # Searcs for the text in the whole txtsearch
841 | match = searchre.search(txtsearch)
842 | if match:
843 | found = True
844 |
845 | # I've found something :)
846 | if DEBUG:
847 | log.debug("Found element!")
848 | log.debug("Search regexp: %s",
849 | searchre.pattern)
850 | log.debug("Requested replacement: %s",
851 | replace)
852 | log.debug("Matched text: %s", txtsearch)
853 | log.debug("Matched text (splitted): %s",
854 | map(lambda i: i.text, searchels))
855 | log.debug("Matched at position: %s",
856 | match.start())
857 | log.debug("matched in elements: %s", e)
858 | if isinstance(replace, etree._Element):
859 | log.debug("Will replace with XML CODE")
860 | elif isinstance(replace(list, tuple)):
861 | log.debug("Will replace with LIST OF"
862 | " ELEMENTS")
863 | else:
864 | log.debug("Will replace with:",
865 | re.sub(search, replace,
866 | txtsearch))
867 |
868 | curlen = 0
869 | replaced = False
870 | for i in e:
871 | curlen += len(searchels[i].text)
872 | if curlen > match.start() and not replaced:
873 | # The match occurred in THIS element.
874 | # Puth in the whole replaced text
875 | if isinstance(replace, etree._Element):
876 | # Convert to a list and process
877 | # it later
878 | replace = [replace]
879 | if isinstance(replace, (list, tuple)):
880 | # I'm replacing with a list of
881 | # etree elements
882 | # clear the text in the tag and
883 | # append the element after the
884 | # parent paragraph
885 | # (because t elements cannot have
886 | # childs)
887 | p = findTypeParent(
888 | searchels[i],
889 | '{%s}p' % nsprefixes['w'])
890 | searchels[i].text = re.sub(
891 | search, '', txtsearch)
892 | insindex = p.getparent().index(p)+1
893 | for r in replace:
894 | p.getparent().insert(
895 | insindex, r)
896 | insindex += 1
897 | else:
898 | # Replacing with pure text
899 | searchels[i].text = re.sub(
900 | search, replace, txtsearch)
901 | replaced = True
902 | log.debug(
903 | "Replacing in element #: %s", i)
904 | else:
905 | # Clears the other text elements
906 | searchels[i].text = ''
907 | return newdocument
908 |
909 |
910 | def getdocumenttext(document):
911 | '''Return the raw text of a document, as a list of paragraphs.'''
912 | paratextlist = []
913 | # Compile a list of all paragraph (p) elements
914 | paralist = []
915 | for element in document.iter():
916 | # Find p (paragraph) elements
917 | if element.tag == '{'+nsprefixes['w']+'}p':
918 | paralist.append(element)
919 | # Since a single sentence might be spread over multiple text elements,
920 | # iterate through each paragraph, appending all text (t) children to that
921 | # paragraphs text.
922 | for para in paralist:
923 | paratext = u''
924 | # Loop through each paragraph
925 | for element in para.iter():
926 | # Find t (text) elements
927 | if element.tag == '{'+nsprefixes['w']+'}t':
928 | if element.text:
929 | paratext = paratext+element.text
930 | elif element.tag == '{'+nsprefixes['w']+'}tab':
931 | paratext = paratext + '\t'
932 | # Add our completed paragraph text to the list of paragraph text
933 | if not len(paratext) == 0:
934 | paratextlist.append(paratext)
935 | return paratextlist
936 |
937 |
938 | def coreproperties(title, subject, creator, keywords, lastmodifiedby=None):
939 | """
940 | Create core properties (common document properties referred to in the
941 | 'Dublin Core' specification). See appproperties() for other stuff.
942 | """
943 | coreprops = makeelement('coreProperties', nsprefix='cp')
944 | coreprops.append(makeelement('title', tagtext=title, nsprefix='dc'))
945 | coreprops.append(makeelement('subject', tagtext=subject, nsprefix='dc'))
946 | coreprops.append(makeelement('creator', tagtext=creator, nsprefix='dc'))
947 | coreprops.append(makeelement('keywords', tagtext=','.join(keywords),
948 | nsprefix='cp'))
949 | if not lastmodifiedby:
950 | lastmodifiedby = creator
951 | coreprops.append(makeelement('lastModifiedBy', tagtext=lastmodifiedby,
952 | nsprefix='cp'))
953 | coreprops.append(makeelement('revision', tagtext='1', nsprefix='cp'))
954 | coreprops.append(
955 | makeelement('category', tagtext='Examples', nsprefix='cp'))
956 | coreprops.append(
957 | makeelement('description', tagtext='Examples', nsprefix='dc'))
958 | currenttime = time.strftime('%Y-%m-%dT%H:%M:%SZ')
959 | # Document creation and modify times
960 | # Prob here: we have an attribute who name uses one namespace, and that
961 | # attribute's value uses another namespace.
962 | # We're creating the element from a string as a workaround...
963 | for doctime in ['created', 'modified']:
964 | elm_str = (
965 | '%s'
968 | ) % (doctime, currenttime, doctime)
969 | coreprops.append(etree.fromstring(elm_str))
970 | return coreprops
971 |
972 |
973 | def appproperties():
974 | """
975 | Create app-specific properties. See docproperties() for more common
976 | document properties.
977 |
978 | """
979 | appprops = makeelement('Properties', nsprefix='ep')
980 | appprops = etree.fromstring(
981 | '')
985 | props =\
986 | {'Template': 'Normal.dotm',
987 | 'TotalTime': '6',
988 | 'Pages': '1',
989 | 'Words': '83',
990 | 'Characters': '475',
991 | 'Application': 'Microsoft Word 12.0.0',
992 | 'DocSecurity': '0',
993 | 'Lines': '12',
994 | 'Paragraphs': '8',
995 | 'ScaleCrop': 'false',
996 | 'LinksUpToDate': 'false',
997 | 'CharactersWithSpaces': '583',
998 | 'SharedDoc': 'false',
999 | 'HyperlinksChanged': 'false',
1000 | 'AppVersion': '12.0000'}
1001 | for prop in props:
1002 | appprops.append(makeelement(prop, tagtext=props[prop], nsprefix=None))
1003 | return appprops
1004 |
1005 |
1006 | def websettings():
1007 | '''Generate websettings'''
1008 | web = makeelement('webSettings')
1009 | web.append(makeelement('allowPNG'))
1010 | web.append(makeelement('doNotSaveAsSingleFile'))
1011 | return web
1012 |
1013 |
1014 | def relationshiplist():
1015 | relationshiplist =\
1016 | [['http://schemas.openxmlformats.org/officeDocument/2006/'
1017 | 'relationships/numbering', 'numbering.xml'],
1018 | ['http://schemas.openxmlformats.org/officeDocument/2006/'
1019 | 'relationships/styles', 'styles.xml'],
1020 | ['http://schemas.openxmlformats.org/officeDocument/2006/'
1021 | 'relationships/settings', 'settings.xml'],
1022 | ['http://schemas.openxmlformats.org/officeDocument/2006/'
1023 | 'relationships/webSettings', 'webSettings.xml'],
1024 | ['http://schemas.openxmlformats.org/officeDocument/2006/'
1025 | 'relationships/fontTable', 'fontTable.xml'],
1026 | ['http://schemas.openxmlformats.org/officeDocument/2006/'
1027 | 'relationships/theme', 'theme/theme1.xml']]
1028 | return relationshiplist
1029 |
1030 |
1031 | def wordrelationships(relationshiplist):
1032 | '''Generate a Word relationships file'''
1033 | # Default list of relationships
1034 | # FIXME: using string hack instead of making element
1035 | #relationships = makeelement('Relationships', nsprefix='pr')
1036 | relationships = etree.fromstring(
1037 | '')
1039 | count = 0
1040 | for relationship in relationshiplist:
1041 | # Relationship IDs (rId) start at 1.
1042 | rel_elm = makeelement('Relationship', nsprefix=None,
1043 | attributes={'Id': 'rId'+str(count+1),
1044 | 'Type': relationship[0],
1045 | 'Target': relationship[1]}
1046 | )
1047 | relationships.append(rel_elm)
1048 | count += 1
1049 | return relationships
1050 |
1051 |
1052 | def savedocx(
1053 | document, coreprops, appprops, contenttypes, websettings,
1054 | wordrelationships, output, imagefiledict=None):
1055 | """
1056 | Save a modified document
1057 | """
1058 | if imagefiledict is None:
1059 | warn(
1060 | 'Using savedocx() without imagefiledict parameter will be deprec'
1061 | 'ated in the future.', PendingDeprecationWarning
1062 | )
1063 |
1064 | assert os.path.isdir(template_dir)
1065 | docxfile = zipfile.ZipFile(
1066 | output, mode='w', compression=zipfile.ZIP_DEFLATED)
1067 |
1068 | # Move to the template data path
1069 | prev_dir = os.path.abspath('.') # save previous working dir
1070 | os.chdir(template_dir)
1071 |
1072 | # Serialize our trees into out zip file
1073 | treesandfiles = {
1074 | document: 'word/document.xml',
1075 | coreprops: 'docProps/core.xml',
1076 | appprops: 'docProps/app.xml',
1077 | contenttypes: '[Content_Types].xml',
1078 | websettings: 'word/webSettings.xml',
1079 | wordrelationships: 'word/_rels/document.xml.rels'
1080 | }
1081 | for tree in treesandfiles:
1082 | log.info('Saving: %s' % treesandfiles[tree])
1083 | treestring = etree.tostring(tree, pretty_print=True)
1084 | docxfile.writestr(treesandfiles[tree], treestring)
1085 |
1086 | # Add & compress images, if applicable
1087 | if imagefiledict is not None:
1088 | for imagepath, picrelid in imagefiledict.items():
1089 | archivename = 'word/media/%s_%s' % (picrelid, basename(imagepath))
1090 | log.info('Saving: %s', archivename)
1091 | docxfile.write(imagepath, archivename)
1092 |
1093 | # Add & compress support files
1094 | files_to_ignore = ['.DS_Store'] # nuisance from some os's
1095 | for dirpath, dirnames, filenames in os.walk('.'):
1096 | for filename in filenames:
1097 | if filename in files_to_ignore:
1098 | continue
1099 | templatefile = join(dirpath, filename)
1100 | archivename = templatefile[2:]
1101 | log.info('Saving: %s', archivename)
1102 | docxfile.write(templatefile, archivename)
1103 |
1104 | log.info('Saved new file to: %r', output)
1105 | docxfile.close()
1106 | os.chdir(prev_dir) # restore previous working dir
1107 | return
1108 |
--------------------------------------------------------------------------------
/example-extracttext.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python
2 | """
3 | This file opens a docx (Office 2007) file and dumps the text.
4 |
5 | If you need to extract text from documents, use this file as a basis for your
6 | work.
7 |
8 | Part of Python's docx module - http://github.com/mikemaccana/python-docx
9 | See LICENSE for licensing information.
10 | """
11 |
12 | import sys
13 |
14 | from docx import opendocx, getdocumenttext
15 |
16 | if __name__ == '__main__':
17 | try:
18 | document = opendocx(sys.argv[1])
19 | newfile = open(sys.argv[2], 'w')
20 | except:
21 | print(
22 | "Please supply an input and output file. For example:\n"
23 | " example-extracttext.py 'My Office 2007 document.docx' 'outp"
24 | "utfile.txt'"
25 | )
26 | exit()
27 |
28 | # Fetch all the text out of the document we just created
29 | paratextlist = getdocumenttext(document)
30 |
31 | # Make explicit unicode version
32 | newparatextlist = []
33 | for paratext in paratextlist:
34 | newparatextlist.append(paratext.encode("utf-8"))
35 |
36 | # Print out text of document with two newlines under each paragraph
37 | newfile.write('\n\n'.join(newparatextlist))
38 |
--------------------------------------------------------------------------------
/example-makedocument.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python
2 |
3 | """
4 | This file makes a .docx (Word 2007) file from scratch, showing off most of the
5 | features of python-docx.
6 |
7 | If you need to make documents from scratch, you can use this file as a basis
8 | for your work.
9 |
10 | Part of Python's docx module - http://github.com/mikemaccana/python-docx
11 | See LICENSE for licensing information.
12 | """
13 |
14 | from docx import *
15 |
16 | if __name__ == '__main__':
17 | # Default set of relationshipships - the minimum components of a document
18 | relationships = relationshiplist()
19 |
20 | # Make a new document tree - this is the main part of a Word document
21 | document = newdocument()
22 |
23 | # This xpath location is where most interesting content lives
24 | body = document.xpath('/w:document/w:body', namespaces=nsprefixes)[0]
25 |
26 | # Append two headings and a paragraph
27 | body.append(heading("Welcome to Python's docx module", 1))
28 | body.append(heading('Make and edit docx in 200 lines of pure Python', 2))
29 | body.append(paragraph('The module was created when I was looking for a '
30 | 'Python support for MS Word .doc files on PyPI and Stackoverflow. '
31 | 'Unfortunately, the only solutions I could find used:'))
32 |
33 | # Add a numbered list
34 | points = [ 'COM automation'
35 | , '.net or Java'
36 | , 'Automating OpenOffice or MS Office'
37 | ]
38 | for point in points:
39 | body.append(paragraph(point, style='ListNumber'))
40 | body.append(paragraph([('For those of us who prefer something simpler, I '
41 | 'made docx.', 'i')]))
42 | body.append(heading('Making documents', 2))
43 | body.append(paragraph('The docx module has the following features:'))
44 |
45 | # Add some bullets
46 | points = ['Paragraphs', 'Bullets', 'Numbered lists',
47 | 'Multiple levels of headings', 'Tables', 'Document Properties']
48 | for point in points:
49 | body.append(paragraph(point, style='ListBullet'))
50 |
51 | body.append(paragraph('Tables are just lists of lists, like this:'))
52 | # Append a table
53 | tbl_rows = [ ['A1', 'A2', 'A3']
54 | , ['B1', 'B2', 'B3']
55 | , ['C1', 'C2', 'C3']
56 | ]
57 | body.append(table(tbl_rows))
58 |
59 | body.append(heading('Editing documents', 2))
60 | body.append(paragraph('Thanks to the awesomeness of the lxml module, '
61 | 'we can:'))
62 | points = [ 'Search and replace'
63 | , 'Extract plain text of document'
64 | , 'Add and delete items anywhere within the document'
65 | ]
66 | for point in points:
67 | body.append(paragraph(point, style='ListBullet'))
68 |
69 | # Add an image
70 | relationships, picpara = picture(relationships, 'image1.png',
71 | 'This is a test description')
72 | body.append(picpara)
73 |
74 | # Search and replace
75 | print 'Searching for something in a paragraph ...',
76 | if search(body, 'the awesomeness'):
77 | print 'found it!'
78 | else:
79 | print 'nope.'
80 |
81 | print 'Searching for something in a heading ...',
82 | if search(body, '200 lines'):
83 | print 'found it!'
84 | else:
85 | print 'nope.'
86 |
87 | print 'Replacing ...',
88 | body = replace(body, 'the awesomeness', 'the goshdarned awesomeness')
89 | print 'done.'
90 |
91 | # Add a pagebreak
92 | body.append(pagebreak(type='page', orient='portrait'))
93 |
94 | body.append(heading('Ideas? Questions? Want to contribute?', 2))
95 | body.append(paragraph('Email '))
96 |
97 | # Create our properties, contenttypes, and other support files
98 | title = 'Python docx demo'
99 | subject = 'A practical example of making docx from Python'
100 | creator = 'Mike MacCana'
101 | keywords = ['python', 'Office Open XML', 'Word']
102 |
103 | coreprops = coreproperties(title=title, subject=subject, creator=creator,
104 | keywords=keywords)
105 | appprops = appproperties()
106 | contenttypes = contenttypes()
107 | websettings = websettings()
108 | wordrelationships = wordrelationships(relationships)
109 |
110 | # Save our document
111 | savedocx(document, coreprops, appprops, contenttypes, websettings,
112 | wordrelationships, 'Welcome to the Python docx module.docx')
113 |
114 |
--------------------------------------------------------------------------------
/image1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mikemaccana/python-docx/4c9b46dbebe3d2a9b82dbcd35af36584a36fd9fe/image1.png
--------------------------------------------------------------------------------
/screenshot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mikemaccana/python-docx/4c9b46dbebe3d2a9b82dbcd35af36584a36fd9fe/screenshot.png
--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python
2 |
3 | try:
4 | from setuptools import setup
5 | except ImportError:
6 | from distutils.core import setup
7 | from glob import glob
8 |
9 | # Make data go into site-packages (http://tinyurl.com/site-pkg)
10 | from distutils.command.install import INSTALL_SCHEMES
11 | for scheme in INSTALL_SCHEMES.values():
12 | scheme['data'] = scheme['purelib']
13 |
14 | DESCRIPTION = (
15 | 'The docx module creates, reads and writes Microsoft Office Word 2007 do'
16 | 'cx files'
17 | )
18 |
19 | setup(
20 | name='docx',
21 | version='0.2.4',
22 | install_requires=['lxml', 'Pillow>=2.0'],
23 | description=DESCRIPTION,
24 | author='Mike MacCana',
25 | author_email='python-docx@googlegroups.com',
26 | maintainer='Steve Canny',
27 | maintainer_email='python-docx@googlegroups.com',
28 | url='http://github.com/mikemaccana/python-docx',
29 | py_modules=['docx'],
30 | data_files=[
31 | ('docx-template/_rels', glob('template/_rels/.*')),
32 | ('docx-template/docProps', glob('template/docProps/*.*')),
33 | ('docx-template/word', glob('template/word/*.xml')),
34 | ('docx-template/word/theme', glob('template/word/theme/*.*')),
35 | ],
36 | )
37 |
--------------------------------------------------------------------------------
/template/_rels/.rels:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
--------------------------------------------------------------------------------
/template/docProps/thumbnail.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mikemaccana/python-docx/4c9b46dbebe3d2a9b82dbcd35af36584a36fd9fe/template/docProps/thumbnail.jpeg
--------------------------------------------------------------------------------
/template/word/fontTable.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
26 |
27 |
28 |
29 |
30 |
31 |
32 |
33 |
34 |
35 |
36 |
37 |
38 |
39 |
40 |
41 |
42 |
43 |
44 |
45 |
46 |
47 |
48 |
49 |
50 |
51 |
52 |
53 |
--------------------------------------------------------------------------------
/template/word/numbering.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
26 |
27 |
28 |
29 |
30 |
31 |
32 |
33 |
34 |
35 |
36 |
37 |
38 |
39 |
40 |
41 |
42 |
43 |
44 |
45 |
46 |
47 |
48 |
49 |
50 |
51 |
52 |
53 |
54 |
55 |
56 |
57 |
58 |
59 |
60 |
61 |
62 |
63 |
64 |
65 |
66 |
67 |
68 |
69 |
70 |
71 |
72 |
73 |
74 |
75 |
76 |
77 |
78 |
79 |
80 |
81 |
82 |
83 |
84 |
85 |
86 |
87 |
88 |
89 |
90 |
91 |
92 |
93 |
94 |
95 |
96 |
97 |
98 |
99 |
100 |
101 |
102 |
103 |
104 |
105 |
106 |
107 |
108 |
109 |
110 |
111 |
112 |
113 |
114 |
115 |
116 |
117 |
118 |
119 |
120 |
121 |
122 |
123 |
124 |
125 |
126 |
127 |
128 |
129 |
130 |
131 |
132 |
133 |
134 |
135 |
136 |
137 |
138 |
139 |
140 |
141 |
142 |
143 |
144 |
145 |
146 |
147 |
148 |
149 |
150 |
151 |
152 |
153 |
154 |
155 |
156 |
157 |
158 |
159 |
160 |
161 |
162 |
163 |
164 |
165 |
166 |
167 |
168 |
169 |
170 |
171 |
172 |
173 |
174 |
175 |
176 |
177 |
178 |
179 |
180 |
181 |
182 |
183 |
184 |
185 |
186 |
187 |
188 |
189 |
190 |
191 |
192 |
193 |
194 |
195 |
196 |
197 |
198 |
199 |
200 |
201 |
202 |
203 |
204 |
205 |
206 |
207 |
208 |
209 |
210 |
211 |
212 |
213 |
214 |
215 |
216 |
217 |
218 |
219 |
220 |
221 |
222 |
223 |
224 |
225 |
226 |
227 |
228 |
229 |
230 |
231 |
232 |
233 |
234 |
235 |
236 |
237 |
238 |
239 |
240 |
241 |
242 |
243 |
244 |
245 |
246 |
247 |
248 |
249 |
250 |
251 |
252 |
253 |
254 |
255 |
256 |
257 |
258 |
259 |
260 |
261 |
262 |
263 |
264 |
265 |
266 |
267 |
268 |
269 |
270 |
271 |
272 |
273 |
274 |
275 |
276 |
277 |
278 |
279 |
280 |
281 |
282 |
283 |
284 |
285 |
286 |
287 |
288 |
289 |
290 |
291 |
292 |
293 |
294 |
295 |
296 |
297 |
298 |
299 |
300 |
301 |
302 |
303 |
304 |
305 |
306 |
307 |
308 |
309 |
310 |
311 |
312 |
313 |
314 |
315 |
316 |
317 |
318 |
319 |
320 |
321 |
322 |
323 |
324 |
325 |
326 |
327 |
328 |
329 |
330 |
331 |
332 |
333 |
334 |
335 |
336 |
337 |
338 |
339 |
340 |
341 |
342 |
343 |
344 |
345 |
346 |
347 |
348 |
349 |
350 |
351 |
352 |
353 |
354 |
355 |
356 |
357 |
358 |
359 |
360 |
361 |
362 |
363 |
364 |
365 |
366 |
367 |
368 |
369 |
370 |
371 |
372 |
373 |
374 |
375 |
376 |
377 |
378 |
379 |
380 |
381 |
382 |
383 |
384 |
385 |
386 |
387 |
388 |
389 |
390 |
391 |
392 |
393 |
394 |
395 |
396 |
397 |
398 |
399 |
400 |
401 |
402 |
403 |
404 |
405 |
406 |
407 |
408 |
409 |
410 |
411 |
412 |
413 |
414 |
415 |
416 |
417 |
418 |
419 |
420 |
421 |
422 |
423 |
424 |
425 |
426 |
427 |
428 |
429 |
430 |
431 |
432 |
433 |
434 |
435 |
436 |
437 |
438 |
439 |
440 |
441 |
442 |
443 |
444 |
445 |
446 |
447 |
448 |
449 |
450 |
451 |
452 |
453 |
454 |
455 |
456 |
457 |
458 |
459 |
460 |
461 |
462 |
463 |
464 |
465 |
466 |
467 |
468 |
469 |
470 |
471 |
472 |
473 |
474 |
475 |
476 |
477 |
478 |
479 |
480 |
481 |
482 |
483 |
484 |
485 |
486 |
487 |
488 |
489 |
490 |
491 |
492 |
493 |
494 |
495 |
496 |
497 |
498 |
499 |
500 |
501 |
502 |
503 |
504 |
505 |
506 |
507 |
508 |
509 |
510 |
--------------------------------------------------------------------------------
/template/word/settings.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
26 |
27 |
28 |
29 |
30 |
31 |
32 |
33 |
34 |
35 |
36 |
37 |
38 |
39 |
40 |
41 |
42 |
43 |
44 |
--------------------------------------------------------------------------------
/template/word/styles.xml:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/template/word/theme/theme1.xml:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/tests/image1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mikemaccana/python-docx/4c9b46dbebe3d2a9b82dbcd35af36584a36fd9fe/tests/image1.png
--------------------------------------------------------------------------------
/tests/test_docx.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python
2 |
3 | """
4 | Test docx module
5 | """
6 |
7 | import os
8 | import lxml
9 | from docx import (
10 | appproperties, contenttypes, coreproperties, getdocumenttext, heading,
11 | makeelement, newdocument, nsprefixes, opendocx, pagebreak, paragraph,
12 | picture, relationshiplist, replace, savedocx, search, table, websettings,
13 | wordrelationships
14 | )
15 |
16 | TEST_FILE = 'ShortTest.docx'
17 | IMAGE1_FILE = 'image1.png'
18 |
19 |
20 | # --- Setup & Support Functions ---
21 | def setup_module():
22 | """Set up test fixtures"""
23 | import shutil
24 | if IMAGE1_FILE not in os.listdir('.'):
25 | shutil.copyfile(os.path.join(os.path.pardir, IMAGE1_FILE), IMAGE1_FILE)
26 | testnewdocument()
27 |
28 |
29 | def teardown_module():
30 | """Tear down test fixtures"""
31 | if TEST_FILE in os.listdir('.'):
32 | os.remove(TEST_FILE)
33 |
34 |
35 | def simpledoc(noimagecopy=False):
36 | """Make a docx (document, relationships) for use in other docx tests"""
37 | relationships = relationshiplist()
38 | imagefiledict = {}
39 | document = newdocument()
40 | docbody = document.xpath('/w:document/w:body', namespaces=nsprefixes)[0]
41 | docbody.append(heading('Heading 1', 1))
42 | docbody.append(heading('Heading 2', 2))
43 | docbody.append(paragraph('Paragraph 1'))
44 | for point in ['List Item 1', 'List Item 2', 'List Item 3']:
45 | docbody.append(paragraph(point, style='ListNumber'))
46 | docbody.append(pagebreak(type='page'))
47 | docbody.append(paragraph('Paragraph 2'))
48 | docbody.append(
49 | table(
50 | [
51 | ['A1', 'A2', 'A3'],
52 | ['B1', 'B2', 'B3'],
53 | ['C1', 'C2', 'C3']
54 | ]
55 | )
56 | )
57 | docbody.append(pagebreak(type='section', orient='portrait'))
58 | if noimagecopy:
59 | relationships, picpara, imagefiledict = picture(
60 | relationships, IMAGE1_FILE, 'This is a test description',
61 | imagefiledict=imagefiledict
62 | )
63 | else:
64 | relationships, picpara = picture(
65 | relationships, IMAGE1_FILE, 'This is a test description'
66 | )
67 | docbody.append(picpara)
68 | docbody.append(pagebreak(type='section', orient='landscape'))
69 | docbody.append(paragraph('Paragraph 3'))
70 | if noimagecopy:
71 | return (document, docbody, relationships, imagefiledict)
72 | else:
73 | return (document, docbody, relationships)
74 |
75 |
76 | # --- Test Functions ---
77 | def testsearchandreplace():
78 | """Ensure search and replace functions work"""
79 | document, docbody, relationships = simpledoc()
80 | docbody = document.xpath('/w:document/w:body', namespaces=nsprefixes)[0]
81 | assert search(docbody, 'ing 1')
82 | assert search(docbody, 'ing 2')
83 | assert search(docbody, 'graph 3')
84 | assert search(docbody, 'ist Item')
85 | assert search(docbody, 'A1')
86 | if search(docbody, 'Paragraph 2'):
87 | docbody = replace(docbody, 'Paragraph 2', 'Whacko 55')
88 | assert search(docbody, 'Whacko 55')
89 |
90 |
91 | def testtextextraction():
92 | """Ensure text can be pulled out of a document"""
93 | document = opendocx(TEST_FILE)
94 | paratextlist = getdocumenttext(document)
95 | assert len(paratextlist) > 0
96 |
97 |
98 | def testunsupportedpagebreak():
99 | """Ensure unsupported page break types are trapped"""
100 | document = newdocument()
101 | docbody = document.xpath('/w:document/w:body', namespaces=nsprefixes)[0]
102 | try:
103 | docbody.append(pagebreak(type='unsup'))
104 | except ValueError:
105 | return # passed
106 | assert False # failed
107 |
108 |
109 | def testnewdocument():
110 | """Test that a new document can be created"""
111 | document, docbody, relationships = simpledoc()
112 | coreprops = coreproperties(
113 | 'Python docx testnewdocument',
114 | 'A short example of making docx from Python', 'Alan Brooks',
115 | ['python', 'Office Open XML', 'Word']
116 | )
117 | savedocx(
118 | document, coreprops, appproperties(), contenttypes(), websettings(),
119 | wordrelationships(relationships), TEST_FILE
120 | )
121 |
122 |
123 | def testnewdocument_noimagecopy():
124 | """
125 | Test that a new document can be created
126 | """
127 | document, docbody, relationships, imagefiledict = simpledoc(
128 | noimagecopy=True
129 | )
130 | coreprops = coreproperties(
131 | 'Python docx testnewdocument',
132 | 'A short example of making docx from Python', 'Alan Brooks',
133 | ['python', 'Office Open XML', 'Word']
134 | )
135 | savedocx(
136 | document, coreprops, appproperties(), contenttypes(), websettings(),
137 | wordrelationships(relationships), TEST_FILE,
138 | imagefiledict=imagefiledict
139 | )
140 |
141 |
142 | def testopendocx():
143 | """Ensure an etree element is returned"""
144 | if isinstance(opendocx(TEST_FILE), lxml.etree._Element):
145 | pass
146 | else:
147 | assert False
148 |
149 |
150 | def testmakeelement():
151 | """Ensure custom elements get created"""
152 | testelement = makeelement(
153 | 'testname',
154 | attributes={'testattribute': 'testvalue'},
155 | tagtext='testtagtext'
156 | )
157 | assert testelement.tag == (
158 | '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}testn'
159 | 'ame'
160 | )
161 | assert testelement.attrib == {
162 | '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}testa'
163 | 'ttribute': 'testvalue'
164 | }
165 | assert testelement.text == 'testtagtext'
166 |
167 |
168 | def testparagraph():
169 | """Ensure paragraph creates p elements"""
170 | testpara = paragraph('paratext', style='BodyText')
171 | assert testpara.tag == (
172 | '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}p'
173 | )
174 | pass
175 |
176 |
177 | def testtable():
178 | """Ensure tables make sense"""
179 | testtable = table([['A1', 'A2'], ['B1', 'B2'], ['C1', 'C2']])
180 | assert (
181 | testtable.xpath(
182 | '/ns0:tbl/ns0:tr[2]/ns0:tc[2]/ns0:p/ns0:r/ns0:t',
183 | namespaces={'ns0': ('http://schemas.openxmlformats.org/wordproce'
184 | 'ssingml/2006/main')}
185 | )[0].text == 'B2'
186 | )
187 |
188 |
189 | if __name__ == '__main__':
190 | import nose
191 | nose.main()
192 |
--------------------------------------------------------------------------------
/tox.ini:
--------------------------------------------------------------------------------
1 | #
2 | # tox.ini
3 | #
4 | # Copyright (C) 2012, 2013 Steve Canny scanny@cisco.com
5 | #
6 | # This module is part of python-docx and is released under the MIT License:
7 | # http://www.opensource.org/licenses/mit-license.php
8 | #
9 | # Configuration for tox
10 |
11 | [tox]
12 | envlist = py26, py27
13 |
14 | [testenv]
15 | deps =
16 | lxml
17 | nose
18 | Pillow
19 |
20 | commands =
21 | nosetests
22 |
--------------------------------------------------------------------------------