├── .coveragerc
├── .gitignore
├── .travis.yml
├── LICENSE
├── MANIFEST.in
├── README.rst
├── docs
    ├── advanced.rst
    ├── api.rst
    ├── builder.rst
    ├── conf.py
    ├── index.rst
    ├── nodes.rst
    ├── parser.rst
    └── writer.rst
├── requirements-dev.txt
├── setup.py
├── tests
    ├── __init__.py
    ├── data
    │   ├── example_doc.small.xml
    │   ├── example_doc.unicode.xml
    │   ├── monty_python_films.ns.xml
    │   └── monty_python_films.xml
    ├── test_builder.py
    ├── test_nodes.py
    ├── test_parser.py
    └── test_writer.py
├── tox.ini
└── xml4h
    ├── __init__.py
    ├── builder.py
    ├── exceptions.py
    ├── impls
        ├── __init__.py
        ├── interface.py
        ├── lxml_etree.py
        ├── xml_dom_minidom.py
        └── xml_etree_elementtree.py
    ├── nodes.py
    └── writer.py


/.coveragerc:
--------------------------------------------------------------------------------
1 | [report]
2 | show_missing = 1
3 | exclude_lines =
4 |     pragma: no cover
5 |     raise NotImplementedError
6 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
 1 | # Build artifacts
 2 | dist
 3 | build
 4 | xml4h.egg-info
 5 | 
 6 | # Sphinx documentation
 7 | docs/_*
 8 | docs/.*
 9 | 
10 | # Nosetests coverage report
11 | .coverage
12 | 
13 | # Tox virtualenvs
14 | .tox/
15 | 


--------------------------------------------------------------------------------
/.travis.yml:
--------------------------------------------------------------------------------
 1 | language: python
 2 | python:
 3 |   - "2.7"
 4 |   - "3.5"
 5 |   - "3.6"
 6 |   - "3.7"
 7 |   - "3.8"
 8 | install: pip install tox-travis
 9 | script: tox
10 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | The MIT License (MIT)
 2 | 
 3 | Copyright (c) 2013 James Murty.
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy of
 6 | this software and associated documentation files (the "Software"), to deal in
 7 | the Software without restriction, including without limitation the rights to
 8 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
 9 | the Software, and to permit persons to whom the Software is furnished to do so,
10 | subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
17 | FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
18 | COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
19 | IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
20 | CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
21 | 


--------------------------------------------------------------------------------
/MANIFEST.in:
--------------------------------------------------------------------------------
1 | include README.rst LICENSE requirements-dev.txt
2 | recursive-include tests *.py *.xml
3 | recursive-include docs *.rst
4 | 


--------------------------------------------------------------------------------
/README.rst:
--------------------------------------------------------------------------------
  1 | ===============================
  2 | xml4h: XML for Humans in Python
  3 | ===============================
  4 | 
  5 | *xml4h* is an MIT licensed library for Python to make it easier to work with XML.
  6 | 
  7 | This library exists because Python is awesome, XML is everywhere, and combining
  8 | the two should be a pleasure but often is not. With *xml4h*, it can be easy.
  9 | 
 10 | As of version 1.0 *xml4h* supports Python versions 2.7 and 3.5+.
 11 | 
 12 | 
 13 | Features
 14 | --------
 15 | 
 16 | *xml4h* is a simplification layer over existing Python XML processing libraries
 17 | such as *lxml*, *ElementTree* and the *minidom*. It provides:
 18 | 
 19 | - a rich pythonic API to traverse and manipulate the XML DOM.
 20 | - a document builder to simply and safely construct complex documents with
 21 |   minimal code.
 22 | - a writer that serialises XML documents with the structure and format that you
 23 |   expect, unlike the machine- but not human-friendly output you tend to get
 24 |   from other libraries.
 25 | 
 26 | The *xml4h* abstraction layer also offers some other benefits, beyond a nice
 27 | API and tool set:
 28 | 
 29 | - A common interface to different underlying XML libraries, so code written
 30 |   against *xml4h* need not be rewritten if you switch implementations.
 31 | - You can easily move between *xml4h* and the underlying implementation: parse
 32 |   your document using the fastest implementation, manipulate the DOM with
 33 |   human-friendly code using *xml4h*, then get back to the underlying
 34 |   implementation if you need to.
 35 | 
 36 | 
 37 | Installation
 38 | ------------
 39 | 
 40 | Install *xml4h* with pip::
 41 | 
 42 |     $ pip install xml4h
 43 | 
 44 | Or install the tarball manually with::
 45 | 
 46 |     $ python setup.py install
 47 | 
 48 | 
 49 | Links
 50 | -----
 51 | 
 52 | - GitHub for source code and issues: https://github.com/jmurty/xml4h
 53 | - ReadTheDocs for documentation: https://xml4h.readthedocs.org
 54 | - Install from the Python Package Index: https://pypi.python.org/pypi/xml4h
 55 | 
 56 | 
 57 | Introduction
 58 | ------------
 59 | 
 60 | With *xml4h* you can easily parse XML files and access their data.
 61 | 
 62 | Let's start with an example XML document::
 63 | 
 64 |     $ cat tests/data/monty_python_films.xml
 65 |     <MontyPythonFilms source="http://en.wikipedia.org/wiki/Monty_Python">
 66 |         <Film year="1971">
 67 |             <Title>And Now for Something Completely Different</Title>
 68 |             <Description>
 69 |                 A collection of sketches from the first and second TV series of
 70 |                 Monty Python's Flying Circus purposely re-enacted and shot for film.
 71 |             </Description>
 72 |         </Film>
 73 |         <Film year="1974">
 74 |             <Title>Monty Python and the Holy Grail</Title>
 75 |             <Description>
 76 |                 King Arthur and his knights embark on a low-budget search for
 77 |                 the Holy Grail, encountering humorous obstacles along the way.
 78 |                 Some of these turned into standalone sketches.
 79 |             </Description>
 80 |         </Film>
 81 |         <Film year="1979">
 82 |             <Title>Monty Python's Life of Brian</Title>
 83 |             <Description>
 84 |                 Brian is born on the first Christmas, in the stable next to
 85 |                 Jesus'. He spends his life being mistaken for a messiah.
 86 |             </Description>
 87 |         </Film>
 88 |         <... more Film elements here ...>
 89 |     </MontyPythonFilms>
 90 | 
 91 | With *xml4h* you can parse the XML file and use "magical" element and attribute
 92 | lookups to read data::
 93 | 
 94 |     >>> import xml4h
 95 |     >>> doc = xml4h.parse('tests/data/monty_python_films.xml')
 96 | 
 97 |     >>> for film in doc.MontyPythonFilms.Film[:3]:
 98 |     ...     print(film['year'] + ' : ' + film.Title.text)
 99 |     1971 : And Now for Something Completely Different
100 |     1974 : Monty Python and the Holy Grail
101 |     1979 : Monty Python's Life of Brian
102 | 
103 | You can also use more explicit (non-magical) methods to traverse the DOM::
104 | 
105 |     >>> for film in doc.child('MontyPythonFilms').children('Film')[:3]:
106 |     ...     print(film.attributes['year'] + ' : ' + film.children.first.text)
107 |     1971 : And Now for Something Completely Different
108 |     1974 : Monty Python and the Holy Grail
109 |     1979 : Monty Python's Life of Brian
110 | 
111 | The *xml4h* builder makes programmatic document creation simple, with a
112 | method-chaining feature that allows for expressive but sparse code that mirrors
113 | the document itself. Here is the code to build part of the above XML document::
114 | 
115 |     >>> b = (xml4h.build('MontyPythonFilms')
116 |     ...     .attributes({'source': 'http://en.wikipedia.org/wiki/Monty_Python'})
117 |     ...     .element('Film')
118 |     ...         .attributes({'year': 1971})
119 |     ...         .element('Title')
120 |     ...             .text('And Now for Something Completely Different')
121 |     ...             .up()
122 |     ...         .elem('Description').t(
123 |     ...             "A collection of sketches from the first and second TV"
124 |     ...             " series of Monty Python's Flying Circus purposely"
125 |     ...             " re-enacted and shot for film."
126 |     ...             ).up()
127 |     ...         .up()
128 |     ...     )
129 | 
130 |     >>> # A builder object can be re-used, and has short method aliases
131 |     >>> b = (b.e('Film')
132 |     ...     .attrs(year=1974)
133 |     ...     .e('Title').t('Monty Python and the Holy Grail').up()
134 |     ...     .e('Description').t(
135 |     ...         "King Arthur and his knights embark on a low-budget search"
136 |     ...         " for the Holy Grail, encountering humorous obstacles along"
137 |     ...         " the way. Some of these turned into standalone sketches."
138 |     ...         ).up()
139 |     ...     .up()
140 |     ... )
141 | 
142 | Pretty-print your XML document with *xml4h*'s writer implementation with
143 | methods to write content to a stream or get the content as text with flexible
144 | formatting options::
145 | 
146 |     >>> print(b.xml_doc(indent=4, newline=True)) # doctest: +ELLIPSIS
147 |     <?xml version="1.0" encoding="utf-8"?>
148 |     <MontyPythonFilms source="http://en.wikipedia.org/wiki/Monty_Python">
149 |         <Film year="1971">
150 |             <Title>And Now for Something Completely Different</Title>
151 |             <Description>A collection of sketches from ...</Description>
152 |         </Film>
153 |         <Film year="1974">
154 |             <Title>Monty Python and the Holy Grail</Title>
155 |             <Description>King Arthur and his knights embark ...</Description>
156 |         </Film>
157 |     </MontyPythonFilms>
158 |     <BLANKLINE>
159 | 
160 | 
161 | Why use *xml4h*?
162 | ----------------
163 | 
164 | Python has three popular libraries for working with XML, none of which are
165 | particularly easy to use:
166 | 
167 | - `xml.dom.minidom <https://docs.python.org/3/library/xml.dom.minidom.html>`_
168 |   is a light-weight, moderately-featured implementation of the W3C DOM
169 |   that is included in the standard library. Unfortunately the W3C DOM API is
170 |   verbose, clumsy, and not very pythonic, and the *minidom* does not support
171 |   XPath expressions.
172 | - `xml.etree.ElementTree <http://docs.python.org/3/library/xml.etree.elementtree.html>`_
173 |   is a fast hierarchical data container that is included in the standard
174 |   library and can be used to represent XML, mostly. The API is fairly pythonic
175 |   and supports some basic XPath features, but it lacks some DOM traversal
176 |   niceties you might expect (e.g. to get an element's parent) and when using it
177 |   you often feel like your working with something subtly different from XML,
178 |   because you are.
179 | - `lxml <http://lxml.de/>`_ is a fast, full-featured XML library with an API
180 |   based on ElementTree but extended. It is your best choice for doing serious
181 |   work with XML in Python but it is not included in the standard library, it
182 |   can be difficult to install, and it gives you the same it's-XML-but-not-quite
183 |   feeling as its ElementTree forebear.
184 | 
185 | Given these three options it can be difficult to choose which library to use,
186 | especially if you're new to XML processing in Python and haven't already
187 | used (struggled with) any of them.
188 | 
189 | In the past your best bet would have been to go with *lxml* for the most
190 | flexibility, even though it might be overkill, because at least then you
191 | wouldn't have to rewrite your code if you later find you need XPath support or
192 | powerful DOM traversal methods.
193 | 
194 | This is where *xml4h* comes in. It provides an abstraction layer over
195 | the existing XML libraries, taking advantage of their power while offering an
196 | improved API and tool set.
197 | 
198 | 
199 | Development Status: beta
200 | ------------------------
201 | 
202 | Currently *xml4h* includes adapter implementations for three of the main XML
203 | processing Python libraries.
204 | 
205 | If you have *lxml* available (highly recommended) it will use that, otherwise
206 | it will fall back to use the *(c)ElementTree* then the *minidom* libraries.
207 | 
208 | 
209 | 
210 | History
211 | -------
212 | 
213 | 1.0
214 | ...
215 | 
216 | - Add support for Python 3 (3.5+)
217 | - Dropped support for Python versions before 2.7.
218 | - Fix node namespace prefix values for lxml adapter.
219 | - Improve builder's ``up()`` method to accept and distinguish between a count
220 |   of parents to step up, or the name of a target ancestor node.
221 | - Add ``xml()`` and ``xml_doc()`` methods to document builder to more easily
222 |   get string content from it, without resorting to the write methods.
223 | - The ``write()`` and ``write_doc()`` methods no longer send output to
224 |   ``sys.stdout`` by default. The user must explicitly provide a target writer
225 |   object, and hopefully be more mindful of the need to set up encoding correctly
226 |   when providing a text stream object.
227 | - Handling of redundant Element namespace prefixes is now more consistent: we
228 |   always strip the prefix when the element has an `xmlns` attribute defining
229 |   the same namespace URI.
230 | 
231 | 0.2.0
232 | .....
233 | 
234 | - Add adapter for the *(c)ElementTree* library versions included as standard
235 |   with Python 2.7+.
236 | - Improved "magical" node traversal to work with lowercase tag names without
237 |   always needing a trailing underscore. See also improved docs.
238 | - Fixes for: potential errors ASCII-encoding nodes as strings; default XPath
239 |   namespace from document node; lookup precedence of xmlns attributes.
240 | 
241 | 
242 | 0.1.0
243 | .....
244 | 
245 | - Initial alpha release with support for *lxml* and *minidom* libraries.
246 | 


--------------------------------------------------------------------------------
/docs/advanced.rst:
--------------------------------------------------------------------------------
  1 | ========
  2 | Advanced
  3 | ========
  4 | 
  5 | 
  6 | .. _xml4h-namespaces:
  7 | 
  8 | Namespaces
  9 | ==========
 10 | 
 11 | *xml4h* supports using XML namespaces in a number of ways, and tries to make
 12 | this sometimes complex and fiddly aspect of XML a little easier to deal with.
 13 | 
 14 | Namespace URIs
 15 | --------------
 16 | 
 17 | XML document nodes can be associated with a *namespace URI* which uniquely
 18 | identifies the namespace.  At bottom a URI is really just a name to identifiy
 19 | the namespace, which may or may not point at an actual resource.
 20 | 
 21 | Namespace URIs are the core piece of the namespacing puzzle, everything else is
 22 | extras.
 23 | 
 24 | Namespace URI values are assigned to a node in one of three ways:
 25 | 
 26 | - an ``xmlns`` attribute on an element assigns a *namespace URI* to that
 27 |   element, and may also define a shorthand *prefix* for the namespace::
 28 | 
 29 |       <AnElement xmlns:my-prefix="urn:example-uri">
 30 | 
 31 |   .. note::
 32 |      Technically the ``xmlns`` attribute must itself also be in the special XML
 33 |      namespacing namespace http://www.w3.org/2000/xmlns/. You needn't care
 34 |      about this.
 35 | 
 36 | - a tag or attribute name includes a *prefix* alias portion that specifies the
 37 |   namespace the item belongs to::
 38 | 
 39 |       <my-prefix:AnotherElement attr1="x" my-prefix:attr2="i am namespaced">
 40 | 
 41 |   A prefix alias can be defined using an "xmlns" attribute as described above,
 42 |   or by using the Builder :meth:`~xml4h.Builder.ns_prefix` or Node
 43 |   :meth:`~xml4h.nodes.Node.set_ns_prefix` methods.
 44 | 
 45 | - in an apparent effort to reduce confusion around namespace URIs and prefixes,
 46 |   some XML libraries avoid prefix aliases altogether and instead require you to
 47 |   specify the full *namespace URI* as a prefix to tag and attribute names
 48 |   using a special syntax with braces::
 49 | 
 50 |       >>> tagname = '{urn:example-uri}YetAnotherWayToNamespace'
 51 | 
 52 |   .. note::
 53 |      In the author's opinion, using a non-standard way to define namespaces
 54 |      does not reduce confusion. *xml4h* supports this approach technically but
 55 |      not philosphically.
 56 | 
 57 | *xml4h* allows you to assign namespace URIs to document nodes when using the
 58 | Builder::
 59 | 
 60 |     >>> # Assign a default namespace with ns_uri
 61 |     >>> import xml4h
 62 |     >>> b = xml4h.build('Doc', ns_uri='ns-uri')
 63 |     >>> root = b.root
 64 | 
 65 |     >>> # Descendent without a namespace inherit their ancestor's default one
 66 |     >>> elem1 = b.elem('Elem1').dom_element
 67 |     >>> elem1.namespace_uri
 68 |     'ns-uri'
 69 | 
 70 |     >>> # Define a prefix alias to assign a new or existing namespace URI
 71 |     >>> elem2 = b.ns_prefix('my-ns', 'second-ns-uri') \
 72 |     ...     .elem('my-ns:Elem2').dom_element
 73 |     >>> print(root.xml())
 74 |     <Doc xmlns="ns-uri" xmlns:my-ns="second-ns-uri">
 75 |         <Elem1/>
 76 |         <my-ns:Elem2/>
 77 |     </Doc>
 78 | 
 79 |     >>> # Or use the explicit URI prefix approach, if you must
 80 |     >>> elem3 = b.elem('{third-ns-uri}Elem3').dom_element
 81 |     >>> elem3.namespace_uri
 82 |     'third-ns-uri'
 83 | 
 84 | And when adding nodes with the API::
 85 | 
 86 |     >>> # Define the ns_uri argument when creating a new element
 87 |     >>> elem4 = root.add_element('Elem4', ns_uri='fourth-ns-uri')
 88 | 
 89 |     >>> # Attributes can be namespaced too
 90 |     >>> elem4.set_attributes({'my-ns:attr1': 'value'})
 91 | 
 92 |     >>> print(elem4.xml())
 93 |     <Elem4 my-ns:attr1="value" xmlns="fourth-ns-uri"/>
 94 | 
 95 | 
 96 | Filtering by Namespace
 97 | ----------------------
 98 | 
 99 | *xml4h* allows you to find and filter nodes based on their namespace.
100 | 
101 | The :meth:`~xml4h.nodes.Node.find` method takes a ``ns_uri`` keyword argument to
102 | return only elements in that namespace::
103 | 
104 |     >>> # By default, find ignores namespaces...
105 |     >>> [n.local_name for n in root.find()]
106 |     ['Elem1', 'Elem2', 'Elem3', 'Elem4']
107 |     >>> # ...but will filter by namespace URI if you wish
108 |     >>> [n.local_name for n in root.find(ns_uri='fourth-ns-uri')]
109 |     ['Elem4']
110 | 
111 | Similarly, a node's children listing can be filtered::
112 | 
113 |     >>> len(root.children)
114 |     4
115 |     >>> root.children(ns_uri='ns-uri')
116 |     [<xml4h.nodes.Element: "Elem1">]
117 | 
118 | XPath queries can also filter by namespace, but the
119 | :meth:`~xml4h.nodes.Node.xpath` method needs to be given a dictionary mapping
120 | of prefix aliases to URIs::
121 | 
122 |     >>> root.xpath('//ns4:*', namespaces={'ns4': 'fourth-ns-uri'})
123 |     [<xml4h.nodes.Element: "Elem4">]
124 | 
125 | .. note::
126 |    Normally, because XPath queries rely on namespace prefix aliases, they
127 |    cannot find namespaced nodes in the default namespace which has an "empty"
128 |    prefix name. *xml4h* works around this limitation by providing the special
129 |    empty/default prefix alias '_'.
130 | 
131 | 
132 | Element Names: Local and Prefix Components
133 | ------------------------------------------
134 | 
135 | When you use a namespace prefix alias to define the namespace an element or
136 | attribute belongs to, the name of that node will be made up of two components:
137 | 
138 | - *prefix* - the namespace alias.
139 | - *local* - the real name of the node, without the namespace alias.
140 | 
141 | *xml4h* makes the full (qualified) name, and the two components, available at
142 | node attributes::
143 | 
144 |     >>> # Elem2's namespace was defined earlier using a prefix alias
145 |     >>> elem2
146 |     <xml4h.nodes.Element: "my-ns:Elem2">
147 | 
148 |     # The full node name...
149 |     >>> elem2.name
150 |     'my-ns:Elem2'
151 |     >>> # ...comprises a prefix...
152 |     >>> elem2.prefix
153 |     'my-ns'
154 |     >>> # ...and a local name component
155 |     >>> elem2.local_name
156 |     'Elem2'
157 | 
158 |     >>> # Here is an element without a prefix alias
159 |     >>> elem1.name
160 |     'Elem1'
161 |     >>> elem1.prefix == None
162 |     True
163 |     >>> elem1.local_name
164 |     'Elem1'
165 | 
166 | 
167 | .. _xml-lib-architecture:
168 | 
169 | *xml4h* Architecture
170 | ====================
171 | 
172 | To best understand the *xml4h* library and to use it appropriately in demanding
173 | situations, you should appreciate what the library is not.
174 | 
175 | *xml4h* is not a full-fledged XML library in its own right, far from it.
176 | Instead of implementing low-level document parsing and manipulation tools, it
177 | operates as an abstraction layer on top of the pre-existing XML processing
178 | libraries you already know.
179 | 
180 | This means the improved API and tool suite provided by *xml4h* work by
181 | mediating operations you perform, asking the underlying XML library to do the
182 | work, and packaging up the results of this work as wrapped *xml4h* objects.
183 | 
184 | This approach has a number of implications, good and bad.
185 | 
186 | On the good side:
187 | 
188 | - you can start using and benefiting from *xml4h* in an existing projects that
189 |   already use a supported XML library without any impact, it can fit right in.
190 | - *xml4h* can take advantage of the existing powerful and fast XML libraries to
191 |   do its work.
192 | - by providing an abstraction layer over multiple libraries, *xml4h* can make
193 |   it (relatively) easy to switch the underlying library without you needing to
194 |   rewrite your own XML handling code.
195 | - by building on the shoulders of giants, *xml4h* itself can remain relatively
196 |   lightweight and focussed on simplicity and usability.
197 | - the author of *xml4h* does not have to write XML-handling code in C...
198 | 
199 | On the bad side:
200 | 
201 | - if the underlying XML libraries available in the Python environment do not
202 |   support a feature (like XPath querying) then that feature will not be
203 |   available in *xml4h*.
204 | - *xml4h* cannot provide radical new XML processing features, since the bulk of
205 |   its work must be done by the underlying library.
206 | - the abstraction layer *xml4h* uses to do its work requires more resources
207 |   than it would to use the underlying library directly, so if you absolutely
208 |   need maximal speed or minimal memory use the library might prove too
209 |   expensive.
210 | - *xml4h* sometimes needs to jump through some hoops to maintain the shared
211 |   abstraction interface over multiple libraries, which means extra work is
212 |   done in Python instead of by the underlying library code in C.
213 | 
214 | The author believes the benefits of using *xml4h* outweighs the drawbacks in
215 | the majority of real-world situations, or he wouldn't have created the library
216 | in the first place, but ultimately it is up to you to decide where you should
217 | or should not use it.
218 | 
219 | 
220 | .. _xml-lib-adapters:
221 | 
222 | Library Adapters
223 | ----------------
224 | 
225 | To provide an abstraction layer over multiple underlying XML libraries, *xml4h*
226 | uses an "adapter" mechanism to mediate operations on documents. There is an
227 | adapter implementation for each library *xml4h* can work with, each of which
228 | extends the :class:`~xml4h.impls.interface.XmlImplAdapter` class. This base
229 | class includes some standard behaviour, and defines the interface for adapter
230 | implementations (to the extent you can define such interfaces in Python).
231 | 
232 | The current version of *xml4h* includes adapter implementations for the three
233 | main XML processing libraries for Python:
234 | 
235 | - :class:`~xml4h.impls.lxml_etree.LXMLAdapter` works with the excellent
236 |   `lxml <http://lxml.de>`_ library which is very full-featured and fast, but
237 |   which is not included in the standard library.
238 | - :class:`~xml4h.impls.xml_etree_elementtree.cElementTreeAdapter` and
239 |   :class:`~xml4h.impls.xml_etree_elementtree.ElementTreeAdapter` work with the
240 |   *ElementTree* libraries included with the standard library of Python versions
241 |   2.7 and later. *ElementTree* is fast and includes support for some basic
242 |   XPath expressions. If the C-based version of ElementTree is available, the
243 |   former adapter is made available and should be used for best performance.
244 | - :class:`~xml4h.impls.xml_dom_minidom.XmlDomImplAdapter` works with the
245 |   `minidom <http://docs.python.org/2/library/xml.dom.minidom.html>`_ W3C-style
246 |   XML library included with the standard library. This library is always
247 |   available but is slower and has fewer features than alternative libraries
248 |   (e.g. no support for XPath)
249 | 
250 | The adapter layer allows the rest of the *xml4h* library code to remain almost
251 | entirely oblivious to the underlying XML library that happens to be available
252 | at the time. The *xml4h* Builder, Node objects, writer etc. call adapter
253 | methods to perform document operations, and the adapter is responsible for
254 | doing the necessary work with the underlying library.
255 | 
256 | 
257 | .. _best-adapter:
258 | 
259 | "Best" Adapter
260 | --------------
261 | 
262 | While *xml4h* can work with multiple underlying XML libraries, some of these
263 | libraries are better (faster, more fully-featured) than others so it would be
264 | smart to use the best of the libraries available.
265 | 
266 | *xml4h* does exactly that: unless you explicitly choose an adapter (see below)
267 | *xml4h* will find the supported libraries in the Python environment and choose
268 | the "best" adapter for you in the circumstances.
269 | 
270 | Here is the list of libraries *xml4h* will choose from, best to least-best:
271 | 
272 | - *lxml*
273 | - *(c)ElementTree*
274 | - *ElementTree*
275 | - *minidom*
276 | 
277 | The :attr:`xml4h.best_adapter` attribute stores the adapter class that *xml4h*
278 | considers to be the best.
279 | 
280 | .. note:
281 |    You cannot always rely on *xml4h* to choose the right underlying XML library
282 |    for your needs. For cases where you need to use a specific library, such as
283 |    when you have a pre-parsed document object, see `wrap-unwrap-nodes`_.
284 | 
285 | 
286 | Choose Your Own Adapter
287 | -----------------------
288 | 
289 | By default, *xml4h* will choose an adapter and underlying XML library
290 | implementation that it considers the best available. However, in some cases you
291 | may need to have full control over which underlying implementation *xml4h*
292 | uses, perhaps because you will use features of the underlying XML
293 | implementation later on, or because you need the performance characteristics
294 | only available in a particular library.
295 | 
296 | For these situations it is possible to tell *xml4h* which adapter
297 | implementation, and therefore which underlying XML library, it should use.
298 | 
299 | To use a specific adapter implementation when parsing a document, or when
300 | creating a new document using the builder, simply provide the optional
301 | ``adapter`` keyword argument to the relevant method:
302 | 
303 | - Parsing::
304 | 
305 |     >>> # Explicitly use the minidom adapter to parse a document
306 |     >>> minidom_doc = xml4h.parse('tests/data/monty_python_films.xml',
307 |     ...                           adapter=xml4h.XmlDomImplAdapter)
308 |     >>> minidom_doc.root.impl_node  #doctest:+ELLIPSIS
309 |     <DOM Element: MontyPythonFilms at ...
310 | 
311 | - Building::
312 | 
313 |     >>> # Explicitly use the lxml adapter to build a document
314 |     >>> lxml_b = xml4h.build('MyDoc', adapter=xml4h.LXMLAdapter)
315 |     >>> lxml_b.root.impl_node  #doctest:+ELLIPSIS
316 |     <Element {http://www.w3.org/2000/xmlns/}MyDoc at ...
317 | 
318 | - Manipulating::
319 | 
320 |     >>> # Use xml4h with a cElementTree document object
321 |     >>> import xml.etree.ElementTree as ET
322 |     >>> et_doc = ET.parse('tests/data/monty_python_films.xml')
323 |     >>> et_doc  #doctest:+ELLIPSIS
324 |     <xml.etree.ElementTree.ElementTree object ...
325 |     >>> doc = xml4h.cElementTreeAdapter.wrap_document(et_doc)
326 |     >>> doc.root
327 |     <xml4h.nodes.Element: "MontyPythonFilms">
328 | 
329 | 
330 | Check Feature Support
331 | .....................
332 | 
333 | Because not all underlying XML libraries support all the features exposed by
334 | *xml4h*, the library includes a simple mechanism to check whether a given
335 | feature is available in the current Python environment or with the current
336 | adapter.
337 | 
338 | To check for feature support call the :meth:`~xml4h.nodes.Node.has_feature`
339 | method on a document node, or
340 | :meth:`~xml4h.impl.interface.XmlImplAdapter.has_feature` on an adapter class.
341 | 
342 | List of features that are not available in all adapters:
343 | 
344 | - ``xpath`` - Can perform XPath queries using the
345 |   :meth:`~xml4h.nodes.Node.xpath` method.
346 | - More to come later, probably...
347 | 
348 | For example, here is how you would test for XPath support in the *minidom*
349 | adapter, which doesn't include it::
350 | 
351 |     >>> minidom_doc.root.has_feature('xpath')
352 |     False
353 | 
354 | If you forget to check for a feature and use it anyway, you will get
355 | a :class:`~xml4h.exceptions.FeatureUnavailableException`::
356 | 
357 |     >>> try:
358 |     ...     minidom_doc.root.xpath('//*')
359 |     ... except Exception as e:
360 |     ...     e  #doctest:+ELLIPSIS
361 |     FeatureUnavailableException('xpath'...
362 | 
363 | 
364 | Adapter & Implementation Quirks
365 | -------------------------------
366 | 
367 | Although *xml4h* aims to provide a seamless abstraction over underlying XML
368 | library implementations this isn't always possible, or is only possible by
369 | performing lots of extra work that affects performance. This section describes
370 | some implementation-specific quirks or differences you may encounter.
371 | 
372 | .. note:
373 |    This set of quirks is almost certainly incomplete, please report issues you
374 |    find so they can either be fixed (in the best case) or captured here as
375 |    known trouble-spots.
376 | 
377 | LXMLAdapter - *lxml*
378 | ....................
379 | 
380 | - *lxml* does not have full support for CDATA nodes, which devolve into plain
381 |   text node values when written (by *xml4h* or by *lxml*'s writer).
382 | - Namespaces defined by adding ``xmlns`` element attributes are not properly
383 |   represented in the underlying implementation due to the *lxml* library's
384 |   immutable ``nsmap`` namespace map. Such namespaces are written correcly
385 |   by the *xml4h* writer, but to avoid quirks it is best to specify namespace
386 |   when creating nodes by setting the ``ns_uri`` keyword attribute.
387 | - When *xml4h* writes *lxml*-based documents with namespaces, some node tag
388 |   names may have unnecessary namespace prefix aliases.
389 | 
390 | (c)ElementTreeAdapter - *ElementTree*
391 | .....................................
392 | 
393 | - Only the versions of (c)ElementTree included with Python version 2.7 and
394 |   later are supported.
395 | - *ElementTree* supports only a very limited subset of XPath for querying, so
396 |   although the ``has_feature('xpath')`` check returns ``True`` don't expect to
397 |   get the full power of XPath when you use this adapter.
398 | - *ElementTree* does not have full support for CDATA nodes, which devolve into
399 |   plain text node values when written (by *xml4h* or by *ElementTree*'s writer).
400 | - Because *ElementTree* doesn't retain information about a node's parent,
401 |   *xml4h* needs to build and maintain its own records of which nodes are
402 |   parents of which children. This extra overhead might harm performance or
403 |   memory usage.
404 | - *ElementTree* doesn't normally remember explicit namespace definition
405 |   directives when parsing a document. *xml4h* works around this when it is
406 |   asked to parse XML data, but if you parse data outside of *xml4h* then use
407 |   the library on the resultant document the namespace definitions will get
408 |   messed up.
409 | 
410 | XmlImplAdapter - *minidom*
411 | ..........................
412 | 
413 | - No support for performing XPath queries.
414 | - Slower than alternative C-based implementations.
415 | 


--------------------------------------------------------------------------------
/docs/api.rst:
--------------------------------------------------------------------------------
 1 | ===
 2 | API
 3 | ===
 4 | 
 5 | 
 6 | Main Interface
 7 | --------------
 8 | 
 9 | .. automodule:: xml4h
10 |    :members: parse, build, best_adapter
11 | 
12 | 
13 | Builder
14 | -------
15 | 
16 | .. automodule:: xml4h.builder
17 |    :members:
18 | 
19 | 
20 | Writer
21 | ------
22 | 
23 | .. automodule:: xml4h.writer
24 |    :members:
25 | 
26 | 
27 | .. _api-nodes:
28 | 
29 | DOM Nodes API
30 | -------------
31 | 
32 | .. automodule:: xml4h.nodes
33 |    :members:
34 |    :special-members:
35 |    :private-members:
36 | 
37 | 
38 | XML Libarary Adapters
39 | ---------------------
40 | 
41 | .. automodule:: xml4h.impls.interface
42 |    :members:
43 | 
44 | .. automodule:: xml4h.impls.lxml_etree
45 |    :members:
46 | 
47 | .. automodule:: xml4h.impls.xml_etree_elementtree
48 |    :members:
49 | 
50 | .. automodule:: xml4h.impls.xml_dom_minidom
51 |    :members:
52 | 
53 | 
54 | Custom Exceptions
55 | -----------------
56 | 
57 | .. automodule:: xml4h.exceptions
58 |    :members:
59 | 


--------------------------------------------------------------------------------
/docs/builder.rst:
--------------------------------------------------------------------------------
  1 | .. _builder:
  2 | 
  3 | =======
  4 | Builder
  5 | =======
  6 | 
  7 | *xml4h* includes a document builder tool that makes it easy to create valid,
  8 | well-formed XML documents using relatively sparse python code. It makes it so
  9 | easy to create XML that you will no longer be tempted to cobble together
 10 | documents with error-prone methods like manual string concatenation or a
 11 | templating library.
 12 | 
 13 | Internally, the builder uses the DOM-building features of an underlying XML
 14 | library which means it is (almost) impossible to construct an invalid document.
 15 | 
 16 | Here is some example code to build a document about Monty Python films::
 17 | 
 18 |     >>> import xml4h
 19 |     >>> xmlb = (xml4h.build('MontyPythonFilms')
 20 |     ...     .attributes({'source': 'http://en.wikipedia.org/wiki/Monty_Python'})
 21 |     ...     .element('Film')
 22 |     ...         .attributes({'year': 1971})
 23 |     ...         .element('Title')
 24 |     ...             .text('And Now for Something Completely Different')
 25 |     ...             .up()
 26 |     ...         .elem('Description').t(
 27 |     ...             "A collection of sketches from the first and second TV"
 28 |     ...             " series of Monty Python's Flying Circus purposely"
 29 |     ...             " re-enacted and shot for film.")
 30 |     ...             .up()
 31 |     ...         .up()
 32 |     ...     .elem('Film')
 33 |     ...         .attrs(year=1974)
 34 |     ...         .e('Title')
 35 |     ...             .t('Monty Python and the Holy Grail')
 36 |     ...             .up()
 37 |     ...         .e('Description').t(
 38 |     ...             "King Arthur and his knights embark on a low-budget search"
 39 |     ...             " for the Holy Grail, encountering humorous obstacles along"
 40 |     ...             " the way. Some of these turned into standalone sketches."
 41 |     ...             ).up()
 42 |     ...     )
 43 | 
 44 | The code above produces the following XML document (abbreviated)::
 45 | 
 46 |     >>> print(xmlb.xml_doc(indent=True))  # doctest:+ELLIPSIS
 47 |     <?xml version="1.0" encoding="utf-8"?>
 48 |     <MontyPythonFilms source="http://en.wikipedia.org/wiki/Monty_Python">
 49 |         <Film year="1971">
 50 |             <Title>And Now for Something Completely Different</Title>
 51 |             <Description>A collection of sketches from the first and second...
 52 |         </Film>
 53 |         <Film year="1974">
 54 |             <Title>Monty Python and the Holy Grail</Title>
 55 |             <Description>King Arthur and his knights embark on a low-budget...
 56 |         </Film>
 57 |     </MontyPythonFilms>
 58 |     <BLANKLINE>
 59 | 
 60 | 
 61 | Getting Started
 62 | ---------------
 63 | 
 64 | You typically create a new XML document builder by calling the
 65 | :func:`xml4h.build` function with the name of the root element::
 66 | 
 67 |     >>> root_b = xml4h.build('RootElement')
 68 | 
 69 | The function returns a :class:`~xml4h.builder.Builder` object that represents
 70 | the *RootElement* and allows you to manipulate this element's attributes
 71 | or to add child elements.
 72 | 
 73 | Once you have the first builder instance, every action you perform to add
 74 | content to the XML document will return another instance of the Builder class::
 75 | 
 76 |     >>> # Add attributes to the root element's Builder
 77 |     >>> root_b = root_b.attributes({'a': 1, 'b': 2}, c=3)
 78 | 
 79 |     >>> root_b  #doctest:+ELLIPSIS
 80 |     <xml4h.builder.Builder object ...
 81 | 
 82 | The Builder class always represents an underlying element in the DOM. The
 83 | :attr:`~xml4h.builder.Builder.dom_element` attribute returns the element node:: 
 84 | 
 85 |     >>> root_b.dom_element
 86 |     <xml4h.nodes.Element: "RootElement">
 87 | 
 88 |     >>> root_b.dom_element.attributes
 89 |     <xml4h.nodes.AttributeDict: [('a', '1'), ('b', '2'), ('c', '3')]>
 90 | 
 91 | When you add a new child element, the result is a builder instance representing
 92 | that child element, *not the original element*::
 93 | 
 94 |     >>> child1_b = root_b.element('ChildElement1')
 95 |     >>> child2_b = root_b.element('ChildElement2')
 96 | 
 97 |     >>> # The element method returns a Builder wrapping the new child element
 98 |     >>> child2_b.dom_element
 99 |     <xml4h.nodes.Element: "ChildElement2">
100 |     >>> child2_b.dom_element.parent
101 |     <xml4h.nodes.Element: "RootElement">
102 | 
103 | This feature of the builder can be a little confusing, but it allows for the
104 | very convenient method-chaining feature that gives the builder its power.
105 | 
106 | 
107 | .. _builder-method-chaining:
108 | 
109 | Method Chaining
110 | ---------------
111 | 
112 | Because every builder method that adds content to the XML document returns
113 | a builder instance representing the nearest (or newest) element, you can
114 | chain together many method calls to construct your document without any
115 | need for intermediate variables.
116 | 
117 | For example, the example code in the previous section used the variables
118 | ``root_b``, ``child1_b`` and ``child2_b`` to represent builder instances but
119 | this is not necessary. Here is how you can use method-chaining to build the
120 | same document with less code::
121 | 
122 |     >>> b = (xml4h
123 |     ...     .build('RootElement').attributes({'a': 1, 'b': 2}, c=3)
124 |     ...         .element('ChildElement1').up()  # NOTE the up() method
125 |     ...         .element('ChildElement2')
126 |     ...     )
127 | 
128 |     >>> print(b.xml_doc(indent=4))
129 |     <?xml version="1.0" encoding="utf-8"?>
130 |     <RootElement a="1" b="2" c="3">
131 |         <ChildElement1/>
132 |         <ChildElement2/>
133 |     </RootElement>
134 |     <BLANKLINE>
135 | 
136 | Notice how you can use chained method calls to write code with a structure
137 | that mirrors that of the XML document you want to produce? This makes it
138 | much easier to spot errors in your code than it would be if you were to
139 | concatenate strings.
140 | 
141 | .. note::
142 | 
143 |    It is a good idea to wrap the :func:`~xml4h.build` function call and all
144 |    following chained methods in parentheses, so you don't need to put
145 |    backslash (\\) characters at the end of every line.
146 | 
147 | The code above introduces a very important builder method:
148 | :meth:`~xml4h.builder.Builder.up`. This method returns a builder instance
149 | representing the current element's parent, or indeed any ancestor.
150 | 
151 | Without the ``up()`` method, every time you created a child element with the
152 | builder you would end up deeper in the document structure with no way to return
153 | to prior elements to add sibling nodes or hierarchies.
154 | 
155 | To help reduce the number of ``up()`` method calls you need to include in
156 | your code, this method can also jump up multiple levels or to a named ancestor
157 | element::
158 | 
159 |     >>> # A builder that references a deeply-nested element:
160 |     >>> deep_b = (xml4h.build('Root')
161 |     ...     .element('Deep')
162 |     ...         .element('AndDeeper')
163 |     ...             .element('AndDeeperStill')
164 |     ...                 .element('UntilWeGetThere')
165 |     ...     )
166 |     >>> deep_b.dom_element
167 |     <xml4h.nodes.Element: "UntilWeGetThere">
168 | 
169 |     >>> # Jump up 4 levels, back to the root element
170 |     >>> deep_b.up(4).dom_element
171 |     <xml4h.nodes.Element: "Root">
172 | 
173 |     >>> # Jump up to a named ancestor element
174 |     >>> deep_b.up('Root').dom_element
175 |     <xml4h.nodes.Element: "Root">
176 | 
177 | .. note::
178 |    To avoid making subtle errors in your document's structure, we recommend you
179 |    use :meth:`~xml4h.builder.Builder.up` calls to return up one level for every
180 |    :meth:`~xml4h.builder.Builder.element` method (or alias) you call.
181 | 
182 | 
183 | Shorthand Methods
184 | -----------------
185 | 
186 | To make your XML-producing code even less verbose and quicker to type, the
187 | builder has shorthand "alias" methods corresponding to the full names.
188 | 
189 | For example, instead of calling ``element()`` to create a new
190 | child element, you can instead use the equivalent ``elem()`` or ``e()``
191 | methods. Similarly, instead of typing ``attributes()`` you can use ``attrs()``
192 | or ``a()``.
193 | 
194 | Here are the methods and method aliases for adding content to an XML document:
195 | 
196 | ===================  ==========================  ================
197 | XML Node Created     Builder method              Aliases
198 | ===================  ==========================  ================
199 | Element              ``element``                 ``elem``, ``e``
200 | Attribute            ``attributes``              ``attrs``, ``a``
201 | Text                 ``text``                    ``t``
202 | CDATA                ``cdata``                   ``data``, ``d``
203 | Comment              ``comment``                 ``c``
204 | Process Instruction  ``processing_instruction``  ``inst``, ``i``
205 | ===================  ==========================  ================
206 | 
207 | These shorthand method aliases are convenient and lead to even less cruft
208 | around the actual XML content you are interested in. But on the other hand
209 | they are much less explicit than the longer versions, so use them judiciously.
210 | 
211 | 
212 | Access the DOM
213 | --------------
214 | 
215 | The XML builder is merely a layer of convenience methods that sits on the
216 | :mod:`xml4h.nodes` DOM API. This means you can quickly access the underlying
217 | nodes from a builder if you need to inspect them or manipulate them in a
218 | way the builder doesn't allow:
219 | 
220 | - The :attr:`~xml4h.builder.Builder.dom_element` attribute returns a builder's
221 |   underlying :class:`~xml4h.nodes.Element`
222 | - The :attr:`~xml4h.builder.Builder.root` attribute returns the document's
223 |   root element.
224 | - The :attr:`~xml4h.builder.Builder.document` attribute returns a builder's
225 |   underlying :class:`~xml4h.nodes.Document`.
226 | 
227 | See the :ref:`api-nodes` documentation to find out how to work with DOM
228 | element nodes once you get them.
229 | 
230 | 
231 | Building on an Existing DOM
232 | ---------------------------
233 | 
234 | When you are building an XML document from scratch you will generally use
235 | the :func:`~xml4h.build` function described in `Getting Started`_. However,
236 | what if you want to add content to a parsed XML document DOM you have already?
237 | 
238 | To wrap an :class:`~xml4h.nodes.Element` DOM node with a builder you simply
239 | provide the element node to the same ``builder()`` method used previously and
240 | it will do the right thing.
241 | 
242 | Here is an example of parsing an existing XML document, locating an element
243 | of interest, constructing a builder from that element, and adding some new
244 | content. Luckily, the code is simpler than that description...
245 | 
246 | ::
247 | 
248 |     >>> # Parse an XML document
249 |     >>> doc = xml4h.parse('tests/data/monty_python_films.xml')
250 | 
251 |     >>> # Find an Element node of interest
252 |     >>> lob_film_elem = doc.MontyPythonFilms.Film[2]
253 |     >>> lob_film_elem.Title.text
254 |     "Monty Python's Life of Brian"
255 | 
256 |     >>> # Construct a builder from the element
257 |     >>> lob_builder = xml4h.build(lob_film_elem)
258 | 
259 |     >>> # Add content
260 |     >>> b = (lob_builder.attrs(stars=5)
261 |     ...     .elem('Review').t('One of my favourite films!').up())
262 | 
263 |     >>> # See the results
264 |     >>> print(lob_builder.xml())  # doctest:+ELLIPSIS
265 |     <Film stars="5" year="1979">
266 |         <Title>Monty Python's Life of Brian</Title>
267 |         <Description>Brian is born on the first Christmas, in the stable...
268 |         <Review>One of my favourite films!</Review>
269 |     </Film>
270 | 
271 | 
272 | Hydra-Builder
273 | -------------
274 | 
275 | Because each builder class instance is independent, an advanced technique for
276 | constructing complex documents is to use multiple builders anchored at
277 | different places in the DOM. In some situations, the ability to add content
278 | to different places in the same document can be very handy.
279 | 
280 | Here is a trivial example of this technique::
281 | 
282 |     >>> # Create two Elements in a doc to store even or odd numbers
283 |     >>> odd_b = xml4h.build('EvenAndOdd').elem('Odd')
284 |     >>> even_b = odd_b.up().elem('Even')
285 | 
286 |     >>> # Populate the numbers from a loop
287 |     >>> for i in range(1, 11):  # doctest:+ELLIPSIS
288 |     ...     if i % 2 == 0:
289 |     ...         even_b.elem('Number').text(i)
290 |     ...     else:
291 |     ...         odd_b.elem('Number').text(i)
292 |     <...
293 | 
294 |     >>> # Check the final document
295 |     >>> print(odd_b.xml_doc(indent=True))
296 |     <?xml version="1.0" encoding="utf-8"?>
297 |     <EvenAndOdd>
298 |         <Odd>
299 |             <Number>1</Number>
300 |             <Number>3</Number>
301 |             <Number>5</Number>
302 |             <Number>7</Number>
303 |             <Number>9</Number>
304 |         </Odd>
305 |         <Even>
306 |             <Number>2</Number>
307 |             <Number>4</Number>
308 |             <Number>6</Number>
309 |             <Number>8</Number>
310 |             <Number>10</Number>
311 |         </Even>
312 |     </EvenAndOdd>
313 |     <BLANKLINE>
314 | 


--------------------------------------------------------------------------------
/docs/conf.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | #
  3 | # xml4h documentation build configuration file, created by
  4 | # sphinx-quickstart on Thu Aug 30 22:29:54 2012.
  5 | #
  6 | # This file is execfile()d with the current directory set to its containing dir.
  7 | #
  8 | # Note that not all possible configuration values are present in this
  9 | # autogenerated file.
 10 | #
 11 | # All configuration values have a default; values that are commented out
 12 | # serve to show the default.
 13 | 
 14 | import sys, os
 15 | from xml4h import __version__
 16 | 
 17 | # If extensions (or modules to document with autodoc) are in another directory,
 18 | # add these directories to sys.path here. If the directory is relative to the
 19 | # documentation root, use os.path.abspath to make it absolute, like shown here.
 20 | #sys.path.insert(0, os.path.abspath('.'))
 21 | 
 22 | # -- General configuration -----------------------------------------------------
 23 | 
 24 | # If your documentation needs a minimal Sphinx version, state it here.
 25 | #needs_sphinx = '1.0'
 26 | 
 27 | # Add any Sphinx extension module names here, as strings. They can be extensions
 28 | # coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
 29 | extensions = ['sphinx.ext.autodoc', 'sphinx.ext.viewcode']
 30 | 
 31 | # Add any paths that contain templates here, relative to this directory.
 32 | templates_path = ['_templates']
 33 | 
 34 | # The suffix of source filenames.
 35 | source_suffix = '.rst'
 36 | 
 37 | # The encoding of source files.
 38 | #source_encoding = 'utf-8-sig'
 39 | 
 40 | # The master toctree document.
 41 | master_doc = 'index'
 42 | 
 43 | # General information about the project.
 44 | project = 'xml4h'
 45 | copyright = '2020, James Murty'
 46 | 
 47 | # The version info for the project you're documenting, acts as replacement for
 48 | # |version| and |release|, also used in various other places throughout the
 49 | # built documents.
 50 | #
 51 | # The short X.Y version.
 52 | version = __version__
 53 | # The full version, including alpha/beta/rc tags.
 54 | release = version
 55 | 
 56 | # The language for content autogenerated by Sphinx. Refer to documentation
 57 | # for a list of supported languages.
 58 | #language = None
 59 | 
 60 | # There are two options for replacing |today|: either, you set today to some
 61 | # non-false value, then it is used:
 62 | #today = ''
 63 | # Else, today_fmt is used as the format for a strftime call.
 64 | #today_fmt = '%B %d, %Y'
 65 | 
 66 | # List of patterns, relative to source directory, that match files and
 67 | # directories to ignore when looking for source files.
 68 | exclude_patterns = ['_build']
 69 | 
 70 | # The reST default role (used for this markup: `text`) to use for all documents.
 71 | #default_role = None
 72 | 
 73 | # If true, '()' will be appended to :func: etc. cross-reference text.
 74 | #add_function_parentheses = True
 75 | 
 76 | # If true, the current module name will be prepended to all description
 77 | # unit titles (such as .. function::).
 78 | #add_module_names = True
 79 | 
 80 | # If true, sectionauthor and moduleauthor directives will be shown in the
 81 | # output. They are ignored by default.
 82 | #show_authors = False
 83 | 
 84 | # The name of the Pygments (syntax highlighting) style to use.
 85 | pygments_style = 'sphinx'
 86 | 
 87 | # A list of ignored prefixes for module index sorting.
 88 | #modindex_common_prefix = []
 89 | 
 90 | 
 91 | # -- Options for HTML output ---------------------------------------------------
 92 | 
 93 | # The theme to use for HTML and HTML Help pages.  See the documentation for
 94 | # a list of builtin themes.
 95 | html_theme = 'default'
 96 | 
 97 | # Theme options are theme-specific and customize the look and feel of a theme
 98 | # further.  For a list of options available for each theme, see the
 99 | # documentation.
100 | #html_theme_options = {}
101 | 
102 | # Add any paths that contain custom themes here, relative to this directory.
103 | #html_theme_path = []
104 | 
105 | # The name for this set of Sphinx documents.  If None, it defaults to
106 | # "<project> v<release> documentation".
107 | #html_title = None
108 | 
109 | # A shorter title for the navigation bar.  Default is the same as html_title.
110 | #html_short_title = None
111 | 
112 | # The name of an image file (relative to this directory) to place at the top
113 | # of the sidebar.
114 | #html_logo = None
115 | 
116 | # The name of an image file (within the static path) to use as favicon of the
117 | # docs.  This file should be a Windows icon file (.ico) being 16x16 or 32x32
118 | # pixels large.
119 | #html_favicon = None
120 | 
121 | # Add any paths that contain custom static files (such as style sheets) here,
122 | # relative to this directory. They are copied after the builtin static files,
123 | # so a file named "default.css" will overwrite the builtin "default.css".
124 | html_static_path = ['_static']
125 | 
126 | # If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
127 | # using the given strftime format.
128 | #html_last_updated_fmt = '%b %d, %Y'
129 | 
130 | # If true, SmartyPants will be used to convert quotes and dashes to
131 | # typographically correct entities.
132 | #html_use_smartypants = True
133 | 
134 | # Custom sidebar templates, maps document names to template names.
135 | #html_sidebars = {}
136 | 
137 | # Additional templates that should be rendered to pages, maps page names to
138 | # template names.
139 | #html_additional_pages = {}
140 | 
141 | # If false, no module index is generated.
142 | #html_domain_indices = True
143 | 
144 | # If false, no index is generated.
145 | #html_use_index = True
146 | 
147 | # If true, the index is split into individual pages for each letter.
148 | #html_split_index = False
149 | 
150 | # If true, links to the reST sources are added to the pages.
151 | #html_show_sourcelink = True
152 | 
153 | # If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
154 | #html_show_sphinx = True
155 | 
156 | # If true, "(C) Copyright ..." is shown in the HTML footer. Default is True.
157 | #html_show_copyright = True
158 | 
159 | # If true, an OpenSearch description file will be output, and all pages will
160 | # contain a <link> tag referring to it.  The value of this option must be the
161 | # base URL from which the finished HTML is served.
162 | #html_use_opensearch = ''
163 | 
164 | # This is the file name suffix for HTML files (e.g. ".xhtml").
165 | #html_file_suffix = None
166 | 
167 | # Output file base name for HTML help builder.
168 | htmlhelp_basename = 'xml4hdoc'
169 | 
170 | 
171 | # -- Options for LaTeX output --------------------------------------------------
172 | 
173 | latex_elements = {
174 | # The paper size ('letterpaper' or 'a4paper').
175 | #'papersize': 'letterpaper',
176 | 
177 | # The font size ('10pt', '11pt' or '12pt').
178 | #'pointsize': '10pt',
179 | 
180 | # Additional stuff for the LaTeX preamble.
181 | #'preamble': '',
182 | }
183 | 
184 | # Grouping the document tree into LaTeX files. List of tuples
185 | # (source start file, target name, title, author, documentclass [howto/manual]).
186 | latex_documents = [
187 |   ('index', 'xml4h.tex', 'xml4h Documentation',
188 |    'James Murty', 'manual'),
189 | ]
190 | 
191 | # The name of an image file (relative to this directory) to place at the top of
192 | # the title page.
193 | #latex_logo = None
194 | 
195 | # For "manual" documents, if this is true, then toplevel headings are parts,
196 | # not chapters.
197 | #latex_use_parts = False
198 | 
199 | # If true, show page references after internal links.
200 | #latex_show_pagerefs = False
201 | 
202 | # If true, show URL addresses after external links.
203 | #latex_show_urls = False
204 | 
205 | # Documents to append as an appendix to all manuals.
206 | #latex_appendices = []
207 | 
208 | # If false, no module index is generated.
209 | #latex_domain_indices = True
210 | 
211 | 
212 | # -- Options for manual page output --------------------------------------------
213 | 
214 | # One entry per manual page. List of tuples
215 | # (source start file, name, description, authors, manual section).
216 | man_pages = [
217 |     ('index', 'xml4h', 'xml4h Documentation',
218 |      ['James Murty'], 1)
219 | ]
220 | 
221 | # If true, show URL addresses after external links.
222 | #man_show_urls = False
223 | 
224 | 
225 | # -- Options for Texinfo output ------------------------------------------------
226 | 
227 | # Grouping the document tree into Texinfo files. List of tuples
228 | # (source start file, target name, title, author,
229 | #  dir menu entry, description, category)
230 | texinfo_documents = [
231 |   ('index', 'xml4h', 'xml4h Documentation',
232 |    'James Murty', 'xml4h', 'One line description of project.',
233 |    'Miscellaneous'),
234 | ]
235 | 
236 | # Documents to append as an appendix to all manuals.
237 | #texinfo_appendices = []
238 | 
239 | # If false, no module index is generated.
240 | #texinfo_domain_indices = True
241 | 
242 | # How to display URL addresses: 'footnote', 'no', or 'inline'.
243 | #texinfo_show_urls = 'footnote'
244 | 


--------------------------------------------------------------------------------
/docs/index.rst:
--------------------------------------------------------------------------------
 1 | .. xml4h documentation master file, created by
 2 |    sphinx-quickstart on Thu Aug 30 22:29:54 2012.
 3 |    You can adapt this file completely to your liking, but it should at least
 4 |    contain the root `toctree` directive.
 5 | 
 6 | .. include:: ../README.rst
 7 | 
 8 | 
 9 | ==========
10 | User Guide
11 | ==========
12 | 
13 | .. toctree::
14 |    :maxdepth: 3
15 | 
16 |    parser
17 |    builder
18 |    writer
19 |    nodes
20 |    advanced
21 |    api
22 | 
23 | 
24 | ==================
25 | Indices and tables
26 | ==================
27 | 
28 | * :ref:`genindex`
29 | * :ref:`modindex`
30 | * :ref:`search`
31 | 
32 | 


--------------------------------------------------------------------------------
/docs/nodes.rst:
--------------------------------------------------------------------------------
  1 | =========
  2 | DOM Nodes
  3 | =========
  4 | 
  5 | *xml4h* provides node objects and convenience methods that make it easier to
  6 | work with an in-memory XML document object model (DOM).
  7 | 
  8 | This section of the document covers the main features of *xml4h* nodes.
  9 | For the full API-level documentation see :ref:`api-nodes`.
 10 | 
 11 | .. _node-traversal:
 12 | 
 13 | Traversing Nodes
 14 | ----------------
 15 | 
 16 | *xml4h* aims to provide a simple and intuitive API for traversing and
 17 | manipulating the XML DOM. To that end it includes a number of convenience
 18 | methods for performing common tasks:
 19 | 
 20 | - Get the :class:`~xml4h.nodes.Document` or root :class:`~xml4h.nodes.Element`
 21 |   from any node via the ``document`` and ``root`` attributes respectively.
 22 | - You can get the ``name`` attribute of nodes that have a name, or look up
 23 |   the different name components with ``prefix`` to get the namespace prefix
 24 |   (if any) and ``local_name`` to get the name portion without the prefix.
 25 | - Nodes that have a value expose it via the ``value`` attribute.
 26 | - A node's ``parent`` attribute returns its parent, while the ``ancestors``
 27 |   attribute returns a list containing its parent, grand-parent,
 28 |   great-grand-parent etc.
 29 | - A node's ``children`` attribute returns the child nodes that belong to it,
 30 |   while the ``siblings`` attribute returns all other nodes that belong to its
 31 |   parent. You can also get the ``siblings_before`` or ``siblings_after`` the
 32 |   current node.
 33 | - Look up a node's namespace URI with ``namespace_uri`` or the alias
 34 |   ``ns_uri``.
 35 | - Check what type of :class:`~xml4h.nodes.Node` you have with Boolean
 36 |   attributes like ``is_element``, ``is_text``, ``is_entity`` etc.
 37 | 
 38 | 
 39 | .. _magical-node-traversal:
 40 | 
 41 | "Magical" Node Traversal
 42 | ------------------------
 43 | 
 44 | To make it easy to traverse XML documents with a known structure *xml4h*
 45 | performs some minor magic when you look up attributes or keys on Document
 46 | and Element nodes.  If you like, you can take advantage of magical traversal
 47 | to avoid peppering your code with ``find`` and ``xpath`` searches, or with
 48 | ``child`` and ``children`` node attribute lookups.
 49 | 
 50 | The principle is simple:
 51 | 
 52 | - Child elements are available as Python attributes of the parent element
 53 |   class.
 54 | - XML element attributes are available as a Python dict in the owning element.
 55 | 
 56 | Here is an example of retrieving information from our Monty Python films
 57 | document using element names as Python attributes (``MontyPythonFilms``,
 58 | ``Film``, ``Title``) and XML attribute names as Python keys (``year``)::
 59 | 
 60 |     >>> # Parse an example XML document about Monty Python films
 61 |     >>> import xml4h
 62 |     >>> doc = xml4h.parse('tests/data/monty_python_films.xml')
 63 | 
 64 |     >>> for film in doc.MontyPythonFilms.Film:
 65 |     ...     print(film['year'] + ' : ' + film.Title.text)  # doctest:+ELLIPSIS
 66 |     1971 : And Now for Something Completely Different
 67 |     1974 : Monty Python and the Holy Grail
 68 |     ...
 69 | 
 70 | Python class attribute lookups of child elements work very well when your XML
 71 | document contains only camel-case tag names ``LikeThisOne`` or ``LikeThat``.
 72 | However, if your document contains lower-case tag names there is a chance the
 73 | element names will clash with existing Python attribute or method names in the
 74 | *xml4h* classes.
 75 | 
 76 | To work around this potential issue you can add an underscore (``_``)
 77 | character at the end of a magical attribute lookup to avoid the naming clash;
 78 | *xml4h* will remove that character before looking for a child element. For
 79 | example, to look up a child of the element ``elem1`` which is named ``child``,
 80 | the code ``elem1.child_`` will return the child element whereas ``elem1.child``
 81 | would access the :meth:`~xml4h.nodes.Node.child` Node method instead.
 82 | 
 83 | .. note::
 84 |    Not all XML child element tag names are accessible using magical traversal.
 85 |    Names with leading underscore characters will not work, and nor will names
 86 |    containing hyphens because they are not valid Python attribute names. If you
 87 |    have to deal with XML names like this use the full API methods like
 88 |    :meth:`~xml4h.nodes.Node.child` and :meth:`~xml4h.nodes.Node.children`
 89 |    instead.
 90 | 
 91 | All the gory details about how magical traversal works are documented at
 92 | :class:`~xml4h.nodes.NodeAttrAndChildElementLookupsMixin`.  Depending on how
 93 | you feel about magical behaviour this feature might feel like a great
 94 | convenience, or black magic that makes you wary. The right attitude probably
 95 | lies somewhere in the middle...
 96 | 
 97 | .. warning::
 98 |    The behaviour of namespaced XML elements and attributes is inconsistent.
 99 |    You can do magical traversal of elements regardless of what namespace the
100 |    elements are in, but to look up XML attributes with a namespace prefix
101 |    you must include that prefix in the name e.g. ``prefix:attribute-name``.
102 | 
103 | 
104 | Searching with Find and XPath
105 | -----------------------------
106 | 
107 | There are two ways to search for elements within an *xml4h* document: ``find``
108 | and ``xpath``.
109 | 
110 | The find methods provided by the library are easy to use but can only perform
111 | relatively simple searches that return :class:`~xml4h.nodes.Element` results,
112 | whereas you need to be familiar with XPath query syntax to search effectively
113 | with the ``xpath`` method but you can perform more complex searches and get
114 | results other than just elements.
115 | 
116 | Find Methods
117 | ............
118 | 
119 | *xml4h* provides three different find methods:
120 | 
121 | - :meth:`~xml4h.nodes.Node.find` searches descendants of the current node for
122 |   elements matching the given constraints. You can search by element name,
123 |   by namespace URI, or with no constraints at all::
124 | 
125 |       >>> # Find ALL elements in the document
126 |       >>> elems = doc.find()
127 |       >>> [e.name for e in elems]  # doctest:+ELLIPSIS
128 |       ['MontyPythonFilms', 'Film', 'Title', 'Description', 'Film', 'Title', 'Description',...
129 | 
130 |       >>> # Find the seven <Film> elements in the XML document
131 |       >>> film_elems = doc.find('Film')
132 |       >>> [e.Title.text for e in film_elems]  # doctest:+ELLIPSIS
133 |       ['And Now for Something Completely Different', 'Monty Python and the Holy Grail',...
134 | 
135 |   Note that the :meth:`~xml4h.nodes.Node.find` method only finds descendants
136 |   of the node you run it on::
137 | 
138 |       >>> # Find <Title> elements in a single <Film> element; there's only one
139 |       >>> film_elem = doc.find('Film', first_only=True)
140 |       >>> film_elem.find('Title')
141 |       [<xml4h.nodes.Element: "Title">]
142 | 
143 | - :meth:`~xml4h.nodes.Node.find_first` searches descendants of the current
144 |   node but only returns the first result element, not a list. If there are no
145 |   matching element results this method returns *None*::
146 | 
147 |       >>> # Find the first <Film> element in the document
148 |       >>> doc.find_first('Film')
149 |       <xml4h.nodes.Element: "Film">
150 | 
151 |       >>> # Search for an element that does not exist
152 |       >>> print(doc.find_first('OopsWrongName'))
153 |       None
154 | 
155 |   If you were paying attention you may have noticed in the example above that
156 |   you can make the :meth:`~xml4h.nodes.Node.find` method do exactly same thing
157 |   as :meth:`~xml4h.nodes.Node.find_first` by passing the keyword argument
158 |   ``first_only=True``.
159 | 
160 | - :meth:`~xml4h.nodes.Node.find_doc` is a convenience method that searches the
161 |   entire document no matter which node you run it on::
162 | 
163 |       >>> # Normal find only searches descendants of the current node
164 |       >>> len(film_elem.find('Title'))
165 |       1
166 | 
167 |       >>> # find_doc searches the entire document
168 |       >>> len(film_elem.find_doc('Title'))
169 |       7
170 | 
171 |   This method is exactly like calling ``xml4h_node.document.find()``, which is
172 |   actually what happens behind the scenes.
173 | 
174 | XPath Querying
175 | ..............
176 | 
177 | *xml4h* provides a single XPath search method which is available on
178 | :class:`~xml4h.nodes.Document` and :class:`~xml4h.nodes.Element` nodes:
179 | 
180 | :meth:`~xml4h.nodes.XPathMixin.xpath` takes an XPath query string and returns
181 | the result which may be a list of elements, a list of attributes, a list of
182 | values, or a single value. The result depends entirely on the kind of query you
183 | perform.
184 | 
185 | .. note::
186 |    XPath querying is currently only available if you use the *lxml* or
187 |    *ElementTree* implementation libraries. You can check whether the XPath
188 |    feature is available with :meth:`~xml4h.nodes.Node.has_feature`.
189 | 
190 | .. note::
191 |    Although *ElementTree* supports XPath queries, this support is
192 |    `very limited <http://effbot.org/zone/element-xpath.htm>`_ and most of the
193 |    example XPath queries below **will not work**. If you want to use XPath, you
194 |    should install *lxml* for better support.
195 | 
196 | XPath queries are powerful and complex so we cannot describe them in detail
197 | here, but we can at least present some useful examples. Here are queries that
198 | perform the same work as the find queries we saw above::
199 | 
200 |       >>> # Query for ALL elements in the document
201 |       >>> elems = doc.xpath('//*')  # doctest:+ELLIPSIS
202 |       >>> [e.name for e in elems]  # doctest:+ELLIPSIS
203 |       ['MontyPythonFilms', 'Film', 'Title', 'Description', 'Film', 'Title', 'Description',...
204 | 
205 |       >>> # Query for the seven <Film> elements in the XML document
206 |       >>> film_elems = doc.xpath('//Film')
207 |       >>> [e.Title.text for e in film_elems]  # doctest:+ELLIPSIS
208 |       ['And Now for Something Completely Different', 'Monty Python and the Holy Grail',...
209 | 
210 |       >>> # Query for the first <Film> element in the document (returns list)
211 |       >>> doc.xpath('//Film[1]')
212 |       [<xml4h.nodes.Element: "Film">]
213 | 
214 |       >>> # Query for <Title> elements in a single <Film> element; there's only one
215 |       >>> film_elem = doc.xpath('Film[1]')[0]
216 |       >>> film_elem.xpath('Title')
217 |       [<xml4h.nodes.Element: "Title">]
218 | 
219 | You can also do things with XPath queries that you simply cannot with the
220 | *find* method, such as find all the attributes of a certain name or apply
221 | rich constraints to the query::
222 | 
223 |       >>> # Query for all year attributes
224 |       >>> doc.xpath('//@year')
225 |       ['1971', '1974', '1979', '1982', '1983', '2009', '2012']
226 | 
227 |       >>> # Query for the title of the film released in 1982
228 |       >>> doc.xpath('//Film[@year="1982"]/Title/text()')
229 |       ['Monty Python Live at the Hollywood Bowl']
230 | 
231 | 
232 | Namespaces and XPath
233 | ....................
234 | 
235 | Finally, let's discuss how you can run XPath queries on documents with
236 | namespaces, because unfortunately this is not a simple subject.
237 | 
238 | First, you need to understand that if you are working with a namespaced
239 | document your XPath queries must refer to those namespaces or they will not
240 | find anything::
241 | 
242 |     >>> # Parse a namespaced version of the Monty Python Films doc
243 |     >>> ns_doc = xml4h.parse('tests/data/monty_python_films.ns.xml')
244 |     >>> print(ns_doc.xml())  #doctest:+ELLIPSIS
245 |     <?xml version="1.0" encoding="utf-8"?>
246 |     <MontyPythonFilms source="http://en.wikipedia.org/wiki/Monty_Python" xmlns="uri:monty-python" xmlns:work="uri:artistic-work">
247 |         <work:Film year="1971">
248 |             <Title>And Now for Something Completely Different</Title>
249 |             ...
250 | 
251 |     >>> # XPath queries without prefixes won't find namespaced elements
252 |     >>> ns_doc.xpath('//Film')
253 |     []
254 | 
255 | To refer to namespaced nodes in your query the namespace must have a prefix
256 | alias assigned to it. You can specify prefixes when you call the *xpath* method
257 | by providing a ``namespaces`` keyword argument with a dictionary of
258 | alias-to-URI mappings::
259 | 
260 |     >>> # Specify explicit prefix alias mappings
261 |     >>> films = ns_doc.xpath('//x:Film', namespaces={'x': 'uri:artistic-work'})
262 |     >>> len(films)
263 |     7
264 | 
265 | Or, preferably, if your document node already has prefix mappings you can use
266 | them directly::
267 | 
268 |     >>> # Our root node already has a 'work' prefix defined...
269 |     >>> ns_doc.root['xmlns:work']
270 |     'uri:artistic-work'
271 | 
272 |     >>> # ...so we can use this prefix directly
273 |     >>> films = ns_doc.root.xpath('//work:Film')
274 |     >>> len(films)
275 |     7
276 | 
277 | Another gotcha is when a document has a default namespace. The default
278 | namespace applies to every descendent node without its own namespace, but XPath
279 | doesn't have a good way of dealing with this since there is no such thing as
280 | a "default namespace" prefix alias.
281 | 
282 | *xml4h* helps out by providing just such an alias: the underscore (``_``)::
283 | 
284 |     >>> # Our document root has a default namespace
285 |     >>> ns_doc.root.ns_uri
286 |     'uri:monty-python'
287 | 
288 |     >>> # You need a prefix alias that refers to the default namespace
289 |     >>> ns_doc.xpath('//Title')
290 |     []
291 | 
292 |     >>> # You could specify it explicitly...
293 |     >>> titles = ns_doc.xpath('//x:Title',
294 |     ...                       namespaces={'x': ns_doc.root.ns_uri})
295 |     >>> len(titles)
296 |     7
297 | 
298 |     >>> # ...or use xml4h's special default namespace prefix: _
299 |     >>> titles = ns_doc.xpath('//_:Title')
300 |     >>> len(titles)
301 |     7
302 | 
303 | 
304 | Filtering Node Lists
305 | --------------------
306 | 
307 | Many *xml4h* node attributes return a list of nodes as a
308 | :class:`~xml4h.nodes.NodeList` object which confers some special filtering
309 | powers.  You get this special node list object from attributes like
310 | ``children``, ``ancestors``, and ``siblings``, and from the ``find`` search
311 | method if it has element results.
312 | 
313 | Here are some examples of how you can easily filter a
314 | :class:`~xml4h.nodes.NodeList` to get just the
315 | nodes you need:
316 | 
317 | - Get the first child node using the ``filter`` method::
318 | 
319 |       >>> # Filter to get just the first child
320 |       >>> doc.root.children.filter(first_only=True)
321 |       <xml4h.nodes.Element: "Film">
322 | 
323 |       >>> # The document has 7 <Film> element children of the root
324 |       >>> len(doc.root.children)
325 |       7
326 | 
327 | - Get the first child node by treating ``children`` as a callable::
328 | 
329 |       >>> doc.root.children(first_only=True)
330 |       <xml4h.nodes.Element: "Film">
331 | 
332 |   When you treat the node list as a callable it calls the ``filter`` method
333 |   behind the scenes, but since doing it the callable way is quicker and
334 |   clearer in code we will use that approach from now on.
335 | 
336 | - Get the first child node with the ``child`` filtering method, which accepts
337 |   the same constraints as the ``filter`` method::
338 | 
339 |       >>> doc.root.child()
340 |       <xml4h.nodes.Element: "Film">
341 | 
342 |       >>> # Apply filtering with child
343 |       >>> print(doc.root.child('WrongName'))
344 |       None
345 | 
346 | - Get the first of a set of children with the ``first`` attribute::
347 | 
348 |       >>> doc.root.children.first
349 |       <xml4h.nodes.Element: "Film">
350 | 
351 | 
352 | - Filter the node list by name::
353 | 
354 |       >>> for n in doc.root.children('Film'):
355 |       ...     print(n.Title.text)
356 |       And Now for Something Completely Different
357 |       Monty Python and the Holy Grail
358 |       Monty Python's Life of Brian
359 |       Monty Python Live at the Hollywood Bowl
360 |       Monty Python's The Meaning of Life
361 |       Monty Python: Almost the Truth (The Lawyer's Cut)
362 |       A Liar's Autobiography: Volume IV
363 | 
364 |       >>> len(doc.root.children('WrongName'))
365 |       0
366 | 
367 |   .. note::
368 |      Passing a node name as the first argument will match the *local* name of
369 |      a node. You can match the full node name, which might include a prefix
370 |      for example, with a call like: ``.children(name='SomeName')``.
371 | 
372 | - Filter with a custom function::
373 | 
374 |       >>> # Filter to films released in the year 1979
375 |       >>> for n in doc.root.children('Film',
376 |       ...         filter_fn=lambda node: node.attributes['year'] == '1979'):
377 |       ...     print(n.Title.text)
378 |       Monty Python's Life of Brian
379 | 
380 | 
381 | Manipulating Nodes and Elements
382 | -------------------------------
383 | 
384 | *xml4h* provides simple methods to manipulate the structure and content of an
385 | XML DOM. The methods available depend on the kind of node you are interacting
386 | with, and by far the majority are for working with
387 | :class:`~xml4h.nodes.Element` nodes.
388 | 
389 | 
390 | Delete a Node
391 | .............
392 | 
393 | Any node can be removes from its owner document with
394 | :meth:`~xml4h.nodes.Node.delete`::
395 | 
396 |     >>> # Before deleting a Film element there are 7 films
397 |     >>> len(doc.MontyPythonFilms.Film)
398 |     7
399 | 
400 |     >>> doc.MontyPythonFilms.children('Film')[-1].delete()
401 |     >>> len(doc.MontyPythonFilms.Film)
402 |     6
403 | 
404 | .. note::
405 |    By default deleting a node also destroys it, but it can optionally be left
406 |    intact after removal from the document by including the ``destroy=False``
407 |    option.
408 | 
409 | Name and Value Attributes
410 | .........................
411 | 
412 | Many nodes have low-level name and value properties that can be read from and
413 | written to.  Nodes with names and values include Text, CDATA, Comment,
414 | ProcessingInstruction, Attribute, and Element nodes.
415 | 
416 | Here is an example of accessing the low-level name and value properties of a
417 | Text node::
418 | 
419 |     >>> text_node = doc.MontyPythonFilms.child('Film').child('Title').child()
420 |     >>> text_node.is_text
421 |     True
422 | 
423 |     >>> text_node.name
424 |     '#text'
425 |     >>> text_node.value
426 |     'And Now for Something Completely Different'
427 | 
428 | And here is the same for an Attribute node::
429 | 
430 |     >>> # Access the name/value properties of an Attribute node
431 |     >>> year_attr = doc.MontyPythonFilms.child('Film').attribute_node('year')
432 |     >>> year_attr.is_attribute
433 |     True
434 | 
435 |     >>> year_attr.name
436 |     'year'
437 |     >>> year_attr.value
438 |     '1971'
439 | 
440 | The name attribute of a node is not necessarily a plain string, in the case of
441 | nodes within a defined namespaced the ``name`` attribute may comprise two
442 | components: a ``prefix`` that represents the namespace, and a ``local_name``
443 | which is the plain name of the node ignoring the namespace. For more
444 | information on namespaces see :ref:`xml4h-namespaces`.
445 | 
446 | Import a Node and its Descendants
447 | .................................
448 | 
449 | In addition to manipulating nodes in a single XML document directly, you can
450 | also import a node (and all its descendant) from another document using a node
451 | clone or transplant operation.
452 | 
453 | There are two ways to import a node and its descendants:
454 | 
455 | - Use the :meth:`~xml4h.nodes.Node.clone_node` Node method or
456 |   :meth:`~xml4h.builder.Builder.clone` Builder method to copy a node into your
457 |   document without removing it from its original document.
458 | - Use the :meth:`~xml4h.nodes.Node.transplant_node` Node method or
459 |   :meth:`~xml4h.builder.Builder.transplant` Builder method to transplant a node
460 |   into your document and remove it from its original document.
461 | 
462 | Here is an example of transplanting a node into a document (which also happens
463 | to undo the damage we did to our example DOM in the ``delete()`` example
464 | above)::
465 | 
466 |     >>> # Build a new document containing a Film element
467 |     >>> film_builder = (xml4h.build('DeletedFilm')
468 |     ...     .element('Film').attrs(year='1971')
469 |     ...         .element('Title')
470 |     ...             .text('And Now for Something Completely Different').up()
471 |     ...         .element('Description').text(
472 |     ...             "A collection of sketches from the first and second TV"
473 |     ...             " series of Monty Python's Flying Circus purposely"
474 |     ...             " re-enacted and shot for film.")
475 |     ...     )
476 | 
477 |     >>> # Transplant the Film element from the new document
478 |     >>> node_to_transplant = film_builder.root.child('Film')
479 |     >>> doc.MontyPythonFilms.transplant_node(node_to_transplant)
480 |     >>> len(doc.MontyPythonFilms.Film)
481 |     7
482 | 
483 | When you transplant a node from another document it is removed from that
484 | document::
485 | 
486 |     >>> # After transplanting the Film node it is no longer in the original doc
487 |     >>> len(film_builder.root.find('Film'))
488 |     0
489 | 
490 | If you need to leave the original document unchanged when importing a node use
491 | the clone methods instead.
492 | 
493 | Working with Elements
494 | .....................
495 | 
496 | Element nodes have the most methods to access and manipulate their content,
497 | which is fitting since this is the most useful type of node and you will deal
498 | with elements regularly.
499 | 
500 | The leaf elements in XML documents often have one or more
501 | :class:`~xml4h.nodes.Text` node children that contain the element's data
502 | content. While you could iterate over such text nodes as child nodes, *xml4h*
503 | provides the more convenient text accessors you would expect::
504 | 
505 |     >>> title_elem = doc.MontyPythonFilms.Film[0].Title
506 |     >>> orig_title = title_elem.text
507 |     >>> orig_title
508 |     'And Now for Something Completely Different'
509 | 
510 |     >>> title_elem.text = 'A new, and wrong, title'
511 |     >>> title_elem.text
512 |     'A new, and wrong, title'
513 | 
514 |     >>> # Let's put it back the way it was...
515 |     >>> title_elem.text = orig_title
516 | 
517 | Elements also have attributes that can be manipulated in a number of ways.
518 | 
519 | Look up an element's attributes with:
520 | 
521 | - the :meth:`~xml4h.nodes.Element.attributes` attribute (or aliases ``attrib``
522 |   and ``attrs``) that return an ordered dictionary of attribute names and
523 |   values::
524 | 
525 |       >>> film_elem = doc.MontyPythonFilms.Film[0]
526 |       >>> film_elem.attributes
527 |       <xml4h.nodes.AttributeDict: [('year', '1971')]>
528 | 
529 | - or by obtaining an element's attributes as :class:`~xml4h.nodes.Attribute`
530 |   nodes, though that is only likely to be useful in unusual circumstances::
531 | 
532 |       >>> film_elem.attribute_nodes
533 |       [<xml4h.nodes.Attribute: "year">]
534 | 
535 |       >>> # Get a specific attribute node by name or namespace URI
536 |       >>> film_elem.attribute_node('year')
537 |       <xml4h.nodes.Attribute: "year">
538 | 
539 | - and there's also the "magical" keyword lookup technique discussed in
540 |   :ref:`magical-node-traversal` for quickly grabbing attribute values.
541 | 
542 | Set attribute values with:
543 | 
544 | - the :meth:`~xml4h.nodes.Element.set_attributes` method, which allows you to
545 |   add attributes without replacing existing ones. This method also supports
546 |   defining XML attributes as a dictionary, list of name/value pairs, or
547 |   keyword arguments::
548 | 
549 |       >>> # Set/add attributes as a dictionary
550 |       >>> film_elem.set_attributes({'a1': 'v1'})
551 | 
552 |       >>> # Set/add attributes as a list of name/value pairs
553 |       >>> film_elem.set_attributes([('a2', 'v2')])
554 | 
555 |       >>> # Set/add attributes as keyword arguments
556 |       >>> film_elem.set_attributes(a3='v3', a4=4)
557 | 
558 |       >>> film_elem.attributes
559 |       <xml4h.nodes.AttributeDict: [('a1', 'v1'), ('a2', 'v2'), ('a3', 'v3'), ('a4', '4'), ('year', '1971')]>
560 | 
561 | - the setter version of the :attr:`~xml4h.nodes.Element.attributes` attribute,
562 |   which replaces any existing attributes with the new set::
563 | 
564 |       >>> film_elem.attributes = {'year': '1971', 'note': 'funny'}
565 |       >>> film_elem.attributes
566 |       <xml4h.nodes.AttributeDict: [('note', 'funny'), ('year', '1971')]>
567 | 
568 | Delete attributes from an element by:
569 | 
570 | - using Python's delete-in-dict technique::
571 | 
572 |       >>> del(film_elem.attributes['note'])
573 |       >>> film_elem.attributes
574 |       <xml4h.nodes.AttributeDict: [('year', '1971')]>
575 | 
576 | - or by calling the ``delete()`` method on an :class:`~xml4h.nodes.Attribute`
577 |   node.
578 | 
579 | Finally, the :class:`~xml4h.nodes.Element` class provides a number of methods
580 | for programmatically adding child nodes, for cases where you would rather work
581 | directly with nodes instead of using a :ref:`builder`.
582 | 
583 | The most complex of these methods is :meth:`~xml4h.nodes.Element.add_element`
584 | which allows you to add a named child element, and to optionally to set the new
585 | element's namespace, text content, and attributes all at the same time. Let's
586 | try an example::
587 | 
588 |     >>> # Add a Film element with an attribute
589 |     >>> new_film_elem = doc.MontyPythonFilms.add_element(
590 |     ...     'Film', attributes={'year': 'never'})
591 | 
592 |     >>> # Add a Description element with text content
593 |     >>> desc_elem = new_film_elem.add_element(
594 |     ...     'Description', text='Just testing...')
595 | 
596 |     >>> # Add a Title element with text *before* the description element
597 |     >>> title_elem = desc_elem.add_element(
598 |     ...     'Title', text='The Film that Never Was', before_this_element=True)
599 | 
600 |     >>> print(doc.MontyPythonFilms.Film[-1].xml())
601 |     <Film year="never">
602 |         <Title>The Film that Never Was</Title>
603 |         <Description>Just testing...</Description>
604 |     </Film>
605 | 
606 | There are similar methods for handling simpler cases like adding text nodes,
607 | comments etc. Here is an example of adding text nodes::
608 | 
609 |     >>> # Add a text node
610 |     >>> title_elem = doc.MontyPythonFilms.Film[-1].Title
611 |     >>> title_elem.add_text(', and Never Will Be')
612 | 
613 |     >>> title_elem.text
614 |     'The Film that Never Was, and Never Will Be'
615 | 
616 | Refer to the :class:`~xml4h.nodes.Element` documentation for more information
617 | about the other methods for adding nodes.
618 | 
619 | 
620 | .. _wrap-unwrap-nodes:
621 | 
622 | Wrapping and Unwrapping *xml4h* Nodes
623 | -------------------------------------
624 | 
625 | You can easily convert to or from *xml4h*'s wrapped version of an
626 | implementation node. For example, if you prefer the *lxml* library's
627 | `ElementMaker <http://lxml.de/tutorial.html#the-e-factory>`_ document builder
628 | approach to the :ref:`xml4h Builder <builder>`, you can create a document
629 | in *lxml*...
630 | 
631 | ::
632 | 
633 |     >>> from lxml.builder import ElementMaker
634 |     >>> E = ElementMaker()
635 |     >>> lxml_doc = E.DocRoot(
636 |     ...     E.Item(
637 |     ...         E.Name('Item 1'),
638 |     ...         E.Value('Value 1')
639 |     ...     ),
640 |     ...     E.Item(
641 |     ...         E.Name('Item 2'),
642 |     ...         E.Value('Value 2')
643 |     ...     )
644 |     ... )
645 |     >>> lxml_doc  # doctest:+ELLIPSIS
646 |     <Element DocRoot at ...
647 | 
648 | ...and then convert (or, more accurately, wrap) the *lxml* nodes with the
649 | appropriate adapter to make them *xml4h* versions::
650 | 
651 |     >>> # Convert lxml Document to xml4h version
652 |     >>> xml4h_doc = xml4h.LXMLAdapter.wrap_document(lxml_doc)
653 |     >>> xml4h_doc.children
654 |     [<xml4h.nodes.Element: "Item">, <xml4h.nodes.Element: "Item">]
655 | 
656 |     >>> # Get an element within the lxml document
657 |     >>> lxml_elem = list(lxml_doc)[0]
658 |     >>> lxml_elem  # doctest:+ELLIPSIS
659 |     <Element Item at ...
660 | 
661 |     >>> # Convert lxml Element to xml4h version
662 |     >>> xml4h_elem = xml4h.LXMLAdapter.wrap_node(lxml_elem, lxml_doc)
663 |     >>> xml4h_elem  # doctest:+ELLIPSIS
664 |     <xml4h.nodes.Element: "Item">
665 | 
666 | You can reach the underlying XML implementation document or node at any time
667 | from an *xml4h* node::
668 | 
669 |     >>> # Get an xml4h node's underlying implementation node
670 |     >>> xml4h_elem.impl_node  # doctest:+ELLIPSIS
671 |     <Element Item at ...
672 |     >>> xml4h_elem.impl_node == lxml_elem
673 |     True
674 | 
675 |     >>> # Get the underlying implementatation document from any node
676 |     >>> xml4h_elem.impl_document  # doctest:+ELLIPSIS
677 |     <Element DocRoot at ...
678 |     >>> xml4h_elem.impl_document == lxml_doc
679 |     True
680 | 
681 | 


--------------------------------------------------------------------------------
/docs/parser.rst:
--------------------------------------------------------------------------------
 1 | ======
 2 | Parser
 3 | ======
 4 | 
 5 | The *xml4h* parser is a simple wrapper around the parser provided by an
 6 | underlying :ref:`XML library implementation <xml-lib-adapters>`.
 7 | 
 8 | .. _parser-parse:
 9 | 
10 | Parse function
11 | --------------
12 | 
13 | To parse XML documents with *xml4h* you feed the :func:`xml4h.parse` function
14 | an XML text document in one of three forms:
15 | 
16 | - A file-like object::
17 | 
18 |     >>> import xml4h
19 | 
20 |     >>> xml_file = open('tests/data/monty_python_films.xml', 'rb')
21 |     >>> doc = xml4h.parse(xml_file)
22 | 
23 |     >>> doc.MontyPythonFilms
24 |     <xml4h.nodes.Element: "MontyPythonFilms">
25 | 
26 | - A file path string::
27 | 
28 |     >>> doc = xml4h.parse('tests/data/monty_python_films.xml')
29 | 
30 |     >>> doc.root['source']
31 |     'http://en.wikipedia.org/wiki/Monty_Python'
32 | 
33 | - A string containing literal XML content::
34 | 
35 |     >>> xml_file = open('tests/data/monty_python_films.xml', 'rb')
36 |     >>> xml_text = xml_file.read()
37 |     >>> doc = xml4h.parse(xml_text)
38 | 
39 |     >>> len(doc.find('Film'))
40 |     7
41 | 
42 | .. note:: The :func:`~xml4h.parse` method distinguishes between a file path
43 |           string and an XML text string by looking for a ``<`` character
44 |           in the value.
45 | 
46 | 
47 | Stripping of Whitespace Nodes
48 | -----------------------------
49 | 
50 | By default the *parse* method ignores whitespace nodes in the XML document
51 | -- or more accurately, it does extra work to remove these nodes after the
52 | document has been parsed by the underlying XML library.
53 | 
54 | Whitespace nodes are rarely interesting, since they are usually the result of
55 | XML content that has been serialized with extra whitespace to make it more
56 | readable to humans.
57 | 
58 | However if you need to keep these nodes, or if you want to avoid the extra
59 | processing overhead when parsing large documents, you can disable this
60 | feature by passing in the ``ignore_whitespace_text_nodes=False`` flag::
61 | 
62 |     >>> # Strip whitespace nodes from document
63 |     >>> doc = xml4h.parse('tests/data/monty_python_films.xml')
64 | 
65 |     >>> # No excess text nodes (XML doc lists 7 films)
66 |     >>> len(doc.MontyPythonFilms.children)
67 |     7
68 |     >>> doc.MontyPythonFilms.children[0]
69 |     <xml4h.nodes.Element: "Film">
70 | 
71 | 
72 |     >>> # Don't strip whitespace nodes
73 |     >>> doc = xml4h.parse('tests/data/monty_python_films.xml',
74 |     ...                   ignore_whitespace_text_nodes=False)
75 | 
76 |     >>> # An extra text node is present
77 |     >>> len(doc.MontyPythonFilms.children)
78 |     8
79 |     >>> doc.MontyPythonFilms.children[0]
80 |     <xml4h.nodes.Text: "#text">
81 | 


--------------------------------------------------------------------------------
/docs/writer.rst:
--------------------------------------------------------------------------------
  1 | ======
  2 | Writer
  3 | ======
  4 | 
  5 | The *xml4h* writer produces serialized XML text documents formatted more
  6 | traditionally – and in our opinion more correctly – than the other Python XML
  7 | libraries.
  8 | 
  9 | .. _writer-write-methods:
 10 | 
 11 | Write methods
 12 | -------------
 13 | 
 14 | To write out an XML document with *xml4h* you will generally use the
 15 | :meth:`~xml4h.nodes.Node.write` or :meth:`~xml4h.nodes.Node.write_doc` methods
 16 | available on any *xml4h* node.
 17 | 
 18 | The writer methods require a file or any IO stream object as the first
 19 | argument, and will automatically handle text or binary IO streams.
 20 | 
 21 | The :meth:`~xml4h.nodes.Node.write` method outputs the current node and any
 22 | descendants::
 23 | 
 24 |     >>> import xml4h
 25 |     >>> doc = xml4h.parse('tests/data/monty_python_films.xml')
 26 |     >>> first_film_elem = doc.find('Film')[0]
 27 | 
 28 |     >>> # Write XML node to stdout
 29 |     >>> import sys
 30 |     >>> first_film_elem.write(sys.stdout, indent=True)  # doctest:+ELLIPSIS
 31 |     <Film year="1971">
 32 |         <Title>And Now for Something Completely Different</Title>
 33 |         <Description>A collection of sketches from the first and second...
 34 |     </Film>
 35 | 
 36 | The :meth:`~xml4h.nodes.Node.write_doc` method outputs the entire document no
 37 | matter which node you call it on::
 38 | 
 39 |     >>> first_film_elem.write_doc(sys.stdout, indent=True)  # doctest:+ELLIPSIS
 40 |     <?xml version="1.0" encoding="utf-8"?>
 41 |     <MontyPythonFilms source="http://en.wikipedia.org/wiki/Monty_Python">
 42 |         <Film year="1971">
 43 |             <Title>And Now for Something Completely Different</Title>
 44 |             <Description>A collection of sketches from the first and second...
 45 |         </Film>
 46 |      ...
 47 | 
 48 | To send output to a file::
 49 | 
 50 |     >>> # Write to a file
 51 |     >>> with open('/tmp/example.xml', 'wb') as f:
 52 |     ...     first_film_elem.write_doc(f)
 53 | 
 54 | .. _writer-xml-methods:
 55 | 
 56 | Get XML as a string
 57 | -------------------
 58 | 
 59 | Because you will often want to generate a string of XML content directly,
 60 | *xml4h* includes the convenience methods :meth:`~xml4h.nodes.Node.xml`
 61 | and :meth:`~xml4h.nodes.Node.xml_doc` to do this easily.
 62 | 
 63 | The :meth:`~xml4h.nodes.Node.xml` method works like the *write* method and
 64 | will return a string of XML content including the current node and its
 65 | descendants::
 66 | 
 67 |     >>> print(first_film_elem.xml())  # doctest:+ELLIPSIS
 68 |     <Film year="1971">
 69 |         <Title>And Now for Something Completely...
 70 | 
 71 | The :meth:`~xml4h.nodes.Node.xml_doc` method works like the *write_doc*
 72 | method and returns a string for the whole document::
 73 | 
 74 |     >>> print(first_film_elem.xml_doc())  # doctest:+ELLIPSIS
 75 |     <?xml version="1.0" encoding="utf-8"?>
 76 |     <MontyPythonFilms source="http://en.wikipedia.org/wiki/Monty_Python">
 77 |         <Film year="1971">
 78 |             <Title>And Now for Something Completely Different</Title>
 79 |             <Description>A collection of sketches from the first and second...
 80 |         </Film>
 81 |         ...
 82 | 
 83 | .. note::
 84 |    *xml4h* assumes that when you directly generate an XML string with these
 85 |    methods it is intended for human consumption, so it applies pretty-print
 86 |    formatting by default.
 87 | 
 88 | 
 89 | .. _writer-formatting:
 90 | 
 91 | Format Output
 92 | -------------
 93 | 
 94 | The *write* and *xml* methods accept a range of formatting options to control
 95 | how XML content is serialized. These are useful if you expect a human to read
 96 | the resulting data.
 97 | 
 98 | For the full range of formatting options see the code documentation for
 99 | :meth:`~xml4h.nodes.Node.write` and :meth:`~xml4h.nodes.Node.xml` et al.
100 | but here are some pointers to get you started:
101 | 
102 | - Set ``indent=True`` to write a pretty-printed XML document with four space
103 |   characters for indentation and ``\n`` for newlines.
104 | - To use a tab character for indenting and ``\r\n`` for indents:
105 |   ``indent='\t', newline='\r\n'``.
106 | - *xml4h* writes *utf-8*-encoded documents by default, to write with a
107 |   different encoding: ``encoding='iso-8859-1'``.
108 | - To avoid outputting the XML declaration when writing a document:
109 |   ``omit_declaration=True``.
110 | 
111 | 
112 | Write using the underlying implementation
113 | -----------------------------------------
114 | 
115 | Because *xml4h* sits on top of an underlying
116 | :ref:`XML library implementation <xml-lib-adapters>` you can use that
117 | library's serialization methods if you prefer, and if you don't mind having
118 | some implementation-specific code.
119 | 
120 | For example, if you are using *lxml* as the underlying library you can use
121 | its serialisation methods by accessing the implementation node::
122 | 
123 |     >>> # Get the implementation root node, in this case an lxml node
124 |     >>> lxml_root_node = first_film_elem.root.impl_node
125 |     >>> type(lxml_root_node)  # doctest:+ELLIPSIS
126 |     <... 'lxml.etree._Element'>
127 | 
128 |     >>> # Use lxml features as normal; xml4h is no longer in the picture
129 |     >>> from lxml import etree
130 |     >>> xml_bytes = etree.tostring(
131 |     ...     lxml_root_node, encoding='utf-8', xml_declaration=True, pretty_print=True)
132 |     >>> print(xml_bytes.decode('utf-8'))  # doctest:+ELLIPSIS
133 |     <?xml version='1.0' encoding='utf-8'?>
134 |     <MontyPythonFilms source="http://en.wikipedia.org/wiki/Monty_Python"><Film year="1971"><Title>And Now for Something Completely Different</Title>
135 |             <Description>A collection of sketches from the first and second...
136 |         </Film>
137 |         <Film year="1974"><Title>Monty Python and the Holy Grail</Title>
138 |             <Description>King Arthur and his knights embark on a low-budget...
139 |         </Film>
140 |         ...
141 | 
142 | .. note::
143 |    The output from *lxml* is a little quirky, at least on the author's machine.
144 |    Note for example the single-quote characters in the XML declaration, and
145 |    the missing newline and indent before the first ``<Film>`` element. But
146 |    don't worry, that's why you have *xml4h* ;)
147 | 


--------------------------------------------------------------------------------
/requirements-dev.txt:
--------------------------------------------------------------------------------
1 | # Nose for running tests
2 | six
3 | nose
4 | coverage
5 | tox
6 | sphinx
7 | 


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | # -*- coding: utf-8 -*-
 3 | 
 4 | import xml4h
 5 | 
 6 | try:
 7 |     from setuptools import setup
 8 | except ImportError:
 9 |     from distutils.core import setup
10 | 
11 | setup(
12 |     name=xml4h.__title__,
13 |     version=xml4h.__version__,
14 |     description='XML for Humans in Python',
15 |     long_description=open('README.rst').read(),
16 |     long_description_content_type='text/x-rst',
17 |     author='James Murty',
18 |     author_email='james@murty.co',
19 |     url='https://github.com/jmurty/xml4h',
20 |     packages=[
21 |         'xml4h',
22 |         'xml4h.impls',
23 |     ],
24 |     package_dir={'xml4h': 'xml4h'},
25 |     package_data={'': ['README.rst', 'LICENSE']},
26 |     include_package_data=True,
27 |     install_requires=[
28 |         'six',
29 |     ],
30 |     license='MIT License',
31 |     # http://pypi.python.org/pypi?%3Aaction=list_classifiers
32 |     classifiers=[
33 |         'Development Status :: 4 - Beta',
34 |         'Intended Audience :: Developers',
35 |         'Topic :: Text Processing :: Markup :: XML',
36 |         'Natural Language :: English',
37 |         'License :: OSI Approved :: MIT License',
38 |         'Programming Language :: Python',
39 |         'Programming Language :: Python :: 2.7',
40 |         'Programming Language :: Python :: 3.5',
41 |         'Programming Language :: Python :: 3.6',
42 |         'Programming Language :: Python :: 3.7',
43 |         'Programming Language :: Python :: 3.8',
44 |     ],
45 |     test_suite='tests',
46 | )
47 | 


--------------------------------------------------------------------------------
/tests/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jmurty/xml4h/83bc0a91afe5d6e17d6c99ec43dc0aec9593cc06/tests/__init__.py


--------------------------------------------------------------------------------
/tests/data/example_doc.small.xml:
--------------------------------------------------------------------------------
 1 | <?xml version="1.0" encoding="utf-8"?>
 2 | <DocRoot xmlns="urn:default" xmlns:myns="urn:custom">
 3 |     <NSDefaultImplicit/>
 4 |     <NSDefaultExplicit xmlns="urn:default"/>
 5 |     <NSCustomExplicit xmlns="urn:custom"/>
 6 |     <myns:NSCustomWithPrefixImplicit/>
 7 |     <myns:NSCustomWithPrefixExplicit xmlns="urn:custom"/>
 8 |     <Attrs1 default-ns-implicit="1"/>
 9 |     <Attrs2 myns:custom-ns-prefix-explicit="1"/>
10 | </DocRoot>
11 | 


--------------------------------------------------------------------------------
/tests/data/example_doc.unicode.xml:
--------------------------------------------------------------------------------
 1 | <?xml version="1.0" encoding="utf-8"?>
 2 | <جذر xmlns="urn:default" xmlns:důl="urn:custom">
 3 |     <NSDefaultImplicit/>
 4 |     <NSDefaultExplicit xmlns="urn:default"/>
 5 |     <NSCustomExplicit xmlns="urn:custom"/>
 6 |     <důl:NSCustomWithPrefixImplicit/>
 7 |     <důl:NSCustomWithPrefixExplicit xmlns="urn:custom"/>
 8 |     <yếutố1 תכונה="1"/>
 9 |     <yếutố2 důl:עודתכונה="tvö"/>
10 | </جذر>
11 | 


--------------------------------------------------------------------------------
/tests/data/monty_python_films.ns.xml:
--------------------------------------------------------------------------------
 1 | <?xml version="1.0" encoding="utf-8"?>
 2 | <MontyPythonFilms source="http://en.wikipedia.org/wiki/Monty_Python"
 3 |                   xmlns="uri:monty-python" xmlns:work="uri:artistic-work">
 4 |     <work:Film year="1971">
 5 |         <Title>And Now for Something Completely Different</Title>
 6 |         <Description>A collection of sketches from the first and second TV series of Monty Python's Flying Circus purposely re-enacted and shot for film.</Description>
 7 |     </work:Film>
 8 |     <work:Film year="1974">
 9 |         <Title>Monty Python and the Holy Grail</Title>
10 |         <Description>King Arthur and his knights embark on a low-budget search for the Holy Grail, encountering humorous obstacles along the way. Some of these turned into standalone sketches.</Description>
11 |     </work:Film>
12 |     <work:Film year="1979">
13 |         <Title>Monty Python's Life of Brian</Title>
14 |         <Description>Brian is born on the first Christmas, in the stable next to Jesus'. He spends his life being mistaken for a messiah.</Description>
15 |     </work:Film>
16 |     <work:Film year="1982">
17 |         <Title>Monty Python Live at the Hollywood Bowl</Title>
18 |         <Description>A videotape recording directed by Ian MacNaughton of a live performance of sketches. Originally intended for a TV/video special. Transferred to 35mm and given a limited theatrical release in the US.</Description>
19 |     </work:Film>
20 |     <work:Film year="1983">
21 |         <Title>Monty Python's The Meaning of Life</Title>
22 |         <Description>An examination of the meaning of life in a series of sketches from conception to death and beyond.</Description>
23 |     </work:Film>
24 |     <work:Film year="2009">
25 |         <Title>Monty Python: Almost the Truth (The Lawyer's Cut)</Title>
26 |         <Description>This film features interviews with all the surviving Python members, along with archive representation for the late Graham Chapman.</Description>
27 |     </work:Film>
28 |     <work:Film year="2012">
29 |         <Title>A Liar's Autobiography: Volume IV</Title>
30 |         <Description>This is an animated film which is based on the memoir of the late Monty Python member, Graham Chapman.</Description>
31 |     </work:Film>
32 | </MontyPythonFilms>
33 | 


--------------------------------------------------------------------------------
/tests/data/monty_python_films.xml:
--------------------------------------------------------------------------------
 1 | <?xml version="1.0" encoding="utf-8"?>
 2 | <MontyPythonFilms source="http://en.wikipedia.org/wiki/Monty_Python">
 3 |     <Film year="1971">
 4 |         <Title>And Now for Something Completely Different</Title>
 5 |         <Description>A collection of sketches from the first and second TV series of Monty Python's Flying Circus purposely re-enacted and shot for film.</Description>
 6 |     </Film>
 7 |     <Film year="1974">
 8 |         <Title>Monty Python and the Holy Grail</Title>
 9 |         <Description>King Arthur and his knights embark on a low-budget search for the Holy Grail, encountering humorous obstacles along the way. Some of these turned into standalone sketches.</Description>
10 |     </Film>
11 |     <Film year="1979">
12 |         <Title>Monty Python's Life of Brian</Title>
13 |         <Description>Brian is born on the first Christmas, in the stable next to Jesus'. He spends his life being mistaken for a messiah.</Description>
14 |     </Film>
15 |     <Film year="1982">
16 |         <Title>Monty Python Live at the Hollywood Bowl</Title>
17 |         <Description>A videotape recording directed by Ian MacNaughton of a live performance of sketches. Originally intended for a TV/video special. Transferred to 35mm and given a limited theatrical release in the US.</Description>
18 |     </Film>
19 |     <Film year="1983">
20 |         <Title>Monty Python's The Meaning of Life</Title>
21 |         <Description>An examination of the meaning of life in a series of sketches from conception to death and beyond.</Description>
22 |     </Film>
23 |     <Film year="2009">
24 |         <Title>Monty Python: Almost the Truth (The Lawyer's Cut)</Title>
25 |         <Description>This film features interviews with all the surviving Python members, along with archive representation for the late Graham Chapman.</Description>
26 |     </Film>
27 |     <Film year="2012">
28 |         <Title>A Liar's Autobiography: Volume IV</Title>
29 |         <Description>This is an animated film which is based on the memoir of the late Monty Python member, Graham Chapman.</Description>
30 |     </Film>
31 | </MontyPythonFilms>
32 | 


--------------------------------------------------------------------------------
/tests/test_parser.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | import six
  3 | import unittest
  4 | import os
  5 | import re
  6 | 
  7 | import xml4h
  8 | 
  9 | 
 10 | class TestParserBasics(unittest.TestCase):
 11 | 
 12 |     @property
 13 |     def small_xml_file_path(self):
 14 |         return os.path.join(
 15 |             os.path.dirname(__file__), 'data/example_doc.small.xml')
 16 | 
 17 |     def test_parse_with_default_parser(self):
 18 |         # Explicit use of default/best adapter
 19 |         dom = xml4h.parse(self.small_xml_file_path, adapter=xml4h.best_adapter)
 20 |         self.assertEqual(8, len(dom.find()))
 21 |         # Implicit use of default/best adapter
 22 |         dom = xml4h.parse(self.small_xml_file_path)
 23 |         self.assertEqual(8, len(dom.find()))
 24 |         self.assertEqual(xml4h.best_adapter, dom.adapter_class)
 25 | 
 26 | 
 27 | class BaseParserTest(object):
 28 |     """
 29 |     Tests to exercise parsing across all xml4h implementations.
 30 |     """
 31 | 
 32 |     @property
 33 |     def small_xml_file_path(self):
 34 |         return os.path.join(
 35 |             os.path.dirname(__file__), 'data/example_doc.small.xml')
 36 | 
 37 |     @property
 38 |     def unicode_xml_file_path(self):
 39 |         return os.path.join(
 40 |             os.path.dirname(__file__), 'data/example_doc.unicode.xml')
 41 | 
 42 |     def parse(self, xml_str):
 43 |         return xml4h.parse(xml_str, adapter=self.adapter)
 44 | 
 45 |     def test_auto_detect_filename_or_xml_data(self):
 46 |         # String with a '<' is parsed as literal XML data
 47 |         dom = self.parse('\n\n\t<MyDoc><Elem1>content</Elem1></MyDoc>')
 48 |         self.assertEqual(2, len(dom.find()))
 49 |         # String without a '<' is treated as a file path -- invalid path
 50 |         self.assertRaises(IOError, self.parse, 'not/a/real/file/path')
 51 |         # String without a '<' is treated as a file path -- valid path
 52 |         self.parse(self.small_xml_file_path)
 53 | 
 54 |     def test_parse_file(self):
 55 |         wrapped_doc = self.parse(self.small_xml_file_path)
 56 |         self.assertIsInstance(wrapped_doc, xml4h.nodes.Document)
 57 |         self.assertEqual(8, len(wrapped_doc.find()))
 58 |         # Check element namespaces
 59 |         self.assertEqual(
 60 |             ['DocRoot', 'NSDefaultImplicit', 'NSDefaultExplicit',
 61 |              'Attrs1', 'Attrs2'],
 62 |             [n.name for n in wrapped_doc.find(ns_uri='urn:default')])
 63 |         self.assertEqual(
 64 |             ['urn:custom', 'urn:custom', 'urn:custom'],
 65 |             [n.namespace_uri for n in wrapped_doc.find(ns_uri='urn:custom')])
 66 |         # We test local name, not full name, here as different XML libraries
 67 |         # retain (or not) different literal element prefixes differently.
 68 |         self.assertEqual(
 69 |             ['NSCustomExplicit',
 70 |              'NSCustomWithPrefixImplicit',
 71 |              'NSCustomWithPrefixExplicit'],
 72 |             [n.local_name for n in wrapped_doc.find(ns_uri='urn:custom')])
 73 |         # Check namespace attributes
 74 |         self.assertEqual(
 75 |             [xml4h.nodes.Node.XMLNS_URI, xml4h.nodes.Node.XMLNS_URI],
 76 |             [n.namespace_uri for n in wrapped_doc.root.attribute_nodes])
 77 |         attrs1_elem = wrapped_doc.find_first('Attrs1')
 78 |         self.assertNotEqual(None, attrs1_elem)
 79 |         self.assertEqual([None],
 80 |             [n.namespace_uri for n in attrs1_elem.attribute_nodes])
 81 |         attrs2_elem = wrapped_doc.find_first('Attrs2')
 82 |         self.assertEqual(['urn:custom'],
 83 |             [n.namespace_uri for n in attrs2_elem.attribute_nodes])
 84 | 
 85 |     def test_roundtrip(self):
 86 |         orig_xml = open(self.small_xml_file_path).read()
 87 |         # We discard semantically unnecessary namespace prefixes on
 88 |         # element names.
 89 |         orig_xml = re.sub(
 90 |             '<myns:NSCustomWithPrefixExplicit xmlns="urn:custom"/>',
 91 |             '<NSCustomWithPrefixExplicit xmlns="urn:custom"/>', orig_xml)
 92 |         if self.adapter == xml4h.LXMLAdapter:
 93 |             # lxml parser does not make it possible to retain semantically
 94 |             # unnecessary 'xmlns' namespace definitions in all elements.
 95 |             # It's not worth failing the roundtrip test just for this
 96 |             orig_xml = re.sub(
 97 |                 '<NSDefaultExplicit xmlns="urn:default"/>',
 98 |                 '<NSDefaultExplicit/>', orig_xml)
 99 |         doc = self.parse(self.small_xml_file_path)
100 |         roundtrip_xml = doc.xml_doc()
101 |         self.assertEqual(six.text_type(orig_xml), roundtrip_xml)
102 | 
103 |     def test_unicode(self):
104 |         # NOTE lxml doesn't support unicode namespace URIs?
105 |         doc = self.parse(self.unicode_xml_file_path)
106 |         self.assertEqual(u'جذر', doc.root.name)
107 |         self.assertEqual(u'urn:default', doc.root.attributes['xmlns'])
108 |         self.assertEqual(u'urn:custom', doc.root.attributes[u'xmlns:důl'])
109 |         self.assertEqual(5, len(doc.find(ns_uri=u'urn:default')))
110 |         self.assertEqual(3, len(doc.find(ns_uri=u'urn:custom')))
111 |         self.assertEqual(u'1', doc.find_first(u'yếutố1').attributes[u'תכונה'])
112 |         self.assertEqual(u'tvö',
113 |             doc.find_first(u'yếutố2').attributes[u'důl:עודתכונה'])
114 | 
115 | 
116 | class TestXmlDomParser(unittest.TestCase, BaseParserTest):
117 | 
118 |     @property
119 |     def adapter(self):
120 |         return xml4h.XmlDomImplAdapter
121 | 
122 | 
123 | class TestLXMLEtreeParser(unittest.TestCase, BaseParserTest):
124 | 
125 |     @property
126 |     def adapter(self):
127 |         if not xml4h.LXMLAdapter.is_available():
128 |             self.skipTest("lxml library is not installed")
129 |         return xml4h.LXMLAdapter
130 | 
131 | 
132 | class TestElementTreeEtreeParser(unittest.TestCase, BaseParserTest):
133 | 
134 |     @property
135 |     def adapter(self):
136 |         if not xml4h.ElementTreeAdapter.is_available():
137 |             self.skipTest(
138 |                 "ElementTree library is not installed or is outdated")
139 |         return xml4h.ElementTreeAdapter
140 | 
141 | 
142 | class TestcElementTreeEtreeParser(unittest.TestCase, BaseParserTest):
143 | 
144 |     @property
145 |     def adapter(self):
146 |         if not xml4h.cElementTreeAdapter.is_available():
147 |             self.skipTest(
148 |                 "cElementTree library is not installed or is outdated")
149 |         return xml4h.cElementTreeAdapter
150 | 


--------------------------------------------------------------------------------
/tests/test_writer.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | import unittest
  3 | import functools
  4 | import six
  5 | 
  6 | import xml4h
  7 | 
  8 | 
  9 | class BaseWriterTest(object):
 10 | 
 11 |     @property
 12 |     def my_builder(self):
 13 |         return functools.partial(xml4h.build, adapter=self.adapter)
 14 | 
 15 |     def setUp(self):
 16 |         # Create test document
 17 |         self.builder = (
 18 |             self.my_builder('DocRoot')
 19 |                 .element('Elem1').text(u'默认جذ').up()
 20 |                 .element('Elem2'))
 21 |         # Handy IO writer
 22 |         self.iobytes = six.BytesIO()
 23 | 
 24 |     def test_write_defaults(self):
 25 |         """ Default write output is utf-8 with no pretty-printing """
 26 |         xml = (
 27 |             u'<?xml version="1.0" encoding="utf-8"?>'
 28 |             u'<DocRoot>'
 29 |             u'<Elem1>默认جذ</Elem1>'
 30 |             u'<Elem2/>'
 31 |             u'</DocRoot>'
 32 |             )
 33 |         io_string = six.StringIO()
 34 |         self.builder.write_doc(io_string)
 35 |         if six.PY2:
 36 |             self.assertEqual(xml.encode('utf-8'), io_string.getvalue())
 37 |         else:
 38 |             self.assertEqual(xml, io_string.getvalue())
 39 | 
 40 |     def test_write_current_node_and_descendents(self):
 41 |         self.builder.dom_element.write(self.iobytes)
 42 |         self.assertEqual(b'<Elem2/>', self.iobytes.getvalue())
 43 | 
 44 |     def test_write_utf8_by_default(self):
 45 |         # Default write output is utf-8, with no pretty-printing
 46 |         xml = (
 47 |             u'<?xml version="1.0" encoding="utf-8"?>'
 48 |             u'<DocRoot>'
 49 |             u'<Elem1>默认جذ</Elem1>'
 50 |             u'<Elem2/>'
 51 |             u'</DocRoot>'
 52 |             )
 53 |         self.builder.dom_element.write_doc(self.iobytes)
 54 |         self.assertEqual(xml.encode('utf-8'), self.iobytes.getvalue())
 55 | 
 56 |     def test_write_utf16(self):
 57 |         xml = (
 58 |             u'<?xml version="1.0" encoding="utf-16"?>'
 59 |             u'<DocRoot>'
 60 |             u'<Elem1>默认جذ</Elem1>'
 61 |             u'<Elem2/>'
 62 |             u'</DocRoot>'
 63 |             )
 64 |         self.builder.dom_element.write_doc(self.iobytes, encoding='utf-16')
 65 |         self.assertEqual(xml.encode('utf-16'), self.iobytes.getvalue())
 66 | 
 67 |     def test_write_latin1_with_illegal_characters(self):
 68 |         self.assertRaises(UnicodeEncodeError,
 69 |             self.builder.dom_element.write_doc,
 70 |                 self.iobytes, encoding='latin1', indent=2)
 71 | 
 72 |     def test_write_latin1(self):
 73 |         # Create latin1-friendly test document
 74 |         self.builder = (
 75 |             self.my_builder('DocRoot')
 76 |                 .element('Elem1').text(u'Tést çæsè').up()
 77 |                 .element('Elem2'))
 78 |         self.builder.dom_element.write_doc(self.iobytes, encoding='latin1')
 79 |         self.assertEqual(
 80 |             u'<?xml version="1.0" encoding="latin1"?>'
 81 |             u'<DocRoot>'
 82 |             u'<Elem1>Tést çæsè</Elem1>'
 83 |             u'<Elem2/>'
 84 |             u'</DocRoot>'.encode('latin1'),
 85 |             self.iobytes.getvalue())
 86 | 
 87 |     def test_with_no_encoding(self):
 88 |         """No encoding writes python unicode"""
 89 |         xml = (
 90 |             u'<?xml version="1.0"?>'
 91 |             u'<DocRoot>'
 92 |             u'<Elem1>默认جذ</Elem1>'
 93 |             u'<Elem2/>'
 94 |             u'</DocRoot>'
 95 |             )
 96 |         io_string = six.StringIO()
 97 |         self.builder.dom_element.write_doc(io_string, encoding=None)
 98 |         # NOTE Exact test, no encoding of comparison XML doc string
 99 |         self.assertEqual(xml, io_string.getvalue())
100 | 
101 |     def test_omit_declaration(self):
102 |         self.builder.dom_element.write_doc(self.iobytes,
103 |                 omit_declaration=True)
104 |         self.assertEqual(
105 |             u'<DocRoot>'
106 |             u'<Elem1>默认جذ</Elem1>'
107 |             u'<Elem2/>'
108 |             u'</DocRoot>'.encode('utf-8'),
109 |             self.iobytes.getvalue())
110 | 
111 |     def test_default_indent_and_newline(self):
112 |         """Default indent of 4 spaces with newlines when indent=True"""
113 |         self.builder.dom_element.write_doc(self.iobytes, indent=True)
114 |         self.assertEqual(
115 |             u'<?xml version="1.0" encoding="utf-8"?>\n'
116 |             u'<DocRoot>\n'
117 |             u'    <Elem1>默认جذ</Elem1>\n'
118 |             u'    <Elem2/>\n'
119 |             u'</DocRoot>\n'.encode('utf-8'),
120 |             self.iobytes.getvalue())
121 | 
122 |     def test_custom_indent_and_newline(self):
123 |         self.builder.dom_element.write_doc(self.iobytes,
124 |             indent=8, newline='\t')
125 |         self.assertEqual(
126 |             u'<?xml version="1.0" encoding="utf-8"?>\t'
127 |             u'<DocRoot>\t'
128 |             u'        <Elem1>默认جذ</Elem1>\t'
129 |             u'        <Elem2/>\t'
130 |             u'</DocRoot>\t'.encode('utf-8'),
131 |             self.iobytes.getvalue())
132 | 
133 | 
134 | class TestXmlDomBuilder(BaseWriterTest, unittest.TestCase):
135 |     """
136 |     Tests building with the standard library xml.dom module, or with any
137 |     library that augments/clobbers this module.
138 |     """
139 | 
140 |     @property
141 |     def adapter(self):
142 |         return xml4h.XmlDomImplAdapter
143 | 
144 | 
145 | class TestLXMLEtreeBuilder(BaseWriterTest, unittest.TestCase):
146 |     """
147 |     Tests building with the lxml (lxml.etree) library.
148 |     """
149 | 
150 |     @property
151 |     def adapter(self):
152 |         if not xml4h.LXMLAdapter.is_available():
153 |             self.skipTest("lxml library is not installed")
154 |         return xml4h.LXMLAdapter
155 | 
156 | 
157 | class TestElementTreeBuilder(BaseWriterTest, unittest.TestCase):
158 |     """
159 |     Tests building with the xml.etree.ElementTree library.
160 |     """
161 | 
162 |     @property
163 |     def adapter(self):
164 |         if not xml4h.ElementTreeAdapter.is_available():
165 |             self.skipTest(
166 |                 "ElementTree library is not installed or is outdated")
167 |         return xml4h.ElementTreeAdapter
168 | 
169 | 
170 | class TestElementTreeBuilder(BaseWriterTest, unittest.TestCase):
171 |     """
172 |     Tests building with the xml.etree.ElementTree library.
173 |     """
174 | 
175 |     @property
176 |     def adapter(self):
177 |         if not xml4h.ElementTreeAdapter.is_available():
178 |             self.skipTest(
179 |                 "cElementTree library is not installed or is outdated")
180 |         return xml4h.ElementTreeAdapter
181 | 


--------------------------------------------------------------------------------
/tox.ini:
--------------------------------------------------------------------------------
 1 | [tox]
 2 | envlist=py27,py35,py36,py37,py38,without-lxml
 3 | 
 4 | [testenv]
 5 | deps=
 6 |     six
 7 |     nose
 8 |     coverage
 9 |     lxml
10 | commands=
11 |     python -m nose --with-coverage --cover-package=xml4h --with-doctest --include=docs --doctest-extension=.rst
12 | 
13 | ; Run reduced tests to ensure xml4h works when lxml isn't installed
14 | [testenv:without-lxml]
15 | deps=
16 |     six
17 |     nose
18 |     coverage
19 | commands=
20 |     python -m nose
21 | 


--------------------------------------------------------------------------------
/xml4h/__init__.py:
--------------------------------------------------------------------------------
  1 | import six
  2 | 
  3 | import xml4h
  4 | 
  5 | # Make commonly-used classes and functions available in xml4h module
  6 | from xml4h.impls.xml_dom_minidom import XmlDomImplAdapter
  7 | from xml4h.impls.xml_etree_elementtree import (
  8 |     ElementTreeAdapter, cElementTreeAdapter)
  9 | from xml4h.impls.lxml_etree import LXMLAdapter
 10 | from xml4h.builder import Builder
 11 | from xml4h.writer import write_node
 12 | 
 13 | 
 14 | __title__ = 'xml4h'
 15 | __version__ = '1.0'
 16 | 
 17 | 
 18 | # List of xml4h adapter classes, in order of preference
 19 | _ADAPTER_CLASSES = [
 20 |     LXMLAdapter,
 21 |     cElementTreeAdapter,
 22 |     ElementTreeAdapter,
 23 |     XmlDomImplAdapter]
 24 | 
 25 | _ADAPTERS_AVAILABLE = []
 26 | _ADAPTERS_UNAVAILABLE = []
 27 | 
 28 | for impl_class in _ADAPTER_CLASSES:
 29 |     if impl_class.is_available():
 30 |         _ADAPTERS_AVAILABLE.append(impl_class)
 31 |     else:
 32 |         _ADAPTERS_UNAVAILABLE.append(impl_class)
 33 | 
 34 | 
 35 | best_adapter = _ADAPTERS_AVAILABLE[0]
 36 | """
 37 | The :ref:`best adapter available <best-adapter>` in the Python environment.
 38 | This adapter is the default when parsing or creating XML documents,
 39 | unless overridden by passing a specific adapter class.
 40 | """
 41 | 
 42 | 
 43 | def parse(
 44 |     to_parse, ignore_whitespace_text_nodes=True, adapter=None
 45 | ):
 46 |     """
 47 |     Parse an XML document into an *xml4h*-wrapped DOM representation
 48 |     using an underlying XML library implementation.
 49 | 
 50 |     :param to_parse: an XML document file, document bytes, or the
 51 |         path to an XML file. If a bytes value is given that contains
 52 |         a ``<`` character it is treated as literal XML data, otherwise
 53 |         a bytes value is treated as a file path.
 54 |     :type to_parse: a file-like object or string
 55 |     :param bool ignore_whitespace_text_nodes: if ``True`` pure whitespace
 56 |         nodes are stripped from the parsed document, since these are
 57 |         usually noise introduced by XML docs serialized to be human-friendly.
 58 |     :param adapter: the *xml4h* implementation adapter class used to parse
 59 |         the document and to interact with the resulting nodes.
 60 |         If None, :attr:`best_adapter` will be used.
 61 |     :type adapter: adapter class or None
 62 | 
 63 |     :return: an :class:`xml4h.nodes.Document` node representing the
 64 |         parsed document.
 65 | 
 66 |     Delegates to an adapter's :meth:`~xml4h.impls.interface.parse_string` or
 67 |     :meth:`~xml4h.impls.interface.parse_file` implementation.
 68 |     """
 69 |     if adapter is None:
 70 |         adapter = best_adapter
 71 |     if isinstance(to_parse, six.binary_type) and b'<' in to_parse:
 72 |         return adapter.parse_bytes(to_parse, ignore_whitespace_text_nodes)
 73 |     elif isinstance(to_parse, six.string_types) and '<' in to_parse:
 74 |         return adapter.parse_string(to_parse, ignore_whitespace_text_nodes)
 75 |     else:
 76 |         return adapter.parse_file(to_parse, ignore_whitespace_text_nodes)
 77 | 
 78 | 
 79 | def build(tagname_or_element, ns_uri=None, adapter=None):
 80 |     """
 81 |     Return a :class:`~xml4h.builder.Builder` that represents an element in
 82 |     a new or existing XML DOM and provides "chainable" methods focussed
 83 |     specifically on adding XML content.
 84 | 
 85 |     :param tagname_or_element: a string name for the root node of a
 86 |         new XML document, or an :class:`~xml4h.nodes.Element` node in an
 87 |         existing document.
 88 |     :type tagname_or_element: string or :class:`~xml4h.nodes.Element` node
 89 |     :param ns_uri: a namespace URI to apply to the new root node. This
 90 |         argument has no effect this method is acting on an element.
 91 |     :type ns_uri: string or None
 92 |     :param adapter: the *xml4h* implementation adapter class used to
 93 |         interact with the document DOM nodes.
 94 |         If None, :attr:`best_adapter` will be used.
 95 |     :type adapter: adapter class or None
 96 | 
 97 |     :return: a :class:`~xml4h.builder.Builder` instance that represents an
 98 |         :class:`~xml4h.nodes.Element` node in an XML DOM.
 99 |     """
100 |     if adapter is None:
101 |         adapter = best_adapter
102 |     if isinstance(tagname_or_element, six.string_types):
103 |         doc = adapter.create_document(
104 |             tagname_or_element, ns_uri=ns_uri)
105 |         element = doc.root
106 |     elif isinstance(tagname_or_element, xml4h.nodes.Element):
107 |         element = tagname_or_element
108 |     else:
109 |         raise xml4h.exceptions.IncorrectArgumentTypeException(
110 |             tagname_or_element, [str, xml4h.nodes.Element])
111 |     return Builder(element)
112 | 


--------------------------------------------------------------------------------
/xml4h/builder.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Builder is a utility class that makes it easy to create valid, well-formed
  3 | XML documents using relatively sparse python code.  The builder class works
  4 | by wrapping an :class:`xml4h.nodes.Element` node to provide "chainable"
  5 | methods focussed specifically on adding XML content.
  6 | 
  7 | Each method that adds content returns a Builder instance representing the
  8 | current or the newly-added element. Behind the scenes, the builder uses the
  9 | :mod:`xml4h.nodes` node traversal and manipulation methods to add content
 10 | directly to the underlying DOM.
 11 | 
 12 | You will not generally create Builder instances directly, but will instead
 13 | call the :meth:`xml4h.builder` method with the name for a new root element
 14 | or with an existing :class:`xml4h.nodes.Element` node.
 15 | """
 16 | import xml4h
 17 | 
 18 | 
 19 | class Builder(object):
 20 |     """
 21 |     Builder class that wraps an :class:`xml4h.nodes.Element` node with methods
 22 |     for adding XML content to an underlying DOM.
 23 |     """
 24 | 
 25 |     def __init__(self, element):
 26 |         """
 27 |         Create a Builder representing an xml4h Element node.
 28 | 
 29 |         :param element: Element node to represent
 30 |         :type element: :class:`xml4h.nodes.Element`
 31 |         """
 32 |         if not isinstance(element, xml4h.nodes.Element):
 33 |             raise ValueError(
 34 |                 "Builder can only be created with an %s.%s instance, not %s"
 35 |                 % (xml4h.nodes.Element.__module__,
 36 |                    xml4h.nodes.Element.__name__,
 37 |                    element))
 38 |         self._element = element
 39 | 
 40 |     @property
 41 |     def dom_element(self):
 42 |         """
 43 |         :return: the :class:`xml4h.nodes.Element` node represented by this
 44 |                  Builder.
 45 |         """
 46 |         return self._element
 47 | 
 48 |     @property
 49 |     def document(self):
 50 |         """
 51 |         :return: the :class:`xml4h.nodes.Document` node that contains the
 52 |                  element represented by this Builder.
 53 |         """
 54 |         return self._element.document
 55 | 
 56 |     @property
 57 |     def root(self):
 58 |         """
 59 |         :return: the :class:`xml4h.nodes.Element` root node ancestor of the
 60 |                  element represented by this Builder
 61 |         """
 62 |         return self._element.root
 63 | 
 64 |     def find(self, **kwargs):
 65 |         """
 66 |         Find descendants of the element represented by this builder that
 67 |         match the given constraints.
 68 | 
 69 |         :return: a list of :class:`xml4h.nodes.Element` nodes
 70 | 
 71 |         Delegates to :meth:`xml4h.nodes.Node.find`
 72 |         """
 73 |         return self._element.find(**kwargs)
 74 | 
 75 |     def find_doc(self, **kwargs):
 76 |         """
 77 |         Find nodes in this element's owning :class:`xml4h.nodes.Document`
 78 |         that match the given constraints.
 79 | 
 80 |         :return: a list of :class:`xml4h.nodes.Element` nodes
 81 | 
 82 |         Delegates to :meth:`xml4h.nodes.Node.find_doc`.
 83 |         """
 84 |         return self._element.find_doc(**kwargs)
 85 | 
 86 |     def write(self, *args, **kwargs):
 87 |         """
 88 |         Write XML bytes for the element represented by this builder.
 89 | 
 90 |         Delegates to :meth:`xml4h.nodes.Node.write`.
 91 |         """
 92 |         self.dom_element.write(*args, **kwargs)
 93 | 
 94 |     def write_doc(self, *args, **kwargs):
 95 |         """
 96 |         Write XML bytes for the Document containing the element
 97 |         represented by this builder.
 98 | 
 99 |         Delegates to :meth:`xml4h.nodes.Node.write_doc`.
100 |         """
101 |         self.dom_element.write_doc(*args, **kwargs)
102 | 
103 |     def xml(self, **kwargs):
104 |         """
105 |         :return: XML string for the element represented by this builder.
106 | 
107 |         Delegates to :meth:`xml4h.nodes.Node.xml`.
108 |         """
109 |         return self.dom_element.xml(**kwargs)
110 | 
111 |     def xml_doc(self, **kwargs):
112 |         """
113 |         :return: XML string for the Document containing the element represented
114 |                  by this builder.
115 | 
116 |         Delegates to :meth:`xml4h.nodes.Node.xml_doc`.
117 |         """
118 |         return self.dom_element.xml_doc(**kwargs)
119 | 
120 |     def up(self, count_or_element_name=1):
121 |         """
122 |         :return: a builder representing an ancestor of the current element,
123 |                  by default the parent element.
124 | 
125 |         :param count_or_element_name:
126 |             when an integer, return the n'th ancestor element up to the
127 |             document's root element.
128 |             when a string, return the nearest ancestor element with that name,
129 |             or the document's root element if there are no matching ancestors.
130 |             Defaults to integer value 1 which means the immediate parent.
131 |         :type count_or_element_name: integer or string
132 |         """
133 |         elem = self._element
134 |         to_count = to_name = None
135 |         if isinstance(count_or_element_name, int):
136 |             to_count = count_or_element_name
137 |         else:
138 |             to_name = count_or_element_name
139 |         up_count = 0
140 |         while True:
141 |             # Don't go up beyond the document root
142 |             if elem.is_root or elem.parent is None:
143 |                 break
144 |             # Go up to element's parent
145 |             elem = elem.parent
146 |             # If we have a name to match and it matches, stop
147 |             if to_name:
148 |                 if elem.name == to_name:
149 |                     break
150 |                 continue
151 |             # If we have a count to reach and have reached it, stop
152 |             up_count += 1
153 |             if up_count >= to_count:
154 |                 break
155 |         return Builder(elem)
156 | 
157 |     def transplant(self, node):
158 |         """
159 |         Transplant a node from another document to become a child of
160 |         the :class:`xml4h.nodes.Element` node represented by this Builder.
161 | 
162 |         :return: a new Builder that represents the current element \
163 |                  (not the transplanted node).
164 | 
165 |         Delegates to :meth:`xml4h.nodes.Node.transplant_node`.
166 |         """
167 |         self._element.transplant_node(node)
168 |         return self
169 | 
170 |     def clone(self, node):
171 |         """
172 |         Clone a node from another document to become a child of
173 |         the :class:`xml4h.nodes.Element` node represented by this Builder.
174 | 
175 |         :return: a new Builder that represents the current element \
176 |                  (not the cloned node).
177 | 
178 |         Delegates to :meth:`xml4h.nodes.Node.clone_node`.
179 |         """
180 |         self._element.clone_node(node)
181 |         return self
182 | 
183 |     def element(self, *args, **kwargs):
184 |         """
185 |         Add a child element to the :class:`xml4h.nodes.Element` node
186 |         represented by this Builder.
187 | 
188 |         :return: a new Builder that represents the child element.
189 | 
190 |         Delegates to :meth:`xml4h.nodes.Element.add_element`.
191 |         """
192 |         child_element = self._element.add_element(*args, **kwargs)
193 |         return Builder(child_element)
194 | 
195 |     elem = element  # Alias
196 |     """Alias of :meth:`element`"""
197 | 
198 |     e = element  # Alias
199 |     """Alias of :meth:`element`"""
200 | 
201 |     def attributes(self, *args, **kwargs):
202 |         """
203 |         Add one or more attributes to the :class:`xml4h.nodes.Element` node
204 |         represented by this Builder.
205 | 
206 |         :return: the current Builder.
207 | 
208 |         Delegates to :meth:`xml4h.nodes.Element.set_attributes`.
209 |         """
210 |         self._element.set_attributes(*args, **kwargs)
211 |         return self
212 | 
213 |     attrs = attributes  # Alias
214 |     """Alias of :meth:`attributes`"""
215 | 
216 |     a = attributes  # Alias
217 |     """Alias of :meth:`attributes`"""
218 | 
219 |     def text(self, text):
220 |         """
221 |         Add a text node to the :class:`xml4h.nodes.Element` node
222 |         represented by this Builder.
223 | 
224 |         :return: the current Builder.
225 | 
226 |         Delegates to :meth:`xml4h.nodes.Element.add_text`.
227 |         """
228 |         self._element.add_text(text)
229 |         return self
230 | 
231 |     t = text  # Alias
232 |     """Alias of :meth:`text`"""
233 | 
234 |     def comment(self, text):
235 |         """
236 |         Add a coment node to the :class:`xml4h.nodes.Element` node
237 |         represented by this Builder.
238 | 
239 |         :return: the current Builder.
240 | 
241 |         Delegates to :meth:`xml4h.nodes.Element.add_comment`.
242 |         """
243 |         self._element.add_comment(text)
244 |         return self
245 | 
246 |     c = comment  # Alias
247 |     """Alias of :meth:`comment`"""
248 | 
249 |     def processing_instruction(self, target, data):
250 |         """
251 |         Add a processing instruction node to the :class:`xml4h.nodes.Element`
252 |         node represented by this Builder.
253 | 
254 |         :return: the current Builder.
255 | 
256 |         Delegates to :meth:`xml4h.nodes.Element.add_instruction`.
257 |         """
258 |         self._element.add_instruction(target, data)
259 |         return self
260 | 
261 |     instruction = processing_instruction  # Alias
262 |     """Alias of :meth:`processing_instruction`"""
263 | 
264 |     i = instruction  # Alias
265 |     """Alias of :meth:`processing_instruction`"""
266 | 
267 |     def cdata(self, text):
268 |         """
269 |         Add a CDATA node to the :class:`xml4h.nodes.Element` node
270 |         represented by this Builder.
271 | 
272 |         :return: the current Builder.
273 | 
274 |         Delegates to :meth:`xml4h.nodes.Element.add_cdata`.
275 |         """
276 |         self._element.add_cdata(text)
277 |         return self
278 | 
279 |     data = cdata  # Alias
280 |     """Alias of :meth:`cdata`"""
281 | 
282 |     d = cdata  # Alias
283 |     """Alias of :meth:`cdata`"""
284 | 
285 |     def ns_prefix(self, prefix, ns_uri):
286 |         """
287 |         Set the namespace prefix of the :class:`xml4h.nodes.Element` node
288 |         represented by this Builder.
289 | 
290 |         :return: the current Builder.
291 | 
292 |         Delegates to :meth:`xml4h.nodes.Element.set_ns_prefix`.
293 |         """
294 |         self._element.set_ns_prefix(prefix, ns_uri)
295 |         return self
296 | 


--------------------------------------------------------------------------------
/xml4h/exceptions.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Custom *xml4h* exceptions.
 3 | """
 4 | 
 5 | 
 6 | class Xml4hException(Exception):
 7 |     """
 8 |     Base exception class for all non-standard exceptions raised by *xml4h*.
 9 |     """
10 |     pass
11 | 
12 | 
13 | class Xml4hImplementationBug(Xml4hException):
14 |     """
15 |     *xml4h* implementation has a bug, probably.
16 |     """
17 |     pass
18 | 
19 | 
20 | class FeatureUnavailableException(Xml4hException):
21 |     """
22 |     User has attempted to use a feature that is available in some *xml4h*
23 |     implementations/adapters, but is not available in the current one.
24 |     """
25 |     pass
26 | 
27 | 
28 | class IncorrectArgumentTypeException(ValueError, Xml4hException):
29 |     """
30 |     Richer flavour of a ValueError that describes exactly what argument
31 |     types are expected.
32 |     """
33 | 
34 |     def __init__(self, arg, expected_types):
35 |         msg = ('Argument %s is not one of the expected types: %s'
36 |             % (arg, expected_types))
37 |         super(IncorrectArgumentTypeException, self).__init__(msg)
38 | 
39 | 
40 | class UnknownNamespaceException(ValueError, Xml4hException):
41 |     """
42 |     User has attempted to refer to an unknown or undeclared namespace by
43 |     prefix or URI.
44 |     """
45 |     pass
46 | 


--------------------------------------------------------------------------------
/xml4h/impls/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jmurty/xml4h/83bc0a91afe5d6e17d6c99ec43dc0aec9593cc06/xml4h/impls/__init__.py


--------------------------------------------------------------------------------
/xml4h/impls/interface.py:
--------------------------------------------------------------------------------
  1 | import abc
  2 | import six
  3 | 
  4 | from xml4h import nodes, exceptions
  5 | 
  6 | 
  7 | @six.add_metaclass(abc.ABCMeta)
  8 | class XmlImplAdapter(object):
  9 |     """
 10 |     Base class that defines how *xml4h* interacts with an underlying XML
 11 |     library that the adaptor "wraps" to provide additional (or at least
 12 |     different) functionality.
 13 | 
 14 |     This class should be treated as an abstract class. It provides some
 15 |     common implementation code used by all *xml4h* adapter implementations,
 16 |     but mostly it sketches out the methods the real implementaiton subclasses
 17 |     must provide.
 18 |     """
 19 | 
 20 |     # List of extra features supported (or not) by an adapter implementation
 21 |     SUPPORTED_FEATURES = {
 22 |         'xpath': False,
 23 |         }
 24 | 
 25 |     @classmethod
 26 |     def has_feature(cls, feature_name):
 27 |         """
 28 |         :return: *True* if a named feature is supported by this adapter.
 29 |         """
 30 |         return cls.SUPPORTED_FEATURES.get(feature_name.lower(), False)
 31 | 
 32 |     @classmethod
 33 |     def ignore_whitespace_text_nodes(cls, wrapped_node):
 34 |         """
 35 |         Find and delete any text nodes containing nothing but whitespace in
 36 |         in the given node and its descendents.
 37 | 
 38 |         This is useful for cleaning up excess low-value text nodes in a
 39 |         document DOM after parsing a pretty-printed XML document.
 40 |         """
 41 |         for child in wrapped_node.children:
 42 |             if child.is_text and child.value.strip() == '':
 43 |                 child.delete()
 44 |             else:
 45 |                 cls.ignore_whitespace_text_nodes(child)
 46 | 
 47 |     @classmethod
 48 |     def create_document(cls, root_tagname, ns_uri=None, **kwargs):
 49 |         # Use implementation's method to create base document and root element
 50 |         impl_doc = cls.new_impl_document(root_tagname, ns_uri, **kwargs)
 51 |         adapter = cls(impl_doc)
 52 |         wrapped_doc = nodes.Document(impl_doc, adapter)
 53 |         # Automatically add namespace URI to root Element as attribute
 54 |         if ns_uri is not None:
 55 |             adapter.set_node_attribute_value(wrapped_doc.root.impl_node,
 56 |                 'xmlns', ns_uri, ns_uri=nodes.Node.XMLNS_URI)
 57 |         return wrapped_doc
 58 | 
 59 |     @classmethod
 60 |     def wrap_document(cls, document_node):
 61 |         adapter = cls(document_node)
 62 |         return nodes.Document(document_node, adapter)
 63 | 
 64 |     @classmethod
 65 |     def wrap_node(cls, node, document, adapter=None):
 66 |         if node is None:
 67 |             return None
 68 |         if adapter is None:
 69 |             adapter = cls(document)
 70 |         impl_class = adapter.map_node_to_class(node)
 71 |         return impl_class(node, adapter)
 72 | 
 73 |     @classmethod
 74 |     @abc.abstractmethod
 75 |     def is_available(cls):
 76 |         """
 77 |         :return: *True* if this adapter's underlying XML library is available \
 78 |             in the Python environment.
 79 |         """
 80 |         raise NotImplementedError("Implementation missing for %s" % cls)
 81 | 
 82 |     @classmethod
 83 |     @abc.abstractmethod
 84 |     def parse_string(cls, xml_str, ignore_whitespace_text_nodes=True):
 85 |         raise NotImplementedError("Implementation missing for %s" % cls)
 86 | 
 87 |     @classmethod
 88 |     @abc.abstractmethod
 89 |     def parse_bytes(cls, xml_bytes, ignore_whitespace_text_nodes=True):
 90 |         raise NotImplementedError("Implementation missing for %s" % cls)
 91 | 
 92 |     @classmethod
 93 |     @abc.abstractmethod
 94 |     def parse_file(cls, xml_file, ignore_whitespace_text_nodes=True):
 95 |         raise NotImplementedError("Implementation missing for %s" % cls)
 96 | 
 97 |     def __init__(self, document):
 98 |         if not isinstance(document, object):
 99 |             raise exceptions.IncorrectArgumentTypeException(
100 |                 document, [object])
101 |         self._impl_document = document
102 |         self._auto_ns_prefix_count = 0
103 |         self.clear_caches()
104 | 
105 |     def clear_caches(cls):
106 |         """
107 |         Clear any in-adapter cached data, for cases where cached data could
108 |         become outdated e.g. by making DOM changes directly outside of *xml4h*.
109 | 
110 |         This is a no-op if the implementing adapter has no cached data.
111 |         """
112 |         pass
113 | 
114 |     @property
115 |     def impl_document(self):
116 |         return self._impl_document
117 | 
118 |     @property
119 |     def impl_root_element(self):
120 |         return self.get_impl_root(self.impl_document)
121 | 
122 |     def get_ns_uri_for_prefix(self, node, prefix):
123 |         if prefix == 'xmlns':
124 |             return nodes.Node.XMLNS_URI
125 |         elif prefix is None:
126 |             attr_name = 'xmlns'
127 |         else:
128 |             attr_name = 'xmlns:%s' % prefix
129 |         uri = self.lookup_ns_uri_by_attr_name(node, attr_name)
130 |         if uri is None:
131 |             if attr_name == 'xmlns':
132 |                 # Default namespace URI
133 |                 return nodes.Node.XMLNS_URI
134 |             raise exceptions.UnknownNamespaceException(
135 |                 "Unknown namespace URI for attribute name '%s'" % attr_name)
136 |         return uri
137 | 
138 |     def get_ns_prefix_for_uri(self, node, uri, auto_generate_prefix=False):
139 |         if uri == nodes.Node.XMLNS_URI:
140 |             return 'xmlns'
141 |         prefix = self.lookup_ns_prefix_for_uri(node, uri)
142 |         if not prefix and auto_generate_prefix:
143 |             prefix = 'autoprefix%d' % self._auto_ns_prefix_count
144 |             self._auto_ns_prefix_count += 1
145 |         return prefix
146 | 
147 |     def get_ns_info_from_node_name(self, name, impl_node):
148 |         """
149 |         Return a three-element tuple with the prefix, local name, and namespace
150 |         URI for the given element/attribute name (in the context of the given
151 |         node's hierarchy). If the name has no associated prefix or namespace
152 |         information, None is return for those tuple members.
153 |         """
154 |         if '}' in name:
155 |             ns_uri, name = name.split('}')
156 |             ns_uri = ns_uri[1:]
157 |             prefix = self.get_ns_prefix_for_uri(impl_node, ns_uri)
158 |         elif ':' in name:
159 |             prefix, name = name.split(':')
160 |             ns_uri = self.get_ns_uri_for_prefix(impl_node, prefix)
161 |             if ns_uri is None:
162 |                 raise exceptions.UnknownNamespaceException(
163 |                     "Prefix '%s' does not have a defined namespace URI"
164 |                     % prefix)
165 |         else:
166 |             prefix, ns_uri = None, None
167 |         return prefix, name, ns_uri
168 | 
169 |     # Utility implementation methods
170 | 
171 |     @classmethod
172 |     @abc.abstractmethod
173 |     def new_impl_document(cls, root_tagname, ns_uri=None, **kwargs):
174 |         raise NotImplementedError("Implementation missing for %s" % cls)
175 | 
176 |     @abc.abstractmethod
177 |     def map_node_to_class(self, node):
178 |         raise NotImplementedError("Implementation missing for %s" % self)
179 | 
180 |     @abc.abstractmethod
181 |     def get_impl_root(self, node):
182 |         raise NotImplementedError("Implementation missing for %s" % self)
183 | 
184 |     # Document implementation methods
185 | 
186 |     @abc.abstractmethod
187 |     def new_impl_element(self, tagname, ns_uri=None, parent=None):
188 |         raise NotImplementedError("Implementation missing for %s" % self)
189 | 
190 |     @abc.abstractmethod
191 |     def new_impl_text(self, text):
192 |         raise NotImplementedError("Implementation missing for %s" % self)
193 | 
194 |     @abc.abstractmethod
195 |     def new_impl_comment(self, text):
196 |         raise NotImplementedError("Implementation missing for %s" % self)
197 | 
198 |     @abc.abstractmethod
199 |     def new_impl_instruction(self, target, data):
200 |         raise NotImplementedError("Implementation missing for %s" % self)
201 | 
202 |     @abc.abstractmethod
203 |     def new_impl_cdata(self, text):
204 |         raise NotImplementedError("Implementation missing for %s" % self)
205 | 
206 |     @abc.abstractmethod
207 |     def find_node_elements(self, node, name='*', ns_uri='*'):
208 |         """
209 |         :return: element node descendents of the given node that match the \
210 |             search constraints.
211 | 
212 |         :param node: a node object from the underlying XML library.
213 |         :param string name: only elements with a matching name will be
214 |             returned. If the value is ``*`` all names will match.
215 |         :param string ns_uri: only elements with a matching namespace URI
216 |             will be returned. If the value is ``*`` all namespaces will match.
217 |         """
218 |         raise NotImplementedError("Implementation missing for %s" % self)
219 | 
220 |     def xpath_on_node(self, node, xpath, **kwargs):
221 |         if not self.has_feature('xpath'):
222 |             raise exceptions.FeatureUnavailableException('xpath')
223 | 
224 |     # Node implementation methods
225 | 
226 |     @abc.abstractmethod
227 |     def get_node_namespace_uri(self, node):
228 |         raise NotImplementedError("Implementation missing for %s" % self)
229 | 
230 |     @abc.abstractmethod
231 |     def set_node_namespace_uri(self, node, ns_uri):
232 |         raise NotImplementedError("Implementation missing for %s" % self)
233 | 
234 |     @abc.abstractmethod
235 |     def get_node_parent(self, node):
236 |         raise NotImplementedError("Implementation missing for %s" % self)
237 | 
238 |     @abc.abstractmethod
239 |     def get_node_children(self, node):
240 |         raise NotImplementedError("Implementation missing for %s" % self)
241 | 
242 |     @abc.abstractmethod
243 |     def get_node_name(self, node):
244 |         raise NotImplementedError("Implementation missing for %s" % self)
245 | 
246 |     @abc.abstractmethod
247 |     def get_node_local_name(self, node):
248 |         raise NotImplementedError("Implementation missing for %s" % self)
249 | 
250 |     @abc.abstractmethod
251 |     def get_node_name_prefix(self, node):
252 |         raise NotImplementedError("Implementation missing for %s" % self)
253 | 
254 |     @abc.abstractmethod
255 |     def get_node_value(self, node):
256 |         raise NotImplementedError("Implementation missing for %s" % self)
257 | 
258 |     @abc.abstractmethod
259 |     def set_node_value(self, node, value):
260 |         raise NotImplementedError("Implementation missing for %s" % self)
261 | 
262 |     @abc.abstractmethod
263 |     def get_node_text(self, node):
264 |         raise NotImplementedError("Implementation missing for %s" % self)
265 | 
266 |     @abc.abstractmethod
267 |     def set_node_text(self, node, text):
268 |         raise NotImplementedError("Implementation missing for %s" % self)
269 | 
270 |     @abc.abstractmethod
271 |     def get_node_attributes(self, element, ns_uri=None):
272 |         raise NotImplementedError("Implementation missing for %s" % self)
273 | 
274 |     @abc.abstractmethod
275 |     def has_node_attribute(self, element, name, ns_uri=None):
276 |         raise NotImplementedError("Implementation missing for %s" % self)
277 | 
278 |     @abc.abstractmethod
279 |     def get_node_attribute_node(self, element, name, ns_uri=None):
280 |         raise NotImplementedError("Implementation missing for %s" % self)
281 | 
282 |     @abc.abstractmethod
283 |     def get_node_attribute_value(self, element, name, ns_uri=None):
284 |         raise NotImplementedError("Implementation missing for %s" % self)
285 | 
286 |     @abc.abstractmethod
287 |     def set_node_attribute_value(self, element, name, value, ns_uri=None):
288 |         raise NotImplementedError("Implementation missing for %s" % self)
289 | 
290 |     @abc.abstractmethod
291 |     def remove_node_attribute(self, element, name, ns_uri=None):
292 |         raise NotImplementedError("Implementation missing for %s" % self)
293 | 
294 |     @abc.abstractmethod
295 |     def add_node_child(self, parent, child, before_sibling=None):
296 |         raise NotImplementedError("Implementation missing for %s" % self)
297 | 
298 |     @abc.abstractmethod
299 |     def import_node(self, parent, node, original_parent=None, clone=False):
300 |         raise NotImplementedError("Implementation missing for %s" % self)
301 | 
302 |     @abc.abstractmethod
303 |     def clone_node(self, node, deep=True):
304 |         raise NotImplementedError("Implementation missing for %s" % self)
305 | 
306 |     @abc.abstractmethod
307 |     def remove_node_child(self, parent, child, destroy_node=True):
308 |         raise NotImplementedError("Implementation missing for %s" % self)
309 | 
310 |     @abc.abstractmethod
311 |     def lookup_ns_uri_by_attr_name(self, node, name):
312 |         raise NotImplementedError("Implementation missing for %s" % self)
313 | 
314 |     @abc.abstractmethod
315 |     def lookup_ns_prefix_for_uri(self, node, uri):
316 |         raise NotImplementedError("Implementation missing for %s" % self)
317 | 


--------------------------------------------------------------------------------
/xml4h/impls/lxml_etree.py:
--------------------------------------------------------------------------------
  1 | import re
  2 | import copy
  3 | 
  4 | from xml4h.impls.interface import XmlImplAdapter
  5 | from xml4h import nodes, exceptions
  6 | 
  7 | try:
  8 |     from lxml import etree
  9 | except ImportError:
 10 |     pass
 11 | 
 12 | 
 13 | class LXMLAdapter(XmlImplAdapter):
 14 |     """
 15 |     Adapter to the `lxml <http://lxml.de>`_ XML library implementation.
 16 |     """
 17 | 
 18 |     SUPPORTED_FEATURES = {
 19 |         'xpath': True,
 20 |         }
 21 | 
 22 |     @classmethod
 23 |     def is_available(cls):
 24 |         try:
 25 |             etree.Element
 26 |             return True
 27 |         except:
 28 |             return False
 29 | 
 30 |     @classmethod
 31 |     def parse_string(cls, xml_str, ignore_whitespace_text_nodes=True):
 32 |         impl_root_elem = etree.fromstring(xml_str)
 33 |         wrapped_doc = LXMLAdapter.wrap_document(impl_root_elem.getroottree())
 34 |         if ignore_whitespace_text_nodes:
 35 |             cls.ignore_whitespace_text_nodes(wrapped_doc)
 36 |         return wrapped_doc
 37 | 
 38 |     @classmethod
 39 |     def parse_bytes(cls, xml_bytes, ignore_whitespace_text_nodes=True):
 40 |         return LXMLAdapter.parse_string(xml_bytes, ignore_whitespace_text_nodes)
 41 | 
 42 |     @classmethod
 43 |     def parse_file(cls, xml_file, ignore_whitespace_text_nodes=True):
 44 |         impl_doc = etree.parse(xml_file)
 45 |         wrapped_doc = LXMLAdapter.wrap_document(impl_doc)
 46 |         if ignore_whitespace_text_nodes:
 47 |             cls.ignore_whitespace_text_nodes(wrapped_doc)
 48 |         return wrapped_doc
 49 | 
 50 |     @classmethod
 51 |     def new_impl_document(cls, root_tagname, ns_uri=None, **kwargs):
 52 |         root_nsmap = {}
 53 |         if ns_uri is not None:
 54 |             root_nsmap[None] = ns_uri
 55 |         else:
 56 |             ns_uri = nodes.Node.XMLNS_URI
 57 |             root_nsmap[None] = ns_uri
 58 |         root_elem = etree.Element('{%s}%s' % (ns_uri, root_tagname),
 59 |             nsmap=root_nsmap)
 60 |         doc = etree.ElementTree(root_elem)
 61 |         return doc
 62 | 
 63 |     def map_node_to_class(self, node):
 64 |         if isinstance(node, etree._ProcessingInstruction):
 65 |             return nodes.ProcessingInstruction
 66 |         elif isinstance(node, etree._Comment):
 67 |             return nodes.Comment
 68 |         elif isinstance(node, etree._ElementTree):
 69 |             return nodes.Document
 70 |         elif isinstance(node, etree._Element):
 71 |             return nodes.Element
 72 |         elif isinstance(node, LXMLAttribute):
 73 |             return nodes.Attribute
 74 |         elif isinstance(node, LXMLText):
 75 |             if node.is_cdata:
 76 |                 return nodes.CDATA
 77 |             else:
 78 |                 return nodes.Text
 79 |         raise exceptions.Xml4hImplementationBug(
 80 |             'Unrecognized type for implementation node: %s' % node)
 81 | 
 82 |     def get_impl_root(self, node):
 83 |         return self._impl_document.getroot()
 84 | 
 85 |     # Document implementation methods
 86 | 
 87 |     def new_impl_element(self, tagname, ns_uri=None, parent=None):
 88 |         if ns_uri is not None:
 89 |             if ':' in tagname:
 90 |                 tagname = tagname.split(':')[1]
 91 |             my_nsmap = {None: ns_uri}
 92 |             # Add any xmlns attribute prefix mappings from parent's document
 93 |             # TODO This doesn't seem to help
 94 |             curr_node = parent
 95 |             while curr_node.__class__ == etree._Element:
 96 |                 for n, v in list(curr_node.attrib.items()):
 97 |                     if '{%s}' % nodes.Node.XMLNS_URI in n:
 98 |                         _, prefix = n.split('}')
 99 |                         my_nsmap[prefix] = v
100 |                 curr_node = self.get_node_parent(curr_node)
101 |             return etree.Element('{%s}%s' % (ns_uri, tagname), nsmap=my_nsmap)
102 |         else:
103 |             return etree.Element(tagname)
104 | 
105 |     def new_impl_text(self, text):
106 |         return LXMLText(text)
107 | 
108 |     def new_impl_comment(self, text):
109 |         return etree.Comment(text)
110 | 
111 |     def new_impl_instruction(self, target, data):
112 |         return etree.ProcessingInstruction(target, data)
113 | 
114 |     def new_impl_cdata(self, text):
115 |         return LXMLText(text, is_cdata=True)
116 | 
117 |     def find_node_elements(self, node, name='*', ns_uri='*'):
118 |         # TODO Any proper way to find namespaced elements by name?
119 |         name_match_nodes = node.getiterator()
120 |         # Filter nodes by name and ns_uri if necessary
121 |         results = []
122 |         for n in name_match_nodes:
123 |             # Ignore the current node
124 |             if n == node:
125 |                 continue
126 |             # Ignore non-Elements
127 |             if not n.__class__ == etree._Element:
128 |                 continue
129 |             if ns_uri != '*' and self.get_node_namespace_uri(n) != ns_uri:
130 |                 continue
131 |             if name != '*' and self.get_node_local_name(n) != name:
132 |                 continue
133 |             results.append(n)
134 |         return results
135 |     find_node_elements.__doc__ = XmlImplAdapter.find_node_elements.__doc__
136 | 
137 |     def xpath_on_node(self, node, xpath, **kwargs):
138 |         """
139 |         Return result of performing the given XPath query on the given node.
140 | 
141 |         All known namespace prefix-to-URI mappings in the document are
142 |         automatically included in the XPath invocation.
143 | 
144 |         If an empty/default namespace (i.e. None) is defined, this is
145 |         converted to the prefix name '_' so it can be used despite empty
146 |         namespace prefixes being unsupported by XPath.
147 |         """
148 |         if isinstance(node, etree._ElementTree):
149 |             # Document node lxml.etree._ElementTree has no nsmap, lookup root
150 |             root = self.get_impl_root(node)
151 |             namespaces_dict = root.nsmap.copy()
152 |         else:
153 |             namespaces_dict = node.nsmap.copy()
154 |         if 'namespaces' in kwargs:
155 |             namespaces_dict.update(kwargs['namespaces'])
156 |         # Empty namespace prefix is not supported, convert to '_' prefix
157 |         if None in namespaces_dict:
158 |             default_ns_uri = namespaces_dict.pop(None)
159 |             namespaces_dict['_'] = default_ns_uri
160 |         # Include XMLNS namespace if it's not already defined
161 |         if not 'xmlns' in namespaces_dict:
162 |             namespaces_dict['xmlns'] = nodes.Node.XMLNS_URI
163 |         return node.xpath(xpath, namespaces=namespaces_dict)
164 | 
165 |     # Node implementation methods
166 | 
167 |     def get_node_namespace_uri(self, node):
168 |         if '}' in node.tag:
169 |             return node.tag.split('}')[0][1:]
170 |         elif isinstance(node, LXMLAttribute):
171 |             return node.namespace_uri
172 |         elif isinstance(node, etree._ElementTree):
173 |             return None
174 |         elif isinstance(node, etree._Element):
175 |             qname, ns_uri = self._unpack_name(node.tag, node)[:2]
176 |             return ns_uri
177 |         else:
178 |             return None
179 | 
180 |     def set_node_namespace_uri(self, node, ns_uri):
181 |         node.nsmap[None] = ns_uri
182 | 
183 |     def get_node_parent(self, node):
184 |         if isinstance(node, etree._ElementTree):
185 |             return None
186 |         else:
187 |             parent = node.getparent()
188 |             # Return ElementTree as root element's parent
189 |             if parent is None:
190 |                 return self.impl_document
191 |             return parent
192 | 
193 |     def get_node_children(self, node):
194 |         if isinstance(node, etree._ElementTree):
195 |             children = [node.getroot()]
196 |         else:
197 |             if not hasattr(node, 'getchildren'):
198 |                 return []
199 |             children = node.getchildren()
200 |             # Hack to treat text attribute as child text nodes
201 |             if node.text is not None:
202 |                 children.insert(0, LXMLText(node.text, parent=node))
203 |         return children
204 | 
205 |     def get_node_name(self, node):
206 |         if isinstance(node, etree._Comment):
207 |             return '#comment'
208 |         elif isinstance(node, etree._ProcessingInstruction):
209 |             return node.target
210 |         prefix = self.get_node_name_prefix(node)
211 |         local_name = self.get_node_local_name(node)
212 |         if prefix is not None:
213 |             return '%s:%s' % (prefix, local_name)
214 |         else:
215 |             return local_name
216 | 
217 |     def get_node_local_name(self, node):
218 |         return re.sub('{.*}', '', node.tag)
219 | 
220 |     def get_node_name_prefix(self, node):
221 |         # Believe non-Element nodes that have a prefix set (e.g. LXMLAttribute)
222 |         if node.prefix and not isinstance(node, etree._Element):
223 |             return node.prefix
224 |         # Derive prefix by unpacking node name
225 |         qname, ns_uri, prefix, local_name = self._unpack_name(node.tag, node)
226 |         if prefix:
227 |             # Don't add unnecessary excess namespace prefixes for elements
228 |             # with a local default namespace declaration
229 |             xmlns_val = self.get_node_attribute_value(node, 'xmlns')
230 |             if xmlns_val == ns_uri:
231 |                 return None
232 |             # Don't add unnecessary excess namespace prefixes for default ns
233 |             if prefix == 'xmlns':
234 |                 return None
235 |             else:
236 |                 return prefix
237 |         else:
238 |             return None
239 | 
240 |     def get_node_value(self, node):
241 |         if isinstance(node, (etree._ProcessingInstruction, etree._Comment)):
242 |             return node.text
243 |         elif hasattr(node, 'value'):
244 |             return node.value
245 |         else:
246 |             return node.text
247 | 
248 |     def set_node_value(self, node, value):
249 |         if hasattr(node, 'value'):
250 |             node.value = value
251 |         else:
252 |             self.set_node_text(node, value)
253 | 
254 |     def get_node_text(self, node):
255 |         return node.text
256 | 
257 |     def set_node_text(self, node, text):
258 |         node.text = text
259 | 
260 |     def get_node_attributes(self, element, ns_uri=None):
261 |         # TODO: Filter by ns_uri
262 |         attribs_by_qname = {}
263 |         for n, v in list(element.attrib.items()):
264 |             qname, ns_uri, prefix, local_name = self._unpack_name(n, element)
265 |             attribs_by_qname[qname] = LXMLAttribute(
266 |                 qname, ns_uri, prefix, local_name, v, element)
267 |         # Include namespace declarations, which we also treat as attributes
268 |         if element.nsmap:
269 |             for n, v in list(element.nsmap.items()):
270 |                 # Only add namespace as attribute if not defined in ancestors
271 |                 # and not the global xmlns namespace
272 |                 if (self._is_ns_in_ancestor(element, n, v)
273 |                         or v == nodes.Node.XMLNS_URI):
274 |                     continue
275 |                 if n is None:
276 |                     ns_attr_name = 'xmlns'
277 |                 else:
278 |                     ns_attr_name = 'xmlns:%s' % n
279 |                 qname, ns_uri, prefix, local_name = self._unpack_name(
280 |                     ns_attr_name, element)
281 |                 attribs_by_qname[qname] = LXMLAttribute(
282 |                     qname, ns_uri, prefix, local_name, v, element)
283 |         return list(attribs_by_qname.values())
284 | 
285 |     def has_node_attribute(self, element, name, ns_uri=None):
286 |         return name in [a.qname for a
287 |                         in self.get_node_attributes(element, ns_uri)]
288 | 
289 |     def get_node_attribute_node(self, element, name, ns_uri=None):
290 |         for attr in self.get_node_attributes(element, ns_uri):
291 |             if attr.qname == name:
292 |                 return attr
293 |         return None
294 | 
295 |     def get_node_attribute_value(self, element, name, ns_uri=None):
296 |         if ns_uri is not None:
297 |             prefix = self.lookup_ns_prefix_for_uri(element, ns_uri)
298 |             name = '%s:%s' % (prefix, name)
299 |         for attr in self.get_node_attributes(element, ns_uri):
300 |             if attr.qname == name:
301 |                 return attr.value
302 |         return None
303 | 
304 |     def set_node_attribute_value(self, element, name, value, ns_uri=None):
305 |         prefix = None
306 |         if ':' in name:
307 |             prefix, name = name.split(':')
308 |         if ns_uri is None and prefix is not None:
309 |             ns_uri = self.lookup_ns_uri_by_attr_name(element, prefix)
310 |         if ns_uri is not None:
311 |             name = '{%s}%s' % (ns_uri, name)
312 |         if name.startswith('{%s}' % nodes.Node.XMLNS_URI):
313 |             if element.nsmap.get(name) != value:
314 |                 # Ideally we would apply namespace (xmlns) attributes to the
315 |                 # element's `nsmap` only, but the lxml/etree nsmap attribute
316 |                 # is immutable and there's no non-hacky way around this.
317 |                 # TODO Is there a better way?
318 |                 pass
319 |             if name.split('}')[1] == 'xmlns':
320 |                 # Hack to remove namespace URI from 'xmlns' attributes so
321 |                 # the name is just a simple string
322 |                 name = 'xmlns'
323 |             element.attrib[name] = value
324 |         else:
325 |             element.attrib[name] = value
326 | 
327 |     def remove_node_attribute(self, element, name, ns_uri=None):
328 |         if ns_uri is not None:
329 |             name = '{%s}%s' % (ns_uri, name)
330 |         elif ':' in name:
331 |             prefix, name = name.split(':')
332 |             if prefix == 'xmlns':
333 |                 name = '{%s}%s' % (nodes.Node.XMLNS_URI, name)
334 |             else:
335 |                 name = '{%s}%s' % (element.nsmap[prefix], name)
336 |         if name in element.attrib:
337 |             del(element.attrib[name])
338 | 
339 |     def add_node_child(self, parent, child, before_sibling=None):
340 |         if isinstance(child, LXMLText):
341 |             # Add text values directly to parent's 'text' attribute
342 |             if parent.text is not None:
343 |                 parent.text = parent.text + child.text
344 |             else:
345 |                 parent.text = child.text
346 |             return None
347 |         else:
348 |             if before_sibling is not None:
349 |                 offset = 0
350 |                 for c in parent.getchildren():
351 |                     if c == before_sibling:
352 |                         break
353 |                     offset += 1
354 |                 parent.insert(offset, child)
355 |             else:
356 |                 parent.append(child)
357 |             return child
358 | 
359 |     def import_node(self, parent, node, original_parent=None, clone=False):
360 |         original_node = node
361 |         if clone:
362 |             node = self.clone_node(node)
363 |         self.add_node_child(parent, node)
364 |         # Hack to remove text node content from original parent by manually
365 |         # deleting matching text content
366 |         if not clone and isinstance(original_node, LXMLText):
367 |             original_parent = self.get_node_parent(original_node)
368 |             if original_parent.text == original_node.text:
369 |                 # Must set to None if there would be no remaining text,
370 |                 # otherwise parent element won't realise it's empty
371 |                 original_parent.text = None
372 |             else:
373 |                 original_parent.text = \
374 |                     original_parent.text.replace(original_node.text, '', 1)
375 | 
376 |     def clone_node(self, node, deep=True):
377 |         if deep:
378 |             return copy.deepcopy(node)
379 |         else:
380 |             return copy.copy(node)
381 | 
382 |     def remove_node_child(self, parent, child, destroy_node=True):
383 |         if isinstance(child, LXMLText):
384 |             parent.text = None
385 |             return
386 |         parent.remove(child)
387 |         if destroy_node:
388 |             child.clear()
389 |             return None
390 |         else:
391 |             return child
392 | 
393 |     def lookup_ns_uri_by_attr_name(self, node, name):
394 |         ns_name = None
395 |         if name == 'xmlns':
396 |             ns_name = None
397 |         elif name.startswith('xmlns:'):
398 |             _, ns_name = name.split(':')
399 |         if ns_name in node.nsmap:
400 |             return node.nsmap[ns_name]
401 |         # If namespace is not in `nsmap` it may be in an XML DOM attribute
402 |         # TODO Generalize this block
403 |         curr_node = node
404 |         while (curr_node is not None
405 |                 and curr_node.__class__ != etree._ElementTree):
406 |             uri = self.get_node_attribute_value(curr_node, name)
407 |             if uri is not None:
408 |                 return uri
409 |             curr_node = self.get_node_parent(curr_node)
410 |         return None
411 | 
412 |     def lookup_ns_prefix_for_uri(self, node, uri):
413 |         if uri == nodes.Node.XMLNS_URI:
414 |             return 'xmlns'
415 |         result = None
416 |         if hasattr(node, 'nsmap') and uri in list(node.nsmap.values()):
417 |             for n, v in list(node.nsmap.items()):
418 |                 if v == uri:
419 |                     result = n
420 |                     break
421 |         # TODO This is a slow hack necessary due to lxml's immutable nsmap
422 |         if result is None or re.match('ns\d', result):
423 |             # We either have no namespace prefix in the nsmap, in which case we
424 |             # will try looking for a matching xmlns attribute, or we have
425 |             # a namespace prefix that was probably assigned automatically by
426 |             # lxml and we'd rather use a human-assigned prefix if available.
427 |             curr_node = node  # self.get_node_parent(node)
428 |             while curr_node.__class__ == etree._Element:
429 |                 for n, v in list(curr_node.attrib.items()):
430 |                     if v == uri and ('{%s}' % nodes.Node.XMLNS_URI) in n:
431 |                         result = n.split('}')[1]
432 |                         return result
433 |                 curr_node = self.get_node_parent(curr_node)
434 |         return result
435 | 
436 |     def _unpack_name(self, name, node):
437 |         qname = prefix = local_name = ns_uri = None
438 |         if name == 'xmlns':
439 |             # Namespace URI of 'xmlns' is a constant
440 |             ns_uri = nodes.Node.XMLNS_URI
441 |         elif '}' in name:
442 |             # Namespace URI is contained in {}, find URI's defined prefix
443 |             ns_uri, local_name = name.split('}')
444 |             ns_uri = ns_uri[1:]
445 |             prefix = self.lookup_ns_prefix_for_uri(node, ns_uri)
446 |         elif ':' in name:
447 |             # Namespace prefix is before ':', find prefix's defined URI
448 |             prefix, local_name = name.split(':')
449 |             if prefix == 'xmlns':
450 |                 # All 'xmlns' attributes are in XMLNS URI by definition
451 |                 ns_uri = nodes.Node.XMLNS_URI
452 |             else:
453 |                 ns_uri = self.lookup_ns_uri_by_attr_name(node, prefix)
454 |         # Catch case where a prefix other than 'xmlns' points at XMLNS URI
455 |         if name != 'xmlns' and ns_uri == nodes.Node.XMLNS_URI:
456 |             prefix = 'xmlns'
457 |         # Construct fully-qualified name from prefix + local names
458 |         if prefix is not None:
459 |             qname = '%s:%s' % (prefix, local_name)
460 |         else:
461 |             qname = local_name = name
462 |         return (qname, ns_uri, prefix, local_name)
463 | 
464 |     def _is_ns_in_ancestor(self, node, name, value):
465 |         """
466 |         Return True if the given namespace name/value is defined in an
467 |         ancestor of the given node, meaning that the given node need not
468 |         have its own attributes to apply that namespacing.
469 |         """
470 |         curr_node = self.get_node_parent(node)
471 |         while curr_node.__class__ == etree._Element:
472 |             if (hasattr(curr_node, 'nsmap')
473 |                     and curr_node.nsmap.get(name) == value):
474 |                 return True
475 |             for n, v in list(curr_node.attrib.items()):
476 |                 if v == value and '{%s}' % nodes.Node.XMLNS_URI in n:
477 |                     return True
478 |             curr_node = self.get_node_parent(curr_node)
479 |         return False
480 | 
481 | 
482 | class LXMLText(object):
483 | 
484 |     def __init__(self, text, parent=None, is_cdata=False):
485 |         self._text = text
486 |         self._parent = parent
487 |         self._is_cdata = is_cdata
488 | 
489 |     @property
490 |     def is_cdata(self):
491 |         return self._is_cdata
492 | 
493 |     @property
494 |     def value(self):
495 |         return self._text
496 | 
497 |     text = value  # Alias
498 | 
499 |     def getparent(self):
500 |         return self._parent
501 | 
502 |     @property
503 |     def prefix(self):
504 |         return None
505 | 
506 |     @property
507 |     def tag(self):
508 |         if self.is_cdata:
509 |             return "#cdata-section"
510 |         else:
511 |             return "#text"
512 | 
513 | 
514 | class LXMLAttribute(object):
515 | 
516 |     def __init__(self, qname, ns_uri, prefix, local_name, value, element):
517 |         self._qname, self._ns_uri, self._prefix, self._local_name = (
518 |             qname, ns_uri, prefix, local_name)
519 |         self._value, self._element = (value, element)
520 | 
521 |     def getroottree(self):
522 |         return self._element.getroottree()
523 | 
524 |     @property
525 |     def qname(self):
526 |         return self._qname
527 | 
528 |     @property
529 |     def namespace_uri(self):
530 |         return self._ns_uri
531 | 
532 |     @property
533 |     def prefix(self):
534 |         return self._prefix
535 | 
536 |     @property
537 |     def local_name(self):
538 |         return self._local_name
539 | 
540 |     @property
541 |     def value(self):
542 |         return self._value
543 | 
544 |     name = tag = local_name  # Alias
545 | 


--------------------------------------------------------------------------------
/xml4h/impls/xml_dom_minidom.py:
--------------------------------------------------------------------------------
  1 | from six import StringIO, BytesIO
  2 | 
  3 | from xml4h.impls.interface import XmlImplAdapter
  4 | from xml4h import nodes, exceptions
  5 | 
  6 | import xml.dom
  7 | import xml.dom.minidom
  8 | 
  9 | 
 10 | class XmlDomImplAdapter(XmlImplAdapter):
 11 |     """
 12 |     Adapter to the
 13 |     `minidom <http://docs.python.org/2/library/xml.dom.minidom.html>`_ XML
 14 |     library implementation.
 15 |     """
 16 | 
 17 |     @classmethod
 18 |     def is_available(cls):
 19 |         try:
 20 |             xml.dom.Node
 21 |             return True
 22 |         except:
 23 |             return False
 24 | 
 25 |     @classmethod
 26 |     def parse_string(cls, xml_str, ignore_whitespace_text_nodes=True):
 27 |         return cls.parse_file(StringIO(xml_str), ignore_whitespace_text_nodes)
 28 | 
 29 |     @classmethod
 30 |     def parse_bytes(cls, xml_bytes, ignore_whitespace_text_nodes=True):
 31 |         return cls.parse_file(BytesIO(xml_bytes), ignore_whitespace_text_nodes)
 32 | 
 33 |     @classmethod
 34 |     def parse_file(cls, xml_file, ignore_whitespace_text_nodes=True):
 35 |         impl_doc = xml.dom.minidom.parse(xml_file)
 36 |         wrapped_doc = XmlDomImplAdapter.wrap_document(impl_doc)
 37 |         if ignore_whitespace_text_nodes:
 38 |             cls.ignore_whitespace_text_nodes(wrapped_doc)
 39 |         return wrapped_doc
 40 | 
 41 |     @classmethod
 42 |     def new_impl_document(cls, root_tagname, ns_uri=None,
 43 |             doctype=None, impl_features=None):
 44 |         # Create DOM implementation factory
 45 |         if impl_features is None:
 46 |             impl_features = []
 47 |         factory = xml.dom.getDOMImplementation('minidom', impl_features)
 48 |         # Create Document from factory
 49 |         doc = factory.createDocument(ns_uri, root_tagname, doctype)
 50 |         return doc
 51 | 
 52 |     def map_node_to_class(self, impl_node):
 53 |         try:
 54 |             return {
 55 |                 xml.dom.Node.ELEMENT_NODE: nodes.Element,
 56 |                 xml.dom.Node.ATTRIBUTE_NODE: nodes.Attribute,
 57 |                 xml.dom.Node.TEXT_NODE: nodes.Text,
 58 |                 xml.dom.Node.CDATA_SECTION_NODE: nodes.CDATA,
 59 |                 # EntityReference not supported by minidom
 60 |                 #xml.dom.Node.ENTITY_REFERENCE: nodes.EntityReference,
 61 |                 xml.dom.Node.ENTITY_NODE: nodes.Entity,
 62 |                 xml.dom.Node.PROCESSING_INSTRUCTION_NODE:
 63 |                     nodes.ProcessingInstruction,
 64 |                 xml.dom.Node.COMMENT_NODE: nodes.Comment,
 65 |                 xml.dom.Node.DOCUMENT_NODE: nodes.Document,
 66 |                 xml.dom.Node.DOCUMENT_TYPE_NODE: nodes.DocumentType,
 67 |                 xml.dom.Node.DOCUMENT_FRAGMENT_NODE: nodes.DocumentFragment,
 68 |                 xml.dom.Node.NOTATION_NODE: nodes.Notation,
 69 |                 }[impl_node.nodeType]
 70 |         except KeyError:
 71 |             raise exceptions.Xml4hImplementationBug(
 72 |                 'Unrecognized type for implementation node: %s' % impl_node)
 73 | 
 74 |     def get_impl_root(self, node):
 75 |         return node.documentElement
 76 | 
 77 |     def new_impl_element(self, tagname, ns_uri=None, parent=None):
 78 |         return self.impl_document.createElementNS(ns_uri, tagname)
 79 | 
 80 |     def new_impl_text(self, text):
 81 |         return self.impl_document.createTextNode(text)
 82 | 
 83 |     def new_impl_comment(self, text):
 84 |         return self.impl_document.createComment(text)
 85 | 
 86 |     def new_impl_instruction(self, target, data):
 87 |         return self.impl_document.createProcessingInstruction(target, data)
 88 | 
 89 |     def new_impl_cdata(self, text):
 90 |         return self.impl_document.createCDATASection(text)
 91 | 
 92 |     def find_node_elements(self, node, name='*', ns_uri='*'):
 93 |         return node.getElementsByTagNameNS(ns_uri, name)
 94 | 
 95 |     def get_node_namespace_uri(self, node):
 96 |         return node.namespaceURI
 97 | 
 98 |     def set_node_namespace_uri(self, node, ns_uri):
 99 |         node.namespaceURI = ns_uri
100 | 
101 |     def get_node_parent(self, element):
102 |         return element.parentNode
103 | 
104 |     def get_node_children(self, element):
105 |         return element.childNodes
106 | 
107 |     def get_node_name(self, node):
108 |         if node.nodeType not in (
109 |             xml.dom.Node.ELEMENT_NODE, xml.dom.Node.ATTRIBUTE_NODE
110 |         ):
111 |             return node.nodeName
112 |         # Special handling of node names for Element and Attribute nodes where
113 |         # we want to exclude the namespace prefix in some cases
114 |         prefix = self.get_node_name_prefix(node)
115 |         local_name = self.get_node_local_name(node)
116 |         if prefix is not None:
117 |             return '%s:%s' % (prefix, local_name)
118 |         else:
119 |             return local_name
120 | 
121 |     def get_node_local_name(self, node):
122 |         return node.localName
123 | 
124 |     def get_node_name_prefix(self, node):
125 |         prefix = node.prefix
126 |         # Don't add unnecessary excess namespace prefixes for elements
127 |         # with a local default namespace declaration
128 |         if prefix and node.nodeType == xml.dom.Node.ELEMENT_NODE:
129 |             xmlns_val = self.get_node_attribute_value(node, 'xmlns')
130 |             if xmlns_val == self.get_node_namespace_uri(node):
131 |                 return None
132 |         return prefix
133 | 
134 |     def get_node_value(self, node):
135 |         return node.nodeValue
136 | 
137 |     def set_node_value(self, node, value):
138 |         node.nodeValue = value
139 | 
140 |     def get_node_text(self, node):
141 |         """
142 |         Return contatenated value of all text node children of this element
143 |         """
144 |         text_children = [n.nodeValue for n in self.get_node_children(node)
145 |                          if n.nodeType == xml.dom.Node.TEXT_NODE]
146 |         if text_children:
147 |             return ''.join(text_children)
148 |         else:
149 |             return None
150 | 
151 |     def set_node_text(self, node, text):
152 |         """
153 |         Set text value as sole Text child node of element; any existing
154 |         Text nodes are removed
155 |         """
156 |         # Remove any existing Text node children
157 |         for child in self.get_node_children(node):
158 |             if child.nodeType == xml.dom.Node.TEXT_NODE:
159 |                 self.remove_node_child(node, child, True)
160 |         if text is not None:
161 |             text_node = self.new_impl_text(text)
162 |             self.add_node_child(node, text_node)
163 | 
164 |     def get_node_attributes(self, element, ns_uri=None):
165 |         attr_nodes = []
166 |         if not element.attributes:
167 |             return attr_nodes
168 |         for attr_name in list(element.attributes.keys()):
169 |             if self.has_node_attribute(element, attr_name, ns_uri):
170 |                 attr_nodes.append(
171 |                     self.get_node_attribute_node(element, attr_name, ns_uri))
172 |         return attr_nodes
173 | 
174 |     def has_node_attribute(self, element, name, ns_uri=None):
175 |         if ns_uri is not None:
176 |             return element.hasAttributeNS(ns_uri, name)
177 |         else:
178 |             return element.hasAttribute(name)
179 | 
180 |     def get_node_attribute_node(self, element, name, ns_uri=None):
181 |         if ns_uri is not None:
182 |             return element.getAttributeNodeNS(ns_uri, name)
183 |         else:
184 |             return element.getAttributeNode(name)
185 | 
186 |     def get_node_attribute_value(self, element, name, ns_uri=None):
187 |         if isinstance(element, xml.dom.minidom.Document):
188 |             return None
189 |         if ns_uri is not None:
190 |             result = element.getAttributeNS(ns_uri, name)
191 |         else:
192 |             result = element.getAttribute(name)
193 |         # Minidom returns empty string for non-existent nodes, correct this
194 |         if result == '' and not name in list(element.attributes.keys()):
195 |             return None
196 |         return result
197 | 
198 |     def set_node_attribute_value(self, element, name, value, ns_uri=None):
199 |         element.setAttributeNS(ns_uri, name, value)
200 | 
201 |     def remove_node_attribute(self, element, name, ns_uri=None):
202 |         if ns_uri is not None:
203 |             element.removeAttributeNS(ns_uri, name)
204 |         else:
205 |             element.removeAttribute(name)
206 | 
207 |     def add_node_child(self, parent, child, before_sibling=None):
208 |         if before_sibling is not None:
209 |             parent.insertBefore(child, before_sibling)
210 |         else:
211 |             parent.appendChild(child)
212 | 
213 |     def import_node(self, parent, node, original_parent=None, clone=False):
214 |         if clone:
215 |             node = self.clone_node(node)
216 |         self.add_node_child(parent, node)
217 | 
218 |     def clone_node(self, node, deep=True):
219 |         return node.cloneNode(deep)
220 | 
221 |     def remove_node_child(self, parent, child, destroy_node=True):
222 |         parent.removeChild(child)
223 |         if destroy_node:
224 |             child.unlink()
225 |             return None
226 |         else:
227 |             return child
228 | 
229 |     def lookup_ns_uri_by_attr_name(self, node, name):
230 |         curr_node = node
231 |         while curr_node is not None:
232 |             value = self.get_node_attribute_value(curr_node, name)
233 |             if value is not None:
234 |                 return value
235 |             curr_node = self.get_node_parent(curr_node)
236 |         return None
237 | 
238 |     def lookup_ns_prefix_for_uri(self, node, uri):
239 |         curr_node = node
240 |         while curr_node:
241 |             attrs = self.get_node_attributes(curr_node)
242 |             for attr in attrs:
243 |                 if attr.value == uri:
244 |                     if ':' in attr.name:
245 |                         return attr.name.split(':')[1]
246 |                     else:
247 |                         return attr.name
248 |             curr_node = self.get_node_parent(curr_node)
249 |         return None
250 | 


--------------------------------------------------------------------------------
/xml4h/impls/xml_etree_elementtree.py:
--------------------------------------------------------------------------------
  1 | import re
  2 | import copy
  3 | 
  4 | import six
  5 | 
  6 | from xml4h.impls.interface import XmlImplAdapter
  7 | from xml4h import nodes, exceptions
  8 | 
  9 | # Import the pure-Python ElementTree implementation, if possible
 10 | try:
 11 |     import xml.etree.ElementTree as PythonET
 12 |     # Re-import non-C ElementTree with a definitive name, for cases where we
 13 |     # must explicilty use non-C-based elements of ElementTree.
 14 |     import xml.etree.ElementTree as BaseET
 15 | except ImportError:
 16 |     pass
 17 | 
 18 | # Import the C-based ElementTree implementation, if possible
 19 | try:
 20 |     import xml.etree.cElementTree as cET
 21 | except ImportError:
 22 |     pass
 23 | 
 24 | 
 25 | class ElementTreeAdapter(XmlImplAdapter):
 26 |     """
 27 |     Adapter to the
 28 |     `ElementTree <http://docs.python.org/2/library/xml.etree.elementtree.html>`_
 29 |     XML library.
 30 | 
 31 |     This code *must* work with either the base ElementTree pure python
 32 |     implementation or the C-based cElementTree implementation, since it is
 33 |     reused in the `cElementTree` class defined below.
 34 |     """
 35 | 
 36 |     ET = PythonET  # Use the pure-Python implementation
 37 | 
 38 |     SUPPORTED_FEATURES = {
 39 |         'xpath': True,
 40 |         }
 41 | 
 42 |     @classmethod
 43 |     def is_available(cls):
 44 |         # Is vital piece of ElementTree module available at all?
 45 |         try:
 46 |             cls.ET.Element
 47 |         except:
 48 |             return False
 49 |         # We only support ElementTree version 1.3+
 50 |         from distutils.version import StrictVersion
 51 |         return StrictVersion(BaseET.VERSION) >= StrictVersion('1.3')
 52 | 
 53 |     @classmethod
 54 |     def parse_string(cls, xml_str, ignore_whitespace_text_nodes=True):
 55 |         return cls.parse_file(
 56 |             six.StringIO(xml_str),
 57 |             ignore_whitespace_text_nodes=ignore_whitespace_text_nodes)
 58 | 
 59 |     @classmethod
 60 |     def parse_bytes(cls, xml_bytes, ignore_whitespace_text_nodes=True):
 61 |         return cls.parse_file(
 62 |             six.BytesIO(xml_bytes),
 63 |             ignore_whitespace_text_nodes=ignore_whitespace_text_nodes)
 64 | 
 65 |     @classmethod
 66 |     def parse_file(cls, xml_file_path, ignore_whitespace_text_nodes=True):
 67 |         # To retain explicit xmlns namespace definition attributes, we need to
 68 |         # manually add these elements to the parsed DOM as we go using
 69 |         # iterative parsing per:
 70 |         # effbot.org/zone/element-namespaces.htm#preserving-existing-namespace-attributes
 71 |         events = ('start', 'start-ns')
 72 |         impl_root = None
 73 |         ns_list = []
 74 |         for event, node in cls.ET.iterparse(xml_file_path, events):
 75 |             if event == 'start-ns':
 76 |                 # Track namespaces as nodes declared
 77 |                 ns_list.append(node)
 78 |             elif event == 'start':
 79 |                 # Recognise and retain root node
 80 |                 if impl_root is None:
 81 |                     impl_root = node
 82 |                 # Add xmlns attributes for each namespace declared
 83 |                 for ns_prefix, ns_uri in ns_list:
 84 |                     if ns_prefix:
 85 |                         attr_name = 'xmlns:%s' % ns_prefix
 86 |                     else:
 87 |                         attr_name = 'xmlns'
 88 |                     node.set(attr_name, ns_uri)
 89 |                 # Reset namespace list now the corresponding attributes exist
 90 |                 ns_list = []
 91 | 
 92 |         impl_doc = cls.ET.ElementTree(impl_root)
 93 |         wrapped_doc = cls.wrap_document(impl_doc)
 94 |         if ignore_whitespace_text_nodes:
 95 |             cls.ignore_whitespace_text_nodes(wrapped_doc)
 96 |         return wrapped_doc
 97 | 
 98 |     @classmethod
 99 |     def new_impl_document(cls, root_tagname, ns_uri=None, **kwargs):
100 |         root_nsmap = {}
101 |         if ns_uri is not None:
102 |             root_nsmap[None] = ns_uri
103 |         else:
104 |             ns_uri = nodes.Node.XMLNS_URI
105 |             root_nsmap[None] = ns_uri
106 |         root_elem = cls.ET.Element('{%s}%s' % (ns_uri, root_tagname))
107 |         doc = cls.ET.ElementTree(root_elem)
108 |         return doc
109 | 
110 |     # This method is called by interface super-class's __init__
111 |     def clear_caches(self):
112 |         self.CACHED_ANCESTRY_DICT = {}
113 | 
114 |     def _lookup_node_parent(self, node):
115 |         """
116 |         Return the parent of the given node, based on an internal dictionary
117 |         mapping of child nodes to the child's parent required since
118 |         ElementTree doesn't make info about node ancestry/parentage available.
119 |         """
120 |         # Basic caching of our internal ancestry dict to help performance
121 |         if not node in self.CACHED_ANCESTRY_DICT:
122 |             # Given node isn't in cached ancestry dictionary, rebuild this now
123 |             ancestry_dict = dict(
124 |                 (c, p) for p in self._impl_document.getiterator() for c in p)
125 |             self.CACHED_ANCESTRY_DICT = ancestry_dict
126 |         return self.CACHED_ANCESTRY_DICT[node]
127 | 
128 |     def _is_node_an_element(self, node):
129 |         """
130 |         Return True if the given node is an ElementTree Element, a fact that
131 |         can be tricky to determine if the cElementTree implementation is
132 |         used.
133 |         """
134 |         # Try the simplest approach first, works for plain old ElementTree
135 |         if isinstance(node, BaseET.Element):
136 |             return True
137 |         # For cElementTree we need to be more cunning (or find a better way)
138 |         if hasattr(node, 'makeelement') \
139 |                 and isinstance(node.tag, six.string_types):
140 |             return True
141 | 
142 |     def map_node_to_class(self, node):
143 |         if isinstance(node, BaseET.ElementTree):
144 |             return nodes.Document
145 |         elif node.tag == BaseET.ProcessingInstruction:
146 |             return nodes.ProcessingInstruction
147 |         elif node.tag == BaseET.Comment:
148 |             return nodes.Comment
149 |         elif isinstance(node, ETAttribute):
150 |             return nodes.Attribute
151 |         elif isinstance(node, ElementTreeText):
152 |             if node.is_cdata:
153 |                 return nodes.CDATA
154 |             else:
155 |                 return nodes.Text
156 |         elif self._is_node_an_element(node):
157 |             return nodes.Element
158 |         raise exceptions.Xml4hImplementationBug(
159 |             'Unrecognized type for implementation node: %s' % node)
160 | 
161 |     def get_impl_root(self, node):
162 |         return self._impl_document.getroot()
163 | 
164 |     # Document implementation methods
165 | 
166 |     def new_impl_element(self, tagname, ns_uri=None, parent=None):
167 |         if ns_uri is not None:
168 |             if ':' in tagname:
169 |                 tagname = tagname.split(':')[1]
170 |             element = self.ET.Element('{%s}%s' % (ns_uri, tagname))
171 |             return element
172 |         else:
173 |             return self.ET.Element(tagname)
174 | 
175 |     def new_impl_text(self, text):
176 |         return ElementTreeText(text)
177 | 
178 |     def new_impl_comment(self, text):
179 |         return self.ET.Comment(text)
180 | 
181 |     def new_impl_instruction(self, target, data):
182 |         return self.ET.ProcessingInstruction(target, data)
183 | 
184 |     def new_impl_cdata(self, text):
185 |         return ElementTreeText(text, is_cdata=True)
186 | 
187 |     def find_node_elements(self, node, name='*', ns_uri='*'):
188 |         # TODO Any proper way to find namespaced elements by name?
189 |         name_match_nodes = node.getiterator()
190 |         # Filter nodes by name and ns_uri if necessary
191 |         results = []
192 |         for n in name_match_nodes:
193 |             # Ignore the current node
194 |             if n == node:
195 |                 continue
196 |             # Ignore non-Elements
197 |             if not isinstance(n.tag, six.string_types):
198 |                 continue
199 |             if ns_uri != '*' and self.get_node_namespace_uri(n) != ns_uri:
200 |                 continue
201 |             if name != '*' and self.get_node_local_name(n) != name:
202 |                 continue
203 |             results.append(n)
204 |         return results
205 |     find_node_elements.__doc__ = XmlImplAdapter.find_node_elements.__doc__
206 | 
207 |     def xpath_on_node(self, node, xpath, **kwargs):
208 |         """
209 |         Return result of performing the given XPath query on the given node.
210 | 
211 |         All known namespace prefix-to-URI mappings in the document are
212 |         automatically included in the XPath invocation.
213 | 
214 |         If an empty/default namespace (i.e. None) is defined, this is
215 |         converted to the prefix name '_' so it can be used despite empty
216 |         namespace prefixes being unsupported by XPath.
217 |         """
218 |         namespaces_dict = {}
219 |         if 'namespaces' in kwargs:
220 |             namespaces_dict.update(kwargs['namespaces'])
221 |         # Empty namespace prefix is not supported, convert to '_' prefix
222 |         if None in namespaces_dict:
223 |             default_ns_uri = namespaces_dict.pop(None)
224 |             namespaces_dict['_'] = default_ns_uri
225 |         # If no default namespace URI defined, use root's namespace (if any)
226 |         if not '_' in namespaces_dict:
227 |             root = self.get_impl_root(node)
228 |             qname, ns_uri, prefix, local_name = self._unpack_name(
229 |                 root.tag, root)
230 |             if ns_uri:
231 |                 namespaces_dict['_'] = ns_uri
232 |         # Include XMLNS namespace if it's not already defined
233 |         if not 'xmlns' in namespaces_dict:
234 |             namespaces_dict['xmlns'] = nodes.Node.XMLNS_URI
235 |         return node.findall(xpath, namespaces_dict)
236 | 
237 |     # Node implementation methods
238 | 
239 |     def get_node_namespace_uri(self, node):
240 |         if '}' in node.tag:
241 |             return node.tag.split('}')[0][1:]
242 |         elif isinstance(node, ETAttribute):
243 |             return node.namespace_uri
244 |         elif self._is_node_an_element(node):
245 |             qname, ns_uri = self._unpack_name(node.tag, node)[:2]
246 |             return ns_uri
247 |         else:
248 |             return None
249 | 
250 |     def set_node_namespace_uri(self, node, ns_uri):
251 |         qname, orig_ns_uri, prefix, local_name = self._unpack_name(
252 |             node.tag, node)
253 |         node.tag = '{%s}%s' % (ns_uri, local_name)
254 | 
255 |     def get_node_parent(self, node):
256 |         parent = None
257 |         # Root document has no parent
258 |         if isinstance(node, BaseET.ElementTree):
259 |             pass
260 |         elif hasattr(node, 'getparent'):
261 |             parent = node.getparent()
262 |         # Return ElementTree as root element's parent
263 |         elif node == self.get_impl_root(node):
264 |             parent = self._impl_document
265 |         else:
266 |             parent = self._lookup_node_parent(node)
267 |         return parent
268 | 
269 |     def get_node_children(self, node):
270 |         if isinstance(node, BaseET.ElementTree):
271 |             children = [node.getroot()]
272 |         else:
273 |             if not hasattr(node, 'getchildren'):
274 |                 return []
275 |             children = list(node.getchildren())
276 |             # Hack to treat text attribute as child text nodes
277 |             if node.text is not None:
278 |                 children.insert(0, ElementTreeText(node.text, parent=node))
279 |         return children
280 | 
281 |     def get_node_name(self, node):
282 |         if node.tag == BaseET.Comment:
283 |             return '#comment'
284 |         elif node.tag == BaseET.ProcessingInstruction:
285 |             name, target = node.text.split(' ')
286 |             return name
287 |         prefix = self.get_node_name_prefix(node)
288 |         if prefix is not None:
289 |             return '%s:%s' % (prefix, self.get_node_local_name(node))
290 |         else:
291 |             return self.get_node_local_name(node)
292 | 
293 |     def get_node_local_name(self, node):
294 |         return re.sub('{.*}', '', node.tag)
295 | 
296 |     def get_node_name_prefix(self, node):
297 |         # Ignore non-elements
298 |         if not isinstance(node.tag, six.string_types):
299 |             return None
300 |         # Believe nodes that have their own prefix (likely only ETAttribute)
301 |         prefix = getattr(node, 'prefix', None)
302 |         if prefix:
303 |             return prefix
304 |         # Derive prefix by unpacking node name
305 |         qname, ns_uri, prefix, local_name = self._unpack_name(node.tag, node)
306 |         if prefix:
307 |             # Don't add unnecessary excess namespace prefixes for elements
308 |             # with a local default namespace declaration
309 |             if node.attrib.get('xmlns') == ns_uri:
310 |                 return None
311 |             # Don't add unnecessary excess namespace prefixes for default ns
312 |             elif prefix == 'xmlns':
313 |                 return None
314 |             else:
315 |                 return prefix
316 |         else:
317 |             return None
318 | 
319 |     def get_node_value(self, node):
320 |         if node.tag == BaseET.ProcessingInstruction:
321 |             name, target = node.text.split(' ')
322 |             return target
323 |         elif node.tag == BaseET.Comment:
324 |             return node.text
325 |         elif hasattr(node, 'value'):
326 |             return node.value
327 |         else:
328 |             return node.text
329 | 
330 |     def set_node_value(self, node, value):
331 |         if hasattr(node, 'value'):
332 |             node.value = value
333 |         else:
334 |             self.set_node_text(node, value)
335 | 
336 |     def get_node_text(self, node):
337 |         return node.text
338 | 
339 |     def set_node_text(self, node, text):
340 |         node.text = text
341 | 
342 |     def get_node_attributes(self, element, ns_uri=None):
343 |         # TODO: Filter by ns_uri
344 |         attribs_by_qname = {}
345 |         for n, v in list(element.attrib.items()):
346 |             qname, ns_uri, prefix, local_name = self._unpack_name(n, element)
347 |             attribs_by_qname[qname] = ETAttribute(
348 |                 qname, ns_uri, prefix, local_name, v, element)
349 |         return list(attribs_by_qname.values())
350 | 
351 |     def has_node_attribute(self, element, name, ns_uri=None):
352 |         return name in [a.qname for a
353 |                         in self.get_node_attributes(element, ns_uri)]
354 | 
355 |     def get_node_attribute_node(self, element, name, ns_uri=None):
356 |         for attr in self.get_node_attributes(element, ns_uri):
357 |             if attr.qname == name:
358 |                 return attr
359 |         return None
360 | 
361 |     def get_node_attribute_value(self, element, name, ns_uri=None):
362 |         if ns_uri is not None:
363 |             prefix = self.lookup_ns_prefix_for_uri(element, ns_uri)
364 |             name = '%s:%s' % (prefix, name)
365 |         for attr in self.get_node_attributes(element, ns_uri):
366 |             if attr.qname == name:
367 |                 return attr.value
368 |         return None
369 | 
370 |     def set_node_attribute_value(self, element, name, value, ns_uri=None):
371 |         prefix = None
372 |         if ':' in name:
373 |             prefix, name = name.split(':')
374 |         if ns_uri is None and prefix is not None:
375 |             ns_uri = self.lookup_ns_uri_by_attr_name(element, prefix)
376 |         if ns_uri is not None:
377 |             name = '{%s}%s' % (ns_uri, name)
378 |         if name.startswith('{%s}' % nodes.Node.XMLNS_URI):
379 |             if name.split('}')[1] == 'xmlns':
380 |                 # Hack to remove namespace URI from 'xmlns' attributes so
381 |                 # the name is just a simple string
382 |                 name = 'xmlns'
383 |             element.attrib[name] = value
384 |         else:
385 |             element.attrib[name] = value
386 | 
387 |     def remove_node_attribute(self, element, name, ns_uri=None):
388 |         if ns_uri is not None:
389 |             name = '{%s}%s' % (ns_uri, name)
390 |         elif ':' in name:
391 |             prefix, local_name = name.split(':')
392 |             if prefix != 'xmlns':
393 |                 ns_attr_name = 'xmlns:%s' % prefix
394 |                 ns_uri = self.lookup_ns_uri_by_attr_name(element, ns_attr_name)
395 |                 name = '{%s}%s' % (ns_uri, local_name)
396 |         if name in element.attrib:
397 |             del(element.attrib[name])
398 | 
399 |     def add_node_child(self, parent, child, before_sibling=None):
400 |         if isinstance(child, ElementTreeText):
401 |             # Add text values directly to parent's 'text' attribute
402 |             if parent.text is not None:
403 |                 parent.text = parent.text + child.text
404 |             else:
405 |                 parent.text = child.text
406 |             self.CACHED_ANCESTRY_DICT[child] = parent
407 |             return None
408 |         else:
409 |             if before_sibling is not None:
410 |                 offset = 0
411 |                 for c in parent.getchildren():
412 |                     if c == before_sibling:
413 |                         break
414 |                     offset += 1
415 |                 parent.insert(offset, child)
416 |             else:
417 |                 parent.append(child)
418 |             self.CACHED_ANCESTRY_DICT[child] = parent
419 |             return child
420 | 
421 |     def import_node(self, parent, node, original_parent=None, clone=False):
422 |         original_node = node
423 |         # We always clone for (c)ElementTree adapter so we can remove original
424 |         # if necessary
425 |         node = self.clone_node(node)
426 |         self.add_node_child(parent, node)
427 |         # Hack to remove text node content from original parent by manually
428 |         # deleting matching text content
429 |         if not clone:
430 |             if isinstance(original_node, ElementTreeText):
431 |                 original_parent = self.get_node_parent(original_node)
432 |                 if original_parent.text == original_node.text:
433 |                     # Must set to None if there would be no remaining text,
434 |                     # otherwise parent element won't realise it's empty
435 |                     original_parent.text = None
436 |                 else:
437 |                     original_parent.text = \
438 |                         original_parent.text.replace(original_node.text, '', 1)
439 |             else:
440 |                 original_parent.remove(original_node)
441 | 
442 |     def clone_node(self, node, deep=True):
443 |         if deep:
444 |             return copy.deepcopy(node)
445 |         else:
446 |             return copy.copy(node)
447 | 
448 |     def remove_node_child(self, parent, child, destroy_node=True):
449 |         if isinstance(child, ElementTreeText):
450 |             child._parent.text = None
451 |             return
452 |         parent.remove(child)
453 |         if destroy_node:
454 |             child.clear()
455 |             return None
456 |         else:
457 |             return child
458 | 
459 |     def lookup_ns_uri_by_attr_name(self, node, name):
460 |         curr_node = node
461 |         while (curr_node is not None
462 |                 and not isinstance(curr_node, BaseET.ElementTree)):
463 |             uri = self.get_node_attribute_value(curr_node, name)
464 |             if uri is not None:
465 |                 return uri
466 |             curr_node = self.get_node_parent(curr_node)
467 |         return None
468 | 
469 |     def lookup_ns_prefix_for_uri(self, node, uri):
470 |         if uri == nodes.Node.XMLNS_URI:
471 |             return 'xmlns'
472 |         result = None
473 |         # Lookup namespace URI in ET's awful global namespace/prefix registry
474 |         if hasattr(BaseET, '_namespace_map') and uri in BaseET._namespace_map:
475 |             result = BaseET._namespace_map[uri]
476 |             if result == '':
477 |                 result = None
478 |         if result is None or re.match('ns\d', result):
479 |             # We either have no namespace prefix in the global mapping, in
480 |             # which case we will try looking for a matching xmlns attribute,
481 |             # or we have a namespace prefix that was probably assigned
482 |             # automatically by ElementTree and we'd rather use a
483 |             # human-assigned prefix if available.
484 |             curr_node = node
485 |             while self._is_node_an_element(curr_node):
486 |                 for n, v in list(curr_node.attrib.items()):
487 |                     if v == uri:
488 |                         if n.startswith('xmlns:'):
489 |                             result = n.split(':')[1]
490 |                             return result
491 |                         elif n.startswith('{%s}' % nodes.Node.XMLNS_URI):
492 |                             result = n.split('}')[1]
493 |                             return result
494 |                 curr_node = self.get_node_parent(curr_node)
495 |         return result
496 | 
497 |     def _unpack_name(self, name, node):
498 |         qname = prefix = local_name = ns_uri = None
499 |         if name == 'xmlns':
500 |             # Namespace URI of 'xmlns' is a constant
501 |             ns_uri = nodes.Node.XMLNS_URI
502 |         elif '}' in name:
503 |             # Namespace URI is contained in {}, find URI's defined prefix
504 |             ns_uri, local_name = name.split('}')
505 |             ns_uri = ns_uri[1:]
506 |             prefix = self.lookup_ns_prefix_for_uri(node, ns_uri)
507 |         elif ':' in name:
508 |             # Namespace prefix is before ':', find prefix's defined URI
509 |             prefix, local_name = name.split(':')
510 |             if prefix == 'xmlns':
511 |                 # All 'xmlns' attributes are in XMLNS URI by definition
512 |                 ns_uri = nodes.Node.XMLNS_URI
513 |             else:
514 |                 ns_uri = self.lookup_ns_uri_by_attr_name(node, prefix)
515 |         # Catch case where a prefix other than 'xmlns' points at XMLNS URI
516 |         if name != 'xmlns' and ns_uri == nodes.Node.XMLNS_URI:
517 |             prefix = 'xmlns'
518 |         # Construct fully-qualified name from prefix + local names
519 |         if prefix is not None:
520 |             qname = '%s:%s' % (prefix, local_name)
521 |         else:
522 |             qname = local_name = name
523 |         return (qname, ns_uri, prefix, local_name)
524 | 
525 | 
526 | class ElementTreeText(object):
527 | 
528 |     def __init__(self, text, parent=None, is_cdata=False):
529 |         self._text = text
530 |         self._parent = parent
531 |         self._is_cdata = is_cdata
532 | 
533 |     @property
534 |     def is_cdata(self):
535 |         return self._is_cdata
536 | 
537 |     @property
538 |     def value(self):
539 |         return self._text
540 | 
541 |     text = value  # Alias
542 | 
543 |     def getparent(self):
544 |         return self._parent
545 | 
546 |     @property
547 |     def prefix(self):
548 |         return None
549 | 
550 |     @property
551 |     def tag(self):
552 |         if self.is_cdata:
553 |             return "#cdata-section"
554 |         else:
555 |             return "#text"
556 | 
557 | 
558 | class ETAttribute(object):
559 | 
560 |     def __init__(self, qname, ns_uri, prefix, local_name, value, element):
561 |         self._qname, self._ns_uri, self._prefix, self._local_name = (
562 |             qname, ns_uri, prefix, local_name)
563 |         self._value, self._element = (value, element)
564 | 
565 |     def getroottree(self):
566 |         return self._element.getroottree()
567 | 
568 |     @property
569 |     def qname(self):
570 |         return self._qname
571 | 
572 |     @property
573 |     def namespace_uri(self):
574 |         return self._ns_uri
575 | 
576 |     @property
577 |     def prefix(self):
578 |         return self._prefix
579 | 
580 |     @property
581 |     def local_name(self):
582 |         return self._local_name
583 | 
584 |     @property
585 |     def value(self):
586 |         return self._value
587 | 
588 |     name = tag = local_name  # Alias
589 | 
590 | 
591 | class cElementTreeAdapter(ElementTreeAdapter):
592 |     """
593 |     Adapter to the C-based implementation of the
594 |     `ElementTree <http://docs.python.org/2/library/xml.etree.elementtree.html>`_
595 |     XML library.
596 |     """
597 | 
598 |     ET = cET  # Use the C-based implementation
599 | 
600 |     @classmethod
601 |     def is_available(cls):
602 |         if not super(cElementTreeAdapter, cls).is_available():
603 |             return False
604 |         # We only support cElementTree version 1.0.6+
605 |         from distutils.version import StrictVersion
606 |         return StrictVersion(cls.ET.VERSION) >= StrictVersion('1.0.6')
607 | 


--------------------------------------------------------------------------------
/xml4h/writer.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Writer to serialize XML DOM documents or sections to text.
  3 | """
  4 | # This implementation is adapted (heavily) from the standard library method
  5 | # xml.dom.minidom.writexml
  6 | import six
  7 | 
  8 | import codecs
  9 | 
 10 | from xml4h import exceptions
 11 | 
 12 | 
 13 | def write_node(node, writer, encoding='utf-8', indent=0, newline='',
 14 |         omit_declaration=False, node_depth=0, quote_char='"'):
 15 |     """
 16 |     Serialize an *xml4h* DOM node and its descendants to text, writing
 17 |     the output to the given *writer*.
 18 | 
 19 |     :param node: the DOM node whose content and descendants will
 20 |         be serialized.
 21 |     :type node: an :class:`xml4h.nodes.Node` or subclass
 22 |     :param writer: a file or stream to which XML text is written.
 23 |     :type writer: a file, stream, etc
 24 |     :param string encoding: the character encoding for serialized text.
 25 |     :param indent: indentation prefix to apply to descendent nodes for
 26 |         pretty-printing. The value can take many forms:
 27 | 
 28 |         - *int*: the number of spaces to indent. 0 means no indent.
 29 |         - *string*: a literal prefix for indented nodes, such as ``\\t``.
 30 |         - *bool*: no indent if *False*, four spaces indent if *True*.
 31 |         - *None*: no indent.
 32 |     :type indent: string, int, bool, or None
 33 |     :param newline: the string value used to separate lines of output.
 34 |         The value can take a number of forms:
 35 | 
 36 |         - *string*: the literal newline value, such as ``\\n`` or ``\\r``.
 37 |           An empty string means no newline.
 38 |         - *bool*: no newline if *False*, ``\\n`` newline if *True*.
 39 |         - *None*: no newline.
 40 |     :type newline: string, bool, or None
 41 |     :param boolean omit_declaration: if *True* the XML declaration header
 42 |         is omitted, otherwise it is included. Note that the declaration is
 43 |         only output when serializing an :class:`xml4h.nodes.Document` node.
 44 |     :param int node_depth: the indentation level to start at, such as 2 to
 45 |         indent output as if the given *node* has two ancestors.
 46 |         This parameter will only be useful if you need to output XML text
 47 |         fragments that can be assembled into a document.  This parameter
 48 |         has no effect unless indentation is applied.
 49 |     :param string quote_char: the character that delimits quoted content.
 50 |         You should never need to mess with this.
 51 |     """
 52 |     def _sanitize_write_value(value):
 53 |         """Return XML-encoded value."""
 54 |         if not value:
 55 |             return value
 56 |         return (value
 57 |             .replace("&", "&amp;")
 58 |             .replace("<", "&lt;")
 59 |             .replace("\"", "&quot;")
 60 |             .replace(">", "&gt;")
 61 |             )
 62 | 
 63 |     def _write_node_impl(node, node_depth):
 64 |         """
 65 |         Internal write implementation that does the real work while keeping
 66 |         track of node depth.
 67 |         """
 68 |         # Output document declaration if we're outputting the whole doc
 69 |         if node.is_document:
 70 |             if not omit_declaration:
 71 |                 writer.write(
 72 |                     '<?xml version=%s1.0%s' % (quote_char, quote_char))
 73 |                 if encoding:
 74 |                     writer.write(' encoding=%s%s%s'
 75 |                         % (quote_char, encoding, quote_char))
 76 |                 writer.write('?>%s' % newline)
 77 |             for child in node.children:
 78 |                 _write_node_impl(child,
 79 |                     node_depth)  # node_depth not incremented
 80 |             writer.write(newline)
 81 |         elif node.is_document_type:
 82 |             writer.write("<!DOCTYPE %s SYSTEM %s%s%s"
 83 |                 % (node.name, quote_char, node.public_id))
 84 |             if node.system_id is not None:
 85 |                 writer.write(
 86 |                     " %s%s%s" % (quote_char, node.system_id, quote_char))
 87 |             if node.children:
 88 |                 writer.write("[")
 89 |                 for child in node.children:
 90 |                     _write_node_impl(child, node_depth + 1)
 91 |                 writer.write("]")
 92 |             writer.write(">")
 93 |         elif node.is_text:
 94 |             writer.write(
 95 |                 _sanitize_write_value(node.value)
 96 |             )
 97 |         elif node.is_cdata:
 98 |             if ']]>' in node.value:
 99 |                 raise ValueError("']]>' is not allowed in CDATA node value")
100 |             writer.write(
101 |                 "<![CDATA[%s]]>" % node.value
102 |             )
103 |         #elif node.is_entity_reference:  # TODO
104 |         elif node.is_entity:
105 |             writer.write(newline + indent * node_depth)
106 |             writer.write("<!ENTITY ")
107 |             if node.is_paremeter_entity:
108 |                 writer.write('%% ')
109 |             writer.write(
110 |                 "%s %s%s%s>"
111 |                 % (node.name, quote_char, node.value, quote_char)
112 |             )
113 |         elif node.is_processing_instruction:
114 |             writer.write(newline + indent * node_depth)
115 |             writer.write("<?%s %s?>" % (node.target, node.data))
116 |         elif node.is_comment:
117 |             if '--' in node.value:
118 |                 raise ValueError("'--' is not allowed in COMMENT node value")
119 |             writer.write("<!--%s-->" % node.value)
120 |         elif node.is_notation:
121 |             writer.write(newline + indent * node_depth)
122 |             writer.write("<!NOTATION %s" % node.name)
123 |             if node.is_system_identifier:
124 |                 writer.write(" system %s%s%s>"
125 |                     % (quote_char, node.external_id, quote_char))
126 |             elif node.is_system_identifier:
127 |                 writer.write(" system %s%s%s %s%s%s>"
128 |                     % (quote_char, node.external_id, quote_char,
129 |                     quote_char, node.uri, quote_char))
130 |         elif node.is_attribute:
131 |             writer.write(
132 |                 " %s=%s" % (node.name, quote_char)
133 |             )
134 |             writer.write(
135 |                 _sanitize_write_value(node.value)
136 |             )
137 |             writer.write(quote_char)
138 |         elif node.is_element:
139 |             # Only need a preceding newline if we're in a sub-element
140 |             if node_depth > 0:
141 |                 writer.write(newline)
142 |             writer.write(indent * node_depth)
143 |             writer.write("<" + node.name)
144 | 
145 |             for attr in node.attribute_nodes:
146 |                 _write_node_impl(attr, node_depth)
147 |             if node.children:
148 |                 found_indented_child = False
149 |                 writer.write(">")
150 |                 for child in node.children:
151 |                     _write_node_impl(child, node_depth + 1)
152 |                     if not (child.is_text
153 |                             or child.is_comment
154 |                             or child.is_cdata):
155 |                         found_indented_child = True
156 |                 if found_indented_child:
157 |                     writer.write(newline + indent * node_depth)
158 |                 writer.write('</%s>' % node.name)
159 |             else:
160 |                 writer.write('/>')
161 |         else:
162 |             raise exceptions.Xml4hImplementationBug(
163 |                 'Cannot write node with class: %s' % node.__class__)
164 | 
165 |     # Sanitize whitespace parameters
166 |     if indent is True:
167 |         indent = ' ' * 4
168 |     elif indent is False:
169 |         indent = ''
170 |     elif isinstance(indent, int):
171 |         indent = ' ' * indent
172 |     # If indent but no newline set, always apply a newline (it makes sense)
173 |     if indent and not newline:
174 |         newline = True
175 | 
176 |     if newline is None or newline is False:
177 |         newline = ''
178 |     elif newline is True:
179 |         newline = '\n'
180 | 
181 |     # If we have a target encoding and are writing to a binary IO stream, wrap
182 |     # the writer with an encoding writer to produce the correct bytes.
183 |     # We detect binary IO streams by:
184 |     # - Python 3: the *absence* of the `encoding` attribute that is present on
185 |     #   `io.TextIOBase`-derived objects
186 |     # - Python 2: the *absence* of the `encode` attribute that is present on
187 |     #   `StringIO` objects
188 |     if (
189 |         encoding
190 |         and not hasattr(writer, 'encoding')
191 |         and not hasattr(writer, 'encode')
192 |     ):
193 |         writer = codecs.getwriter(encoding)(writer)
194 | 
195 |     # Do the business...
196 |     _write_node_impl(node, node_depth)
197 | 


--------------------------------------------------------------------------------