4XPath API


About XPath

For the formal definition of XPath, see The XPath 1.0 Recommendation. More gentle introductions the ZVON XPath Tutorial, and ProSolutions XPath tutorial.

Usage

4XPath can be used to either evaluate an XPath expression on the fly or to create a reusable pre-parsed object. The second approach is useful if you evaluate the same expression over and over again. You always need a context in order to evaluate an expression. This is at the minimum a DOM node. You can use any DOM implementation that conforms to the Python binding for DOM Level 2. Examples are 4DOM, minidom, cDomlette or pDomlette. The latter two have some specializations that 4XPath can take advantage of for optimization.

In the following discussions, we'll use the following two XML documents (set up as Python strings). The second is simply the first with all the elements in a namespace.


xmlstring = """<employees>
<employee id="100">Memnon</employee>
<employee id="101">Emathion</employee>
<employee id="102">Castor</employee>
<employee id="103">Polydeuces</employee>
</employees>
"""

nsxmlstring = """<employees xmlns='http://spam.com/foo'>
<employee id="100">Memnon</employee>
<employee id="101">Emathion</employee>
<employee id="102">Castor</employee>
<employee id="103">Polydeuces</employee>
</employees>
"""

On-the-fly XPaths

To use a one-off XPath expression just start by importing the 4XPath module

from xml import xpath

Then set up the DOM document to be used as the context.

from Ft.Lib.pDomlette import PyExpatReader
reader = PyExpatReader()
doc = reader.fromString(xmlstring)

Now just evaluate the XPaths you want.

result = xpath.Evaluate('/*', contextNode=doc)

Which basically selects the document element. Note that the return value of this XPath is a node set, which is implemented as a Python list of DOM nodes. For instance "print result" would display


[<Domlette Element Node at 820fc74: name='employees' with 0 attributes and 9 children>]

Or similar (the "820fc74" represents the memory location of the Python object and will almost certainly be different for you). The following are the mappings from XPath object types to the Python objects returned by evaluation

So, if you follow the following examples


>>> print repr(xpath.Evaluate('/employees/employee', contextNode=doc))
[<Domlette Element Node at 8217784: name='employee' with 1 attributes and 1 children>,
 <Domlette Element Node at 81ddbf4: name='employee' with 1 attributes and 1 children>,
 <Domlette Element Node at 810c234: name='employee' with 1 attributes and 1 children>,
 <Domlette Element Node at 82131c4: name='employee' with 1 attributes and 1 children>]
>>> print repr(xpath.Evaluate('/employees/employee[1]/@id', contextNode=doc))
[<Domlette Attribute Node at 82190fc: name='id', value='100'>]
>>> print repr(xpath.Evaluate('string(/employees/employee[1]/@id)', contextNode=doc))
u'100'
>>> print repr(xpath.Evaluate('number(/employees/employee[1]/@id)', contextNode=doc))
100.0
>>> print repr(xpath.Evaluate('/employees/employee[1]/@id = "100"', contextNode=doc))
1

Note the u"100" returned by one of the expressions. This is the new unicode type introduced in Python 2.0. In Python 1.5.2, the return value would be a simple string objec: "100".

In our examples so far, we've set the context node directly. This actually creates a context with the given node, a context list size of 1 and position of 1; no namespace mappings will be defined. But this isn't always what we want to do. Sometimes we want to specify the other context elements. The most common reason for this is to set up a namespace mapping. For instance, if we try to do the above processing on the nsxmlstring document:

nsdoc = reader.fromString(nsxmlstring)
print xpath.Evaluate('/employees/employee', contextNode=nsdoc)

We get an empty node-set. If you remember your XPath and think carefully, you'll see why. The node test "employees" strictly matches an element node with no namespace. Since the employee element in the nsxmlstring document is actually in the "http://spam.com/foo" namespace, the node test fails. The solution is to use a namespace prefix in the XPath expression which is mapped to the right namespace. Remember that this is so even though we don't use a namespace prefix in the nsxmlstring document.

So how do we set up the namespace mappings we need? This is where the ability to set the full context comes in. We can create a 4XPath Context object with the node we want and the namespace mappings we want

from xml.xpath.Context import Context
con = Context(nsdoc, processorNss={'x', 'http://spam.com/foo'})

The first argument is the context node. Then we specify a keyword argument "processorNss" which is a dictionary with the prefixes to map as keys and the namespace URIs as values. Note: don't try to set an empty string as a prefix: this is illegal. Now we can use the context object we created.

print xpath.Evaluate('/x:employees/x:employee', context=con)

And we get what we expect: a node set with four entries.

Reusable XPath expression objects

If your usage pattern is more along the lines of repeated evaluation of a particular expression against different contexts or documents, you probably want to parse the expression for the sake of performance.

from xml import xpath
from xml.xpath.Context import Context

from Ft.Lib.pDomlette import PyExpatReader
reader = PyExpatReader()
doc = reader.fromString(xmlstring)
expr = xpath.Compile('/employees/employee')

To evaluate it, we are required to have a full context object, not just a bare node. Then we can use the "evaluate" method of the parsed expression object, passing in the context:

con = Context(nsdoc)
expr.evaluate(con)

Optimization: Document Indexing

Note: you needn't bother with this if you are using pDomlette or cDomlette.

Many XPath constructs require a sorting of nodes according to XML document order. This can be an expensive operation if the DOM implementation is not already primed for this, so XPath allows users of such implementations to pre-index documents for faster sorting. To do so:

from xml.xpath import Util
...
Util.IndexDocument(document_node)
...XPath operations...
Util.FreeDocumentIndex(document_node)

Do be sure to free the index to avoid memory leaks. Also note that it's a bad idea to mutate any node in the document while it is indexed.

Module xml.xpath

4XPath core module: provides the basic API for 4XPath

Module Summary

Global Functions

Global Function Summary
Evaluate Evaluates an XPath expression 
Compile Compile an XPath expression for quicker evaluation. 
RegisterExtensionModules Register XPath extension functions contained in Python modules. 

 

Global Function Details

Evaluate

Evaluate(expr, contextNode, context)

Evaluates an XPath expression

Parameters
expr of type string

XPath expression to be evaluated

contextNode of type Python DOM binding node object

The context node, which will be used as the sole entry in the context node list. If None, the context argument must be an xml.xpath.Context object. Defaults to None

context of type xml.xpath.Context

The user-specified context. If None, the contextNode argument must be a valid DOM node. Defaults to None

Return Value
Ft.Lib.boolean, float, string or list of DOM nodes

The result of the XPath expresssion evaluation.

Throws



Compile

Compile(expr)

Compile an XPath expression for quicker evaluation.

Parameters
expr of type string

XPath expression to be compiled

Return Value
xml.xpath.ParsedExpr

A pre-compiled XPath expression object

Throws



RegisterExtensionModules

RegisterExtensionModules(moduleList)

Register XPath extension functions contained in Python modules.

Parameters
moduleList of type list of strings each representing a fully-qualified module name. Each module must follow the 4XPath extension protocol.

Each module is imported, and any XPath extension functions conteined therein are available to the 4XPath run-time.

Return Value
None


Module xml.xpath.Context

XPath context

Module Summary

Classes

Class Summary
Context Represents the context used for XPath processing at any given point 

 

Class Context

Represents the context used for XPath processing at any given point

Attribute Summary
node The context node, as used for computing XPath expressions 
position The context node's position in the context node list, as returned by the XPath position() function 
size The size of the context node list 
varBindings Maps variable and parameters by expanded name to the value of the variable 
processorNss provides expansion from namespace prefixes to uris for expanded names in name tests, variable names, etc. 

 

Method Summary
 
nss Get a dictionary representing namespace nodes defined at the context node 

 

Method Details

__init__(node, position, size, varBindings, processorNss)

Parameters
node of type Python DOM binding node object

The context node, as used for computing XPath expressions

position of type positive integer

The context node's position in the context node list, as returned by the XPath position() function

size of type positive integer

The size of the context node list

varBindings of type dictionary with keys a tuple of two strings and value a string, integer, BooleanType or node set (list of nodes)

Maps variable and parameters by expanded name to the value of the variable. Defaults to an empty dictionary.

processorNss of type dictionary with string key and value

provides expansion from namespace prefixes to uris for expanded names in name tests, variable names, etc. Defaults to an empty dictionary.

Return Value
None


nss

nss()

Get a dictionary representing namespace nodes defined at the context node

Parameters
None
Return Value
dictionary with string key and string value

Maps prefixes to namespace URIs