Also see extensions.
Imagine, you have a namespace called 'http://hui.de/honk' and have to treat all of its elements in a specific way, say, to find out if they are really honking. You could provide a function called 'is_honking' that handles that:
>>> def is_honking(honk_element): ... return honk_element.get('honking') == 'true'
Then you can use it:
>>> from lxml.etree import XML >>> honk_element = XML('<honk xmlns="http://hui.de/honk" honking="true"/>') >>> print is_honking(honk_element) True
Not too bad, right? Now, imagine, you only want to do that to certain elements from that namespace and prevent others from being passed to is_honking. You can add a check to is_honking to test the tag name before doing anything else.
After a while, however, you remember what you heard at school about object oriented programming. You start wondering if there isn't a nicer way to do that. -- And there is!
lxml allows you to implement namespaces, in a rather literal sense. You can do the above like this:
>>> from lxml.etree import Namespace, ElementBase >>> class HonkElement(ElementBase): ... def honking(self): ... return self.get('honking') == 'true' ... honking = property(honking)
Now you can build the new namespace by calling the Namespace class:
>>> namespace = Namespace('http://hui.de/honk')
and then register the new element type with that namespace:
>>> namespace['honk'] = HonkElement
After this, you create and use your XML elements:
>>> honk_element = XML('<honk xmlns="http://hui.de/honk" honking="true"/>') >>> print honk_element.honking True
The same works when creating elements by hand:
>>> from lxml.etree import Element >>> honk_element = Element('{http://hui.de/honk}honk', honking='true') >>> print honk_element.honking True
Essentially, what this allows you to do, is giving elements a specific API based on their namespace and element name.
There is one thing to remember. Element classes must not have a constructor, neither must there be any internal state (except for their XML representation). Element instances are created and garbage collected at need, so there is no way to predict when and how often a constructor would be called. Even worse, when the __init__ method is called, the object may not even be initialized yet to represent the XML tag, so there is not much use in providing an __init__ method in subclasses.
However, there is one possible way to do things on element initialization. Element classes have an _init() method that can be overridden. It can be used to modify the XML tree, e.g. to construct special children or verify and update attributes.
The semantics of _init() are as follows:
There is a slight difference between the Namespace example and the simple 'is_honking' method above. We associated the HonkElement class only with the 'honk' element. If you have other elements in the same namespace, they do not pick up the same implementation.
Example:
>>> honk_element = XML('<honk xmlns="http://hui.de/honk" honking="true"><bla/></honk>') >>> print honk_element.honking True >>> print honk_element[0].honking Traceback (most recent call last): ... AttributeError: 'etree._Element' object has no attribute 'honking'
You can therefore provide one implementation per element name in each namespace and have lxml select the right one on the fly. If you want one element implementation per namespace (ignoring the element name) or prefer having a common class for most elements except a few, you can specify a default implementation for an entire namespace by registering that class with the empty element name (None).
You may consider following an object oriented approach. If you build a class hierarchy of element classes, you can also implement a base class for a namespace, that is used if no specific element class is provided. Again, you only have to pass None as an element name:
>>> class HonkNSElement(ElementBase): ... def honk(self): ... return "HONK" >>> namespace[None] = HonkNSElement >>> class HonkElement(HonkNSElement): ... def honking(self): ... return self.get('honking') == 'true' ... honking = property(honking) >>> namespace['honk'] = HonkElement
Now you can use your new namespace:
>>> honk_element = XML('<honk xmlns="http://hui.de/honk" honking="true"><bla/></honk>') >>> print honk_element.honking True >>> print honk_element.honk() HONK >>> print honk_element[0].honk() HONK >>> print honk_element[0].honking Traceback (most recent call last): ... AttributeError: 'HonkNSElement' object has no attribute 'honking'