Pygenx Manual

Author: Michael Twomey
Contact: mick@translucentcode.org
Copyright: Michael Twomey 2004
license:http://software.translucentcode.org/pygenx/LICENSE (MIT style)
Version: 0.5.3
Date: 2004-07-01 10:44:41.099026

Contents

Overview

Pygenx is a Python wrapper for the Genx library. It is intended to be a light weight way of generating correct, canonical XML with the minimum of fuss.

Installation

Installation is done via the normal python distutils mechanism. After downloading and unpacking the source tarball, pygenx can be installed by following the following instructions:

$ cd pygenx-0.5.3
$ python setup.py build
$ sudo python setup.py install

If you aren't using sudo you need to perform the install step as root (or if you are installing to a python which is writable, omit the sudo altogether).

Basic Usage

Pygenx can be used for generating very trivial XML quite easily.

A simple example:

#!/usr/bin/env python

import genx

writer = genx.Writer()

fp = file("basic_example.xml", "w")

writer.startDocFile(fp)

writer.startElementLiteral("example")
writer.addText("This is an example")
writer.endElement()

writer.endDocument()

When run this should produce output like the following:

<example>This is an example</example>

In the above example a genx.Writer object is being created, then elements are started, text written, and elements closed, before the document itself is closed.

This example restricts itself to using the genx.Writer.startElementLiteral

More Advanced Usage

Using genx.Writer.startElementLiteral is ok for simple cases, but when you have multiple namespaces or many elements, it is both inefficient and tedious to use. A better method is to pre-declare namespaces, attributes and elements for later use. This allows genx to perform in a more optimised manner.

Classes

genx.Writer

All pygenx operation centres around the genx.Writer class, which typically represents a single document, though there is nothing stopping you creating a new document after you have finished working on one. The only restriction with a Writer instance is that you work on a single document at a time.

You could create a genx.Writer instance and configure the various namespaces and elements, then re-use it for different documents. This reduces the overhead required when writing the documents.

genx.Writer.addAttribute

params:genx.Attribute attribute, String value
returns:None

This adds the given genx.Attribute object to the currently active element.

A simple example:

>>> import genx
>>> w = genx.Writer()
>>> attr = genx.declareAttribute("href")
>>> fp = file("/tmp/text.xml", "w")
>>> w.startDocFile(fp)
>>> w.startElementLiteral("a")
>>> w.addAttribute(attr, "http://example.com/")
>>> w.endElement()
>>> w.endDocument()
>>>

This produces the following output:

<a href="http://example.com/"></a>

genx.Writer.addAttributeLiteral

params:String name, String value, String namespace = None
returns:None

This adds an attribute with the given name and value to the currently active element. An optional namespace string can be passed in too. This method is slower than using genx.Writer.addAttribute with a defined genx.Attribute object.

Typical use:

>>> import genx
>>> writer = genx.Writer()
>>> fp = file("/tmp/test.xml", "w")
>>> writer.startDocFile(fp)
>>> writer.startElementLiteral("elem")
>>> writer.addAttributeLiteral("attr", "value")
>>> writer.endElement()
>>> writer.endDocument()
>>> 

And the content of test.xml:

<elem attr="value"></elem>

genx.Writer.addNamespace

params:genx.Namespace namespace, String prefix = None
returns:None

Adds the given genx.Namespace object to the currently active element, using an optional prefix.

A simple example with an element:

>>> import genx
>>> writer = genx.Writer()
>>> fp = file("/tmp/test.xml", "w")
>>> ns = writer.declareNamespace("http://example.com", "foo")
>>> writer.startDocFile(fp)
>>> writer.startElementLiteral("elem")
>>> writer.addNamespace(ns)
>>> writer.endElement()
>>> writer.endDocument()
>>> 

And the output in test.xml:

<elem xmlns:foo="http://example.com"></elem>

genx.Writer.addText

params:String text
returns:None

Adds the specified text to the currently active element. The text will be encoded as UTF-8, so ensure that the text has the correct encoding (this can usually be achieved when reading in the string using the decode method, e.g. s = fp.read().decode('ISO-8859-1')).

Basic usage:

>>> import genx
>>> w = genx.Writer()
>>> fp = file("/tmp/test.xml", "w")
>>> w.startDocFile(fp)
>>> w.startElementLiteral("elem")
>>> w.addText("some text")
>>> w.endElement()
>>> w.endDocument()
>>> 

The output:

<elem>some text</elem>

genx.Writer.checkText

params:String text
returns:int status

This is a function for sanity checking strings. It checks to see if the given string is valid UTF-8 and if it contains any invalid XML characters.

The return codes are based on genx's error codes, currently I'm not exposing these codes, so currently the three relevant return codes are:

0
The text is ok.
1
The text is invalid UTF-8.
2
The text is invalid XML.

Simple usage:

>>> import genx
>>> w = genx.Writer()
>>> w.checkText("This is a plain string")
0
>>> w.checkText("This is an invalid unicode string. \xff\x01")
1
>>> w.checkText("This is an invalid XML string\x01")
2
>>> 

genx.Writer.comment

params:String comment
returns:None

This adds an XML comment (e.g. <!-- my comment -->) in the generated XML.

For example:

>>> import genx
>>> w = genx.Writer()
>>> fp = file("/tmp/test.xml", "w")
>>> w.startDocFile(fp)
>>> w.comment("A comment") 
>>> w.startElementLiteral("elem")
>>> w.comment("Another comment")
>>> w.endElement()
>>> w.endDocument()
>>> 

test.xml:

<!--A comment-->
<elem><!--Another comment--></elem>

genx.Writer.PI

params:String target, String text
returns:None

Adds an XML Processing Instruction (e.g. <?foo bar?>) to the file.

For example:

>>> import genx
>>> w = genx.Writer()
>>> fp = file("/tmp/test.xml", "w")
>>> w.startDocFile(fp)
>>> w.PI("foo", "bar")
>>> w.startElementLiteral("elem")
>>> w.endElement()
>>> w.endDocument()
>>> 

Produces:

<?foo bar?>
<elem></elem>

genx.Writer.declareAttribute

params:String name, genx.Namespace namespace = None
returns:genx.Attribute

This creates a genx.Attribute object with the given name, and an optional genx.Namespace object. This object can then be used with genx.Writer.addAttribute calls to add attributes to the current document.

A simple example:

>>> import genx
>>> w = genx.Writer()
>>> attr = w.declareAttribute("attr")
>>> ns = w.declareNamespace("http://example.com/ns")
>>> attr2 = w.declareAttribute("attr2", ns)
>>>

genx.Writer.declareElement

params:String name, genx.Namespace namespace = None
returns:genx.Element

Declare a new element, using the optional genx.Namespace object.

A trivial example:

>>> import genx
>>> writer = genx.Writer()
>>> elem = writer.declareElement("element")
>>> ns = writer.declareNamespace("http://example.com/ns")
>>> elem_with_ns = writer.declareElement("anotherelem", ns)
>>>

genx.Writer.declareNamespace

params:String uri, String prefix = None
returns:genx.Namespace

Declare a new genx.Namespace object, using the given namespace URI and an optional prefix. Use a prefix of "" to declare the the default namespace.

A simple example:

>>> import genx
>>> w = genx.Writer()
>>> ns = w.declareNamespace("http://example.com/ns")
>>> ns2 = w.declareNamespace("http://example.com/ns2", "myprefix")
>>>

genx.Writer.endDocument

params:none
returns:None

Finish writing the current document. When this is called all the elements should have been previously closed using genx.Writer.endElement calls. After this is called the genx.Writer instance can be re-used with another file.

An example of re-using a genx.Writer instance:

>>> import genx
>>> w = genx.Writer()
>>> fp = file("/tmp/test.xml", "w")
>>> elem = w.declareElement("elem")
>>> w.startDocFile(fp)
>>> w.startElement(elem)
>>> w.endElement() 
>>> w.endDocument()
>>> fp2 = file("/tmp/test2.xml", "w")
>>> w.startDocFile(fp2)
>>> w.startElement(elem)
>>> w.startElement(elem)
>>> w.endElement()
>>> w.endElement()
>>> w.endDocument()
>>> 

The content of test.xml:

<elem></elem>

The content of test2.xml:

<elem><elem></elem></elem>

genx.Writer.endElement

params:none
returns:None

Finish writing the current element. This needs to be called to close each corresponding element created with genx.Writer.startElement or genx.Writer.startElementLiteral calls.

genx.Writer.scrubText

params:String text
returns:String

This silently scrubs any invalid characters out of the given string.

A simple example:

>>> import genx
>>> writer = genx.Writer()
>>> writer.scrubText("A string")
'A string'
>>> writer.scrubText("A string |\x01|")
'A string ||'
>>> 

genx.Writer.startDocFile

params:File file
returns:None

This starts a new document using the given File object. It should be called only once for each document. If called again on an active document you will get a genx.SequenceError.

Note

The file object passed in should either be a standard python File object, in which case the C FILE pointer it contains will be passed to genx's genxStartDocFile.

If the object passed in is a normal python object, then it must have a write method and a flush method, which perform buffer style operations. An example of this object would be StringIO.StringIO.

genx.Writer.startElement

params:genx.Element
returns:None

Starts a new XML element using the given genx.Element object. This is the preferred way to write elements into an XML document, as it is faster to re-use premade genx.Element objects than to use genx.Writer.startElementLiteral calls.

A simple example:

>>> import genx
>>> w = genx.Writer()
>>> elem = w.declareElement("foo")
>>> fp = file("/tmp/test.xml", "w")
>>> w.startDocFile(fp)
>>> w.startElement(elem)
>>> w.endElement()
>>> w.endDocument()
>>>

This writes:

<foo></foo>

genx.Writer.startElementLiteral

params:String name, String namespace = None
returns:None

Starts a new XML element with the given name and an optional namespace URI string. This is the most straight forward way to create elements, but it isn't the fastest, and can get unwieldy when compared to genx.Writer.startElement.

A variation of the example in genx.Writer.startElement:

>>> import genx
>>> w = genx.Writer()
>>> fp = file("/tmp/test.xml", "w")
>>> w.startDocFile(fp)
>>> w.startElementLiteral("foo")
>>> w.endElement()
>>> w.endDocument()
>>>

This writes:

<foo></foo>

genx.Writer.unsetDefaultNamespace

params:none
returns:None

This clears the default namespace declaration. This is slightly tricky to explain, probably easier to demonstrate.

#!/usr/bin/env python

import genx

w = genx.Writer()
nsdef = w.declareNamespace("http://default", "")
nspref = w.declareNamespace("http://pref", "pref")
e = w.declareElement("e")
edef = w.declareElement("edef", nsdef)
epref = w.declareElement("epref", nspref)

fp = file("/tmp/test.xml", "w")
w.startDocFile(fp)
w.startElement(edef)
w.addNamespace(nspref)
w.startElement(e)
w.endElement()
w.startElement(epref)
w.unsetDefaultNamespace()
w.startElement(e)
w.endElement()
w.endElement()
w.endElement()
w.endDocument()

fp2 = file("/tmp/test2.xml", "w")
w.startDocFile(fp2)
w.startElement(edef)
w.addNamespace(nspref)
w.startElement(e)
w.endElement()
w.startElement(epref)
#w.unsetDefaultNamespace()
w.startElement(e)
w.endElement()
w.endElement()
w.endElement()
w.endDocument()

Using xmllint and diff to compare the outputs:

$ xmllint --format /tmp/test2.xml >test2.xml
$ xmllint --format /tmp/test.xml >test.xml
$ diff -u test.xml test2.xml 
--- test.xml    2004-05-31 23:44:50.000000000 +0100
+++ test2.xml   2004-05-31 23:44:45.000000000 +0100
@@ -1,7 +1,7 @@
 <?xml version="1.0"?>
 <edef xmlns="http://default" xmlns:pref="http://pref">
   <e xmlns=""/>
-  <pref:epref xmlns="">
-    <e/>
+  <pref:epref>
+    <e xmlns=""/>
   </pref:epref>
 </edef>

The lines with the - characters in front of them represent text.xml (with the unsetDefaultNamespace call) and the lines with the + character in front represent test2.xml (without the call). As you can see the unsetDefaultNamespace call forcibly resets the namespace.

genx.Attribute

This represents an XML attribute, which can be attached to any genx.Element. This is created using genx.Writer.declareAttribute.

genx.Element

This represents an XML element, which can be used with genx.Writer.startElement. This is created using genx.Writer.declareElement.

genx.Namespace

This represents an XML namespace, which can be used with genx.Element and genx.Attribute objects. This is created using genx.Writer.declareNamespace.

Functions

genx.get_version

params:none
returns:String

This returns the version of genx as reported by genx's genxGetVersion function.

Exceptions

These are normally thrown based on the status codes genx returns.

genx.AttributeInDefaultNamespaceError

This occurs when an attribute is declared or used in the default namespace.

For example:

>>> import genx
>>> w = genx.Writer()
>>> ns = w.declareNamespace("http://example.com/ns", "")
>>> a = w.declareAttribute("a", ns)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "_genx.pyx", line 537, in _genx.Writer.declareAttribute
  File "_genx.pyx", line 454, in _genx.Writer.__checkStatus
_genx.AttributeInDefaultNamespaceError: \
'Attribute cannot be in default namespace'
>>>

genx.BadDefaultDeclarationError

Can't say it better than Tim:

You tried to declare some namespace to be the default on an element which is in no namespace.

To trigger this:

>>> import genx
>>> w = genx.Writer()
>>> fp = file("/tmp/text.xml", "w")
>>> ns = w.declareNamespace("http://example.com/ns", "")
>>> elem = w.declareElement("elem")
>>> w.startDocFile(fp)
>>> w.startElement(elem)
>>> w.addNamespace(ns)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "_genx.pyx", line 414, in _genx.Writer.addNamespace
  File "_genx.pyx", line 468, in _genx.Writer.__checkStatus
_genx.BadDefaultDeclarationError: \
'Declared a default namespace on an element which is in no namespace'
>>>

genx.BadNameError

This occurs when an invalid XML name is used.

For example:

>>> w.startElementLiteral("<foo")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "_genx.pyx", line 373, in _genx.Writer.startElementLiteral
  File "_genx.pyx", line 450, in _genx.Writer.__checkStatus
_genx.BadNameError: 'Bad NAME'
>>>

genx.BadNamespaceNameError

This is raised when you try to declare a genx.Namespace using None or an empty string.

Some examples:

>>> w.declareNamespace("")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "_genx.pyx", line 490, in _genx.Writer.declareNamespace
  File "_genx.pyx", line 446, in _genx.Writer.__checkStatus
_genx.BadNamespaceNameError: 'Bad namespace name'
>>> w.declareNamespace(None)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "_genx.pyx", line 479, in _genx.Writer.declareNamespace
_genx.BadNamespaceNameError: None is an invalid namespace
>>>

genx.BadUTF8Error

This is raised when invalid UTF-8 is passed to a genx call. However it is unlikely that this will get raised, with python raising it's own encoding errors before genx is reached.

genx.DuplicateAttributeError

This happens when you try to add an attribute with the same name to an element more than once.

For example:

>>> w.addAttributeLiteral("a", "foo")
>>> w.addAttributeLiteral("a", "foo")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "_genx.pyx", line 437, in _genx.Writer.addAttributeLiteral
  File "_genx.pyx", line 456, in _genx.Writer.__checkStatus
_genx.DuplicateAttributeError: 'Same attribute specified more than once'
>>>

genx.DuplicateNamespaceError

This occurs when you add the same namespace to an element more than once.

For example:

>>> ns2 = w.declareNamepsace("http://example.com/2", "ns2")
>>> w.addNamespace(ns2)
>>> ns3 = w.declareNamespace("http://example.com/2", "ns3")
>>> w.addNamespace(ns3)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "_genx.pyx", line 414, in _genx.Writer.addNamespace
  File "_genx.pyx", line 466, in _genx.Writer.__checkStatus
_genx.DuplicateNamespaceError: \
'Declared namespace twice with different prefixes on one element.'
>>>

genx.DuplicatePrefixError

This is raised when two namespaces are declared with the same prefix.

For example:

>>> ns1 = w.declareNamespace("http://example.com/ns1", "ns1")
>>> ns2 = w.declareNamespace("http://example.com/ns2", "ns1")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "_genx.pyx", line 490, in _genx.Writer.declareNamespace
  File "_genx.pyx", line 448, in _genx.Writer.__checkStatus
_genx.DuplicatePrefixError: 'Duplicate prefix'
>>>

genx.GenxError

This is a catch all error, if after checking the various error codes genx returns pygenx can't find a matching exception this is raised with the error string included.

genx.IOError

This usually occurs when genx has problems writing to the file, the most common cause is some other part of the python code closing the file object.

A typical example:

>>> import genx
>>> w = genx.Writer()
>>> fp = file("/tmp/test.xml", "w")
>>> w.startDocFile(fp)
>>> fp.close()
>>> w.startElementLiteral("example")
>>> w.addText("foo")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "_genx.pyx", line 379, in _genx.Writer.addText
  File "_genx.pyx", line 458, in _genx.Writer.__checkStatus
_genx.IOError: 'I/O error'
>>>  

In the above example the exception isn't raised until the genx.Writer.addText call as genx hasn't tried writing to the file yet, it only does so when the addText call is made.

genx.MalformedPIError

This is raised when an invalid string is passed to genx.Writer.PI, usually when there is a "?>" in the string.

For example:

>>> writer.PI("foo", "bar?>")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "_genx.pyx", line 391, in _genx.Writer.PI
  File "_genx.pyx", line 464, in _genx.Writer.__checkStatus
_genx.MalformedPIError: '?> in PI'
>>>

genx.NonXMLCharacterError

This is raised when a character which violates the XML 1.0 Character rules is passed into genx. The string can be perfectly valid UTF-8 but still be invalid XML.

For example:

>>> w.addAttributeLiteral("bar", "text \x01")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "_genx.pyx", line 437, in _genx.Writer.addAttributeLiteral
  File "_genx.pyx", line 460, in _genx.Writer.__checkStatus
_genx.NonXMLCharacterError: 'Non XML Character'
>>>

genx.SequenceError

This is the most commonly seen error, it occurs when a calls are made in an incorrect order.

For example, this code closes the document before closing the element:

>>> import genx
>>> w = genx.Writer()
>>> fp = file("/tmp/text.xml", "w")
>>> w.startDocFile(fp)
>>> w.startElementLiteral("foo")
>>> w.endDocument()
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "_genx.pyx", line 424, in _genx.Writer.endDocument
  File "_genx.pyx", line 452, in _genx.Writer.__checkStatus
_genx.SequenceError: 'Call out of sequence'
>>>

Developing pygenx

Pygenx is written as a Pyrex wrapper to Genx. It is kept under source control with GNU Arch and built using Scons (the distribution is built using plain distutils, the development is built using scons).

About this manual

This manual is written in restructured text, and converted to HTML using Docutils. Docutils has proven to be a joy to use, especially for python programming, I'd recommend it to anyone.