Author: | Dave Kuhlman |
---|---|
Address: | dkuhlman@rexx.com http://www.rexx.com/~dkuhlman |
revision: | 2.0b |
---|
date: | June 30, 2010 |
---|
copyright: | Copyright (c) 2004 Dave Kuhlman. This documentation and the software it describes is covered by The MIT License: http://www.opensource.org/licenses/mit-license.php. |
---|---|
abstract: | generateDS.py generates Python data structures (for example, class definitions) from an XML Schema document. These data structures represent the elements in an XML document described by the XML Schema. It also generates parsers that load an XML document into those data structures. In addition, a separate file containing subclasses (stubs) is optionally generated. The user can add methods to the subclasses in order to process the contents of an XML document. |
Contents
generateDS.py generates Python data structures (for example, class definitions) from an XML Schema document. These data structures represent the elements in an XML document described by the XML Schema. It also generates parsers that load an XML document into those data structures. In addition, a separate file containing subclasses (stubs) is optionally generated. The user can add methods to the subclasses in order to process the contents of an XML document.
The generated Python code contains:
The generated classes contain the following:
The generated subclass file contains one (sub-)class definition for each data representation class. If the subclass file is used, then the parser creates instances of the subclasses (instead of creating instances of the superclasses). This enables the user to extend the subclasses with "tree walk" methods, for example, that process the contents of the XML file. The user can also generate and extend multiple subclass files which use a single, common superclass file, thus implementing a number of different processes on the same XML document type.
This document explains (1) how to use generateDS.py; (2) how to use the Python code and data structures that it generates; and (3) how to modify the generated code for special purposes.
You can find the source distribution here:
It is also available at:
There is a mailing list at SourceForge: generateds-users -- https://lists.sourceforge.net/lists/listinfo/generateds-users.
There is a tutorial in the distribution: tutorial/tutorial.html and at generateDS -- Introduction and Tutorial -- http://www.rexx.com/~dkuhlman/generateds_tutorial.html.
Newer versions of Python have XML support in the Python standard library. For older versions of Python, install PyXML. You can find it at: http://pyxml.sourceforge.net/
De-compress the generateDS distribution file. Use something like the following:
tar xzvf generateDS-x.xx.tar.gz
Then, the regular Distutils commands should work:
$ cd generateDS-x.xx $ python setup.py build $ python setup.py install # probably as root
Run generateDS.py with a single argument, the XML Schema file that defines the data structures. For example, the following will generate Python source code for data structures described in people.xsd and will write it to the file people.py. In addition, it will write subclass stubs to the file peoplesubs.py:
python generateDS.py -o people.py -s peoplesubs.py people.xsd
Here is the usage message displayed by generateDS.py:
Synopsis: Generate Python classes from XML Schema definition. Input is read from in_xsd_file or, if "-" (dash) arg, from stdin. Output is written to files named in "-o" and "-s" options. Usage: python generateDS.py [ options ] <xsd_file> python generateDS.py [ options ] - Options: -h, --help Display this help information. -o <outfilename> Output file name for data representation classes -s <subclassfilename> Output file name for subclasses -p <prefix> Prefix string to be pre-pended to the class names -f Force creation of output files. Do not ask. -a <namespaceabbrev> Namespace abbreviation, e.g. "xsd:". Default = 'xs:'. -b <behaviorfilename> Input file name for behaviors added to subclasses -m Generate properties for member variables --search-path="a:b:c:d" Search these directories for additional schema files. --subclass-suffix="XXX" Append XXX to the generated subclass names. Default="Sub". --root-element="XXX" Assume XXX is root element of instance docs. Default is first element defined in schema. --super="XXX" Super module name in subclass module. Default="???" --validator-bodies=path Path to a directory containing files that provide bodies (implementations) of validator methods. --use-old-getter-setter Name getters and setters getVar() and setVar(), instead of get_var() and set_var(). --user-methods= <module>, -u <module> Optional module containing user methods. See section "User Methods" in the documentation. --no-dates Do not include the current date in the generated files. This is useful if you want to minimize the amount of (no-operation) changes to the generated python code. --no-versions Do not include the current version in the generated files. This is useful if you want to minimize the amount of (no-operation) changes to the generated python code. --no-process-includes Do not process included XML Schema files. By default, generateDS.py will insert content from files referenced by <include ... /> elements into the XML Schema to be processed. --silence Normally, the code generated with generateDS echoes the information being parsed. To prevent the echo from occurring, use the --silence switch. --namespacedef='xmlns:abc="http://www.abc.com"' Namespace definition to be passed in as the value for the namespacedef_ parameter of the export() method by the generated parse() and parseString() functions. Default=''. --external-encoding=<encoding> Encode output written by the generated export methods using this encoding. Default, if omitted, is the value returned by sys.getdefaultencoding(). Example: --external-encoding='utf-8'. --member-specs=list|dict Generate member (type) specifications in each class: a dictionary of instances of class MemberSpec_ containing member name, type, and array or not. Allowed values are "list" or "dict". Default: None. --session=mysession.session Load and use options from session file. You can create session file in generateds_gui.py. --version Print version and exit.
The following command line flags are recognized by generateDS.py:
Namespace abbreviation, for example "xsd:". The default is 'xs:'. If the <schema> element in your XML Schema, specifies something other than "xmlns:xs=", then you need to use this option. So, suppose you have the following at the beginning of your XSchema file:
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
Then you can the following command line option:
-a "xsd:"
But, note that generateDS.py also tries to pick-up the namespace prefix used in the XMLSchema file automatically. If the <schema> element has an attribute "xmlns:xxx" whose value is "http://www.w3.org/2001/XMLSchema", then generateDS.py will use "xxx:" as the alias for the XMLSchema namespace in the XMLSchema document.
Append suffix to the name of classes generated in the subclass file. The default, if omitted, is "Sub". For example, the following will append "_Action" to each generated subclass name:
generateDS.py --subclass-suffix="_Action" -s actions.py mydef.xsd
And the following will append nothing, making the superclass and subclass names the same:
generateDS.py --subclass-suffix="" -s actions.py mydef.xsd
Make module_name the name of the superclass module imported by the subclass module. If this flag is omitted, the following is generated near the top of the subclass file:
import ??? as supermod
and you will need to hand edit this so the correct superclass module is imported.
In some cases the element and attribute names in an XML document will conflict with Python keywords. In order to avoid these clashes, generateDS.py contains a table that maps names that might clash to acceptable names. This table is a Python dictionary named NameTable. The user can modify existing entries and add additional name-replacement pairs to this table, for example, if new conflicts occur.
In some cases the name of a child element and the name of an attribute will be the same. (I believe, but am not sure, that this is allowed by XML Schema.) Since generateDS.py treats both child elements and attributes as members of the generated class, this is a name conflict. Therefore, where such conflicts exist, generateDS.py modifies the name of the attribute by adding "_attr" to its name.
Here are a few notes on how to use the GUI front-end.
generateds_gui.py is installed when you do the standard installation:
$ python setup.py install
Run it by typing the following at the command line:
$ generateds_gui.py
For help with command line options, run:
$ generateds_gui.py --help
For a description of the values and flags that you can set, see section Running generateDS.py. There are also tool tips on the various widgets in the graphical user interface.
Generate the python bindings modules by using the Tools/Generate menu item or the Generate button at the bottom of the window.
Capture the command line generated by using the Tools/Capture command line menu item. You might consider copying and pasting that command line into a shell script or batch file for repeated reuse.
You can also save and later reload your values and flags in a session file. See the Save session, Save session as, and Load session items under the File menu. By default, a session file has the extension ".session".
You can load a session on start-up with the "-s" or "--session" comand line options. For example:
$ generateds_gui.py --session=mybindingsjob.session
Or, use the "session" option in a configuration file.
If the command to be run when generating bindings is not standard, you can specify that command with the "--exec-path" command line option or with the "exec-path" option configuration file. The default is "generateDS.py".
Command line options can also be specified in a configuration file. generateds_gui.py checks for that configuration file in the following locations in this order:
Here is a sample configuration file:
[general] exec-path: /usr/bin/python ~/bin/generateDS.py impl-path: generateds_gui.glade session: a1.session
Options on the command line override options in configuration files.
generateDS.py is not very intelligent about detecting what prefix is used in the schema file for the XML Schema namespace. When this problem occurs, you may see the following when running generateDS.py:
AttributeError: 'NoneType' object has no attribute 'annotate'
generateDS.py assumes that the XML Schema namespace prefix in your schema is "xs:".
So, if the XML Schema namespace prefix in your schema is not "xs:", you will need to use the "-a" command line option when you run generateDS.py. Here is an example:
generateDS.py -a "xsd:" --super=mylib -o mylib.py -s myapp.py someschema.xsd
The following constructs, among others, in XML Schema are supported:
See file people.xsd for examples of the definition of data types and structures. Also see the section on The XML Schema Input to generateDS.
Element definitions that contain attributes but no nested child elements provide access to their data content through getter and setter methods getValueOf_ and setValueOf_ and member variable valueOf_.
Elements that are defined to contain both text and nested child elements have "mixed content". generateDS.py provides access to mixed content, but the generated data structures (classes) are fundamentally different from that generated for other elements. See section Mixed content for more details.
Note that elements defined with attributes but with no nested sub-elements do not need to be declared as "mixed". For these elements, character data is captured in a member variable valueOf_, and can be accessed with member methods getValueOf_ and setValueOf_.
generateDS.py supports anyAttribute. For example, if an element is defined as follows:
<xs:element name="Tool"> <xs:complexType> <xs:attribute name="PartNumber" type="xs:string" /> <xs:anyAttribute processContents="skip" /> </xs:complexType> </xs:element>
Then generateDS.py will generate a class with a member variable anyAttributes_ containing a dictionary. Any attributes found in the instance XML document that are not explicitly defined for this element will be stored in this dictionary. generateDS.py also generates getters and setters as well as code for parsing and export. generateDS.py ignores processContents. See section anyAttribute for more details.
generateDS.py now generates subclasses for extensions, that is when an element definition contains something like this:
<xs:extension base="sometag">
Limitation -- There is an important limitation, however: member names duplicated (overridden ?) in an extension generate erroneous code. Sigh. I guess I needed something more to do.
Several of the generated methods have been refactored so that subclasses can reuse the code in their superclasses. Take a look at the generated code to learn how to use it.
The Python compiler/interpreter requires that it has seen a superclass before it sees the subclass that uses it. Because of this, generateDS.py delays generating a subclass until after its superclass has been generated. Therefore, the order in which classes are generated may be different from what you expect.
generateDS.py now handles definition and use of attribute groups. For example: the use of something like the following:
<xs:attributeGroup name="favorites"> <xs:attribute name="fruit" /> <xs:attribute name="vegetable" /> </xs:attributeGroup>
And, a reference or use like the following:
<xs:element name="person"> <xs:complexType mixed="0"> <xs:attributeGroup ref="favorites" /> o o o
Results in generation of class person that contains members fruit and vegetable.
generateDS.py now handles a limited range of substitution groups, but, there is an important limitation, in particular generateDS.py handles substitution groups that involve complex types, but does not handle those that involve (substitute for) simple types (for example, xs:string, xs:integer, etc). This is because the code generated for members defined as simple types does not provide the needed information to handle substitution groups.
generateDS.py supports some, but not all, simple types defined in "XML Schema Part 0: Primer Second Edition" ( http://www.w3.org/TR/xmlschema-0/. See section "Simple Types" and appendix B). Validation is performed for some simple types. When performed, validation is done while the XML document is being read and instances are created.
Here is a list of supported simple types:
generateDS.py generates minimal support for members defined as simpleType. However, the code generated by generateDS.py does not enforce restrictions. For notes on how to enforce restrictions, see section simpleType and validators.
A simpleType can be a restriction on a primitive type or on a defined element type. So, for example, the following will generate valid code:
<xs:element name="percent"> <xs:simpleType> <xs:restriction base="xs:integer"> <xs:minInclusive value="1"/> <xs:maxInclusive value="100"/> </xs:restriction> </xs:simpleType> </xs:element>
And, the following will also generate valid code:
<xs:simpleType name="emptyString"> <xs:restriction base="xs:string"> <xs:whiteSpace value="collapse"/> </xs:restriction> </xs:simpleType> <xs:element name="merge"> <xs:complexType> <xs:simpleContent> <xs:extension base="emptyString"> <xs:attribute name="fromTag" type="xs:string"/> <xs:attribute name="toTag" type="xs:string"/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element>
For elements defined with maxOccurs="unbounded", generateDS.py generates code that processes a list of elements.
For elements defined with minOccurs="0" and maxOccurs="1", generateDS.py generates code that exports an element only if that element has a (non-None) value.
Here are a few notes that should help you use validator methods to enforce restrictions.
Default behavior -- The generated code, by default, treats the value of a member whose type is a simpleType as if it were declared as type xs:string.
Validator method stubs -- For a member variable name declared as a simpleType named X, a validator method validate_X is generated. Example -- from:
<xs:simpleType name="tAnyName"> <xs:restriction base="xs:string"/> </xs:simpleType>
The class generated by generateDS.py will contain the following method definition:
def validate_tAnyName(self, value): # Validate type tAnyName, a restriction on xs:string. pass
Calls to validator methods -- For a member variable declared as a simpleType X, a call to validate X is added to the build method. Example -- from:
<xs:element name="person"> <xs:complexType mixed="0"> <xs:sequence> <xs:element name="test2" type="tAnyName"/> </xs:sequence> </xs:complexType> </xs:element>
generateDS.py produces the following call:
self.validate_tAnyName(self.test2) # validate type tAnyName
Code bodies for validator methods can be added either (1) manually or (2) automatically from an external source. See command line option "--validator-bodies" and see below.
You can add code to the validator method stub to enforce the restriction for the base type and further restrictions imposed on that base type. This can be done in the following ways:
The support for simpleType in generateDS.py has the following limitations (among others, I'm sure):
It only works for simpleType defined with and referenced through a name. It does not work for "in-line" definitions. So, for example, the following works:
<xs:element name="person"> <xs:complexType> <xs:sequence> <xs:element name="test3" type="tAnyName"/> </xs:sequence> </xs:complexType> </xs:element> <xs:simpleType name="tAnyName"> <xs:restriction base="xs:string"/> </xs:simpleType>
But, the following does not work:
<xs:element name="person"> <xs:complexType> <xs:sequence> <xs:element name="test3"> <xs:simpleType name="tAnyName"> <xs:restriction base="xs:string"/> </xs:simpleType> </xs:element> </xs:sequence> </xs:complexType> </xs:element>
Attributes defined as a simple type are not supported.
By default, generateDS.py will insert content from files referenced by include elements into the XML Schema to be processed. This behavior can be turned off by using the "--no-process-includes" command line option.
include elements are processed and the referenced content is inserted in the XML Schema by importing and using process_includes.py, which is included in the generateDS.py distribution.
process_includes.py will use either lxml or ElementTree, but its preference is lxml because lxml attempts to preserve namespace prefixes. So if your XML Schemas have <include ... /> elements in them, you might want to consider installing lxml, even though ElementTree is in the Python standard library for Python versions >= 2.5.
The include file processing is capable of retrieve included files via FTP and HTTP internet protocols as well as from the local file system.
Also see command line option "--search-path" (see Running generateDS.py), which can be used to specify a colon separated list of directories where include processing will look for included schemas.
generateDS.py has support for abstract types. For more on this, see: XML Schema Part 0: Primer Second Edition: Abstract Elements and Types -- http://www.w3.org/TR/xmlschema-0/#abstract.
Note: Quite a bit of work has been done on generateDS.py since this section was written. So, it accepts and processes more of features in XML Schema than earlier. The best advice is to give it a try on your schema. If it works, great. If it does not, post a message to the list: generateds-users -- https://lists.sourceforge.net/lists/listinfo/generateds-users.
generateDS.py actually accepts a subset of XML Schema. The sample XML Schema file should give you a picture of how to describe an XML file and the Python classes that you will generate. And here are some notes that should help:
Specify the tag in the XML file and the name of the generated Python class in the name attribute on the xs:element. For example, to generate a Python class named "person", which will be populated from an XML element/tag "person", use the following XML Schema snippet:
<xs:element name="person" ...
To specify a data member for a generated Python class that will be propogated from an attribute in an element in an XML file, use the XML Schema xs:attribute. For attributes, generateDS recognizes the following types: "xs:string", "xs:integer", and "xs:float". For example, the following adds member data items "hobby" and "category" with types "xs:string" and "xs:integer":
<xs:element name="person"> <complexType> <xs:attribute name="hobby" type="xs:string" /> <xs:attribute name="category" type="xs:integer" /> </complexType> </xs:element>
To specify a data member for a generated Python class whose value is a string, integer, or float and which will be populated from a nested (simple) element, specify a nested XML Schema element whose type is "xs:string", "xs:integer", or "xs:float". Here is an example which defines a Python class "person" with a data member "description" which is a string and which is populated from a (simple) nested element:
<xs:element name="person"> <complexType> <sequence> <xs:element name="description" type="xs:string" /> <sequence> </complexType> </xs:element>
To specify a data member of a generated Python class that will be populated from a nested XML element, refer to the nested object in the "type" attribute and then define another element/type whose name is that type. For example, the following specifies that the person class will have a data member named "transportation" that will be populated from a nested XML element "bicycle" and whose value will be an instance of the generated class "bicycle":
<xs:element name="person"> <complexType> <sequence> <xs:element name="transportation" type="bicycle" /> <sequence> </complexType> </xs:element> <xs:element name="bicycle"> o o o </xs:element>
To specify a data member of a generated Python class that will contain a list of instances of a generated classes and populated from nested XML elements, add the "maxOccurs" attribute with value "unbounded". Here is an example:
<xs:element name="person"> <complexType> <sequence> <xs:element name="transportation" type="bicycle" maxOccurs="unbounded" /> <xs:element name="description" type="xs:string" maxOccurs="unbounded" /> <sequence> </complexType> </xs:element> <xs:element name="bicycle"> o o o </xs:element>
Here are a few additional rules that will help you to write XML Schema files for generateDS.py:
Here are a few additional constructions that generateDS.py understands.
You can use the <complexType> element at top level (instead of <element>) to define an element. So, for example, instead of:
<xs:element name="server-type"> <xs:complexType> <xs:sequence> <xs:element name="server-name" type="xs:string"/> <xs:element name="server-description" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element>
you can use the following, which is equivalent:
<xs:complexType name="server-type"> <xs:sequence> <xs:element name="server-name" type="xs:string"/> <xs:element name="server-description" type="xs:string"/> </xs:sequence> </xs:complexType>
You can use the "ref" attribute to refer to another element definition, instead of using the "name" and "type" attributes. So, for example, you can use the following:
<xs:element name="server-info"> <xs:complexType> <xs:sequence> <xs:element name="server-comment" type="xs:string"/> <xs:element ref="server-type" /> </xs:sequence> </xs:complexType> </xs:element> in place of this: <xs:element name="server-info"> <xs:complexType> <xs:sequence> <xs:element name="server-comment" type="xs:string"/> <xs:element name="server-type" type="server-type"/> </xs:sequence> </xs:complexType> </xs:element>
generateDS.py generates a subclass for each element that that is defined as the extension of a base element. So, for the following:
<xs:complexType name="BType"> <xs:complexContent> <xs:extension base="AType"> <xs:sequence> o o o
generateDS.py will generate something like the following:
class BType(AType): o o o
generateDS.py generates special code to handle elements defined as containing mixed content, that is elements defined with attribute mixed="true". See section Mixed content for more details.
With the use of the "-b" command line option, generateDS.py will also accept as input an XML document instance that describes behaviors to be added to subclasses when the subclass file is generated with the "-s" command line option.
An example is provided in the Demos/Xmlbehavior sub-directory of the distribution.
The XMLBehaviors capability in generateDS.py was inspired and, for the most part, designed by gian paolo ciceri (gp.ciceri@suddenthinks.com). This work is part of our work on our application development project for Quixote.
This section describes the XMLBehavior XML document that is used as input to generateDS.py. The XMLBehavior XML document is an XML instance document (given as an argument to the "-b" command line flag) that describes behaviors (methods) to be added to class definitions in the subclass file (generated with the "-s" command line flag).
See file xmlbehavior_po.xml in the Demos/Xmlbehavior directory in the distribution for an example that you can use as a model.
The elements in the XMLBehavior document type are the following:
generateDS.py contains a function get_impl_body() that implements the ability to retrieve implementation bodies. The current implementation retrieves implementation bodies from an Internet Web URL. Other sources for implementation bodies can be implemented by modifying get_impl_body().
As an example, the version that follows first tries to retrieve an implementation body from a Web address and, if that fails, attempts to obtain the implementation body from a file in the local file system using the <xb:base-impl-url> as a path to a directory containing files, each of which contains one implementation body and <xb:impl-url> as the file name. This implementation of get_impl_body was provided by Colin Dembovsky of Systemsfusion Inc. Thanks, Colin. (I've included it in the generateDS.py script, but commented out, for those who want to use and possibly extend it.):
def get_impl_body(classBehavior, baseImplUrl, implUrl): impl = ' pass\n' if implUrl: trylocal = 0 if baseImplUrl: implUrl = '%s%s' % (baseImplUrl, implUrl) try: implFile = urllib2.urlopen(implUrl) impl = implFile.read() implFile.close() except: trylocal = 1 if trylocal: try: implFile = file(implUrl) impl = implFile.read() implFile.close() except: print '*** Implementation at %s not found.' % implUrl return impl
Here are additional features, contributed by users such as Chris Allan. Many thanks.
xsd:list elements can be used with a child xsd:simpleType which confuses the XschemaHandler stack unrolling. xsd:list element support should allow the following XML Schema definition to be supported in generateDS.py:
<xsd:attribute name="Foo"> <xsd:simpleType> <xsd:list> <xsd:simpleType> ... </xsd:simpleType> </xsd:list> </xsd:simpleType> </xsd:attribute>
The enumerated values for the parent element are resolved and made available through the instance attribute values.
In order to properly resolve and query types which are unions in an XML Schema, an element's membership in an xsd:union is available through the instance attribute unionOf.
When a parent xsd:choice is exists, an element's "maxOccurs" and "minOccurs" values can be inherited from the xsd:choice rather than the element itself. xsd:choice elements have been added to the child element via the choice instance attribute and are now used in the "maxOccurs" and "minOccurs" attribute resolution. This should allow the following XML Schema definition to be supported in generateDS.py:
<xsd:element name="Foo"> <xsd:complexType> <xsd:choice maxOccurs="unbounded"> <xsd:element ref="Bar"/> <xsd:element ref="Baz"/> </xsd:choice> </xsd:complexType> </xsd:element>
Some applications require the availability of the "minOccurs" attribute in addition to the previous minimal support of "optionality". This is available through the getMinOccurs method (which follows the style of the existing API).
The previous content type and base type resolution is insufficient for some needs. Basically it was unable to handle more complex and shared element and simpleType definitions. This support has been extended to more correctly resolve the base type and properly indicate the content type of the element. This should provide the ability to handle more complex XML Schema definitions in generateDS.py. Documentation on the algorithm for how this is achieved is available as comments in the source code of generateDS.py -- see comments in method resolve_type in class XschemaElement.
Some developers working to extend the analysis and code generation in generateDS.py may be helped by additional information collected during the parsing of the XML Schema file.
Some applications need all the top level simpleTypes to be available for further queries after the SAX parser has completed its work and after all types have been resolved. These types are available as an instance attribute topLevelSimpleTypes inside XschemaHandler.
In some cases, the document produced by a call to an export method will contain elements that have namespace prefixes. For example, the following snippet contains namespace prefix "abc":
<abc:people > <abc:person> o o o </abc:person> </abc:people>
A way is needed to insert a namespace prefix definition into the generated document. Here is how generateDS.py fills that need.
Each generated export method takes an optional argument namespacedef_. If provided, the value of that parameter is inserted in the exported element. So, for example, the following call:
people.export(sys.stdout, 0, namespacedef_='xmlns:abc="http://www.abc.com/namespace"')
might produce:
<abc:people xmlns:abc="http://www.abc.com/namespace"> <abc:person> o o o </abc:person> </abc:people>
If this is an issue for you, then you may also want to consider using the "--namespacedef" command line option when you run generateDS.py. The value of this option will be passed in to the export function in the generated parse functions. So, for example, running generateDS.py as follows:
generateDS.py --namespacedef='xmlns:abc="http://www.abc.com/namespace.xsd"' -o mylib.py -s myapp.py myschema.xsd
will generate parse methods that automatically add the namespacedef_ argument to the call to export.
The simplest use is to call one of the parsing functions in the generated source file. You may be able to use one of these functions without change, or can modify one to fit your needs. generateDS.py generates the following parsing functions:
These parsing functions are generated in both the superclass and the subclass files. Note the call to the export method. You may need to comment out or un-comment this call to export according to your needs.
For example, if the generated source is in people.py, then, from the command line, run something like the following:
python people.py people.xml
Or, from within other Python code, use something like the following:
import people rootObject = people.parse('people.xml')
It might be that the generated module, when parsing an XML instance document, does not, by default, recognize the top level (root) element in an instance document. This might happen because generateDS.py does not detect the correct top level element from the XML schema or because you need to use the generated module to parse instance documents that have different top level elements. If this is the case, you might pick and use one of the following strategies:
In your schema, move the definition of the element type that defines the top level element in your instance documents to the top of the schema. By default, generateDS.py uses the first definition in the schema as the when constructing the generated parse function.
Use the "--root-element" command line option to specify top level element. But, be aware that this only works if the tag name and type name of the top level element are the same.
Modify the parse function in your generated module, replacing the class whose factory is called and the tag name passed in to the export method. For example, change:
def parse(inFileName): doc = minidom.parse(inFileName) rootNode = doc.documentElement rootObj = type1.factory() rootObj.build(rootNode) # Enable Python to collect the space used by the DOM. doc = None sys.stdout.write('<?xml version="1.0" ?>\n') rootObj.export(sys.stdout, 0, name_="type1", namespacedef_='') return rootObj
to:
def parse(inFileName): doc = minidom.parse(inFileName) rootNode = doc.documentElement rootObj = type2.factory() rootObj.build(rootNode) # Enable Python to collect the space used by the DOM. doc = None sys.stdout.write('<?xml version="1.0" ?>\n') rootObj.export(sys.stdout, 0, name_="type2", namespacedef_='') return rootObj
Notice that we've changed the two occurrences of "type1" to "type2".
Using the generated parse function as a model, create a separate module that imports your generated module. In the parse function in your module, make a change similar to that suggested above. And, of course, add any additional code needed by your application.
Write a separate module containing your own parse function that inspects the top level element of an input XML instance document and automatically determines which generated class should be used to parse it. Here is an example:
#!/usr/bin/env python import sys from optparse import OptionParser from xml.dom import minidom import mygeneratedmodule as gendsmod def get_root_tag(node): tag = node.tagName tags = tag.split(':') if len(tags) > 1: tag = tags[-1] rootClass = None if hasattr(gendsmod, tag): rootClass = getattr(gendsmod, tag) return tag, rootClass def parse(inFilename, options): doc = minidom.parse(inFilename) rootNode = doc.documentElement rootTag, rootClass = get_root_tag(rootNode) rootObj = rootClass.factory() rootObj.build(rootNode) # Enable Python to collect the space used by the DOM. doc = None sys.stdout.write('<?xml version="1.0" ?>\n') rootObj.export(sys.stdout, 0, name_=rootTag, namespacedef_='') doc = None return rootObj USAGE_TEXT = """ python %prog [options] <somefile.xml>""" def usage(parser): parser.print_help() sys.exit(1) def main(): parser = OptionParser(USAGE_TEXT) (options, args) = parser.parse_args() if len(args) == 1: infilename = args[0] parse(infilename, options) else: usage(parser) if __name__ == "__main__": main()
Notice the call to get_root_tag, which attempts to recognize the top level tag in the input XML document so that the parse function can parse and export it.
The generated classes contain methods export and exportLiteral which can be called to export classes to several text formats, in particular to an XML instance document and a Python module containing Python literals. See the generated parse functions for examples showing how to call the export methods.
The export method in generated classes writes out an XML document that represents the instance that contains it and its child elements. So, for example, if your instance tree was created by one of the parsing functions described above, then calling export on the root element should reproduce the input XML document, differing only with respect to ignorable white space.
generateDS.py generates Python classes that represent the elements in an XML document, given an Xschema definition of the XML document type. The exportLiteral method will export a Python literal representation of the Python instances of the classes that represent an XML document.
When generateDS.py generates the Python source code for your classes, this new feature also generates an exportLiteral method in each class. If you call this method on the root (top-most) object, it will write out a literal representation of your class instances as Python code.
generateDS.py also generates a function at top level (parseLiteral) that parses an XML document and calls the "exportLiteral" method on the root object to write the data structure (instances of your generated classes) as a Python module that you can import to (re-)create instances of the classes that represent your XML document.
generateDS.py was designed and built with the assumption that we are not interested in marking up text content at all. What we really want is a way to represent structured and nested date in text. It takes the statement, "I want to represent nested data structures in text.", entirely seriously. Given that assumption, there may be times when you want a more "Pythonic" textual representation of the Python data structures for which generateDS.py has generated code. exportLiteral enables you to produce that representation.
This feature means that the classes that you generate from an XML schema support the interchangeability of XML and Python literals. This means that, given classes generated by generateDS.py for your XML document type, you can perform the following transformations:
This capability enables you to:
See the generated function parseLiteral for an example of how to use exportLiteral.
If you have an instance of a minidom node that represents an element in an XML document, you can also use the 'build' member function to populate an instance of the corresponding class. Here is an example:
from xml.dom import minidom from xml.dom import Node doc = minidom.parse(inFileName) rootNode = doc.childNodes[0] people = [] for child in rootNode.childNodes: if child.nodeType == Node.ELEMENT_NODE and child.nodeName == 'person': obj = person() obj.build(child) people.append(obj)
If you choose to use the generated subclass module, and I encourage you to do so, you may need to edit and modify that file. Here are some of the things that you must do (look for "???"):
You can also (and most likely will want to) add methods to the generated classes. See the section How to Modify the Generated Code for more on this.
The classes generated from each element definition provide getter and setter methods to access its attributes and child elements.
Elements that are referenced but not defined (i.e. that are simple, for example strings, integers, floats, and booleans) are accessed through getter and setter methods in the class in which they are referenced.
Element definitions that contain attributes but no nested child elements provide access to their data content through getter and setter methods getValueOf_ and setValueOf_ and member variable valueOf_.
The goal of generateDS.py is to support data structures represented in XML as opposed to text mark-up. However, it does provides some support for mixed content. But, for mixed content, the data structures and code generated by generateDS.py are fundamentally different from those for elements that do not contain mixed content.
There are limitations, of course. A known limitation is related to extension elements. Specifically, if an element contains mixed content, and this element extends a base class, then the base class and any classes it extends must be defined to contain mixed content. This is due to the fact that generateDS.py generates a data structure (class) for elements containing mixed content that is fundamentally different from that generated for other elements.
Here is an example of mixed content:
<note>This is a <bold>nice</bold> comment.</note>
When an element is defined with something like the following:
<xs:complexType mixed="true"> <xs:sequence> o o o
then, instead of generating a class whose named members refer to nested elements, a class containing a list of instances of class MixedContainer is generated. In order to process the content of a mixed content element, the code you write will need to walk this list of instances of MixedContainer and check the type of each item in that list. Basically, the structure becomes more DOM-like in the sense that it has a list of children, rather than named fields.
Instances of MixedContainer have the following methods:
Note that elements defined with attributes but with no nested sub-elements do not need to be declared as "mixed". For these elements, character data is captured in a member variable valueOf_, and can be accessed with member methods getValueOf_ and setValueOf_.
For elements that specify anyAttributes, generateDS.py produces a class containing the following:
Note: Attributes that are explicitly defined for an element are not stored in the dictionary anyAttributes_.
generateDS.py ignores the processContents attribute on the anyAttribute element in the XML Schema
generateDS.py provides a mechanism that enables you to attach user defined methods to specific generated classes. In order to do so, create a Python module containing specifications of those methods and indicate that module on the command line with the "--user-methods" option. Example:
python generateDS.py -f --super=people_sup -o people_sup.py -s people_sub.py --user-methods=gends_user_methods people.xsd
The module named with this flag must be located where generateDS.py can import it. You might need to add the directory containing your user methods module to the PYTHONPATH environment variable.
The module specified with the "--user-methods" flag should define a variable METHOD_SPECS which contains a list of instances of a class that implements methods match_name and get_interpolated_source.
See file gends_user_methods.py for an example of this specification file and the definition of class MethodSpec. Read the comments in that file for more guidance.
The member_data_items_ class variable -- User methods, especially those attached to more than one class, are likely to need a list of the members in the current class. Each generated class has a class variable containing a list of specifications of the members in the class. Each item in this list is an instance of class MemberSpec_, which is defined near the top of your generated (super-class) file. Use the following to access the information in each member specification:
m.get_name() -- Returns the name of the member variable (a string).
m.get_data_type() -- Returns the data type of the member variable (a string). If the data type is a list, returns the terminal type, which is that last string in the list. (Also see get_data_type_chain().)
m.get_data_type_chain() -- Returns the data type of the member variable (a string or list). When the data type is a simpleType that has another simpleType as it's base or is a complexType that extends a simpleType, then the data type is a list of strings, for example:
['RelationType', 'xs:string']
The last string in the list is the terminal type, usually a built-in simple type. Note that m.get_data_type() returns the terminal (last) type.
m.get_container() -- (an integer) Indicates whether the member variable is a single item or a list/container (i.e. generated from maxOccurs > 0): 0 indicates a single item; 1 indicates a list.
There are a number of things of interest in this sample file (gends_user_methods.py):
Although, the MethodSpec class must be included in your user methods specification module, you can modify this class. For example, for special situations, it might be useful to modify either of the methods MethodSpec.match_name or MethodSpec.get_interpolated_source. These methods are called by generateDS.py. See comments on the definitions of these methods in gends_user_methods.py.
A method set_up is attached to the root class. (This user method specification module is intended to be used with people.xsd/people.xml in the Demos/People directory.) It performs initialization, before the walk method is called. In particular, set_up initializes a counter and imports the types module (which saves us from having to modify the generated code).
The walk_and_update and walk_and_show methods provide an example showing how to walk the entire document object tree.
The method walk_and_update uses the member_data_items_ class variable to obtain a list of members of the class. It's a list of instances of class MemberSpec_, which support the m.get_name(), m.get_data_type()``, and m.get_container() methods described above.
In method walk_and_show, note the use of getattr to retrieve the value of a member variable and the use of setattr to set the value of a member variable.
The expression "%(class_name)s" is used to insert the class name into the generated source code.
Notice how the types module is used to determine whether a member variable contains a simple type or an instance of a class. Example:
obj1 = getattr(self, member[0]) if type(obj1) == types.InstanceType: ...
In string formatting operations, you will need to use double percent signs in order to "pass through" a single percent sign, for example:
print '%%d. class: %(class_name)s depth: %%d' %% (counter, depth, )
where the single percent signs are interpolated ("%(class_name)s" is replace by the class name), and double percent signs are replace by single percent signs ("%%d" becomes "%d").
Suggestion -- How to begin:
generateDS.py generates calls to several methods that each have a default implementation in a superclass. The default superclass with default implementations is included in the generated code. The user can replace this default superclass by implementing a module named generatedssuper.py containing a class named GeneratedsSuper.
What to look for in the generated code:
Currently the following methods are implemented:
class GeneratedsSuper: def format_string(self, input_data, input_name=''): return input_data def format_integer(self, input_data, input_name=''): return '%d' % input_data def format_float(self, input_data, input_name=''): return '%f' % input_data def format_double(self, input_data, input_name=''): return '%e' % input_data def format_boolean(self, input_data, input_name=''): return '%s' % input_data
Caution: Overriding any of the format_xxxx() methods enables you to export invalid XML. So, use at your own risk, test before using, etc.
How to modify the behavior of the default methods:
Where to put (implement) methods that override the default methods -- You can place the implementations of methods that override the default methods in the following places:
In a class named GeneratedsSuper in a separate module named generatedssuper. Since this class would replace the default implementations, you should provide implementations of all the default methods listed above in that class. The distribution contains a generatedssuper.py which you can modify for your specific needs.
In individual generated (super) classes (the ones generated with the "-o" command line option) using the User Methods feature.
In individual classes in a subclass module generated with the "-s" command line option.
If you want to use the same method in more than one generated subclass, then you might consider putting that method in a "mix-in" class and inherited that method in the generated subclass. With this approach, you must put the mix-in class containing your methods before the regular superclass, so that Python will find your custom methods before the default ones. That is, you must use:
class clientSub(MySpecialMethods, supermod.client):
not:
class clientSub(supermod.client, MySpecialMethods):
If you choose to implement module generatedssuper, here are a few suggestions:
Implement a module generatedssuper.py containing definition of a class GeneratedsSuper. Note that a default version of this module is included in the distribution.
Put this module in a location where it can be imported when your generated code is run.
An easy way to begin is to copy the default definition of the class GeneratedsSuper from a module generated with the "-o" command line option into a module named generatedssuper.py. Then modify your (copied) implementation. Or, use the default module included in the distribution.
To implement a method that does a task specific to particular class or a particular member of a class, do something like the following:
def format_string(self, input_data, input_name=''): if self.__class__.__name__ == 'person': return '[[%s]]' % input_data else: return input_data
or:
def format_string(self, input_data, input_name=''): if self.__class__.__name__ == 'booster' and input_name == 'lastname': return '[[%s]]' % input_data else: return input_data
Alternatively, to attach a method to a specific class, use the User Methods or a generated subclass module (command line option "-s"), as described above.
This section attempts to explain how to modify and add features to the generated code.
You can add new member definitions to a generated class. Look at the 'export' and 'exportLiteral' member functions for examples of how to access member variables and how to walk nested sub-elements.
Here are interesting places to look in each class definition:
And, if you need methods that are common to and shared by several of the generated subclasses, you can put them in a new class and add that class to the superclass list for each of your subclasses.
Although you can add your own methods to the generated superclasses, I'm recommeding that you add methods to the generated subclasses in the subclass module generated with the "-s" command line option, and then edit the subclass module in order to build your application. Why?
Here are some alternatives to using the subclass file:
Under the directory Demos are several examples:
Suggested uses:
The following extension employs a user method (see User Methods) in order to capture elements defined as xs:date as date objects.
Thanks to Lars Ericson for this code and explanation.
By default, generateDS.py treats elements declared as type xs:date as though they are strings.
To get xs:dates stored as dates, in your local copy, add the following user method (User Methods), a slight modification of the sample (in gends_user_methods.py):
method1 = MethodSpec(name='walk_and_update', source='''\ def walk_and_update(self): members = %(class_name)s.member_data_items_ for member in members: obj1 = getattr(self, member.get_name()) if member.get_data_type() == 'xs:date': newvalue = date_calcs.date_from_string(obj1) setattr(self, member.get_name(), newvalue) elif member.get_container(): for child in obj1: if type(child) == types.InstanceType: child.walk_and_update() else: obj1 = getattr(self, member.get_name()) if type(obj1) == types.InstanceType: obj1.walk_and_update() ''', class_names=r'^.*$', )
Then, define date_calcs.py as:
#!/usr/bin/env python # -*- mode: pymode; coding: latin1; -*- import datetime # 2007-09-01 # test="2007-09-01" # print test # print date_from_string(test) def date_from_string(str): year = int(str[:4]) month = int(str[5:7]) day = int(str[8:10]) dt = datetime.date(year, month, day) return dt
And, add a "str" here in generateDS.py:
def quote_xml(inStr): s1 = str(inStr) s1 = s1.replace('&', '&') s1 = s1.replace('<', '<') s1 = s1.replace('"', '"') return s1
Also, add these imports to TEMPLATE_HEADER in generateDS.py:
import date_calcs import types
There are things in Xschema that are not supported. You will have to use a restricted sub-set of Xschema to define your data structures. See above for supported features. See people.xsd and people.xml for examples.
And, then, try it on your XML Schema, and let me know about what does not work.
Warning -- This section describes an optional generated SAX parser which, I believe, is currently broken for all but the simplest schemas. Generation of a SAX parser has not been updated for the latest changes to generateDS.py. In particular, when names of elements are reused (in different parent elements), the SAX parser becomes confused. Until I've been able to figure out how to fix this, you are advised not to use the SAX parser.
generateDS.py generates two kinds of parsers: one kind is based on SAX and the other is build on minidom. See the generated functions saxParse, parse(), and parseString(). Using the SAX parser instead of the minidom parser should reduce memory requirements for large documents, since the minidom parser, but not the SAX parser, constructs a DOM tree for the entire document in memory.
However, both styles of parsers construct instances of the data structures generated by generateDS.py. This means that, even when the SAX parser is used, generateDS.py may not be well-suited for applications that read large XML documents, although what "large" means depends on your hardware. Notice that the minidom parsing functions (parse() and parseString()) over-write the variable doc so as to enable Python to reclaim the space occupied by the DOM tree, which may help alleviate the memory problem to some extent when the minidom parser is used.
While generateDS.py itself does not process XML Schema include elements, the distribution provides a script process_includes.py that can be used as a preprocessor. This script scans your XML Schema document and, recursively, documents that are included looking for include elements; it inserts all content into a single document, which it writes out.
Since process_includes.py uses the ElementTree API, in order to use process_includes.py you will need one of the following:
Here are samples of how you might use process_includes.py, if your schema contains include elements.
Example 1:
$ python process_includes.py definitions1.xsd | \ $ python generateDS.py -f --super=task1sup -o task1sup.py -s task1sub.py -
Example 2:
$ python process_includes.py definitions1.xsd tmp.xsd $ python generateDS.py -f --super=task1sup -o task1sup.py -s task1sub.py tmp.xsd
For help and usage information, run the following:
$ python process_includes.py --help
Many thanks to those who have used generateDS.py and have contributed their comments and suggestions. These comments have been valuable both in teaching me about things I needed to know in order to continue work and in motivating me to do the work in the first place.
And, a special thanks to those of you who have contributed patches for fixes and new features. Recent help has been provided by the following among others:
Python: The Python home page.
Dave's Page: My home page, which contains more Python stuff.