Author: | Dave Kuhlman |
---|---|
Address: | dkuhlman@rexx.com http://www.rexx.com/~dkuhlman |
revision: | 1.19a |
---|
date: | October 22, 2009 |
---|
copyright: | Copyright (c) 2004 Dave Kuhlman. This documentation and the software it describes is covered by The MIT License: http://www.opensource.org/licenses/mit-license. |
---|---|
abstract: | This document is an introduction and tutorial to the use of generateDS.py which generates Python data structures (for example, class definitions) from an XML schema document. These data structures represent the elements in an XML document described by the XML schema. It also generates parsers that load an XML document into those data structures. In addition, a separate file containing subclasses (stubs) is optionally generated. The user can add methods to the subclasses in order to process the contents of an XML document. |
Contents
Note: If you plan to work through this tutorial, you may find it useful to get the distribution file for this tutorial. It contains the sample code and files discussed below. You can find it in the generateDS.py distribution file at generateds_tutorial.zip or at http://www.rexx.com/~dkuhlman/generateds_tutorial.zip
generateDS.py generates Python data structures (for example, class definitions) from an XML schema document. These data structures represent the elements in an XML document described by the XML schema. It also generates parsers that load an XML document into those data structures. In addition, a separate file containing subclasses (stubs) is optionally generated. The user can add methods to the subclasses in order to process the contents of an XML document.
The generated Python code contains:
Each generated class contain the following:
The generated subclass file contains one (sub-)class definition for each data representation class. If the subclass file is used, then the parser creates instances of the subclasses (instead of creating instances of the superclasses). This enables the user to extend the subclasses with "tree walk" methods, for example, that process the contents of the XML file. The user can also generate and extend multiple subclass files which use a single, common superclass file, thus implementing a number of different processes on the same XML document type.
This document introduces the user to generateDS.p and walks the user through several examples that show how to generate Python code and how to use that generated code.
Use the following to get help:
$ generateDS.py --help
I'll assume that generateDS.py is in a directory on your path. If not, you should do whatever is necessary to make it accessible and executable.
Here is a simple XML schema document:
And, here is how you might generate classes and subclasses that provide data bindings (a Python API) for the definitions in that schema:
$ generateDS.py -o people_api.py -s people_sub.py people.xsd
And, if you want to automatically over-write the generated Python files, use the -f command line flag to force over-write without asking:
$ generateDS.py -f -o people_api.py -s people_sub.py people.xsd
And, to hard-wire the sub-class file so that it imports the API module, use the --super command line file. Example:
$ generateDS.py -o people_api.py people.xsd $ generateDS.py -s people_appl1.py --super=people_api people.xsd
Or, do both at the same time with the following:
$ generateDS.py -o people_api.py -s people_appl1.py --super=people_api people.xsd
And, for your second application:
$ generateDS.py -s people_appl2.py --super=people_api people.xsd
If you take a look inside these two "application" files, you will see and import statement like the following:
import people_api
If you had not used the --super command line option when generating the "application" files, then you could modify that statement yourself.
Why does this work? Why can we run the generated code as a Python script? -- If you look at the generated code, down near the end of the file you'll find a main() function that calls a function named parse(). The parse function does the following:
Except for some indentation (ignorable whitespace), this exported XML should be the same as the original XML document. So, that gives you a reasonably thorough test of your generated code.
And, that parse() function gives you a hint of how you might build your own application-specific code that uses the generated API (those generated Python classes).
Now that you have generated code for your data model, you can test it by running it as an application. Suppose that you have an XML instance document people1.xml that satisfies your schema. Then you can parse that instance document and export it (print it out) with something like the following:
$ python people_api.py people1.xml
And, if you have used the --super command line option, as I have above, to connect your sub-class file with the super-class (API) file, then you could use the following to do the same thing:
$ python people_appl1.py people1.xml
You may want to merely skim this section for now, then later refer back to it when some of these options are are used later in this tutorial. Also, remember that you can get information about more command line options used by generateDS.py by typing:
$ python generateDS.py --help
and by reading the document http://www.rexx.com/~dkuhlman/generateDS.html
Use this option to tell generateDS.py which of the elements defined in your XM schema is the "root" element. The root element is the outer-most (top-level) element in XML instance documents defined by this schema. In effect, this tells your generated modules which element to use as the root element when parsing and exporting documents.
generateDS.py attempts to guess the root element, usually the first element defined in your XML schema. Use this option when that default is not what you want.
generateDS.py generates Python code which, with no modification, will parse and then export an XML document defined by your schema. However, you are likely to want to go beyond that. In many situations you will want to construct an application around processing your XML documents or to use the generated code from your application.
One strategy is to generate a sub-class file and to add your application-specific code to that. Generate the sub-class file with the "-s" command line flag:
$ generateDS.py -s myapp.py sample.xsd
Now add some application-specific code to myapp.py, for example:
class peopleSub(supermod.people): def __init__(self, comments=None, person=None, programmer=None, python_programmer=None, java_programmer=None): supermod.people.__init__(self, comments, person, programmer, python_programmer, java_programmer) def fancyexport(self, outfile): outfile.write('Starting fancy export') for person in self.get_person(): person.fancyexport(outfile) supermod.people.subclass = peopleSub # end class peopleSub class personSub(supermod.person): def __init__(self, vegetable=None, fruit=None, ratio=None, id=None, value=None, name=None, interest=None, category=None, agent=None, promoter=None, description=None): supermod.person.__init__(self, vegetable, fruit, ratio, id, value, name, interest, category, agent, promoter, description) def fancyexport(self, outfile): outfile.write('Fancy person export') supermod.person.subclass = personSub # end class personSub
In this approach you might do things like the following:
Get to know the generated export API by inspecting the generated code in the super-class file. That's the file generated with the "-o" command line flag.
What to look for:
Now, you can import your generated API module, and use it to construct and manipulate objects. Here is an example using code generated with the "people" schema:
import sys import tmp9sup as api def test(names): people = api.people() for count, name in enumerate(names): id = '%d' % (count + 1, ) person = api.person(name=name, id=id) people.add_person(person) people.export(sys.stdout, 0) test(['albert', 'betsy', 'charlie'])
Run this and you might see something like the following:
$ python tmp.py <people > <person id="1"> <name>albert</name> </person> <person id="2"> <name>betsy</name> </person> <person id="3"> <name>charlie</name> </person> </people>
An you can combine the above two methods.
Here are the relevant, modified sub-classes:
import people_api as supermod class peopleSub(supermod.people): def __init__(self, comments=None, person=None, programmer=None, python_programmer=None, java_programmer=None): supermod.people.__init__(self, comments, person, programmer, python_programmer, java_programmer) def upcase_names(self): for person in self.get_person(): person.upcase_names() supermod.people.subclass = peopleSub # end class peopleSub class personSub(supermod.person): def __init__(self, vegetable=None, fruit=None, ratio=None, id=None, value=None, name=None, interest=None, category=None, agent=None, promoter=None, description=None): supermod.person.__init__(self, vegetable, fruit, ratio, id, value, name, interest, category, agent, promoter, description) def upcase_names(self): self.set_name(self.get_name().upper()) supermod.person.subclass = personSub # end class personSub
Notes:
Here is the application itself:
import sys import upcase_names_api as api def create_people(names): people = api.peopleSub() for count, name in enumerate(names): id = '%d' % (count + 1, ) person = api.personSub(name=name, id=id) people.add_person(person) return people def main(): names = ['albert', 'betsy', 'charlie'] people = create_people(names) people.export(sys.stdout, 0) people.upcase_names() print '-' * 50 people.export(sys.stdout, 0) main()
Notes:
And, when you run this mini-application, here is what you might see:
$ python upcase_names.py Before: <people > <person id="1"> <name>albert</name> </person> <person id="2"> <name>betsy</name> </person> <person id="3"> <name>charlie</name> </person> </people> -------------------------------------------------- After: <people > <person id="1"> <name>ALBERT</name> </person> <person id="2"> <name>BETSY</name> </person> <person id="3"> <name>CHARLIE</name> </person> </people>
There are times when you would like to implement a function or method that can perform operations on a variety of members and that needs type information about each member.
You can get help with this by generating your code with the "--member-specs" command line option. When you use this option, generateDS.py add a list or a dictionary containing an item for each member. If you want a list, then use "--member-specs=list", and if you want a dictionary, with member names as keys, then use "--member-specs=dict".
Here is an example -- In this example, we walk the document/instance tree and convert all string simple types to upper case.
Here is a schema:
<?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="contact-list" type="contactlistType" /> <xs:complexType name="contactlistType"> <xs:sequence> <xs:element name="description" type="xs:string" /> <xs:element name="contact" type="contactType" maxOccurs="unbounded" /> </xs:sequence> <xs:attribute name="locator" type="xs:string" /> </xs:complexType> <xs:complexType name="contactType"> <xs:sequence> <xs:element name="first-name" type="xs:string"/> <xs:element name="last-name" type="xs:string"/> <xs:element name="interest" type="xs:string" maxOccurs="unbounded" /> <xs:element name="category" type="xs:integer"/> </xs:sequence> <xs:attribute name="id" type="xs:integer" /> <xs:attribute name="priority" type="xs:float" /> <xs:attribute name="color-code" type="xs:string" /> </xs:complexType> </xs:schema>
We generate code with the following command line:
$ generateDS.py -f \ -o member_specs_api.py \ -s member_specs_upper.py \ --super=member_specs_api \ --member-specs=list \ member_specs.xsd
Notes:
And, here is the sub-class file (genrated with the "-s" command line option), to which we have added a bit of code that converts any string-type members to upper case. You can think of this module as a special "application" of the generated classes.
#!/usr/bin/env python # # member_specs_upper.py # # # Generated Wed Sep 23 16:39:47 2009 by generateDS.py version 1.18f. # import sys from string import lower as str_lower from xml.dom import minidom import member_specs_api as supermod # # Globals # ExternalEncoding = 'ascii' # # Utility funtions needed in each generated class. # def upper_elements(obj): for item in obj.member_data_items_: if item.get_data_type() == 'xs:string': name = remap(item.get_name()) val1 = getattr(obj, name) if isinstance(val1, list): for idx, val2 in enumerate(val1): val1[idx] = val2.upper() else: setattr(obj, name, val1.upper()) def remap(name): newname = name.replace('-', '_') return newname # # Data representation classes # class contactlistTypeSub(supermod.contactlistType): def __init__(self, locator=None, description=None, person=None): supermod.contactlistType.__init__(self, locator, description, person) def upper(self): upper_elements(self) for child in self.get_contact(): child.upper() supermod.contactlistType.subclass = contactlistTypeSub # end class contactlistTypeSub class contactTypeSub(supermod.contactType): def __init__(self, priority=None, color_code=None, id=None, first_name=None, last_name=None, interest=None, category=None): supermod.contactType.__init__(self, priority, color_code, id, first_name, last_name, interest, category) def upper(self): upper_elements(self) supermod.contactType.subclass = contactTypeSub # end class contactTypeSub def parse(inFilename): doc = minidom.parse(inFilename) rootNode = doc.documentElement rootObj = supermod.contactlistType.factory() rootObj.build(rootNode) # Enable Python to collect the space used by the DOM. doc = None sys.stdout.write('<?xml version="1.0" ?>\n') rootObj.export(sys.stdout, 0, name_="contact-list", namespacedef_='') doc = None return rootObj def parseString(inString): doc = minidom.parseString(inString) rootNode = doc.documentElement rootObj = supermod.contactlistType.factory() rootObj.build(rootNode) # Enable Python to collect the space used by the DOM. doc = None sys.stdout.write('<?xml version="1.0" ?>\n') rootObj.export(sys.stdout, 0, name_="contact-list", namespacedef_='') return rootObj def parseLiteral(inFilename): doc = minidom.parse(inFilename) rootNode = doc.documentElement rootObj = supermod.contactlistType.factory() rootObj.build(rootNode) # Enable Python to collect the space used by the DOM. doc = None sys.stdout.write('from member_specs_api import *\n\n') sys.stdout.write('rootObj = contact_list(\n') rootObj.exportLiteral(sys.stdout, 0, name_="contact_list") sys.stdout.write(')\n') return rootObj USAGE_TEXT = """ Usage: python ???.py <infilename> """ def usage(): print USAGE_TEXT sys.exit(1) def main(): args = sys.argv[1:] if len(args) != 1: usage() infilename = args[0] root = parse(infilename) if __name__ == '__main__': #import pdb; pdb.set_trace() main()
Notes:
Here is a test driver for our (mini-) application:
#!/usr/bin/env python # # member_specs_test.py # import sys from xml.dom import minidom import member_specs_upper def process(infilename): doc = minidom.parse(infilename) rootNode = doc.documentElement rootObj = member_specs_upper.contactlistTypeSub.factory() rootObj.build(rootNode) # Enable Python to collect the space used by the DOM. doc = None sys.stdout.write('<?xml version="1.0" ?>\n') rootObj.export(sys.stdout, 0, name_="contact-list", namespacedef_='') rootObj.upper() sys.stdout.write('\n') sys.stdout.write('<?xml version="1.0" ?>\n') rootObj.export(sys.stdout, 0, name_="contact-list", namespacedef_='') return rootObj USAGE_MSG = """\ Usage: python member_specs_test.py infilename """ def usage(): print USAGE_MSG sys.exit(1) def main(): args = sys.argv[1:] if len(args) != 1: usage() infilename = args[0] process(infilename) if __name__ == '__main__': main()
Notes:
When we run our application, here is the output:
$ python member_specs_test.py member_specs_data.xml <?xml version="1.0" ?> <contact-list locator="http://www.rexx.com/~dkuhlman"> <description>My list of contacts</description> <contact priority="0.050000" color-code="red" id="1"> <first-name>arlene</first-name> <last-name>Allen</last-name> <interest>traveling</interest> <category>2</category> </contact> </contact-list> <?xml version="1.0" ?> <contact-list locator="HTTP://WWW.REXX.COM/~DKUHLMAN"> <description>MY LIST OF CONTACTS</description> <contact priority="0.050000" color-code="RED" id="1"> <first-name>ARLENE</first-name> <last-name>ALLEN</last-name> <interest>TRAVELING</interest> <category>2</category> </contact> </contact-list>
Notes: