Advanced Data Mapping
Version: 0.1.0 Last Updated: 02/13/06 21:52:15
View: Paged  |  One Page

This section details all the options available to Mappers, as well as advanced patterns.

To start, heres the tables we will work with again:

from sqlalchemy import *
db = create_engine('sqlite://filename=mydb', echo=True)

# a table to store users
users = Table('users', db,
    Column('user_id', Integer, primary_key = True),
    Column('user_name', String(40)),
    Column('password', String(80))
)

# a table that stores mailing addresses associated with a specific user
addresses = Table('addresses', db,
    Column('address_id', Integer, primary_key = True),
    Column('user_id', Integer, ForeignKey("users.user_id")),
    Column('street', String(100)),
    Column('city', String(80)),
    Column('state', String(2)),
    Column('zip', String(10))
)

# a table that stores keywords
keywords = Table('keywords', db,
    Column('keyword_id', Integer, primary_key = True),
    Column('name', VARCHAR(50))
)

# a table that associates keywords with users
userkeywords = Table('userkeywords', db,
    Column('user_id', INT, ForeignKey("users")),
    Column('keyword_id', INT, ForeignKey("keywords"))
)
More On Relations
Custom Join Conditions

When creating relations on a mapper, most examples so far have illustrated the mapper and relationship joining up based on the foreign keys of the tables they represent. in fact, this "automatic" inspection can be completely circumvented using the primaryjoin and secondaryjoin arguments to relation, as in this example which creates a User object which has a relationship to all of its Addresses which are in Boston:

class User(object):
    pass
class Address(object):
    pass
Address.mapper = mapper(Address, addresses)
User.mapper = mapper(User, users, properties={
    'boston_addreses' : relation(Address.mapper, primaryjoin=
                and_(users.c.user_id==Address.c.user_id, 
                Addresses.c.city=='Boston'))
})

Many to many relationships can be customized by one or both of primaryjoin and secondaryjoin, shown below with just the default many-to-many relationship explicitly set:

class User(object):
    pass
class Keyword(object):
    pass
Keyword.mapper = mapper(Keyword, keywords)
User.mapper = mapper(User, users, properties={
    'keywords':relation(Keyword.mapper, 
        primaryjoin=users.c.user_id==userkeywords.c.user_id,
        secondaryjoin=userkeywords.c.keyword_id==keywords.c.keyword_id
        )
})
back to section top
Lazy/Eager Joins Multiple Times to One Table

The previous example leads in to the idea of joining against the same table multiple times. Below is a User object that has lists of its Boston and New York addresses, both lazily loaded when they are first accessed:

User.mapper = mapper(User, users, properties={
    'boston_addreses' : relation(Address.mapper, primaryjoin=
                and_(users.c.user_id==Address.c.user_id, 
                Addresses.c.city=='Boston')),
    'newyork_addresses' : relation(Address.mapper, primaryjoin=
                and_(users.c.user_id==Address.c.user_id, 
                Addresses.c.city=='New York')),
})

A complication arises with the above pattern if you want the relations to be eager loaded. Since there will be two separate joins to the addresses table during an eager load, an alias needs to be used to separate them. You can create an alias of the addresses table to separate them, but then you are in effect creating a brand new mapper for each property, unrelated to the main Address mapper, which can create problems with commit operations. So an additional argument use_alias can be used with an eager relationship to specify the alias to be used just within the eager query:

User.mapper = mapper(User, users, properties={
    'boston_addreses' : relation(Address.mapper, primaryjoin=
                and_(User.c.user_id==Address.c.user_id, 
                Addresses.c.city=='Boston'), lazy=False, use_alias=True),
    'newyork_addresses' : relation(Address.mapper, primaryjoin=
                and_(User.c.user_id==Address.c.user_id, 
                Addresses.c.city=='New York'), lazy=False, use_alias=True),
})

sqlu = User.mapper.select()
back to section top
Relation Options
Keyword options to the relation function include:
  • lazy=(True|False|None) - specifies how the related items should be loaded. a value of True indicates they should be loaded when the property is first accessed. A value of False indicates they should be loaded by joining against the parent object query, so parent and child are loaded in one round trip. A value of None indicates the related items are not loaded by the mapper in any case; the application will manually insert items into the list in some other way. A relationship with lazy=None is still important; items added to the list or removed will cause the appropriate updates and deletes upon commit().
  • primaryjoin - a ClauseElement that will be used as the primary join of this child object against the parent object, or in a many-to-many relationship the join of the primary object to the association table. By default, this value is computed based on the foreign key relationships of the parent and child tables (or association table).
  • secondaryjoin - a ClauseElement that will be used as the join of an association table to the child object. By default, this value is computed based on the foreign key relationships of the association and child tables.
  • foreignkey - specifies which column in this relationship is "foreign", i.e. which column refers to the parent object. This value is automatically determined in all cases, based on the primary and secondary join conditions, except in the case of a self-referential mapper, where it is needed to indicate the child object's reference back to it's parent.
  • uselist - a boolean that indicates if this property should be loaded as a list or a scalar. In most cases, this value is determined based on the type and direction of the relationship - one to many forms a list, one to one forms a scalar, many to many is a list. If a scalar is desired where normally a list would be present, set uselist to False.
  • private - indicates if these child objects are "private" to the parent; removed items will also be deleted, and if the parent item is deleted, all child objects are deleted as well. See the example in Useful Feature: Private Relations.
  • backreference - indicates the name of a property to be placed on the related mapper's class that will handle this relationship in the other direction, including synchronizing the object attributes on both sides of the relation. See the example in Useful Feature: Backreferences.
  • order_by - indicates the ordering that should be applied when loading these items. See the section Controlling Ordering for details.
  • association - When specifying a many to many relationship with an association object, this keyword should reference the mapper of the target object of the association. See the example in Association Object.
  • use_alias - Useful with eager loads, a value of True indicates that unique alias names should be generated when creating joins against the parent table, to avoid conflicts with parallel joins of the same two tables. The unique aliasing will be propigated into all child eager joins as well to maintain their isolation. This aliasing only occurs when generating SELECT statements, and aliased columns in the result set are translated back to that of the original table when creating object instances.
  • live=False - this option is no longer used.
back to section top
Controlling Ordering

By default, mappers will not supply any ORDER BY clause when selecting rows. This can be modified in several ways.

A "default ordering" can be supplied by all mappers, by enabling the "default_ordering" flag to the engine, which indicates that table primary keys or object IDs should be used as the default ordering:

db = create_engine('postgres://username=scott&password=tiger', default_ordering=True)

The "order_by" parameter can be sent to a mapper, overriding the per-engine ordering if any. A value of None means that the mapper should not use any ordering, even if the engine's default_ordering property is True. A non-None value, which can be a column, an asc or desc clause, or an array of either one, indicates the ORDER BY clause that should be added to all select queries:

# disable all ordering
mapper = mapper(User, users, order_by=None)

# order by a column
mapper = mapper(User, users, order_by=users.c.user_id)

# order by multiple items
mapper = mapper(User, users, order_by=[users.c.user_id, desc(users.c.user_name)])

"order_by" can also be specified to an individual select method, overriding all other per-engine/per-mapper orderings:

# order by a column
l = mapper.select(users.c.user_name=='fred', order_by=users.c.user_id)

# order by multiple criterion
l = mapper.select(users.c.user_name=='fred', order_by=[users.c.user_id, desc(users.c.user_name)])

For relations, the "order_by" property can also be specified to all forms of relation:

# order address objects by address id
mapper = mapper(User, users, properties = {
    'addresses' : relation(mapper(Address, addresses), order_by=addresses.c.address_id)
})

# eager load with ordering - the ORDER BY clauses of parent/child will be organized properly
mapper = mapper(User, users, properties = {
    'addresses' : relation(mapper(Address, addresses), order_by=desc(addresses.c.email_address), eager=True)
}, order_by=users.c.user_id)
back to section top
Limiting Rows

You can limit rows in a regular SQL query by specifying limit and offset. A Mapper can handle the same concepts:

class User(object):
    pass

m = mapper(User, users)
sqlr = m.select(limit=20, offset=10)
However, things get tricky when dealing with eager relationships, since a straight LIMIT of rows does not represent the count of items when joining against other tables to load related items as well. So here is what SQLAlchemy will do when you use limit or offset with an eager relationship:
class User(object):
    pass
class Address(object):
    pass
m = mapper(User, users, properties={
    'addresses' : relation(mapper(Address, addresses), lazy=False)
})
r = m.select(User.c.user_name.like('F%'), limit=20, offset=10)
SELECT users.user_id AS users_user_id, users.user_name AS users_user_name,
users.password AS users_password, addresses.address_id AS addresses_address_id,
addresses.user_id AS addresses_user_id, addresses.street AS addresses_street,
addresses.city AS addresses_city, addresses.state AS addresses_state,
addresses.zip AS addresses_zip
FROM
(SELECT users.user_id FROM users WHERE users.user_name LIKE %(users_user_name)s
ORDER BY users.oid LIMIT 20 OFFSET 10) AS rowcount,
users LEFT OUTER JOIN addresses ON users.user_id = addresses.user_id
WHERE rowcount.user_id = users.user_id ORDER BY users.oid, addresses.oid
{'users_user_name': 'F%'}

The main WHERE clause as well as the limiting clauses are coerced into a subquery; this subquery represents the desired result of objects. A containing query, which handles the eager relationships, is joined against the subquery to produce the result.

back to section top
Overriding Column Names

When mappers are constructed, by default the column names in the Table metadata are used as the names of attributes on the mapped class. This can be customzed within the properties by stating the key/column combinations explicitly:

user_mapper = mapper(User, users, properties={
    'id' : users.c.user_id,
    'name' : users.c.user_name,
})

In the situation when column names overlap in a mapper against multiple tables, columns may be referenced together with a list:

# join users and addresses
usersaddresses = sql.join(users, addresses, users.c.user_id == addresses.c.user_id)
m = mapper(User, usersaddresses,   
    properties = {
        'id' : [users.c.user_id, addresses.c.user_id],
    }
    )
back to section top
Deferred Column Loading

This feature allows particular columns of a table to not be loaded by default, instead being loaded later on when first referenced. It is essentailly "column-level lazy loading". This feature is useful when one wants to avoid loading a large text or binary field into memory when its not needed. Individual columns can be lazy loaded by themselves or placed into groups that lazy-load together.

book_excerpts = Table('books', db, 
    Column('book_id', Integer, primary_key=True),
    Column('title', String(200), nullable=False),
    Column('summary', String(2000)),
    Column('excerpt', String),
    Column('photo', Binary)
)

class Book(object):
    pass

# define a mapper that will load each of 'excerpt' and 'photo' in 
# separate, individual-row SELECT statements when each attribute
# is first referenced on the individual object instance
book_mapper = mapper(Book, book_excerpts, properties = {
    'excerpt' : deferred(book_excerpts.c.excerpt),
    'photo' : deferred(book_excerpts.c.photo)
})

Deferred columns can be placed into groups so that they load together:

book_excerpts = Table('books', db, 
    Column('book_id', Integer, primary_key=True),
    Column('title', String(200), nullable=False),
    Column('summary', String(2000)),
    Column('excerpt', String),
    Column('photo1', Binary),
    Column('photo2', Binary),
    Column('photo3', Binary)
)

class Book(object):
    pass

# define a mapper with a 'photos' deferred group.  when one photo is referenced,
# all three photos will be loaded in one SELECT statement.  The 'excerpt' will 
# be loaded separately when it is first referenced.
book_mapper = mapper(Book, book_excerpts, properties = {
    'excerpt' : deferred(book_excerpts.c.excerpt),
    'photo1' : deferred(book_excerpts.c.photo1, group='photos'),
    'photo2' : deferred(book_excerpts.c.photo2, group='photos'),
    'photo3' : deferred(book_excerpts.c.photo3, group='photos')
})
back to section top
More on Mapper Options

The options method of mapper, first introduced in Switching Lazy/Eager, No Load, supports the copying of a mapper into a new one, with any number of its relations replaced by new ones. The method takes a variable number of MapperOption objects which know how to change specific things about the mapper. The five available options are eagerload, lazyload, noload, deferred and extension.

An example of a mapper with a lazy load relationship, upgraded to an eager load relationship:

class User(object):
    pass
class Address(object):
    pass

# a 'lazy' relationship
User.mapper = mapper(User, users, properties = {
    'addreses':relation(mapper(Address, addresses), lazy=True)
})
    
# copy the mapper and convert 'addresses' to be eager
eagermapper = User.mapper.options(eagerload('addresses'))

The load options also can take keyword arguments that apply to the new relationship. To take the "double" address lazy relationship from the previous section and upgrade it to eager, adding the "selectalias" keywords as well:

m = User.mapper.options(
        eagerload('boston_addresses', selectalias='boston_ad'), 
        eagerload('newyork_addresses', selectalias='newyork_ad')
    )

The defer and undefer options can control the deferred loading of attributes:

# set the 'excerpt' deferred attribute to load normally
m = book_mapper.options(undefer('excerpt'))

# set the referenced mapper 'photos' to defer its loading of the column 'imagedata'
m = book_mapper.options(defer('photos.imagedata'))
back to section top
Mapping a Class with Table Inheritance

Table Inheritance indicates the pattern where two tables, in a parent-child relationship, are mapped to an inheritance chain of classes. If a table "employees" contains additional information about managers in the table "managers", a corresponding object inheritance pattern would have an Employee class and a Manager class. Loading a Manager object means you are joining managers to employees. For SQLAlchemy, this pattern is just a special case of a mapper that maps against a joined relationship, and is provided via the inherits keyword.

class User(object):
    """a user object."""
    pass
User.mapper = mapper(User, users)

class AddressUser(User):
    """a user object that also has the users mailing address."""
    pass

# define a mapper for AddressUser that inherits the User.mapper, and joins on the user_id column
AddressUser.mapper = mapper(
AddressUser,
        addresses, inherits=User.mapper
        )

items = AddressUser.mapper.select()

Above, the join condition is determined via the foreign keys between the users and the addresses table. To specify the join condition explicitly, use inherit_condition:

AddressUser.mapper = mapper(
        AddressUser,
        addresses, inherits=User.mapper, 
        inherit_condition=users.c.user_id==addresses.c.user_id
    )
back to section top
Mapping a Class against Multiple Tables

The more general case of the pattern described in "table inheritance" is a mapper that maps against more than one table. The join keyword from the SQL package creates a neat selectable unit comprised of multiple tables, complete with its own composite primary key, which can be passed in to a mapper as the table.

# a class
class AddressUser(object):
    pass

# define a Join
j = join(users, addresses)

# map to it - the identity of an AddressUser object will be 
# based on (user_id, address_id) since those are the primary keys involved
m = mapper(AddressUser, j)
A second example:
# many-to-many join on an association table
j = join(users, userkeywords, 
        users.c.user_id==userkeywords.c.user_id).join(keywords, 
           userkeywords.c.keyword_id==keywords.c.keyword_id)
 
# a class 
class KeywordUser(object):
    pass

# map to it - the identity of a KeywordUser object will be
# (user_id, keyword_id) since those are the primary keys involved
m = mapper(KeywordUser, j)
back to section top
Mapping a Class against Arbitary Selects

Similar to mapping against a join, a plain select() object can be used with a mapper as well. Below, an example select which contains two aggregate functions and a group_by is mapped to a class:

s = select([customers, 
            func.count(orders).label('order_count'), 
            func.max(orders.price).label('highest_order')],
            customers.c.customer_id==orders.c.customer_id,
            group_by=[c for c in customers.c]
            )
class Customer(object):
    pass

mapper = mapper(Customer, s)

Above, the "customers" table is joined against the "orders" table to produce a full row for each customer row, the total count of related rows in the "orders" table, and the highest price in the "orders" table, grouped against the full set of columns in the "customers" table. That query is then mapped against the Customer class. New instances of Customer will contain attributes for each column in the "customers" table as well as an "order_count" and "highest_order" attribute. Updates to the Customer object will only be reflected in the "customers" table and not the "orders" table. This is because the primary keys of the "orders" table are not represented in this mapper and therefore the table is not affected by save or delete operations.

back to section top
Multiple Mappers for One Class

By now it should be apparent that the mapper defined for a class is in no way the only mapper that exists for that class. Other mappers can be created at any time; either explicitly or via the options method, to provide different loading behavior.

However, its not as simple as that. The mapper serves a dual purpose; one is to generate select statements and load objects from executing those statements; the other is to keep track of the defined dependencies of that object when save and delete operations occur, and to extend the attributes of the object so that they store information about their history and communicate with the unit of work system. For this reason, it is a good idea to be aware of the behavior of multiple mappers. When creating dependency relationships between objects, one should insure that only the primary mappers are used in those relationships, else deep object traversal operations will fail to load in the expected properties, and update operations will not take all the dependencies into account.

Generally its as simple as, the first mapper that is defined for a particular class is the one that gets to define that classes' relationships to other mapped classes, and also decorates its attributes and constructors with special behavior. Any subsequent mappers created for that class will be able to load new instances, but object manipulation operations will still function via the original mapper. The special keyword is_primary will override this behavior, and make any mapper the new "primary" mapper.

class User(object):
    pass

# mapper one - mark it as "primary", meaning this mapper will handle
# saving and class-level properties
m1 = mapper(User, users, is_primary=True)

# mapper two - this one will also eager-load address objects in
m2 = mapper(User, users, properties={
        'addresses' : relation(mapper(Address, addresses), lazy=False)
    })

# get a user.  this user will not have an 'addreses' property
u1 = m1.select(User.c.user_id==10)

# get another user.  this user will have an 'addreses' property.
u2 = m2.select(User.c.user_id==27)

# make some modifications, including adding an Address object.
u1.user_name = 'jack'
u2.user_name = 'jane'
u2.addresses.append(Address('123 green street'))

# upon commit, the User objects will be saved. 
# the Address object will not, since the primary mapper for User
# does not have an 'addresses' relationship defined
objectstore.commit()
back to section top
Circular Mapping

Oftentimes it is necessary for two mappers to be related to each other. With a datamodel that consists of Users that store Addresses, you might have an Address object and want to access the "user" attribute on it, or have a User object and want to get the list of Address objects. The easiest way to do this is via the backreference keyword described in Useful Feature: Backreferences. Although even when backreferences are used, it is sometimes necessary to explicitly specify the relations on both mappers pointing to each other.

To achieve this involves creating the first mapper by itself, then creating the second mapper referencing the first, then adding references to the first mapper to reference the second:

class User(object):
    pass
class Address(object):
    pass
User.mapper = mapper(User, users)
Address.mapper = mapper(Address, addresses, properties={
    'user':relation(User.mapper)
})
User.mapper.add_property('addresses', relation(Address.mapper))

Note that with a circular relationship as above, you cannot declare both relationships as "eager" relationships, since that produces a circular query situation which will generate a recursion exception. So what if you want to load an Address and its User eagerly? Just make a second mapper using options:

eagermapper = Address.mapper.options(eagerload('user'))
s = eagermapper.select(Address.c.address_id==12)
back to section top
Self Referential Mappers

A self-referential mapper is a mapper that is designed to operate with an adjacency list table. This is a table that contains one or more foreign keys back to itself, and is usually used to create hierarchical tree structures. SQLAlchemy's default model of saving items based on table dependencies is not sufficient in this case, as an adjacency list table introduces dependencies between individual rows. Fortunately, SQLAlchemy will automatically detect a self-referential mapper and do the extra lifting to make it work.

# define a self-referential table
trees = Table('treenodes', engine,
    Column('node_id', Integer, primary_key=True),
    Column('parent_node_id', Integer, ForeignKey('treenodes.node_id'), nullable=True),
    Column('node_name', String(50), nullable=False),
    )

# treenode class
class TreeNode(object):
    pass

# mapper defines "children" property, pointing back to TreeNode class,
# with the mapper unspecified.  it will point back to the primary 
# mapper on the TreeNode class.
TreeNode.mapper = mapper(TreeNode, trees, properties={
        'children' : relation(
                        TreeNode, 
                        private=True
                     ),
        }
    )
    
# or, specify the circular relationship after establishing the original mapper:
mymapper = mapper(TreeNode, trees)

mymapper.add_property('children', relation(
                        mymapper, 
                        private=True
                     ))

This kind of mapper goes through a lot of extra effort when saving and deleting items, to determine the correct dependency graph of nodes within the tree.

A self-referential mapper where there is more than one relationship on the table requires that all join conditions be explicitly spelled out. Below is a self-referring table that contains a "parent_node_id" column to reference parent/child relationships, and a "root_node_id" column which points child nodes back to the ultimate root node:

# define a self-referential table with several relations
trees = Table('treenodes', engine,
    Column('node_id', Integer, primary_key=True),
    Column('parent_node_id', Integer, ForeignKey('treenodes.node_id'), nullable=True),
    Column('root_node_id', Integer, ForeignKey('treenodes.node_id'), nullable=True),
    Column('node_name', String(50), nullable=False),
    )

# treenode class
class TreeNode(object):
    pass

# define the "children" property as well as the "root" property
TreeNode.mapper = mapper(TreeNode, trees, properties={
        'children' : relation(
                        TreeNode, 
                        primaryjoin=trees.c.parent_node_id==trees.c.node_id
                        private=True
                     ),
        'root' : relation(
                TreeNode,
                primaryjoin=trees.c.root_node_id=trees.c.node_id, 
                foreignkey=trees.c.node_id,
                uselist=False
            )
        }
    )

The "root" property on a TreeNode is a many-to-one relationship. By default, a self-referential mapper declares relationships as one-to-many, so the extra parameter foreignkey, pointing to the "many" side of a relationship, is needed to indicate a "many-to-one" self-referring relationship.

Both TreeNode examples above are available in functional form in the examples/adjacencytree directory of the distribution.

back to section top
Result-Set Mapping

Take any result set and feed it into a mapper to produce objects. Multiple mappers can be combined to retrieve unrelated objects from the same row in one step. The instances method on mapper takes a ResultProxy object, which is the result type generated from SQLEngine, and delivers object instances.

single object
class User(object):
    pass

User.mapper = mapper(User, users)

# select users
c = users.select().execute()

# get objects
userlist = User.mapper.instances(c)
multiple objects
# define a second class/mapper
class Address(object):
    pass
    
Address.mapper = mapper(Address, addresses)

# select users and addresses in one query
s = select([users, addresses], users.c.user_id==addresses.c.user_id)

# execute it, and process the results with the User mapper, chained to the Address mapper
r = User.mapper.instances(s.execute(), Address.mapper)

# result rows are an array of objects, one for each mapper used
for entry in r:
    user = r[0]
    address = r[1]
back to section top
Extending Mapper

Mappers can have functionality augmented or replaced at many points in its execution via the usage of the MapperExtension class. This class is just a series of "hooks" where various functionality takes place. An application can make its own MapperExtension objects, overriding only the methods it needs.

class MapperExtension(object):
    def create_instance(self, mapper, row, imap, class_):
        """called when a new object instance is about to be created from a row.  
        the method can choose to create the instance itself, or it can return 
        None to indicate normal object creation should take place.
        
        mapper - the mapper doing the operation
        row - the result row from the database
        imap - a dictionary that is storing the running set of objects collected from the
        current result set
        class_ - the class we are mapping.
        """
    def append_result(self, mapper, row, imap, result, instance, isnew, populate_existing=False):
        """called when an object instance is being appended to a result list.
        
        If it returns True, it is assumed that this method handled the appending itself.

        mapper - the mapper doing the operation
        row - the result row from the database
        imap - a dictionary that is storing the running set of objects collected from the
        current result set
        result - an instance of util.HistoryArraySet(), which may be an attribute on an
        object if this is a related object load (lazy or eager).  use result.append_nohistory(value)
        to append objects to this list.
        instance - the object instance to be appended to the result
        isnew - indicates if this is the first time we have seen this object instance in the current result
        set.  if you are selecting from a join, such as an eager load, you might see the same object instance
        many times in the same result set.
        populate_existing - usually False, indicates if object instances that were already in the main 
        identity map, i.e. were loaded by a previous select(), get their attributes overwritten
        """
    def before_insert(self, mapper, instance):
        """called before an object instance is INSERTed into its table.
        
        this is a good place to set up primary key values and such that arent handled otherwise."""
    def after_insert(self, mapper, instance):
        """called after an object instance has been INSERTed"""
    def before_delete(self, mapper, instance):
        """called before an object instance is DELETEed"""

To use MapperExtension, make your own subclass of it and just send it off to a mapper:

mapper = mapper(User, users, extension=MyExtension())

An existing mapper can create a copy of itself using an extension via the extension option:

extended_mapper = mapper.options(extension(MyExtension()))
back to section top
How Mapper Modifies Mapped Classes

This section is a quick summary of what's going on when you send a class to the mapper() function. This material, not required to be able to use SQLAlchemy, is a little more dense and should be approached patiently!

The primary changes to a class that is mapped involve attaching property objects to it which represent table columns. These property objects essentially track changes. In addition, the __init__ method of the object is decorated to track object creates.

Here is a quick rundown of all the changes in code form:

# step 1 - override __init__ to 'register_new' with the Unit of Work
oldinit = myclass.__init__
def init(self, *args, **kwargs):
    nohist = kwargs.pop('_mapper_nohistory', False)
    oldinit(self, *args, **kwargs)
    if not nohist:
        # register_new with Unit Of Work
        objectstore.uow().register_new(self)
myclass.__init__ = init

# step 2 - set a string identifier that will 
# locate the classes' primary mapper
myclass._mapper = mapper.hashkey

# step 3 - add column accessor
myclass.c = mapper.columns

# step 4 - attribute decorating.  
# this happens mostly within the package sqlalchemy.attributes

# this dictionary will store a series of callables 
# that generate "history" containers for
# individual object attributes
myclass._class_managed_attributes = {}

# create individual properties for each column - 
# these objects know how to talk 
# to the attribute package to create appropriate behavior.
# the next example examines the attributes package more closely.
myclass.column1 = SmartProperty().property('column1', uselist=False)
myclass.column2 = SmartProperty().property('column2', uselist=True)

The attribute package is used when save operations occur to get a handle on modified values. In the example below, a full round-trip attribute tracking operation is illustrated:

import sqlalchemy.attributes as attributes

# create an attribute manager.  
# the sqlalchemy.mapping package keeps one of these around as 
# 'objectstore.global_attributes'
manager = attributes.AttributeManager()

# regular old new-style class
class MyClass(object):
    pass

# register a scalar and a list attribute
manager.register_attribute(MyClass, 'column1', uselist=False)
manager.register_attribute(MyClass, 'column2', uselist=True)
    
# create/modify an object
obj = MyClass()
obj.column1 = 'this is a new value'
obj.column2.append('value 1')
obj.column2.append('value 2')

# get history objects
col1_history = manager.get_history(obj, 'column1')
col2_history = manager.get_history(obj, 'column2')

# whats new ?
>>> col1_history.added_items()
['this is a new value']

>>> col2_history.added_items()
['value1', 'value2']

# commit changes
manager.commit(obj)

# the new values become the "unchanged" values
>>> col1_history.added_items()
[]

>>> col1_history.unchanged_items()
['this is a new value']

>>> col2_history.added_items()
[]

>>> col2_history.unchanged_items()
['value1', 'value2']

The above AttributeManager also includes a method value_changed which is triggered whenever change events occur on the managed object attributes. The Unit of Work (objectstore) package overrides this method in order to receive change events; its essentially this:

import sqlalchemy.attributes as attributes
class UOWAttributeManager(attributes.AttributeManager):
    def value_changed(self, obj, key, value):
        if hasattr(obj, '_instance_key'):
            uow().register_dirty(obj)
        else:
            uow().register_new(obj)
            
global_attributes = UOWAttributeManager()

Objects that contain the attribute "_instance_key" are already registered with the Identity Map, and are assumed to have come from the database. They therefore get marked as "dirty" when changes happen. Objects without an "_instance_key" are not from the database, and get marked as "new" when changes happen, although usually this will already have occured via the object's __init__ method.

back to section top
Previous: Unit of Work | Next: The Types System