This section details all the options available to Mappers, as well as advanced patterns.
To start, heres the tables we will work with again:
from sqlalchemy import * db = create_engine('sqlite://filename=mydb', echo=True) # a table to store users users = Table('users', db, Column('user_id', Integer, primary_key = True), Column('user_name', String(40)), Column('password', String(80)) ) # a table that stores mailing addresses associated with a specific user addresses = Table('addresses', db, Column('address_id', Integer, primary_key = True), Column('user_id', Integer, ForeignKey("users.user_id")), Column('street', String(100)), Column('city', String(80)), Column('state', String(2)), Column('zip', String(10)) ) # a table that stores keywords keywords = Table('keywords', db, Column('keyword_id', Integer, primary_key = True), Column('name', VARCHAR(50)) ) # a table that associates keywords with users userkeywords = Table('userkeywords', db, Column('user_id', INT, ForeignKey("users")), Column('keyword_id', INT, ForeignKey("keywords")) )
When creating relations on a mapper, most examples so far have illustrated the mapper and relationship joining up based on the foreign keys of the tables they represent. in fact, this "automatic" inspection can be completely circumvented using the primaryjoin and secondaryjoin arguments to relation, as in this example which creates a User object which has a relationship to all of its Addresses which are in Boston:
class User(object): pass class Address(object): pass Address.mapper = mapper(Address, addresses) User.mapper = mapper(User, users, properties={ 'boston_addreses' : relation(Address.mapper, primaryjoin= and_(users.c.user_id==Address.c.user_id, Addresses.c.city=='Boston')) })
Many to many relationships can be customized by one or both of primaryjoin and secondaryjoin, shown below with just the default many-to-many relationship explicitly set:
class User(object): pass class Keyword(object): pass Keyword.mapper = mapper(Keyword, keywords) User.mapper = mapper(User, users, properties={ 'keywords':relation(Keyword.mapper, primaryjoin=users.c.user_id==userkeywords.c.user_id, secondaryjoin=userkeywords.c.keyword_id==keywords.c.keyword_id ) })
The previous example leads in to the idea of joining against the same table multiple times. Below is a User object that has lists of its Boston and New York addresses, both lazily loaded when they are first accessed:
User.mapper = mapper(User, users, properties={ 'boston_addreses' : relation(Address.mapper, primaryjoin= and_(users.c.user_id==Address.c.user_id, Addresses.c.city=='Boston')), 'newyork_addresses' : relation(Address.mapper, primaryjoin= and_(users.c.user_id==Address.c.user_id, Addresses.c.city=='New York')), })
A complication arises with the above pattern if you want the relations to be eager loaded. Since there will be two separate joins to the addresses table during an eager load, an alias needs to be used to separate them. You can create an alias of the addresses table to separate them, but then you are in effect creating a brand new mapper for each property, unrelated to the main Address mapper, which can create problems with commit operations. So an additional argument use_alias can be used with an eager relationship to specify the alias to be used just within the eager query:
User.mapper = mapper(User, users, properties={ 'boston_addreses' : relation(Address.mapper, primaryjoin= and_(User.c.user_id==Address.c.user_id, Addresses.c.city=='Boston'), lazy=False, use_alias=True), 'newyork_addresses' : relation(Address.mapper, primaryjoin= and_(User.c.user_id==Address.c.user_id, Addresses.c.city=='New York'), lazy=False, use_alias=True), }) sqlu = User.mapper.select()
By default, mappers will not supply any ORDER BY clause when selecting rows. This can be modified in several ways.
A "default ordering" can be supplied by all mappers, by enabling the "default_ordering" flag to the engine, which indicates that table primary keys or object IDs should be used as the default ordering:
db = create_engine('postgres://username=scott&password=tiger', default_ordering=True)
The "order_by" parameter can be sent to a mapper, overriding the per-engine ordering if any. A value of None means that the mapper should not use any ordering, even if the engine's default_ordering property is True. A non-None value, which can be a column, an asc or desc clause, or an array of either one, indicates the ORDER BY clause that should be added to all select queries:
# disable all ordering mapper = mapper(User, users, order_by=None) # order by a column mapper = mapper(User, users, order_by=users.c.user_id) # order by multiple items mapper = mapper(User, users, order_by=[users.c.user_id, desc(users.c.user_name)])
"order_by" can also be specified to an individual select method, overriding all other per-engine/per-mapper orderings:
# order by a column l = mapper.select(users.c.user_name=='fred', order_by=users.c.user_id) # order by multiple criterion l = mapper.select(users.c.user_name=='fred', order_by=[users.c.user_id, desc(users.c.user_name)])
For relations, the "order_by" property can also be specified to all forms of relation:
# order address objects by address id mapper = mapper(User, users, properties = { 'addresses' : relation(mapper(Address, addresses), order_by=addresses.c.address_id) }) # eager load with ordering - the ORDER BY clauses of parent/child will be organized properly mapper = mapper(User, users, properties = { 'addresses' : relation(mapper(Address, addresses), order_by=desc(addresses.c.email_address), eager=True) }, order_by=users.c.user_id)
You can limit rows in a regular SQL query by specifying limit and offset. A Mapper can handle the same concepts:
class User(object): pass m = mapper(User, users) sqlr = m.select(limit=20, offset=10)
class User(object): pass class Address(object): pass m = mapper(User, users, properties={ 'addresses' : relation(mapper(Address, addresses), lazy=False) }) r = m.select(User.c.user_name.like('F%'), limit=20, offset=10)
The main WHERE clause as well as the limiting clauses are coerced into a subquery; this subquery represents the desired result of objects. A containing query, which handles the eager relationships, is joined against the subquery to produce the result.
When mappers are constructed, by default the column names in the Table metadata are used as the names of attributes on the mapped class. This can be customzed within the properties by stating the key/column combinations explicitly:
user_mapper = mapper(User, users, properties={ 'id' : users.c.user_id, 'name' : users.c.user_name, })
In the situation when column names overlap in a mapper against multiple tables, columns may be referenced together with a list:
# join users and addresses usersaddresses = sql.join(users, addresses, users.c.user_id == addresses.c.user_id) m = mapper(User, usersaddresses, properties = { 'id' : [users.c.user_id, addresses.c.user_id], } )
This feature allows particular columns of a table to not be loaded by default, instead being loaded later on when first referenced. It is essentailly "column-level lazy loading". This feature is useful when one wants to avoid loading a large text or binary field into memory when its not needed. Individual columns can be lazy loaded by themselves or placed into groups that lazy-load together.
book_excerpts = Table('books', db, Column('book_id', Integer, primary_key=True), Column('title', String(200), nullable=False), Column('summary', String(2000)), Column('excerpt', String), Column('photo', Binary) ) class Book(object): pass # define a mapper that will load each of 'excerpt' and 'photo' in # separate, individual-row SELECT statements when each attribute # is first referenced on the individual object instance book_mapper = mapper(Book, book_excerpts, properties = { 'excerpt' : deferred(book_excerpts.c.excerpt), 'photo' : deferred(book_excerpts.c.photo) })
Deferred columns can be placed into groups so that they load together:
book_excerpts = Table('books', db, Column('book_id', Integer, primary_key=True), Column('title', String(200), nullable=False), Column('summary', String(2000)), Column('excerpt', String), Column('photo1', Binary), Column('photo2', Binary), Column('photo3', Binary) ) class Book(object): pass # define a mapper with a 'photos' deferred group. when one photo is referenced, # all three photos will be loaded in one SELECT statement. The 'excerpt' will # be loaded separately when it is first referenced. book_mapper = mapper(Book, book_excerpts, properties = { 'excerpt' : deferred(book_excerpts.c.excerpt), 'photo1' : deferred(book_excerpts.c.photo1, group='photos'), 'photo2' : deferred(book_excerpts.c.photo2, group='photos'), 'photo3' : deferred(book_excerpts.c.photo3, group='photos') })
The options method of mapper, first introduced in Switching Lazy/Eager, No Load, supports the copying of a mapper into a new one, with any number of its relations replaced by new ones. The method takes a variable number of MapperOption objects which know how to change specific things about the mapper. The five available options are eagerload, lazyload, noload, deferred and extension.
An example of a mapper with a lazy load relationship, upgraded to an eager load relationship:
class User(object): pass class Address(object): pass # a 'lazy' relationship User.mapper = mapper(User, users, properties = { 'addreses':relation(mapper(Address, addresses), lazy=True) }) # copy the mapper and convert 'addresses' to be eager eagermapper = User.mapper.options(eagerload('addresses'))
The load options also can take keyword arguments that apply to the new relationship. To take the "double" address lazy relationship from the previous section and upgrade it to eager, adding the "selectalias" keywords as well:
m = User.mapper.options( eagerload('boston_addresses', selectalias='boston_ad'), eagerload('newyork_addresses', selectalias='newyork_ad') )
The defer and undefer options can control the deferred loading of attributes:
# set the 'excerpt' deferred attribute to load normally m = book_mapper.options(undefer('excerpt')) # set the referenced mapper 'photos' to defer its loading of the column 'imagedata' m = book_mapper.options(defer('photos.imagedata'))
Table Inheritance indicates the pattern where two tables, in a parent-child relationship, are mapped to an inheritance chain of classes. If a table "employees" contains additional information about managers in the table "managers", a corresponding object inheritance pattern would have an Employee class and a Manager class. Loading a Manager object means you are joining managers to employees. For SQLAlchemy, this pattern is just a special case of a mapper that maps against a joined relationship, and is provided via the inherits keyword.
class User(object): """a user object.""" pass User.mapper = mapper(User, users) class AddressUser(User): """a user object that also has the users mailing address.""" pass # define a mapper for AddressUser that inherits the User.mapper, and joins on the user_id column AddressUser.mapper = mapper( AddressUser, addresses, inherits=User.mapper ) items = AddressUser.mapper.select()
Above, the join condition is determined via the foreign keys between the users and the addresses table. To specify the join condition explicitly, use inherit_condition:
AddressUser.mapper = mapper( AddressUser, addresses, inherits=User.mapper, inherit_condition=users.c.user_id==addresses.c.user_id )
The more general case of the pattern described in "table inheritance" is a mapper that maps against more than one table. The join keyword from the SQL package creates a neat selectable unit comprised of multiple tables, complete with its own composite primary key, which can be passed in to a mapper as the table.
# a class class AddressUser(object): pass # define a Join j = join(users, addresses) # map to it - the identity of an AddressUser object will be # based on (user_id, address_id) since those are the primary keys involved m = mapper(AddressUser, j)
# many-to-many join on an association table j = join(users, userkeywords, users.c.user_id==userkeywords.c.user_id).join(keywords, userkeywords.c.keyword_id==keywords.c.keyword_id) # a class class KeywordUser(object): pass # map to it - the identity of a KeywordUser object will be # (user_id, keyword_id) since those are the primary keys involved m = mapper(KeywordUser, j)
Similar to mapping against a join, a plain select() object can be used with a mapper as well. Below, an example select which contains two aggregate functions and a group_by is mapped to a class:
s = select([customers, func.count(orders).label('order_count'), func.max(orders.price).label('highest_order')], customers.c.customer_id==orders.c.customer_id, group_by=[c for c in customers.c] ) class Customer(object): pass mapper = mapper(Customer, s)
Above, the "customers" table is joined against the "orders" table to produce a full row for each customer row, the total count of related rows in the "orders" table, and the highest price in the "orders" table, grouped against the full set of columns in the "customers" table. That query is then mapped against the Customer class. New instances of Customer will contain attributes for each column in the "customers" table as well as an "order_count" and "highest_order" attribute. Updates to the Customer object will only be reflected in the "customers" table and not the "orders" table. This is because the primary keys of the "orders" table are not represented in this mapper and therefore the table is not affected by save or delete operations.
By now it should be apparent that the mapper defined for a class is in no way the only mapper that exists for that class. Other mappers can be created at any time; either explicitly or via the options method, to provide different loading behavior.
However, its not as simple as that. The mapper serves a dual purpose; one is to generate select statements and load objects from executing those statements; the other is to keep track of the defined dependencies of that object when save and delete operations occur, and to extend the attributes of the object so that they store information about their history and communicate with the unit of work system. For this reason, it is a good idea to be aware of the behavior of multiple mappers. When creating dependency relationships between objects, one should insure that only the primary mappers are used in those relationships, else deep object traversal operations will fail to load in the expected properties, and update operations will not take all the dependencies into account.
Generally its as simple as, the first mapper that is defined for a particular class is the one that gets to define that classes' relationships to other mapped classes, and also decorates its attributes and constructors with special behavior. Any subsequent mappers created for that class will be able to load new instances, but object manipulation operations will still function via the original mapper. The special keyword is_primary will override this behavior, and make any mapper the new "primary" mapper.
class User(object): pass # mapper one - mark it as "primary", meaning this mapper will handle # saving and class-level properties m1 = mapper(User, users, is_primary=True) # mapper two - this one will also eager-load address objects in m2 = mapper(User, users, properties={ 'addresses' : relation(mapper(Address, addresses), lazy=False) }) # get a user. this user will not have an 'addreses' property u1 = m1.select(User.c.user_id==10) # get another user. this user will have an 'addreses' property. u2 = m2.select(User.c.user_id==27) # make some modifications, including adding an Address object. u1.user_name = 'jack' u2.user_name = 'jane' u2.addresses.append(Address('123 green street')) # upon commit, the User objects will be saved. # the Address object will not, since the primary mapper for User # does not have an 'addresses' relationship defined objectstore.commit()
Oftentimes it is necessary for two mappers to be related to each other. With a datamodel that consists of Users that store Addresses, you might have an Address object and want to access the "user" attribute on it, or have a User object and want to get the list of Address objects. The easiest way to do this is via the backreference keyword described in Useful Feature: Backreferences. Although even when backreferences are used, it is sometimes necessary to explicitly specify the relations on both mappers pointing to each other.
To achieve this involves creating the first mapper by itself, then creating the second mapper referencing the first, then adding references to the first mapper to reference the second:
class User(object): pass class Address(object): pass User.mapper = mapper(User, users) Address.mapper = mapper(Address, addresses, properties={ 'user':relation(User.mapper) }) User.mapper.add_property('addresses', relation(Address.mapper))
Note that with a circular relationship as above, you cannot declare both relationships as "eager" relationships, since that produces a circular query situation which will generate a recursion exception. So what if you want to load an Address and its User eagerly? Just make a second mapper using options:
eagermapper = Address.mapper.options(eagerload('user')) s = eagermapper.select(Address.c.address_id==12)
A self-referential mapper is a mapper that is designed to operate with an adjacency list table. This is a table that contains one or more foreign keys back to itself, and is usually used to create hierarchical tree structures. SQLAlchemy's default model of saving items based on table dependencies is not sufficient in this case, as an adjacency list table introduces dependencies between individual rows. Fortunately, SQLAlchemy will automatically detect a self-referential mapper and do the extra lifting to make it work.
# define a self-referential table trees = Table('treenodes', engine, Column('node_id', Integer, primary_key=True), Column('parent_node_id', Integer, ForeignKey('treenodes.node_id'), nullable=True), Column('node_name', String(50), nullable=False), ) # treenode class class TreeNode(object): pass # mapper defines "children" property, pointing back to TreeNode class, # with the mapper unspecified. it will point back to the primary # mapper on the TreeNode class. TreeNode.mapper = mapper(TreeNode, trees, properties={ 'children' : relation( TreeNode, private=True ), } ) # or, specify the circular relationship after establishing the original mapper: mymapper = mapper(TreeNode, trees) mymapper.add_property('children', relation( mymapper, private=True ))
This kind of mapper goes through a lot of extra effort when saving and deleting items, to determine the correct dependency graph of nodes within the tree.
A self-referential mapper where there is more than one relationship on the table requires that all join conditions be explicitly spelled out. Below is a self-referring table that contains a "parent_node_id" column to reference parent/child relationships, and a "root_node_id" column which points child nodes back to the ultimate root node:
# define a self-referential table with several relations trees = Table('treenodes', engine, Column('node_id', Integer, primary_key=True), Column('parent_node_id', Integer, ForeignKey('treenodes.node_id'), nullable=True), Column('root_node_id', Integer, ForeignKey('treenodes.node_id'), nullable=True), Column('node_name', String(50), nullable=False), ) # treenode class class TreeNode(object): pass # define the "children" property as well as the "root" property TreeNode.mapper = mapper(TreeNode, trees, properties={ 'children' : relation( TreeNode, primaryjoin=trees.c.parent_node_id==trees.c.node_id private=True ), 'root' : relation( TreeNode, primaryjoin=trees.c.root_node_id=trees.c.node_id, foreignkey=trees.c.node_id, uselist=False ) } )
The "root" property on a TreeNode is a many-to-one relationship. By default, a self-referential mapper declares relationships as one-to-many, so the extra parameter foreignkey, pointing to the "many" side of a relationship, is needed to indicate a "many-to-one" self-referring relationship.
Both TreeNode examples above are available in functional form in the examples/adjacencytree directory of the distribution.
Take any result set and feed it into a mapper to produce objects. Multiple mappers can be combined to retrieve unrelated objects from the same row in one step. The instances method on mapper takes a ResultProxy object, which is the result type generated from SQLEngine, and delivers object instances.
class User(object): pass User.mapper = mapper(User, users) # select users c = users.select().execute() # get objects userlist = User.mapper.instances(c)
# define a second class/mapper class Address(object): pass Address.mapper = mapper(Address, addresses) # select users and addresses in one query s = select([users, addresses], users.c.user_id==addresses.c.user_id) # execute it, and process the results with the User mapper, chained to the Address mapper r = User.mapper.instances(s.execute(), Address.mapper) # result rows are an array of objects, one for each mapper used for entry in r: user = r[0] address = r[1]
Mappers can have functionality augmented or replaced at many points in its execution via the usage of the MapperExtension class. This class is just a series of "hooks" where various functionality takes place. An application can make its own MapperExtension objects, overriding only the methods it needs.
class MapperExtension(object): def create_instance(self, mapper, row, imap, class_): """called when a new object instance is about to be created from a row. the method can choose to create the instance itself, or it can return None to indicate normal object creation should take place. mapper - the mapper doing the operation row - the result row from the database imap - a dictionary that is storing the running set of objects collected from the current result set class_ - the class we are mapping. """ def append_result(self, mapper, row, imap, result, instance, isnew, populate_existing=False): """called when an object instance is being appended to a result list. If it returns True, it is assumed that this method handled the appending itself. mapper - the mapper doing the operation row - the result row from the database imap - a dictionary that is storing the running set of objects collected from the current result set result - an instance of util.HistoryArraySet(), which may be an attribute on an object if this is a related object load (lazy or eager). use result.append_nohistory(value) to append objects to this list. instance - the object instance to be appended to the result isnew - indicates if this is the first time we have seen this object instance in the current result set. if you are selecting from a join, such as an eager load, you might see the same object instance many times in the same result set. populate_existing - usually False, indicates if object instances that were already in the main identity map, i.e. were loaded by a previous select(), get their attributes overwritten """ def before_insert(self, mapper, instance): """called before an object instance is INSERTed into its table. this is a good place to set up primary key values and such that arent handled otherwise.""" def after_insert(self, mapper, instance): """called after an object instance has been INSERTed""" def before_delete(self, mapper, instance): """called before an object instance is DELETEed"""
To use MapperExtension, make your own subclass of it and just send it off to a mapper:
mapper = mapper(User, users, extension=MyExtension())
An existing mapper can create a copy of itself using an extension via the extension option:
extended_mapper = mapper.options(extension(MyExtension()))
This section is a quick summary of what's going on when you send a class to the mapper() function. This material, not required to be able to use SQLAlchemy, is a little more dense and should be approached patiently!
The primary changes to a class that is mapped involve attaching property objects to it which represent table columns. These property objects essentially track changes. In addition, the __init__ method of the object is decorated to track object creates.
Here is a quick rundown of all the changes in code form:
# step 1 - override __init__ to 'register_new' with the Unit of Work oldinit = myclass.__init__ def init(self, *args, **kwargs): nohist = kwargs.pop('_mapper_nohistory', False) oldinit(self, *args, **kwargs) if not nohist: # register_new with Unit Of Work objectstore.uow().register_new(self) myclass.__init__ = init # step 2 - set a string identifier that will # locate the classes' primary mapper myclass._mapper = mapper.hashkey # step 3 - add column accessor myclass.c = mapper.columns # step 4 - attribute decorating. # this happens mostly within the package sqlalchemy.attributes # this dictionary will store a series of callables # that generate "history" containers for # individual object attributes myclass._class_managed_attributes = {} # create individual properties for each column - # these objects know how to talk # to the attribute package to create appropriate behavior. # the next example examines the attributes package more closely. myclass.column1 = SmartProperty().property('column1', uselist=False) myclass.column2 = SmartProperty().property('column2', uselist=True)
The attribute package is used when save operations occur to get a handle on modified values. In the example below, a full round-trip attribute tracking operation is illustrated:
import sqlalchemy.attributes as attributes # create an attribute manager. # the sqlalchemy.mapping package keeps one of these around as # 'objectstore.global_attributes' manager = attributes.AttributeManager() # regular old new-style class class MyClass(object): pass # register a scalar and a list attribute manager.register_attribute(MyClass, 'column1', uselist=False) manager.register_attribute(MyClass, 'column2', uselist=True) # create/modify an object obj = MyClass() obj.column1 = 'this is a new value' obj.column2.append('value 1') obj.column2.append('value 2') # get history objects col1_history = manager.get_history(obj, 'column1') col2_history = manager.get_history(obj, 'column2') # whats new ? >>> col1_history.added_items() ['this is a new value'] >>> col2_history.added_items() ['value1', 'value2'] # commit changes manager.commit(obj) # the new values become the "unchanged" values >>> col1_history.added_items() [] >>> col1_history.unchanged_items() ['this is a new value'] >>> col2_history.added_items() [] >>> col2_history.unchanged_items() ['value1', 'value2']
The above AttributeManager also includes a method value_changed which is triggered whenever change events occur on the managed object attributes. The Unit of Work (objectstore) package overrides this method in order to receive change events; its essentially this:
import sqlalchemy.attributes as attributes class UOWAttributeManager(attributes.AttributeManager): def value_changed(self, obj, key, value): if hasattr(obj, '_instance_key'): uow().register_dirty(obj) else: uow().register_new(obj) global_attributes = UOWAttributeManager()
Objects that contain the attribute "_instance_key" are already registered with the Identity Map, and are assumed to have come from the database. They therefore get marked as "dirty" when changes happen. Objects without an "_instance_key" are not from the database, and get marked as "new" when changes happen, although usually this will already have occured via the object's __init__ method.