The SQLAlchemy SQL Toolkit and Object Relational Mapper is a comprehensive set of tools for working with databases and Python. It has several distinct areas of functionality which can be used individually or combined together. Its major API components, all public-facing, are illustrated below:
+-----------------------------------------------------------+ | Object Relational Mapper (ORM) | | [tutorial] [docs] | +-----------------------------------------------------------+ +---------+ +------------------------------------+ +--------+ | | | SQL Expression Language | | | | | | [tutorial] [docs] | | | | | +------------------------------------+ | | | +-----------------------+ +--------------+ | | Dialect/Execution | | Schema Management | | [docs] | | [docs] | +---------------------------------+ +-----------------------+ +----------------------+ +----------------------------------+ | Connection Pooling | | Types | | [docs] | | [docs] | +----------------------+ +----------------------------------+
Above, the two most significant front-facing portions of SQLAlchemy are the Object Relational Mapper and the SQL Expression Language. These are two separate toolkits, one building off the other. SQL Expressions can be used independently of the ORM. When using the ORM, the SQL Expression language is used to establish object-relational configurations as well as in querying.
back to section topMetaData
and Table
objects; reading database schemas into your application, creating and dropping tables, constraints, defaults, sequences, indexes.
Installing SQLAlchemy from scratch is most easily achieved with setuptools. (setuptools installation). Just run this from the command-line:
# easy_install SQLAlchemy
This command will download the latest version of SQLAlchemy from the Python Cheese Shop and install it to your system.
Otherwise, you can install from the distribution using the setup.py
script:
# python setup.py install
SQLAlchemy is designed to operate with a DB-API implementation built for a particular database, and includes support for the most popular databases:
This documentation covers SQLAlchemy version 0.4. If you're working on a system that already has SQLAlchemy installed, check the version from your Python prompt like this:
>>> import sqlalchemy >>> sqlalchemy.__version__ 0.4.0
From version 0.3 to version 0.4 of SQLAlchemy, some conventions have changed. Most of these conventions are available in the most recent releases of the 0.3 series starting with version 0.3.9, so that you can make a 0.3 application compatible with 0.4 in most cases.
This section will detail only those things that have changed in a backwards-incompatible manner. For a full overview of everything that's new and changed, see WhatsNewIn04.
All symbols related to the SQLAlchemy Object Relational Mapper, i.e. names like mapper()
, relation()
, backref()
, create_session()
synonym()
, eagerload()
, etc. are now only in the sqlalchemy.orm
package, and not in sqlalchemy
. So if you were previously importing everything on an asterisk:
from sqlalchemy import *
You should now import separately from orm:
from sqlalchemy import * from sqlalchemy.orm import *
Or more commonly, just pull in the names you'll need:
from sqlalchemy import create_engine, MetaData, Table, Column, types from sqlalchemy.orm import mapper, relation, backref, create_session
The BoundMetaData
name is removed. Now, you just use MetaData
. Additionally, the engine
parameter/attribute is now called bind
, and connect()
is deprecated:
# plain metadata meta = MetaData() # metadata bound to an engine meta = MetaData(engine) # bind metadata to an engine later meta.bind = engine
Additionally, DynamicMetaData
is now known as ThreadLocalMetaData
.
The methods correlate()
, order_by()
, and group_by()
on the select()
construct now return a new select object, and do not change the original one. Additionally, the generative methods where()
, column()
, distinct()
, and several others have been added:
s = table.select().order_by(table.c.id).where(table.c.x==7) result = engine.execute(s)
If you've been using the collection_class
option on mapper()
, the requirements for instrumented collections have changed. For an overview, see Alternate Collection Implementations.
This is for create/drop statements, sessions, SQL constructs, metadatas:
myengine = create_engine('sqlite://') meta = MetaData(myengine) meta2 = MetaData() meta2.bind = myengine session = create_session(bind=myengine) statement = select([table], bind=myengine) meta.create_all(bind=myengine)
This mostly applies to SQL constructs where you pass a type in:
s = select([mytable], mytable.c.x=bindparam(y, type_=DateTime)) func.now(type_=DateTime)
In this tutorial we will cover a basic SQLAlchemy object-relational mapping scenario, where we store and retrieve Python objects from a database representation. The database schema will begin with one table, and will later develop into several. The tutorial is in doctest format, meaning each >>>
line represents something you can type at a Python command prompt, and the following text represents the expected return value. The tutorial has no prerequisites.
A quick check to verify that we are on at least version 0.4 of SQLAlchemy:
>>> import sqlalchemy >>> sqlalchemy.__version__ 0.4.0
For this tutorial we will use an in-memory-only SQLite database. This is an easy way to test things without needing to have an actual database defined anywhere. To connect we use create_engine()
:
>>> from sqlalchemy import create_engine >>> engine = create_engine('sqlite:///:memory:', echo=True)
The echo
flag is a shortcut to setting up SQLAlchemy logging, which is accomplished via Python's standard logging
module. With it enabled, we'll see all the generated SQL produced. If you are working through this tutorial and want less output generated, set it to False
. This tutorial will format the SQL behind a popup window so it doesn't get in our way; just click the "SQL" links to see whats being generated.
Next we want to tell SQLAlchemy about our tables. We will start with just a single table called users
, which will store records for the end-users using our application (lets assume its a website). We define our tables all within a catalog called MetaData
, using the Table
construct, which resembles regular SQL CREATE TABLE syntax:
>>> from sqlalchemy import Table, Column, Integer, String, MetaData, ForeignKey >>> metadata = MetaData() >>> users_table = Table('users', metadata, ... Column('id', Integer, primary_key=True), ... Column('name', String(40)), ... Column('fullname', String(100)), ... Column('password', String(15)) ... )
All about how to define Table
objects, as well as how to create them from an existing database automatically, is described in Database Meta Data.
Next, to tell the MetaData
we'd actually like to create our users_table
for real inside the SQLite database, we use create_all()
, passing it the engine
instance which points to our database. This will check for the presence of a table first before creating, so its safe to call multiple times:
sql>>> metadata.create_all(engine)
So now our database is created, our initial schema is present, and our SQLAlchemy application knows all about the tables and columns in the database; this information is to be re-used by the Object Relational Mapper, as we'll see now.
back to section topSo lets create a rudimental User
object to be mapped in the database. This object will for starters have three attributes, name
, fullname
and password
. It only need subclass Python's built-in object
class (i.e. its a new style class). We will give it a constructor so that it may conveniently be instantiated with its attributes at once, as well as a __repr__
method so that we can get a nice string representation of it:
>>> class User(object): ... def __init__(self, name, fullname, password): ... self.name = name ... self.fullname = fullname ... self.password = password ... ... def __repr__(self): ... return "<User('%s','%s', '%s')>" % (self.name, self.fullname, self.password)
With our users_table
and User
class, we now want to map the two together. That's where the SQLAlchemy ORM package comes in. We'll use the mapper
function to create a mapping between users_table
and User
:
>>> from sqlalchemy.orm import mapper >>> mapper(User, users_table) <sqlalchemy.orm.mapper.Mapper object at 0x...>
The mapper()
function creates a new Mapper
object and stores it away for future reference. It also instruments the attributes on our User
class, corresponding to the users_table
table. The id
, name
, fullname
, and password
columns in our users_table
are now instrumented upon our User
class, meaning it will keep track of all changes to these attributes, and can save and load their values to/from the database. Lets create our first user, 'Ed Jones', and ensure that the object has all three of these attributes:
>>> ed_user = User('ed', 'Ed Jones', 'edspassword') >>> ed_user.name 'ed' >>> ed_user.password 'edspassword' >>> str(ed_user.id) 'None'
What was that last id
attribute? That was placed there by the Mapper
, to track the value of the id
column in the users_table
. Since our User
doesn't exist in the database, it's id is None
. When we save the object, it will get populated automatically with its new id.
We're now ready to start talking to the database. The ORM's "handle" to the database is the Session
. When we first set up the application, at the same level as our create_engine()
statement, we define a second object called Session
(or whatever you want to call it, create_session
, etc.) which is configured by the sessionmaker()
function. This function is configurational and need only be called once.
>>> from sqlalchemy.orm import sessionmaker >>> Session = sessionmaker(bind=engine, autoflush=True, transactional=True)
In the case where your application does not yet have an Engine
when you define your module-level objects, just set it up like this:
>>> Session = sessionmaker(autoflush=True, transactional=True)
Later, when you create your engine with create_engine()
, connect it to the Session
using configure()
:
>>> Session.configure(bind=engine) # once engine is available
This Session
class will create new Session
objects which are bound to our database and have the transactional characteristics we've configured. Whenever you need to have a conversation with the database, you instantiate a Session
:
>>> session = Session()
The above Session
is associated with our SQLite engine
, but it hasn't opened any connections yet. When it's first used, it retrieves a connection from a pool of connections maintained by the engine
, and holds onto it until we commit all changes and/or close the session object. Because we configured transactional=True
, theres also a transaction in progress (one notable exception to this is MySQL, when you use its default table style of MyISAM). There's options available to modify this behavior but we'll go with this straightforward version to start.
So saving our User
is as easy as issuing save()
:
>>> session.save(ed_user)
But yo'll notice nothing has happened yet. Well, lets pretend something did, and try to query for our user. This is done using the query()
method on Session
. We create a new query representing the set of all User
objects first. Then we narrow the results by "filtering" down to the user we want; that is, the user whose name
attribute is "ed"
. Finally we call first()
which tells Query
, "we'd like the first result in this list".
sql>>> session.query(User).filter_by(name='ed').first()
<User('ed','Ed Jones', 'edspassword')>
And we get back our new user. If you view the generated SQL, yo'll see that the Session
issued an INSERT
statement before querying. The Session
stores whatever you put into it in memory, and at certain points it issues a flush, which issues SQL to the database to store all pending new objects and changes to existing objects. You can manually invoke the flush operation using flush()
; however when the Session
is configured to autoflush
, its usually not needed.
OK, let's do some more operations. We'll create and save three more users:
>>> session.save(User('wendy', 'Wendy Williams', 'foobar')) >>> session.save(User('mary', 'Mary Contrary', 'xxg527')) >>> session.save(User('fred', 'Fred Flinstone', 'blah'))
Also, Ed has already decided his password isn't too secure, so lets change it:
>>> ed_user.password = 'f8s7ccs'
Then we'll permanently store everything thats been changed and added to the database. We do this via commit()
:
sql>>> session.commit()
commit()
flushes whatever remaining changes remain to the database, and commits the transaction. The connection resources referenced by the session are now returned to the connection pool. Subsequent operations with this session will occur in a new transaction, which will again re-acquire connection resources when first needed.
If we look at Ed's id
attribute, which earlier was None
, it now has a value:
>>> ed_user.id 1
After each INSERT
operation, the Session
assigns all newly generated ids and column defaults to the mapped object instance. For column defaults which are database-generated and are not part of the table's primary key, they'll be loaded when you first reference the attribute on the instance.
One crucial thing to note about the Session
is that each object instance is cached within the Session, based on its primary key identitifer. The reason for this cache is not as much for performance as it is for maintaining an identity map of instances. This map guarantees that whenever you work with a particular User
object in a session, you always get the same instance back. As below, reloading Ed gives us the same instance back:
sql>>> ed_user is session.query(User).filter_by(name='ed').one()
True
The get()
method, which queries based on primary key, will not issue any SQL to the database if the given key is already present:
>>> ed_user is session.query(User).get(ed_user.id) True
A whirlwind tour through querying.
A Query
is created from the Session
, relative to a particular class we wish to load.
>>> query = session.query(User)
Once we have a query, we can start loading objects. The Query object, when first created, represents all the instances of its main class. You can iterate through it directly:
sql>>> for user in session.query(User): ... print user.name
ed wendy mary fred
...and the SQL will be issued at the point where the query is evaluated as a list. If you apply array slices before iterating, LIMIT and OFFSET are applied to the query:
sql>>> for u in session.query(User)[1:3]: ... print u
<User('wendy','Wendy Williams', 'foobar')> <User('mary','Mary Contrary', 'xxg527')>
Narrowing the results down is accomplished either with filter_by()
, which uses keyword arguments:
sql>>> for user in session.query(User).filter_by(name='ed', fullname='Ed Jones'): ... print user
<User('ed','Ed Jones', 'f8s7ccs')>
...or filter()
, which uses SQL expression language constructs. These allow you to use regular Python operators with the class-level attributes on your mapped class:
sql>>> for user in session.query(User).filter(User.name=='ed'): ... print user
<User('ed','Ed Jones', 'f8s7ccs')>
You can also use the Column
constructs attached to the users_table
object to construct SQL expressions:
sql>>> for user in session.query(User).filter(users_table.c.name=='ed'): ... print user
<User('ed','Ed Jones', 'f8s7ccs')>
Most common SQL operators are available, such as LIKE
:
sql>>> session.query(User).filter(User.name.like('%ed'))[1]
<User('fred','Fred Flinstone', 'blah')>
Note above our array index of 1
placed the appropriate LIMIT/OFFSET and returned a scalar result immediately.
The all()
, one()
, and first()
methods immediately issue SQL without using an iterative context or array index. all()
returns a list:
>>> query = session.query(User).filter(User.name.like('%ed')) sql>>> query.all()
[<User('ed','Ed Jones', 'f8s7ccs')>, <User('fred','Fred Flinstone', 'blah')>]
first()
applies a limit of one and returns the first result as a scalar:
sql>>> query.first()
<User('ed','Ed Jones', 'f8s7ccs')>
and one()
, applies a limit of two, and if not exactly one row returned (no more, no less), raises an error:
sql>>> try: ... user = query.one() ... except Exception, e: ... print e
Multiple rows returned for one()
All Query
methods that don't return a result instead return a new Query
object, with modifications applied. Therefore you can call many query methods successively to build up the criterion you want:
sql>>> session.query(User).filter(User.id<2).filter_by(name='ed').\ ... filter(User.fullname=='Ed Jones').all()
[<User('ed','Ed Jones', 'f8s7ccs')>]
If you need to use other conjunctions besides AND
, all SQL conjunctions are available explicitly within expressions, such as and_()
and or_()
, when using filter()
:
>>> from sqlalchemy import and_, or_ sql>>> session.query(User).filter( ... and_(User.id<224, or_(User.name=='ed', User.name=='wendy')) ... ).all()
[<User('ed','Ed Jones', 'f8s7ccs')>, <User('wendy','Wendy Williams', 'foobar')>]
You also have full ability to use literal strings to construct SQL. For a single criterion, use a string with filter()
:
sql>>> for user in session.query(User).filter("id<224").all(): ... print user.name
ed wendy mary fred
Bind parameters can be specified with string-based SQL, using a colon. To specify the values, use the params()
method:
sql>>> session.query(User).filter("id<:value and name=:name").\ ... params(value=224, name='fred').one()
<User('fred','Fred Flinstone', 'blah')>
Note that when we use constructed SQL expressions, bind parameters are generated for us automatically; we don't need to worry about them.
To use an entirely string-based statement, using from_statement()
; just ensure that the columns clause of the statement contains the column names normally used by the mapper (below illustrated using an asterisk):
sql>>> session.query(User).from_statement("SELECT * FROM users where name=:name").params(name='ed').all()
[<User('ed','Ed Jones', 'f8s7ccs')>]
from_statement()
can also accomodate full select()
constructs. These are described in the SQL Expression Language Tutorial:
>>> from sqlalchemy import select, func sql>>> session.query(User).from_statement( ... select( ... [users_table], ... select([func.max(users_table.c.name)]).label('maxuser')==users_table.c.name) ... ).all()
[<User('wendy','Wendy Williams', 'foobar')>]
There's also a way to combine scalar results with objects, using add_column()
. This is often used for functions and aggregates. When add_column()
(or its cousin add_entity()
, described later) is used, tuples are returned:
sql>>> for r in session.query(User).\ ... add_column(select([func.max(users_table.c.name)]).label('maxuser')): ... print r
(<User('ed','Ed Jones', 'f8s7ccs')>, u'wendy') (<User('wendy','Wendy Williams', 'foobar')>, u'wendy') (<User('mary','Mary Contrary', 'xxg527')>, u'wendy') (<User('fred','Fred Flinstone', 'blah')>, u'wendy')
We've spent a lot of time dealing with just one class, and one table. Let's now look at how SQLAlchemy deals with two tables, which have a relationship to each other. Let's say that the users in our system also can store any number of email addresses associated with their username. This implies a basic one to many association from the users_table
to a new table which stores email addresess, which we will call addresses
. We will also create a relationship between this new table to the users table, using a ForeignKey
:
>>> from sqlalchemy import ForeignKey >>> addresses_table = Table('addresses', metadata, ... Column('id', Integer, primary_key=True), ... Column('email_address', String(100), nullable=False), ... Column('user_id', Integer, ForeignKey('users.id')))
Another call to create_all()
will skip over our users
table and build just the new addresses
table:
sql>>> metadata.create_all(engine)
For our ORM setup, we're going to start all over again. We will first close out our Session
and clear all Mapper
objects:
>>> from sqlalchemy.orm import clear_mappers >>> session.close() >>> clear_mappers()
Our User
class, still around, reverts to being just a plain old class. Lets create an Address
class to represent a user's email address:
>>> class Address(object): ... def __init__(self, email_address): ... self.email_address = email_address ... ... def __repr__(self): ... return "<Address('%s')>" % self.email_address
Now comes the fun part. We define a mapper for each class, and associate them using a function called relation()
. We can define each mapper in any order we want:
>>> from sqlalchemy.orm import relation >>> mapper(User, users_table, properties={ ... 'addresses':relation(Address, backref='user') ... }) <sqlalchemy.orm.mapper.Mapper object at 0x...> >>> mapper(Address, addresses_table) <sqlalchemy.orm.mapper.Mapper object at 0x...>
Above, the new thing we see is that User
has defined a relation named addresses
, which will reference a list of Address
objects. How does it know it's a list ? SQLAlchemy figures it out for you, based on the foreign key relationship between users_table
and addresses_table
.
Now when we create a User
, it automatically has this collection present:
>>> jack = User('jack', 'Jack Bean', 'gjffdd') >>> jack.addresses []
We are free to add Address
objects, and the session
will take care of everything for us.
>>> jack.addresses.append(Address(email_address='jack@google.com')) >>> jack.addresses.append(Address(email_address='j25@yahoo.com'))
Before we save into the Session
, lets examine one other thing that's happened here. The addresses
collection is present on our User
because we added a relation()
with that name. But also within the relation()
function is the keyword backref
. This keyword indicates that we wish to make a bi-directional relationship. What this basically means is that not only did we generate a one-to-many relationship called addresses
on the User
class, we also generated a many-to-one relationship on the Address
class. This relationship is self-updating, without any data being flushed to the database, as we can see on one of Jack's addresses:
>>> jack.addresses[1] <Address('j25@yahoo.com')> >>> jack.addresses[1].user <User('jack','Jack Bean', 'gjffdd')>
Let's save into the session, then close out the session and create a new one...so that we can see how Jack
and his email addresses come back to us:
>>> session.save(jack) sql>>> session.commit()
>>> session = Session()
Querying for Jack, we get just Jack back. No SQL is yet issued for for Jack's addresses:
sql>>> jack = session.query(User).filter_by(name='jack').one()
>>> jack <User('jack','Jack Bean', 'gjffdd')>
Let's look at the addresses
collection. Watch the SQL:
sql>>> jack.addresses
[<Address('jack@google.com')>, <Address('j25@yahoo.com')>]
When we accessed the addresses
collection, SQL was suddenly issued. This is an example of a lazy loading relation.
If you want to reduce the number of queries (dramatically, in many cases), we can apply an eager load to the query operation. We clear out the session to ensure that a full reload occurs:
>>> session.clear()
Then apply an option to the query, indicating that we'd like addresses
to load "eagerly". SQLAlchemy then constructs a join between the users
and addresses
tables:
>>> from sqlalchemy.orm import eagerload sql>>> jack = session.query(User).options(eagerload('addresses')).filter_by(name='jack').one()
>>> jack <User('jack','Jack Bean', 'gjffdd')> >>> jack.addresses [<Address('jack@google.com')>, <Address('j25@yahoo.com')>]
If you think that query is elaborate, it is ! But SQLAlchemy is just getting started. Note that when using eager loading, nothing changes as far as the ultimate results returned. The "loading strategy", as its called, is designed to be completely transparent in all cases, and is for optimization purposes only. Any query criterion you use to load objects, including ordering, limiting, other joins, etc., should return identical results regardless of the combination of lazily- and eagerly- loaded relationships present.
back to section topWhich brings us to the next big topic. What if we want to create joins that do change the results ? For that, another Query
tornado is coming....
One way to join two tables together is just to compose a SQL expression. Below we make one up using the id
and user_id
attributes on our mapped classes:
sql>>> session.query(User).filter(User.id==Address.user_id).\ ... filter(Address.email_address=='jack@google.com').all()
[<User('jack','Jack Bean', 'gjffdd')>]
Or we can make a real JOIN construct; below we use the join()
function available on Table
to create a Join
object, then tell the Query
to use it as our FROM clause:
sql>>> session.query(User).select_from(users_table.join(addresses_table)).\ ... filter(Address.email_address=='jack@google.com').all()
[<User('jack','Jack Bean', 'gjffdd')>]
Note that the join()
construct has no problem figuring out the correct join condition between users_table
and addresses_table
..the ForeignKey
we constructed says it all.
The easiest way to join is automatically, using the join()
method on Query
. Just give this method the path from A to B, using the name of a mapped relationship directly:
sql>>> session.query(User).join('addresses').\ ... filter(Address.email_address=='jack@google.com').all()
[<User('jack','Jack Bean', 'gjffdd')>]
By "A to B", we mean a single relation name or a path of relations. In our case we only have User->addresses->Address
configured, but if we had a setup like A->bars->B->bats->C->widgets->D
, a join along all four entities would look like:
session.query(Foo).join(['bars', 'bats', 'widgets']).filter(...)
Each time join()
is called on Query
, the joinpoint of the query is moved to be that of the endpoint of the join. As above, when we joined from users_table
to addresses_table
, all subsequent criterion used by filter_by()
are against the addresses
table. When you join()
again, the joinpoint starts back from the root. We can also backtrack to the beginning explicitly using reset_joinpoint()
. This instruction will place the joinpoint back at the root users
table, where subsequent filter_by()
criterion are again against users
:
sql>>> session.query(User).join('addresses').\ ... filter_by(email_address='jack@google.com').\ ... reset_joinpoint().filter_by(name='jack').all()
[<User('jack','Jack Bean', 'gjffdd')>]
In all cases, we can get the User
and the matching Address
objects back at the same time, by telling the session we want both. This returns the results as a list of tuples:
sql>>> session.query(User).add_entity(Address).join('addresses').\ ... filter(Address.email_address=='jack@google.com').all()
[(<User('jack','Jack Bean', 'gjffdd')>, <Address('jack@google.com')>)]
Another common scenario is the need to join on the same table more than once. For example, if we want to find a User
who has two distinct email addresses, both jack@google.com
as well as j25@yahoo.com
, we need to join to the Addresses
table twice. SQLAlchemy does provide Alias
objects which can accomplish this; but far easier is just to tell join()
to alias for you:
sql>>> session.query(User).\ ... join('addresses', aliased=True).filter(Address.email_address=='jack@google.com').\ ... join('addresses', aliased=True).filter(Address.email_address=='j25@yahoo.com').all()
[<User('jack','Jack Bean', 'gjffdd')>]
The key thing which occured above is that our SQL criterion were aliased as appropriate corresponding to the alias generated in the most recent join()
call.
The next section describes some "higher level" operators, including any()
and has()
, which make patterns like joining to multiple aliases unnecessary in most cases.
A summary of all operators usable on relations:
Filter on explicit column criterion, combined with a join. Column criterion can make usage of all supported SQL operators and expression constructs:
sql>>> session.query(User).join('addresses').\ ... filter(Address.email_address=='jack@google.com').all()
[<User('jack','Jack Bean', 'gjffdd')>]
Criterion placed in filter()
usually correspond to the last join()
call; if the join was specified with aliased=True
, class-level criterion against the join's target (or targets) will be appropriately aliased as well.
sql>>> session.query(User).join('addresses', aliased=True).\ ... filter(Address.email_address=='jack@google.com').all()
[<User('jack','Jack Bean', 'gjffdd')>]
Filter_by on key=value criterion, combined with a join. Same as filter()
on column criterion except keyword arguments are used.
sql>>> session.query(User).join('addresses').\ ... filter_by(email_address='jack@google.com').all()
[<User('jack','Jack Bean', 'gjffdd')>]
Filter on explicit column criterion using any()
(for collections) or has()
(for scalar relations). This is a more succinct method than joining, as an EXISTS
subquery is generated automatically. any()
means, "find all parent items where any child item of its collection meets this criterion":
sql>>> session.query(User).\ ... filter(User.addresses.any(Address.email_address=='jack@google.com')).all()
[<User('jack','Jack Bean', 'gjffdd')>]
has()
means, "find all parent items where the child item meets this criterion":
sql>>> session.query(Address).\ ... filter(Address.user.has(User.name=='jack')).all()
[<Address('jack@google.com')>, <Address('j25@yahoo.com')>]
Both has()
and any()
also accept keyword arguments which are interpreted against the child classes' attributes:
sql>>> session.query(User).\ ... filter(User.addresses.any(email_address='jack@google.com')).all()
[<User('jack','Jack Bean', 'gjffdd')>]
Filter_by on instance identity criterion. When comparing to a related instance, filter_by()
will in most cases not need to reference the child table, since a child instance already contains enough information with which to generate criterion against the parent table. filter_by()
uses an equality comparison for all relationship types. For many-to-one and one-to-one, this represents all objects which reference the given child object:
# locate a user sql>>> user = session.query(User).filter(User.name=='jack').one()
# use the user in a filter_by() expression sql>>> session.query(Address).filter_by(user=user).all()
[<Address('jack@google.com')>, <Address('j25@yahoo.com')>]
For one-to-many and many-to-many, it represents all objects which contain the given child object in the related collection:
Select instances with a particular parent. This is the "reverse" operation of filtering by instance identity criterion; the criterion is against a relation pointing to the desired class, instead of one pointing from it. This will utilize the same "optimized" query criterion, usually not requiring any joins:
sql>>> session.query(Address).with_parent(user, property='addresses').all()
[<Address('jack@google.com')>, <Address('j25@yahoo.com')>]
Filter on a many-to-one/one-to-one instance identity criterion. The class-level ==
operator will act the same as filter_by()
for a scalar relation:
sql>>> session.query(Address).filter(Address.user==user).all()
[<Address('jack@google.com')>, <Address('j25@yahoo.com')>]
whereas the !=
operator will generate a negated EXISTS clause:
sql>>> session.query(Address).filter(Address.user!=user).all()
[]
a comparison to None
also generates a negated EXISTS clause:
sql>>> session.query(Address).filter(Address.user==None).all()
[]
Filter on a one-to-many instance identity criterion. The contains()
operator returns all parent objects which contain the given object as one of its collection members:
sql>>> session.query(User).filter(User.addresses.contains(address)).all()
[<User('jack','Jack Bean', 'gjffdd')>]
Filter on a multiple one-to-many instance identity criterion. The ==
operator can be used with a collection-based attribute against a list of items, which will generate multiple EXISTS
clauses:
Let's try to delete jack
and see how that goes. We'll mark as deleted in the session, then we'll issue a count
query to see that no rows remain:
>>> session.delete(jack) sql>>> session.query(User).filter_by(name='jack').count()
0
So far, so good. How about Jack's Address
objects ?
sql>>> session.query(Address).filter( ... Address.email_address.in_(['jack@google.com', 'j25@yahoo.com']) ... ).count()
2
Uh oh, they're still there ! Anaylzing the flush SQL, we can see that the user_id
column of each addresss was set to NULL, but the rows weren't deleted. SQLAlchemy doesn't assume that deletes cascade, you have to tell it so.
So let's rollback our work, and start fresh with new mappers that express the relationship the way we want:
sql>>> session.rollback() # roll back the transaction
>>> session.clear() # clear the session >>> clear_mappers() # clear mappers
We need to tell the addresses
relation on User
that we'd like session.delete() operations to cascade down to the child Address
objects. Further, we also want Address
objects which get detached from their parent User
, whether or not the parent is deleted, to be deleted. For these behaviors we use two cascade options delete
and delete-orphan
, using the string-based cascade
option to the relation()
function:
>>> mapper(User, users_table, properties={ ... 'addresses':relation(Address, backref='user', cascade="all, delete, delete-orphan") ... }) <sqlalchemy.orm.mapper.Mapper object at 0x...> >>> mapper(Address, addresses_table) <sqlalchemy.orm.mapper.Mapper object at 0x...>
Now when we load Jack, removing an address from his addresses
collection will result in that Address
being deleted:
# load Jack by primary key sql>>> jack = session.query(User).get(jack.id)
# remove one Address (lazy load fires off) sql>>> del jack.addresses[1]
# only one address remains sql>>> session.query(Address).filter( ... Address.email_address.in_(['jack@google.com', 'j25@yahoo.com']) ... ).count()
1
Deleting Jack will delete both Jack and his remaining Address
:
>>> session.delete(jack) sql>>> session.commit()
sql>>> session.query(User).filter_by(name='jack').count()
0 sql>>> session.query(Address).filter( ... Address.email_address.in_(['jack@google.com', 'j25@yahoo.com']) ... ).count()
0
We're moving into the bonus round here, but lets show off a many-to-many relationship. We'll sneak in some other features too, just to take a tour. We'll make our application a blog application, where users can write BlogPost
s, which have Keywords
associated with them.
First some new tables:
>>> post_table = Table('posts', metadata, ... Column('id', Integer, primary_key=True), ... Column('user_id', Integer, ForeignKey('users.id')), ... Column('headline', String(255), nullable=False), ... Column('body', String) ... ) >>> post_keywords = Table('post_keywords', metadata, ... Column('post_id', Integer, ForeignKey('posts.id')), ... Column('keyword_id', Integer, ForeignKey('keywords.id'))) >>> keywords_table = Table('keywords', metadata, ... Column('id', Integer, primary_key=True), ... Column('keyword', String(50), nullable=False, unique=True)) sql>>> metadata.create_all(engine)
Then some classes:
>>> class BlogPost(object): ... def __init__(self, headline, body, author): ... self.author = author ... self.headline = headline ... self.body = body ... def __repr__(self): ... return "BlogPost(%r, %r, %r)" % (self.headline, self.body, self.author) >>> class Keyword(object): ... def __init__(self, keyword): ... self.keyword = keyword
And the mappers. BlogPost
will reference User
via its author
attribute:
>>> from sqlalchemy.orm import backref >>> mapper(Keyword, keywords_table) <sqlalchemy.orm.mapper.Mapper object at 0x...> >>> mapper(BlogPost, post_table, properties={ ... 'author':relation(User, backref=backref('posts', lazy='dynamic')), ... 'keywords':relation(Keyword, secondary=post_keywords) ... }) <sqlalchemy.orm.mapper.Mapper object at 0x...>
There's three new things in the above mapper:
User
relation has a backref, like we've used before, except this time it references a function called backref()
. This function is used when yo'd like to specify keyword options for the backwards relationship.
backref()
is lazy="dynamic"
. This sets a default loader strategy on the attribute, in this case a special strategy that allows partial loading of results.
keywords
relation uses a keyword argument secondary
to indicate the association table for the many to many relationship from BlogPost
to Keyword
.
Usage is not too different from what we've been doing. Let's give Wendy some blog posts:
sql>>> wendy = session.query(User).filter_by(name='wendy').one()
>>> post = BlogPost("Wendy's Blog Post", "This is a test", wendy) >>> session.save(post)
We're storing keywords uniquely in the database, but we know that we don't have any yet, so we can just create them:
>>> post.keywords.append(Keyword('wendy')) >>> post.keywords.append(Keyword('firstpost'))
We can now look up all blog posts with the keyword 'firstpost'. We'll use a special collection operator any
to locate "blog posts where any of its keywords has the keyword string 'firstpost'":
sql>>> session.query(BlogPost).filter(BlogPost.keywords.any(keyword='firstpost')).all()
[BlogPost("Wendy's Blog Post", 'This is a test', <User('wendy','Wendy Williams', 'foobar')>)]
If we want to look up just Wendy's posts, we can tell the query to narrow down to her as a parent:
sql>>> session.query(BlogPost).with_parent(wendy).\ ... filter(BlogPost.keywords.any(keyword='firstpost')).all()
[BlogPost("Wendy's Blog Post", 'This is a test', <User('wendy','Wendy Williams', 'foobar')>)]
Or we can use Wendy's own posts
relation, which is a "dynamic" relation, to query straight from there:
sql>>> wendy.posts.filter(BlogPost.keywords.any(keyword='firstpost')).all()
[BlogPost("Wendy's Blog Post", 'This is a test', <User('wendy','Wendy Williams', 'foobar')>)]
Generated Documentation for Query: class Query(object)
ORM Generated Docs: module sqlalchemy.orm
Further information on mapping setups are in Mapper Configuration.
Further information on working with Sessions: Using the Session.
back to section topThis tutorial will cover SQLAlchemy SQL Expressions, which are Python constructs that represent SQL statements. The tutorial is in doctest format, meaning each >>>
line represents something you can type at a Python command prompt, and the following text represents the expected return value. The tutorial has no prerequisites.
A quick check to verify that we are on at least version 0.4 of SQLAlchemy:
>>> import sqlalchemy >>> sqlalchemy.__version__ 0.4.0
For this tutorial we will use an in-memory-only SQLite database. This is an easy way to test things without needing to have an actual database defined anywhere. To connect we use create_engine()
:
>>> from sqlalchemy import create_engine >>> engine = create_engine('sqlite:///:memory:', echo=True)
The echo
flag is a shortcut to setting up SQLAlchemy logging, which is accomplished via Python's standard logging
module. With it enabled, we'll see all the generated SQL produced. If you are working through this tutorial and want less output generated, set it to False
. This tutorial will format the SQL behind a popup window so it doesn't get in our way; just click the "SQL" links to see whats being generated.
The SQL Expression Language constructs its expressions in most cases against table columns. In SQLAlchemy, a column is most often represented by an object called Column
, and in all cases a Column
is associated with a Table
. A collection of Table
objects and their associated child objects is referred to as database metadata. In this tutorial we will explicitly lay out several Table
objects, but note that SA can also "import" whole sets of Table
objects automatically from an existing database (this process is called table reflection).
We define our tables all within a catalog called MetaData
, using the Table
construct, which resembles regular SQL CREATE TABLE statements. We'll make two tables, one of which represents "users" in an application, and another which represents zero or more "email addreses" for each row in the "users" table:
>>> from sqlalchemy import Table, Column, Integer, String, MetaData, ForeignKey >>> metadata = MetaData() >>> users = Table('users', metadata, ... Column('id', Integer, primary_key=True), ... Column('name', String(40)), ... Column('fullname', String(100)), ... ) >>> addresses = Table('addresses', metadata, ... Column('id', Integer, primary_key=True), ... Column('user_id', None, ForeignKey('users.id')), ... Column('email_address', String(50), nullable=False) ... )
All about how to define Table
objects, as well as how to create them from an existing database automatically, is described in Database Meta Data.
Next, to tell the MetaData
we'd actually like to create our selection of tables for real inside the SQLite database, we use create_all()
, passing it the engine
instance which points to our database. This will check for the presence of each table first before creating, so its safe to call multiple times:
sql>>> metadata.create_all(engine)
The first SQL expression we'll create is the Insert
construct, which represents an INSERT statement. This is typically created relative to its target table:
>>> ins = users.insert()
To see a sample of the SQL this construct produces, use the str()
function:
>>> str(ins) 'INSERT INTO users (id, name, fullname) VALUES (:id, :name, :fullname)'
Notice above that the INSERT statement names every column in the users
table. This can be limited by using the values
keyword, which establishes the VALUES clause of the INSERT explicitly:
>>> ins = users.insert(values={'name':'jack', 'fullname':'Jack Jones'}) >>> str(ins) 'INSERT INTO users (name, fullname) VALUES (:name, :fullname)'
Above, while the values
keyword limited the VALUES clause to just two columns, the actual data we placed in values
didn't get rendered into the string; instead we got named bind parameters. As it turns out, our data is stored within our Insert
construct, but it typically only comes out when the statement is actually executed; since the data consists of literal values, SQLAlchemy automatically generates bind parameters for them. We can peek at this data for now by looking at the compiled form of the statement:
>>> ins.compile().params ClauseParameters:{'fullname': 'Jack Jones', 'name': 'jack'}
The interesting part of an Insert
is executing it. In this tutorial, we will generally focus on the most explicit method of executing a SQL construct, and later touch upon some "shortcut" ways to do it. The engine
object we created is a repository for database connections capable of issuing SQL to the database. To acquire a connection, we use the connect()
method:
>>> conn = engine.connect() >>> conn <sqlalchemy.engine.base.Connection object at 0x...>
The Connection
object represents an actively checked out DBAPI connection resource. Lets feed it our Insert
object and see what happens:
>>> result = conn.execute(ins)
So the INSERT statement was now issued to the database. Although we got positional "qmark" bind parameters instead of "named" bind params in the output. How come ? Because when executed, the Connection
used the SQLite dialect to help generate the statement; when we use the str()
function, the statement isn't aware of this dialect, and falls back onto a default which uses named params. We can view this manually as follows:
>>> from sqlalchemy.databases.sqlite import SQLiteDialect >>> compiled = ins.compile(dialect=SQLiteDialect()) >>> str(compiled) 'INSERT INTO users (name, fullname) VALUES (?, ?)'
What about the result
variable we got when we called execute()
? As the SQLAlchemy Connection
object references a DBAPI connection, the result, known as a ResultProxy
object, is analgous to the DBAPI cursor object. In the case of an INSERT, we can get important information from it, such as the primary key values which were generated from our statement:
>>> result.last_inserted_ids() [1]
The value of 1
was automatically generated by SQLite, but only because we did not specify the id
column in our Insert
statement; otherwise, our explicit value would have been used. In either case, SQLAlchemy always knows how to get at a newly generated primary key value, even though the method of generating them is different across different databases; each databases' Dialect
knows the specific steps needed to determine the correct value (or values; note that last_inserted_ids()
returns a list so that it supports composite primary keys).
Our insert example above was intentionally a little drawn out to show some various behaviors of expression language constructs. In the usual case, an Insert
statement is usually compiled against the parameters sent to the execute()
method on Connection
, so that there's no need to use the values
keyword with Insert
. Lets create a generic Insert
statement again and use it in the "normal" way:
>>> ins = users.insert() >>> conn.execute(ins, id=2, name='wendy', fullname='Wendy Williams')
<sqlalchemy.engine.base.ResultProxy object at 0x...>
Above, because we specified all three columns in the the execute()
method, the compiled Insert
included all three columns. The Insert
statement is compiled at execution time based on the parameters we specified; if we specified fewer parameters, the Insert
would have fewer entries in its VALUES clause.
To issue many inserts using DBAPI's executemany()
method, we can send in a list of dictionaries each containing a distinct set of parameters to be inserted, as we do here to add some email addresses:
>>> conn.execute(addresses.insert(), [ ... {'user_id': 1, 'email_address' : 'jack@yahoo.com'}, ... {'user_id': 1, 'email_address' : 'jack@msn.com'}, ... {'user_id': 2, 'email_address' : 'www@www.org'}, ... {'user_id': 2, 'email_address' : 'wendy@aol.com'}, ... ])
<sqlalchemy.engine.base.ResultProxy object at 0x...>
Above, we again relied upon SQLite's automatic generation of primary key identifiers for each addresses
row.
When executing multiple sets of parameters, each dictionary must have the same set of keys; i.e. you cant have fewer keys in some dictionaries than others. This is because the Insert
statement is compiled against the first dictionary in the list, and its assumed that all subsequent argument dictionaries are compatible with that statement.
We're executing our Insert
using a Connection
. There's two options that allow you to not have to deal with the connection part. You can execute in the connectionless style, using the engine, which opens and closes a connection for you:
sql>>> result = engine.execute(users.insert(), name='fred', fullname="Fred Flintstone")
and you can save even more steps than that, if you connect the Engine
to the MetaData
object we created earlier. When this is done, all SQL expressions which involve tables within the MetaData
object will be automatically bound to the Engine
. In this case, we call it implicit execution:
>>> metadata.bind = engine sql>>> result = users.insert().execute(name="mary", fullname="Mary Contrary")
When the MetaData
is bound, statements will also compile against the engine's dialect. Since a lot of the examples here assume the default dialect, we'll detach the engine from the metadata which we just attached:
>>> metadata.bind = None
Detailed examples of connectionless and implicit execution are available in the "Engines" chapter: Connectionless Execution, Implicit Execution.
back to section topWe began with inserts just so that our test database had some data in it. The more interesting part of the data is selecting it ! We'll cover UPDATE and DELETE statements later. The primary construct used to generate SELECT statements is the select()
function:
>>> from sqlalchemy.sql import select >>> s = select([users]) >>> result = conn.execute(s)
Above, we issued a basic select()
call, placing the users
table within the COLUMNS clause of the select, and then executing. SQLAlchemy expanded the users
table into the set of each of its columns, and also generated a FROM clause for us. The result returned is again a ResultProxy
object, which acts much like a DBAPI cursor, including methods such as fetchone()
and fetchall()
. The easiest way to get rows from it is to just iterate:
>>> for row in result: ... print row (1, u'jack', u'Jack Jones') (2, u'wendy', u'Wendy Williams') (3, u'fred', u'Fred Flintstone') (4, u'mary', u'Mary Contrary')
Above, we see that printing each row produces a simple tuple-like result. We have more options at accessing the data in each row. One very common way is through dictionary access, using the string names of columns:
sql>>> result = conn.execute(s)
>>> row = result.fetchone() >>> print "name:", row['name'], "; fullname:", row['fullname'] name: jack ; fullname: Jack Jones
Integer indexes work as well:
>>> row = result.fetchone() >>> print "name:", row[1], "; fullname:", row[2] name: wendy ; fullname: Wendy Williams
But another way, whose usefulness will become apparent later on, is to use the Column
objects directly as keys:
sql>>> for row in conn.execute(s): ... print "name:", row[users.c.name], "; fullname:", row[users.c.fullname]
name: jack ; fullname: Jack Jones name: wendy ; fullname: Wendy Williams name: fred ; fullname: Fred Flintstone name: mary ; fullname: Mary Contrary
Result sets which have pending rows remaining should be explicitly closed before discarding. While the resources referenced by the ResultProxy
will be closed when the object is garbage collected, it's better to make it explicit as some database APIs are very picky about such things:
>>> result.close()
If we'd like to more carefully control the columns which are placed in the COLUMNS clause of the select, we reference individual Column
objects from our Table
. These are available as named attributes off the c
attribute of the Table
object:
>>> s = select([users.c.name, users.c.fullname]) sql>>> result = conn.execute(s)
>>> for row in result: ... print row (u'jack', u'Jack Jones') (u'wendy', u'Wendy Williams') (u'fred', u'Fred Flintstone') (u'mary', u'Mary Contrary')
Lets observe something interesting about the FROM clause. Whereas the generated statement contains two distinct sections, a "SELECT columns" part and a "FROM table" part, our select()
construct only has a list containing columns. How does this work ? Let's try putting two tables into our select()
statement:
sql>>> for row in conn.execute(select([users, addresses])): ... print row
(1, u'jack', u'Jack Jones', 1, 1, u'jack@yahoo.com') (1, u'jack', u'Jack Jones', 2, 1, u'jack@msn.com') (1, u'jack', u'Jack Jones', 3, 2, u'www@www.org') (1, u'jack', u'Jack Jones', 4, 2, u'wendy@aol.com') (2, u'wendy', u'Wendy Williams', 1, 1, u'jack@yahoo.com') (2, u'wendy', u'Wendy Williams', 2, 1, u'jack@msn.com') (2, u'wendy', u'Wendy Williams', 3, 2, u'www@www.org') (2, u'wendy', u'Wendy Williams', 4, 2, u'wendy@aol.com') (3, u'fred', u'Fred Flintstone', 1, 1, u'jack@yahoo.com') (3, u'fred', u'Fred Flintstone', 2, 1, u'jack@msn.com') (3, u'fred', u'Fred Flintstone', 3, 2, u'www@www.org') (3, u'fred', u'Fred Flintstone', 4, 2, u'wendy@aol.com') (4, u'mary', u'Mary Contrary', 1, 1, u'jack@yahoo.com') (4, u'mary', u'Mary Contrary', 2, 1, u'jack@msn.com') (4, u'mary', u'Mary Contrary', 3, 2, u'www@www.org') (4, u'mary', u'Mary Contrary', 4, 2, u'wendy@aol.com')
It placed both tables into the FROM clause. But also, it made a real mess. Those who are familiar with SQL joins know that this is a cartesian product; each row from the users
table is produced against each row from the addresses
table. So to put some sanity into this statement, we need a WHERE clause. Which brings us to the second argument of select()
:
>>> s = select([users, addresses], users.c.id==addresses.c.user_id) sql>>> for row in conn.execute(s): ... print row
(1, u'jack', u'Jack Jones', 1, 1, u'jack@yahoo.com') (1, u'jack', u'Jack Jones', 2, 1, u'jack@msn.com') (2, u'wendy', u'Wendy Williams', 3, 2, u'www@www.org') (2, u'wendy', u'Wendy Williams', 4, 2, u'wendy@aol.com')
So that looks a lot better, we added an expression to our select()
which had the effect of adding WHERE users.id = addresses.user_id
to our statement, and our results were managed down so that the join of users
and addresses
rows made sense. But let's look at that expression? It's using just a Python equality operator between two different Column
objects. It should be clear that something is up. Saying 1==1
produces True
, and 1==2
produces False
, not a WHERE clause. So lets see exactly what that expression is doing:
>>> users.c.id==addresses.c.user_id <sqlalchemy.sql.expression._BinaryExpression object at 0x...>
Wow, surprise ! This is neither a True
nor a False
. Well what is it ?
>>> str(users.c.id==addresses.c.user_id) 'users.id = addresses.user_id'
As you can see, the ==
operator is producing an object that is very much like the Insert
and select()
objects we've made so far, thanks to Python's __eq__()
builtin; you call str()
on it and it produces SQL. By now, one can that everything we are working with is ultimately the same type of object. SQLAlchemy terms the base class of all of these expessions as sqlalchemy.sql.ClauseElement
.
Since we've stumbled upon SQLAlchemy's operator paradigm, let's go through some of its capabilities. We've seen how to equate two columns to each other:
>>> print users.c.id==addresses.c.user_id users.id = addresses.user_id
If we use a literal value (a literal meaning, not a SQLAlchemy clause object), we get a bind parameter:
>>> print users.c.id==7 users.id = :users_id
The 7
literal is embedded in ClauseElement
; we can use the same trick we did with the Insert
object to see it:
>>> (users.c.id==7).compile().params ClauseParameters:{'users_id': 7}
Most Python operators, as it turns out, produce a SQL expression here, like equals, not equals, etc.:
>>> print users.c.id != 7 users.id != :users_id >>> # None converts to IS NULL >>> print users.c.name == None users.name IS NULL >>> # reverse works too >>> print 'fred' > users.c.name users.name < :users_name
If we add two integer columns together, we get an addition expression:
>>> print users.c.id + addresses.c.id users.id + addresses.id
Interestingly, the type of the Column
is important ! If we use +
with two string based columns (recall we put types like Integer
and String
on our Column
objects at the beginning), we get something different:
>>> print users.c.name + users.c.fullname users.name || users.fullname
Where ||
is the string concatenation operator used on most databases. But not all of them. MySQL users, fear not:
>>> from sqlalchemy.databases.mysql import MySQLDialect >>> print (users.c.name + users.c.fullname).compile(dialect=MySQLDialect()) concat(users.name, users.fullname)
The above illustrates the SQL that's generated for an Engine
that's connected to a MySQL database (note that the Dialect
is normally created behind the scenes; we created one above just to illustrate without using an engine).
If you have come across an operator which really isn't available, you can always use the op()
method; this generates whatever operator you need:
>>> print users.c.name.op('tiddlywinks')('foo') users.name tiddlywinks :users_name
We'd like to show off some of our operators inside of select()
constructs. But we need to lump them together a little more, so let's first introduce some conjunctions. Conjunctions are those little words like AND and OR that put things together. We'll also hit upon NOT. AND, OR and NOT can work from the corresponding functions SQLAlchemy provides (notice we also throw in a LIKE):
>>> from sqlalchemy.sql import and_, or_, not_ >>> print and_(users.c.name.like('j%'), users.c.id==addresses.c.user_id, ... or_(addresses.c.email_address=='wendy@aol.com', addresses.c.email_address=='jack@yahoo.com'), ... not_(users.c.id>5)) users.name LIKE :users_name AND users.id = addresses.user_id AND (addresses.email_address = :addresses_email_address OR addresses.email_address = :addresses_email_address_1) AND users.id <= :users_id
And you can also use the re-jiggered bitwise AND, OR and NOT operators, although because of Python operator precedence you have to watch your parenthesis:
>>> print users.c.name.like('j%') & (users.c.id==addresses.c.user_id) & \ ... ((addresses.c.email_address=='wendy@aol.com') | (addresses.c.email_address=='jack@yahoo.com')) \ ... & ~(users.c.id>5) users.name LIKE :users_name AND users.id = addresses.user_id AND (addresses.email_address = :addresses_email_address OR addresses.email_address = :addresses_email_address_1) AND users.id <= :users_id
So with all of this vocabulary, let's select all users who have an email address at AOL or MSN, whose name starts with a letter between "m" and "z", and we'll also generate a column containing their full name combined with their email address. We will add two new constructs to this statement, between()
and label()
. between()
produces a BETWEEN clause, and label()
is used in a column expression to produce labels using the AS
keyword; its recommended when selecting from expressions that otherwise would not have a name:
>>> s = select([(users.c.fullname + ", " + addresses.c.email_address).label('title')], ... and_( ... users.c.id==addresses.c.user_id, ... users.c.name.between('m', 'z'), ... or_( ... addresses.c.email_address.like('%@aol.com'), ... addresses.c.email_address.like('%@msn.com') ... ) ... ) ... ) >>> print conn.execute(s).fetchall() SELECT users.fullname || ? || addresses.email_address AS title FROM users, addresses WHERE users.id = addresses.user_id AND users.name BETWEEN ? AND ? AND (addresses.email_address LIKE ? OR addresses.email_address LIKE ?) [', ', 'm', 'z', '%@aol.com', '%@msn.com'] [(u'Wendy Williams, wendy@aol.com',)]
Once again, SQLAlchemy figured out the FROM clause for our statement. In fact it will determine the FROM clause based on all of its other bits; the columns clause, the whereclause, and also some other elements which we haven't covered yet, which include ORDER BY, GROUP BY, and HAVING.
back to section topOur last example really became a handful to type. Going from what one understands to be a textual SQL expression into a Python construct which groups components together in a programmatic style can be hard. That's why SQLAlchemy lets you just use strings too. The text()
construct represents any textual statement. To use bind parameters with text()
, always use the named colon format. Such as below, we create a text()
and execute it, feeding in the bind parameters to the execute()
method:
>>> from sqlalchemy.sql import text >>> s = text("""SELECT users.fullname || ', ' || addresses.email_address AS title ... FROM users, addresses ... WHERE users.id = addresses.user_id AND users.name BETWEEN :x AND :y AND ... (addresses.email_address LIKE :e1 OR addresses.email_address LIKE :e2) ... """) sql>>> print conn.execute(s, x='m', y='z', e1='%@aol.com', e2='%@msn.com').fetchall()
[(u'Wendy Williams, wendy@aol.com',)]
To gain a "hybrid" approach, any of SA's SQL constructs can have text freely intermingled wherever you like - the text()
construct can be placed within any other ClauseElement
construct, and when used in a non-operator context, a direct string may be placed which converts to text()
automatically. Below we combine the usage of text()
and strings with our constructed select()
object, by using the select()
object to structure the statement, and the text()
/strings to provide all the content within the structure. For this example, SQLAlchemy is not given any Column
or Table
objects in any of its expressions, so it cannot generate a FROM clause. So we also give it the from_obj
keyword argument, which is a list of ClauseElements
(or strings) to be placed within the FROM clause:
>>> s = select([text("users.fullname || ', ' || addresses.email_address AS title")], ... and_( ... "users.id = addresses.user_id", ... "users.name BETWEEN 'm' AND 'z'", ... "(addresses.email_address LIKE :x OR addresses.email_address LIKE :y)" ... ), ... from_obj=['users', 'addresses'] ... ) sql>>> print conn.execute(s, x='%@aol.com', y='%@msn.com').fetchall()
[(u'Wendy Williams, wendy@aol.com',)]
Going from constructed SQL to text, we lose some capabilities. We lose the capability for SQLAlchemy to compile our expression to a specific target database; above, our expression won't work with MySQL since it has no ||
construct. It also becomes more tedious for SQLAlchemy to be made aware of the datatypes in use; for example, if our bind parameters required UTF-8 encoding before going in, or conversion from a Python datetime
into a string (as is required with SQLite), we would have to add extra information to our text()
construct. Similar issues arise on the result set side, where SQLAlchemy also performs type-specific data conversion in some cases; still more information can be added to text()
to work around this. But what we really lose from our statement is the ability to manipulate it, transform it, and analyze it. These features are critical when using the ORM, which makes heavy usage of relational transformations. To show off what we mean, we'll first introduce the ALIAS construct and the JOIN construct, just so we have some juicier bits to play with.
The alias corresponds to a "renamed" version of a table or arbitrary relation, which occurs anytime you say "SELECT .. FROM sometable AS someothername". The AS
creates a new name for the table. Aliases are super important in SQL as they allow you to reference the same table more than once. Scenarios where you need to do this include when you self-join a table to itself, or more commonly when you need to join from a parent table to a child table multiple times. For example, we know that our user jack
has two email addresses. How can we locate jack based on the combination of those two addresses? We need to join twice to it. Let's construct two distinct aliases for the addresses
table and join:
>>> a1 = addresses.alias('a1') >>> a2 = addresses.alias('a2') >>> s = select([users], and_( ... users.c.id==a1.c.user_id, ... users.c.id==a2.c.user_id, ... a1.c.email_address=='jack@msn.com', ... a2.c.email_address=='jack@yahoo.com' ... )) sql>>> print conn.execute(s).fetchall()
[(1, u'jack', u'Jack Jones')]
Easy enough. One thing that we're going for with the SQL Expression Language is the melding of programmatic behavior with SQL generation. Coming up with names like a1
and a2
is messy; we really didn't need to use those names anywhere, its just the database that needed them. Plus, we might write some code that uses alias objects that came from several different places, and its difficult to ensure that they all have unique names. So instead, we just let SQLAlchemy make the names for us, using "anonymous" aliases:
>>> a1 = addresses.alias() >>> a2 = addresses.alias() >>> s = select([users], and_( ... users.c.id==a1.c.user_id, ... users.c.id==a2.c.user_id, ... a1.c.email_address=='jack@msn.com', ... a2.c.email_address=='jack@yahoo.com' ... )) sql>>> print conn.execute(s).fetchall()
[(1, u'jack', u'Jack Jones')]
One super-huge advantage of anonymous aliases is that not only did we not have to guess up a random name, but we can also be guaranteed that the above SQL string is deterministically generated to be the same every time. This is important for databases such as Oracle which cache compiled "query plans" for their statements, and need to see the same SQL string in order to make use of it.
Aliases can of course be used for anything which you can SELECT from, including SELECT statements themselves. We can self-join the users
table back to the select()
we've created by making an alias of the entire statement. The correlate(None)
directive is to avoid SQLAlchemy's attempt to "correlate" the inner users
table with the outer one:
>>> a1 = s.correlate(None).alias() >>> s = select([users.c.name], users.c.id==a1.c.id) sql>>> print conn.execute(s).fetchall()
[(u'jack',)]
We're halfway along to being able to construct any SELECT expression. The next cornerstone of the SELECT is the JOIN expression. We've already been doing joins in our examples, by just placing two tables in either the columns clause or the where clause of the select()
construct. But if we want to make a real "JOIN" or "OUTERJOIN" construct, we use the join()
and outerjoin()
methods, most commonly accessed from the left table in the join:
>>> print users.join(addresses) users JOIN addresses ON users.id = addresses.user_id
The alert reader will see more surprises; SQLAlchemy figured out how to JOIN the two tables ! The ON condition of the join, as it's called, was automatically generated based on the ForeignKey
object which we placed on the addresses
table way at the beginning of this tutorial. Already the join()
construct is looking like a much better way to join tables.
Of course you can join on whatever expression you want, such as if we want to join on all users who use the same name in their email address as their username:
>>> print users.join(addresses, addresses.c.email_address.like(users.c.name + '%')) users JOIN addresses ON addresses.email_address LIKE users.name || :users_name
When we create a select()
construct, SQLAlchemy looks around at the tables we've mentioned and then places them in the FROM clause of the statement. When we use JOINs however, we know what FROM clause we want, so here we make usage of the from_obj
keyword argument:
>>> s = select([users.c.fullname], from_obj=[ ... users.join(addresses, addresses.c.email_address.like(users.c.name + '%')) ... ]) sql>>> print conn.execute(s).fetchall()
[(u'Jack Jones',), (u'Jack Jones',), (u'Wendy Williams',)]
The outerjoin()
function just creates LEFT OUTER JOIN
constructs. It's used just like join()
:
>>> s = select([users.c.fullname], from_obj=[users.outerjoin(addresses)]) >>> print s SELECT users.fullname FROM users LEFT OUTER JOIN addresses ON users.id = addresses.user_id
That's the output outerjoin()
produces, unless, of course, you're stuck in a gig using Oracle prior to version 9, and you've set up your engine (which would be using OracleDialect
) to use Oracle-specific SQL:
>>> from sqlalchemy.databases.oracle import OracleDialect >>> print s.compile(dialect=OracleDialect(use_ansi=False)) SELECT users.fullname FROM users, addresses WHERE users.id = addresses.user_id(+)
If you don't know what that SQL means, don't worry ! The secret tribe of Oracle DBAs don't want their black magic being found out ;).
back to section topWe've now gained the ability to construct very sophisticated statements. We can use all kinds of operators, table constructs, text, joins, and aliases. The point of all of this, as mentioned earlier, is not that it's an "easier" or "better" way to write SQL than just writing a SQL statement yourself; the point is that its better for writing programmatically generated SQL which can be morphed and adapted as needed in automated scenarios.
To support this, the select()
construct we've been working with supports piecemeal construction, in addition to the "all at once" method we've been doing. Suppose you're writing a search function, which receives criterion and then must construct a select from it. To accomplish this, upon each criterion encountered, you apply "generative" criterion to an existing select()
construct with new elements, one at a time. We start with a basic select()
constructed with the shortcut method available on the users
table:
>>> query = users.select() >>> print query SELECT users.id, users.name, users.fullname FROM users
We encounter search criterion of "name='jack'". So we apply WHERE criterion stating such:
>>> query = query.where(users.c.name=='jack')
Next, we encounter that they'd like the results in descending order by full name. We apply ORDER BY, using an extra modifier desc
:
>>> query = query.order_by(users.c.fullname.desc())
We also come across that they'd like only users who have an address at MSN. A quick way to tack this on is by using an EXISTS clause, which we correlate to the users
table in the enclosing SELECT:
>>> from sqlalchemy.sql import exists >>> query = query.where( ... exists([addresses.c.id], ... and_(addresses.c.user_id==users.c.id, addresses.c.email_address.like('%@msn.com')) ... ).correlate(users))
And finally, the application also wants to see the listing of email addresses at once; so to save queries, we outerjoin the addresses
table (using an outer join so that users with no addresses come back as well; since we're programmatic, we might not have kept track that we used an EXISTS clause against the addresses
table too...). Additionally, since the users
and addresses
table both have a column named id
, let's isolate their names from each other in the COLUMNS clause by using labels:
>>> query = query.column(addresses).select_from(users.outerjoin(addresses)).apply_labels()
Let's bake for .0001 seconds and see what rises:
>>> conn.execute(query).fetchall()
[(1, u'jack', u'Jack Jones', 1, 1, u'jack@yahoo.com'), (1, u'jack', u'Jack Jones', 2, 1, u'jack@msn.com')]
So we started small, added one little thing at a time, and at the end we have a huge statement..which actually works. Now let's do one more thing; the searching function wants to add another email_address
criterion on, however it doesn't want to construct an alias of the addresses
table; suppose many parts of the application are written to deal specifically with the addresses
table, and to change all those functions to support receiving an arbitrary alias of the address would be cumbersome. We can actually convert the addresses
table within the existing statement to be an alias of itself, using replace_selectable()
:
>>> a1 = addresses.alias() >>> query = query.replace_selectable(addresses, a1) >>> print query
One more thing though, with automatic labeling applied as well as anonymous aliasing, how do we retrieve the columns from the rows for this thing ? The label for the email_addresses
column is now the generated name addresses_1_email_address
; and in another statement might be something different ! This is where accessing by result columns by Column
object becomes very useful:
sql>>> for row in conn.execute(query): ... print "Name:", row[users.c.name], "; Email Address", row[a1.c.email_address]
Name: jack ; Email Address jack@yahoo.com Name: jack ; Email Address jack@msn.com
The above example, by it's end, got significantly more intense than the typical end-user constructed SQL will usually be. However when writing higher-level tools such as ORMs, they become more significant. SQLAlchemy's ORM relies very heavily on techniques like this.
back to section topThe concepts of creating SQL expressions have been introduced. What's left are more variants of the same themes. So now we'll catalog the rest of the important things we'll need to know.
Throughout all these examples, SQLAlchemy is busy creating bind parameters wherever literal expressions occur. You can also specify your own bind parameters with your own names, and use the same statement repeatedly. The database dialect converts to the appropriate named or positional style, as here where it converts to positional for SQLite:
>>> from sqlalchemy.sql import bindparam >>> s = users.select(users.c.name==bindparam('username')) sql>>> conn.execute(s, username='wendy').fetchall()
[(2, u'wendy', u'Wendy Williams')]
Another important aspect of bind paramters is that they may be assigned a type. The type of the bind paramter will determine it's behavior within expressions and also how the data bound to it is processed before being sent off to the database:
>>> s = users.select(users.c.name.like(bindparam('username', type_=String) + text("'%'"))) sql>>> conn.execute(s, username='wendy').fetchall()
[(2, u'wendy', u'Wendy Williams')]
Bind parameters of the same name can also be used multiple times, where only a single named value is needed in the execute paramters:
>>> s = select([users, addresses], ... users.c.name.like(bindparam('name', type_=String) + text("'%'")) | ... addresses.c.email_address.like(bindparam('name', type_=String) + text("'@%'")), ... from_obj=[users.outerjoin(addresses)]) sql>>> conn.execute(s, name='jack').fetchall()
[(1, u'jack', u'Jack Jones', 1, 1, u'jack@yahoo.com'), (1, u'jack', u'Jack Jones', 2, 1, u'jack@msn.com')]
SQL functions are created using the func
keyword, which generates functions using attribute access:
>>> from sqlalchemy.sql import func >>> print func.now() now() >>> print func.concat('x', 'y') concat(:concat, :concat_1)
Certain functions are marked as "ANSI" functions, which mean they don't get the parenthesis added after them, such as CURRENT_TIMESTAMP:
>>> print func.current_timestamp() current_timestamp
Functions are most typically used in the columns clause of a select statement, and can also be labeled as well as given a type. Labeling a function is recommended so that the result can be targeted in a result row based on a string name, and assigning it a type is required when you need result-set processing to occur, such as for unicode conversion and date conversions. Below, we use the result function scalar()
to just read the first column of the first row and then close the result; the label, even though present, is not important in this case:
>>> print conn.execute( ... select([func.max(addresses.c.email_address, type_=String).label('maxemail')]) ... ).scalar()
www@www.org
Databases such as Postgres and Oracle which support functions that return whole result sets can be assembled into selectable units, which can be used in statements. Such as, a database function calculate()
which takes the parameters x
and y
, and returns three columns which we'd like to name q
, z
and r
, we can construct using "lexical" column objects as well as bind parameters:
>>> from sqlalchemy.sql import column >>> calculate = select([column('q'), column('z'), column('r')], ... from_obj=[func.calculate(bindparam('x'), bindparam('y'))]) >>> print select([users], users.c.id > calculate.c.z) SELECT users.id, users.name, users.fullname FROM users, (SELECT q, z, r FROM calculate(:x, :y)) WHERE users.id > z
If we wanted to use our calculate
statement twice with different bind parameters, the unique_params()
function will create copies for us, and mark the bind params as "unique" so that conflicting names are isolated. Note we also make two separate aliases of our selectable:
>>> s = select([users], users.c.id.between( ... calculate.alias('c1').unique_params(x=17, y=45).c.z, ... calculate.alias('c2').unique_params(x=5, y=12).c.z)) >>> print s SELECT users.id, users.name, users.fullname FROM users, (SELECT q, z, r FROM calculate(:x, :y)) AS c1, (SELECT q, z, r FROM calculate(:x_1, :y_1)) AS c2 WHERE users.id BETWEEN c1.z AND c2.z >>> s.compile().params ClauseParameters:{'y': 45, 'x': 17, 'y_1': 12, 'x_1': 5}
Unions come in two flavors, UNION and UNION ALL, which are available via module level functions or methods off a Selectable:
>>> u = addresses.select(addresses.c.email_address=='foo@bar.com').union( ... addresses.select(addresses.c.email_address.like('%@yahoo.com')), ... ).order_by(addresses.c.email_address) sql>>> print conn.execute(u).fetchall()
[(1, 1, u'jack@yahoo.com')]
Also available, though not supported on all databases, are intersect()
, intersect_all()
, except_()
, and except_all()
:
>>> u = addresses.select(addresses.c.email_address.like('%@%.com')).except_( ... addresses.select(addresses.c.email_address.like('%@msn.com')) ... ) sql>>> print conn.execute(u).fetchall()
[(1, 1, u'jack@yahoo.com'), (4, 2, u'wendy@aol.com')]
To embed a SELECT in a column expression, use as_scalar()
:
sql>>> print conn.execute(select([ ... users.c.name, ... select([func.count(addresses.c.id)], users.c.id==addresses.c.user_id).as_scalar() ... ])).fetchall()
[(u'jack', 2), (u'wendy', 2), (u'fred', 0), (u'mary', 0)]
Alternatively, applying a label()
to a select evaluates it as a scalar as well:
sql>>> print conn.execute(select([ ... users.c.name, ... select([func.count(addresses.c.id)], users.c.id==addresses.c.user_id).label('address_count') ... ])).fetchall()
[(u'jack', 2), (u'wendy', 2), (u'fred', 0), (u'mary', 0)]
Notice in the examples on "scalar selects", the FROM clause of each embedded select did not contain the users
table in it's FROM clause. This is because SQLAlchemy automatically attempts to correlate embeded FROM objects to that of an enclosing query. To disable this, or to specify explicit FROM clauses to be correlated, use correlate()
:
>>> s = select([users.c.name], users.c.id==select([users.c.id]).correlate(None)) >>> print s SELECT users.name FROM users WHERE users.id = (SELECT users.id FROM users) >>> s = select([users.c.name, addresses.c.email_address], users.c.id== ... select([users.c.id], users.c.id==addresses.c.user_id).correlate(addresses) ... ) >>> print s SELECT users.name, addresses.email_address FROM users, addresses WHERE users.id = (SELECT users.id FROM users WHERE users.id = addresses.user_id)
The select()
function can take keyword arguments order_by
, group_by
(as well as having
), limit
, and offset
. There's also distinct=True
. These are all also available as generative functions. order_by()
expressions can use the modifiers asc()
or desc()
to indicate ascending or descending.
>>> s = select([addresses.c.user_id, func.count(addresses.c.id)]).\ ... group_by(addresses.c.user_id).having(func.count(addresses.c.id>1)) >>> print conn.execute(s).fetchall()
[(1, 2), (2, 2)] >>> s = select([addresses.c.email_address, addresses.c.id]).distinct().\ ... order_by(addresses.c.email_address.desc(), addresses.c.id) >>> conn.execute(s).fetchall()
[(u'www@www.org', 3), (u'wendy@aol.com', 4), (u'jack@yahoo.com', 1), (u'jack@msn.com', 2)] >>> s = select([addresses]).offset(1).limit(1) >>> print conn.execute(s).fetchall()
[(2, 1, u'jack@msn.com')]
Finally, we're back to UPDATE. Updates work a lot like INSERTS, except there is an additional WHERE clause that can be specified.
>>> # change 'jack' to 'ed' sql>>> conn.execute(users.update(users.c.name=='jack'), name='ed')
<sqlalchemy.engine.base.ResultProxy object at 0x...> >>> # use bind parameters >>> u = users.update(users.c.name==bindparam('oldname'), values={'name':bindparam('newname')}) sql>>> conn.execute(u, oldname='jack', newname='ed')
<sqlalchemy.engine.base.ResultProxy object at 0x...> >>> # update a column to an expression sql>>> conn.execute(users.update(values={users.c.fullname:"Fullname: " + users.c.name}))
<sqlalchemy.engine.base.ResultProxy object at 0x...>
A correlated update lets you update a table using selection from another table, or the same table:
>>> s = select([addresses.c.email_address], addresses.c.user_id==users.c.id).limit(1) sql>>> conn.execute(users.update(values={users.c.fullname:s}))
<sqlalchemy.engine.base.ResultProxy object at 0x...>
Finally, a delete. Easy enough:
sql>>> conn.execute(addresses.delete())
<sqlalchemy.engine.base.ResultProxy object at 0x...> sql>>> conn.execute(users.delete(users.c.name > 'm'))
<sqlalchemy.engine.base.ResultProxy object at 0x...>
The best place to get every possible name you can use in constructed SQL is the Generated Documentation.
Table Metadata Reference: Database Meta Data
Engine/Connection/Execution Reference: Database Engines
SQL Types: The Types System
back to section topThis section references most major configurational patterns involving the mapper() and relation() functions. It assumes you've worked through the Object Relational Tutorial and know how to construct and use rudimental mappers and relations.
Full API documentation for the ORM:
Options for the mapper()
function:
The default behavior of a mapper
is to assemble all the columns in the mapped Table
into mapped object attributes. This behavior can be modified in several ways, as well as enhanced by SQL expressions.
To load only a part of the columns referenced by a table as attributes, use the include_properties
and exclude_properties
arguments:
mapper(User, users_table, include_properties=['user_id', 'user_name']) mapper(Address, addresses_table, exclude_properties=['street', 'city', 'state', 'zip'])
To change the name of the attribute mapped to a particular column, place the Column
object in the properties
dictionary with the desired key:
mapper(User, users_table, properties={ 'id' : users_table.c.user_id, 'name' : users_table.c.user_name, })
To change the names of all attributes using a prefix, use the column_prefix
option. This is useful for classes which wish to add their own property
accessors:
mapper(User, users_table, column_prefix='_')
The above will place attribute names such as _user_id
, _user_name
, _password
etc. on the mapped User
class.
To place multiple columns which are known to be "synonymous" based on foreign key relationship or join condition into the same mapped attribute, put them together using a list, as below where we map to a Join
:
# join users and addresses usersaddresses = sql.join(users_table, addresses_table, \ users_table.c.user_id == addresses_table.c.user_id) mapper(User, usersaddresses, properties = { 'id':[users_table.c.user_id, addresses_table.c.user_id], })
This feature allows particular columns of a table to not be loaded by default, instead being loaded later on when first referenced. It is essentailly "column-level lazy loading". This feature is useful when one wants to avoid loading a large text or binary field into memory when its not needed. Individual columns can be lazy loaded by themselves or placed into groups that lazy-load together.
book_excerpts = Table('books', db, Column('book_id', Integer, primary_key=True), Column('title', String(200), nullable=False), Column('summary', String(2000)), Column('excerpt', String), Column('photo', Binary) ) class Book(object): pass # define a mapper that will load each of 'excerpt' and 'photo' in # separate, individual-row SELECT statements when each attribute # is first referenced on the individual object instance mapper(Book, book_excerpts, properties = { 'excerpt' : deferred(book_excerpts.c.excerpt), 'photo' : deferred(book_excerpts.c.photo) })
Deferred columns can be placed into groups so that they load together:
book_excerpts = Table('books', db, Column('book_id', Integer, primary_key=True), Column('title', String(200), nullable=False), Column('summary', String(2000)), Column('excerpt', String), Column('photo1', Binary), Column('photo2', Binary), Column('photo3', Binary) ) class Book(object): pass # define a mapper with a 'photos' deferred group. when one photo is referenced, # all three photos will be loaded in one SELECT statement. The 'excerpt' will # be loaded separately when it is first referenced. mapper(Book, book_excerpts, properties = { 'excerpt' : deferred(book_excerpts.c.excerpt), 'photo1' : deferred(book_excerpts.c.photo1, group='photos'), 'photo2' : deferred(book_excerpts.c.photo2, group='photos'), 'photo3' : deferred(book_excerpts.c.photo3, group='photos') })
You can defer or undefer columns at the Query
level using the defer
and undefer
options:
query = session.query(Book) query.options(defer('summary')).all() query.options(undefer('excerpt')).all()
And an entire "deferred group", i.e. which uses the group
keyword argument to deferred()
, can be undeferred using undefer_group()
, sending in the group name:
query = session.query(Book) query.options(undefer_group('photos')).all()
To add a SQL clause composed of local or external columns as a read-only, mapped column attribute, use the column_property()
function. Any scalar-returning ClauseElement
may be used, as long as it has a name
attribute; usually, you'll want to call label()
to give it a specific name:
mapper(User, users_table, properties={ 'fullname' : column_property( (users_table.c.firstname + " " + users_table.c.lastname).label('fullname') ) })
Correlated subqueries may be used as well:
mapper(User, users_table, properties={ 'address_count' : column_property( select( [func.count(addresses_table.c.address_id)], addresses_table.c.user_id==users_table.c.user_id ).label('address_count') ) })
A common request is the ability to create custom class properties that override the behavior of setting/getting an attribute. You accomplish this using normal Python property
constructs:
class MyAddress(object): def _set_email(self, email): self._email = email def _get_email(self): return self._email email = property(_get_email, _set_email) mapper(MyAddress, addresses_table, properties = { # map the '_email' attribute to the "email" column # on the table '_email': addresses_table.c.email })
To have your custom email
property be recognized by keyword-based Query
functions such as filter_by()
, place a synonym
on your mapper:
mapper(MyAddress, addresses_table, properties = { '_email': addresses_table.c.email 'email':synonym('_email') }) # use the synonym in a query result = session.query(MyAddress).filter_by(email='john@smith.com')
Synonym strategies such as the above can be easily automated, such as this example which specifies all columns and synonyms explicitly:
mapper(MyAddress, addresses_table, properties = dict( [('_'+col.key, col) for col in addresses_table.c] + [(col.key, synonym('_'+col.key)) for col in addresses_table.c] ))
The column_prefix
option can also help with the above scenario by setting up the columns automatically with a prefix:
mapper(MyAddress, addresses_table, column_prefix='_', properties = dict( [(col.key, synonym('_'+col.key)) for col in addresses_table.c] ))
Sets of columns can be associated with a single datatype. The ORM treats the group of columns like a single column which accepts and returns objects using the custom datatype you provide. In this example, we'll create a table vertices
which stores a pair of x/y coordinates, and a custom datatype Point
which is a composite type of an x and y column:
vertices = Table('vertices', metadata, Column('id', Integer, primary_key=True), Column('x1', Integer), Column('y1', Integer), Column('x2', Integer), Column('y2', Integer), )
The requirements for the custom datatype class are that it have a constructor which accepts positional arguments corresponding to its column format, and also provides a method __composite_values__()
which returns the state of the object as a list or tuple, in order of its column-based attributes. It also should supply adequate __eq__()
and __ne__()
methods which test the equality of two instances:
class Point(object): def __init__(self, x, y): self.x = x self.y = y def __composite_values__(self): return [self.x, self.y] def __eq__(self, other): return other.x == self.x and other.y == self.y def __ne__(self, other): return not self.__eq__(other)
Setting up the mapping uses the composite()
function:
class Vertex(object): pass mapper(Vertex, vertices, properties={ 'start':composite(Point, vertices.c.x1, vertices.c.y1), 'end':composite(Point, vertices.c.x2, vertices.c.y2) })
We can now use the Vertex
instances as well as querying as though the start
and end
attributes are regular scalar attributes:
sess = Session() v = Vertex(Point(3, 4), Point(5, 6)) sess.save(v) v2 = sess.query(Vertex).filter(Vertex.start == Point(3, 4))
The "equals" comparison operation by default produces an AND of all corresponding columns equated to one another. If you'd like to override this, or define the behavior of other SQL operators for your new type, the composite()
function accepts an extension object of type sqlalchemy.orm.PropComparator
:
from sqlalchemy.orm import PropComparator from sqlalchemy import sql class PointComparator(PropComparator): def __gt__(self, other): """define the 'greater than' operation""" return sql.and_(*[a>b for a, b in zip(self.prop.columns, other.__composite_values__())]) maper(Vertex, vertices, properties={ 'start':composite(Point, vertices.c.x1, vertices.c.y1, comparator=PointComparator), 'end':composite(Point, vertices.c.x2, vertices.c.y2, comparator=PointComparator) })
By default, mappers will attempt to ORDER BY the "oid" column of a table, or the first primary key column, when selecting rows. This can be modified in several ways.
The "order_by" parameter can be sent to a mapper, overriding the per-engine ordering if any. A value of None means that the mapper should not use any ordering. A non-None value, which can be a column, an asc
or desc
clause, or an array of either one, indicates the ORDER BY clause that should be added to all select queries:
# disable all ordering mapper(User, users_table, order_by=None) # order by a column mapper(User, users_table, order_by=users_table.c.user_id) # order by multiple items mapper(User, users_table, order_by=[users_table.c.user_id, users_table.c.user_name.desc()])
"order_by" can also be specified with queries, overriding all other per-engine/per-mapper orderings:
# order by a column l = query.filter(User.user_name=='fred').order_by(User.user_id).all() # order by multiple criterion l = query.filter(User.user_name=='fred').order_by([User.user_id, User.user_name.desc()])
The "order_by" property can also be specified on a relation()
which will control the ordering of the collection:
mapper(Address, addresses_table) # order address objects by address id mapper(User, users_table, properties = { 'addresses' : relation(Address, order_by=addresses_table.c.address_id) })
Note that when using eager loaders with relations, the tables used by the eager load's join are anonymously aliased. You can only order by these columns if you specify it at the relation()
level. To control ordering at the query level based on a related table, you join()
to that relation, then order by it:
session.query(User).join('addresses').order_by(Address.street)
SQLAlchemy supports three forms of inheritance: single table inheritance, where several types of classes are stored in one table, concrete table inheritance, where each type of class is stored in its own table, and joined table inheritance, where the parent/child classes are stored in their own tables that are joined together in a select. Whereas support for single and joined table inheritance is strong, concrete table inheritance is a less common scenario with some particular problems so is not quite as flexible.
When mappers are configured in an inheritance relationship, SQLAlchemy has the ability to load elements "polymorphically", meaning that a single query can return objects of multiple types.
For the following sections, assume this class relationship:
class Employee(object): def __init__(self, name): self.name = name def __repr__(self): return self.__class__.__name__ + " " + self.name class Manager(Employee): def __init__(self, name, manager_data): self.name = name self.manager_data = manager_data def __repr__(self): return self.__class__.__name__ + " " + self.name + " " + self.manager_data class Engineer(Employee): def __init__(self, name, engineer_info): self.name = name self.engineer_info = engineer_info def __repr__(self): return self.__class__.__name__ + " " + self.name + " " + self.engineer_info
In joined table inheritance, each class along a particular classes' list of parents is represented by a unique table. The total set of attributes for a particular instance is represented as a join along all tables in its inheritance path. Here, we first define a table to represent the Employee
class. This table will contain a primary key column (or columns), and a column for each attribute that's represented by Employee
. In this case it's just name
:
employees = Table('employees', metadata, Column('employee_id', Integer, primary_key=True), Column('name', String(50)), Column('type', String(30), nullable=False) )
The table also has a column called type
. It is strongly advised in both single- and joined- table inheritance scenarios that the root table contains a column whose sole purpose is that of the discriminator; it stores a value which indicates the type of object represented within the row. The column may be of any desired datatype. While there are some "tricks" to work around the requirement that there be a discriminator column, they are more complicated to configure when one wishes to load polymorphically.
Next we define individual tables for each of Engineer
and Manager
, which each contain columns that represent the attributes unique to the subclass they represent. Each table also must contain a primary key column (or columns), and in most cases a foreign key reference to the parent table. It is standard practice that the same column is used for both of these roles, and that the column is also named the same as that of the parent table. However this is optional in SQLAlchemy; separate columns may be used for primary key and parent-relation, the column may be named differently than that of the parent, and even a custom join condition can be specified between parent and child tables instead of using a foreign key. In joined table inheritance, the primary key of an instance is always represented by the primary key of the base table only (new in SQLAlchemy 0.4).
engineers = Table('engineers', metadata, Column('employee_id', Integer, ForeignKey('employees.employee_id'), primary_key=True), Column('engineer_info', String(50)), ) managers = Table('managers', metadata, Column('employee_id', Integer, ForeignKey('employees.employee_id'), primary_key=True), Column('manager_data', String(50)), )
We then configure mappers as usual, except we use some additional arguments to indicate the inheritance relationship, the polymorphic discriminator column, and the polymorphic identity of each class; this is the value that will be stored in the polymorphic discriminator column.
mapper(Employee, employees, polymorphic_on=employees.c.type, polymorphic_identity='employee') mapper(Engineer, engineers, inherits=Employee, polymorphic_identity='engineer') mapper(Manager, managers, inherits=Employee, polymorphic_identity='manager')
And that's it. Querying against Employee
will return a combination of Employee
, Engineer
and Manager
objects.
When loading fresh from the database, the joined-table setup above will query from the parent table first, then for each row will issue a second query to the child table. For example, for a load of five rows with Employee
id 3, Manager
ids 1 and 5 and Engineer
ids 2 and 4, will produce queries along the lines of this example:
session.query(Employee).all()
The above query works well for a get()
operation, since it limits the queries to only the tables directly involved in fetching a single instance. For instances which are already present in the session, the secondary table load is not needed. However, the above loading style is not efficient for loading large groups of objects, as it incurs separate queries for each parent row.
One way to reduce the number of "secondary" loads of child rows is to "defer" them, using polymorphic_fetch='deferred'
:
mapper(Employee, employees, polymorphic_on=employees.c.type, \ polymorphic_identity='employee', polymorphic_fetch='deferred') mapper(Engineer, engineers, inherits=Employee, polymorphic_identity='engineer') mapper(Manager, managers, inherits=Employee, polymorphic_identity='manager')
The above configuration queries in the same manner as earlier, except the load of each "secondary" table occurs only when attributes referencing those columns are first referenced on the loaded instance. This style of loading is very efficient for cases where large selects of items occur, but a detailed "drill down" of extra inherited properties is less common.
More commonly, an all-at-once load may be achieved by constructing a query which combines all three tables together, and adding it to the mapper configuration as its select_table
, which is an arbitrary selectable which the mapper will use for load operations (it has no impact on save operations). Any selectable can be used for this, such as a UNION of tables. For joined table inheritance, the easiest method is to use OUTER JOIN:
join = employees.outerjoin(engineers).outerjoin(managers) mapper(Employee, employees, polymorphic_on=employees.c.type, \ polymorphic_identity='employee', select_table=join) mapper(Engineer, engineers, inherits=Employee, polymorphic_identity='engineer') mapper(Manager, managers, inherits=Employee, polymorphic_identity='manager')
Which produces a query like the following:
session.query(Employee).all()
Single table inheritance is where the attributes of the base class as well as all subclasses are represented within a single table. A column is present in the table for every attribute mapped to the base class and all subclasses; the columns which correspond to a single subclass are nullable. This configuration looks much like joined-table inheritance except there's only one table. In this case, a type
column is required, as there would be no other way to discriminate between classes. The table is specified in the base mapper only; for the inheriting classes, leave their table
parameter blank:
employees_table = Table('employees', metadata, Column('employee_id', Integer, primary_key=True), Column('name', String(50)), Column('manager_data', String(50)), Column('engineer_info', String(50)), Column('type', String(20), nullable=False) ) employee_mapper = mapper(Employee, employees_table, \ polymorphic_on=employees_table.c.type, polymorphic_identity='employee') manager_mapper = mapper(Manager, inherits=employee_mapper, polymorphic_identity='manager') engineer_mapper = mapper(Engineer, inherits=employee_mapper, polymorphic_identity='engineer')
This form of inheritance maps each class to a distinct table, as below:
employees_table = Table('employees', metadata, Column('employee_id', Integer, primary_key=True), Column('name', String(50)), ) managers_table = Table('managers', metadata, Column('employee_id', Integer, primary_key=True), Column('name', String(50)), Column('manager_data', String(50)), ) engineers_table = Table('engineers', metadata, Column('employee_id', Integer, primary_key=True), Column('name', String(50)), Column('engineer_info', String(50)), )
Notice in this case there is no type
column. If polymorphic loading is not required, theres no advantage to using inherits
here; you just define a separate mapper for each class.
mapper(Employee, employees_table) mapper(Manager, managers_table) mapper(Engineer, engineers_table)
To load polymorphically, the select_table
argument is currently required. In this case we must construct a UNION of all three tables. SQLAlchemy includes a helper function to create these called polymorphic_union
, which will map all the different columns into a structure of selects with the same numbers and names of columns, and also generate a virtual type
column for each subselect:
pjoin = polymorphic_union({ 'employee':employees_table, 'manager':managers_table, 'engineer':engineers_table }, 'type', 'pjoin') employee_mapper = mapper(Employee, employees_table, select_table=pjoin, \ polymorphic_on=pjoin.c.type, polymorphic_identity='employee') manager_mapper = mapper(Manager, managers_table, inherits=employee_mapper, \ concrete=True, polymorphic_identity='manager') engineer_mapper = mapper(Engineer, engineers_table, inherits=employee_mapper, \ concrete=True, polymorphic_identity='engineer')
Upon select, the polymorphic union produces a query like this:
session.query(Employee).all()
Both joined-table and single table inheritance scenarios produce mappings which are usable in relation() functions; that is, it's possible to map a parent object to a child object which is polymorphic. Similiarly, inheriting mappers can have relation()
s of their own at any level, which are inherited to each child class. The only requirement for relations is that there is a table relationship between parent and child. An example is the following modification to the joined table inheritance example, which sets a bi-directional relationship between Employee
and Company
:
employees_table = Table('employees', metadata, Column('employee_id', Integer, primary_key=True), Column('name', String(50)), Column('company_id', Integer, ForeignKey('companies.company_id')) ) companies = Table('companies', metadata, Column('company_id', Integer, primary_key=True), Column('name', String(50))) class Company(object): pass mapper(Company, companies, properties={ 'employees': relation(Employee, backref='company') })
SQLAlchemy has a lot of experience in this area; the optimized "outer join" approach can be used freely for parent and child relationships, eager loads are fully useable, query aliasing and other tricks are fully supported as well.
In a concrete inheritance scenario, mapping relation()
s is more difficult since the distinct classes do not share a table. In this case, you can establish a relationship from parent to child if a join condition can be constructed from parent to child, if each child table contains a foreign key to the parent:
companies = Table('companies', metadata, Column('id', Integer, primary_key=True), Column('name', String(50))) employees_table = Table('employees', metadata, Column('employee_id', Integer, primary_key=True), Column('name', String(50)), Column('company_id', Integer, ForeignKey('companies.id')) ) managers_table = Table('managers', metadata, Column('employee_id', Integer, primary_key=True), Column('name', String(50)), Column('manager_data', String(50)), Column('company_id', Integer, ForeignKey('companies.id')) ) engineers_table = Table('engineers', metadata, Column('employee_id', Integer, primary_key=True), Column('name', String(50)), Column('engineer_info', String(50)), Column('company_id', Integer, ForeignKey('companies.id')) ) mapper(Employee, employees_table, select_table=pjoin, polymorphic_on=pjoin.c.type, polymorphic_identity='employee') mapper(Manager, managers_table, inherits=employee_mapper, concrete=True, polymorphic_identity='manager') mapper(Engineer, engineers_table, inherits=employee_mapper, concrete=True, polymorphic_identity='engineer') mapper(Company, companies, properties={ 'employees':relation(Employee) })
Let's crank it up and try loading with an eager load:
session.query(Company).options(eagerload('employees')).all()
The big limitation with concrete table inheritance is that relation()s placed on each concrete mapper do not propagate to child mappers. If you want to have the same relation()s set up on all concrete mappers, they must be configured manually on each.
back to section topMappers can be constructed against arbitrary relational units (called Selectables
) as well as plain Tables
. For example, The join
keyword from the SQL package creates a neat selectable unit comprised of multiple tables, complete with its own composite primary key, which can be passed in to a mapper as the table.
# a class class AddressUser(object): pass # define a Join j = join(users_table, addresses_table) # map to it - the identity of an AddressUser object will be # based on (user_id, address_id) since those are the primary keys involved mapper(AddressUser, j, properties={ 'user_id':[users_table.c.user_id, addresses_table.c.user_id] })
A second example:
# many-to-many join on an association table j = join(users_table, userkeywords, users_table.c.user_id==userkeywords.c.user_id).join(keywords, userkeywords.c.keyword_id==keywords.c.keyword_id) # a class class KeywordUser(object): pass # map to it - the identity of a KeywordUser object will be # (user_id, keyword_id) since those are the primary keys involved mapper(KeywordUser, j, properties={ 'user_id':[users_table.c.user_id, userkeywords.c.user_id], 'keyword_id':[userkeywords.c.keyword_id, keywords.c.keyword_id] })
In both examples above, "composite" columns were added as properties to the mappers; these are aggregations of multiple columns into one mapper property, which instructs the mapper to keep both of those columns set at the same value.
back to section topSimilar to mapping against a join, a plain select() object can be used with a mapper as well. Below, an example select which contains two aggregate functions and a group_by is mapped to a class:
s = select([customers, func.count(orders).label('order_count'), func.max(orders.price).label('highest_order')], customers.c.customer_id==orders.c.customer_id, group_by=[c for c in customers.c] ).alias('somealias') class Customer(object): pass mapper(Customer, s)
Above, the "customers" table is joined against the "orders" table to produce a full row for each customer row, the total count of related rows in the "orders" table, and the highest price in the "orders" table, grouped against the full set of columns in the "customers" table. That query is then mapped against the Customer class. New instances of Customer will contain attributes for each column in the "customers" table as well as an "order_count" and "highest_order" attribute. Updates to the Customer object will only be reflected in the "customers" table and not the "orders" table. This is because the primary key columns of the "orders" table are not represented in this mapper and therefore the table is not affected by save or delete operations.
back to section topThe first mapper created for a certain class is known as that class's "primary mapper." Other mappers can be created as well, these come in two varieties.
secondary mapper
this is a mapper that must be constructed with the keyword argument non_primary=True
, and represents a load-only mapper. Objects that are loaded with a secondary mapper will have their save operation processed by the primary mapper. It is also invalid to add new relation()
s to a non-primary mapper. To use this mapper with the Session, specify it to the query
method:
example:
# primary mapper mapper(User, users_table) # make a secondary mapper to load User against a join othermapper = mapper(User, users_table.join(someothertable), non_primary=True) # select result = session.query(othermapper).select()
The "non primary mapper" is a rarely needed feature of SQLAlchemy; in most cases, the Query
object can produce any kind of query that's desired. It's recommended that a straight Query
be used in place of a non-primary mapper unless the mapper approach is absolutely needed. Current use cases for the "non primary mapper" are when you want to map the class to a particular select statement or view to which additional query criterion can be added, and for when the particular mapped select statement or view is to be placed in a relation()
of a parent mapper.
entity name mapper
this is a mapper that is a fully functioning primary mapper for a class, which is distinguished from the regular primary mapper by an entity_name
parameter. Instances loaded with this mapper will be totally managed by this new mapper and have no connection to the original one. Most methods on Session
include an optional entity_name
parameter in order to specify this condition.
example:
# primary mapper mapper(User, users_table) # make an entity name mapper that stores User objects in another table mapper(User, alternate_users_table, entity_name='alt') # make two User objects user1 = User() user2 = User() # save one in in the "users" table session.save(user1) # save the other in the "alternate_users_table" session.save(user2, entity_name='alt') session.flush() # select from the alternate mapper session.query(User, entity_name='alt').select()
Use the "entity name" mapper when different instances of the same class are persisted in completely different tables. The "entity name" approach can also perform limited levels of horizontal partitioning as well. A more comprehensive approach to horizontal partitioning is provided by the Sharding API.
Mappers can have functionality augmented or replaced at many points in its execution via the usage of the MapperExtension class. This class is just a series of "hooks" where various functionality takes place. An application can make its own MapperExtension objects, overriding only the methods it needs. Methods that are not overridden return the special value sqlalchemy.orm.EXT_CONTINUE
to allow processing to continue to the next MapperExtension or simply proceed normally if there are no more extensions.
API documentation for MapperExtension: class MapperExtension(object)
To use MapperExtension, make your own subclass of it and just send it off to a mapper:
m = mapper(User, users_table, extension=MyExtension())
Multiple extensions will be chained together and processed in order; they are specified as a list:
m = mapper(User, users_table, extension=[ext1, ext2, ext3])
The full list of options for the relation()
function:
A quick walkthrough of the basic relational patterns.
A one to many relationship places a foreign key in the child table referencing the parent. SQLAlchemy creates the relationship as a collection on the parent object containing instances of the child object.
parent_table = Table('parent', metadata, Column('id', Integer, primary_key=True)) child_table = Table('child', metadata, Column('id', Integer, primary_key=True), Column('parent_id', Integer, ForeignKey('parent.id'))) class Parent(object): pass class Child(object): pass mapper(Parent, parent_table, properties={ 'children':relation(Child) }) mapper(Child, child_table)
To establish a bi-directional relationship in one-to-many, where the "reverse" side is a many to one, specify the backref
option:
mapper(Parent, parent_table, properties={ 'children':relation(Child, backref='parent') }) mapper(Child, child_table)
Child
will get a parent
attribute with many-to-one semantics.
Many to one places a foreign key in the parent table referencing the child. The mapping setup is identical to one-to-many, however SQLAlchemy creates the relationship as a scalar attribute on the parent object referencing a single instance of the child object.
parent_table = Table('parent', metadata, Column('id', Integer, primary_key=True), Column('child_id', Integer, ForeignKey('child.id'))) child_table = Table('child', metadata, Column('id', Integer, primary_key=True), ) class Parent(object): pass class Child(object): pass mapper(Parent, parent_table, properties={ 'child':relation(Child) }) mapper(Child, child_table)
Backref behavior is available here as well, where backref="parents"
will place a one-to-many collection on the Child
class.
One To One is essentially a bi-directional relationship with a scalar attribute on both sides. To acheive this, the uselist=False
flag indicates the placement of a scalar attribute instead of a collection on the "many" side of the relationship. To convert one-to-many into one-to-one:
mapper(Parent, parent_table, properties={ 'child':relation(Child, uselist=False, backref='parent') })
Or to turn many-to-one into one-to-one:
mapper(Parent, parent_table, properties={ 'child':relation(Child, backref=backref('parent', uselist=False)) })
Many to Many adds an association table between two classes. The association table is indicated by the secondary
argument to relation()
.
left_table = Table('left', metadata, Column('id', Integer, primary_key=True)) right_table = Table('right', metadata, Column('id', Integer, primary_key=True)) association_table = Table('association', metadata, Column('left_id', Integer, ForeignKey('left.id')), Column('right_id', Integer, ForeignKey('right.id')), ) mapper(Parent, left_table, properties={ 'children':relation(Child, secondary=association_table) }) mapper(Child, right_table)
For a bi-directional relationship, both sides of the relation contain a collection by default, which can be modified on either side via the uselist
flag to be scalar. The backref
keyword will automatically use the same secondary
argument for the reverse relation:
mapper(Parent, left_table, properties={ 'children':relation(Child, secondary=association_table, backref='parents') })
The association object pattern is a variant on many-to-many: it specifically is used when your association table contains additional columns beyond those which are foreign keys to the left and right tables. Instead of using the secondary
argument, you map a new class directly to the association table. The left side of the relation references the association object via one-to-many, and the association class references the right side via many-to-one.
left_table = Table('left', metadata, Column('id', Integer, primary_key=True)) right_table = Table('right', metadata, Column('id', Integer, primary_key=True)) association_table = Table('association', metadata, Column('left_id', Integer, ForeignKey('left.id')), Column('right_id', Integer, ForeignKey('right.id')), Column('data', String(50)) ) mapper(Parent, left_table, properties={ 'children':relation(Association) }) mapper(Association, association_table, properties={ 'child':relation(Child) }) mapper(Child, right_table)
The bi-directional version adds backrefs to both relations:
mapper(Parent, left_table, properties={ 'children':relation(Association, backref="parent") }) mapper(Association, association_table, properties={ 'child':relation(Child, backref="parent_assocs") }) mapper(Child, right_table)
Working with the association pattern in its direct form requires that child objects are associated with an association instance before being appended to the parent; similarly, access from parent to child goes through the association object:
# create parent, append a child via association p = Parent() a = Association() a.child = Child() p.children.append(a) # iterate through child objects via association, including association # attributes for assoc in p.children: print assoc.data print assoc.child
To enhance the association object pattern such that direct access to the Association
object is optional, SQLAlchemy provides the associationproxy.
Important Note: it is strongly advised that the secondary
table argument not be combined with the Association Object pattern, unless the relation()
which contains the secondary
argument is marked viewonly=True
. Otherwise, SQLAlchemy may persist conflicting data to the underlying association table since it is represented by two conflicting mappings. The Association Proxy pattern should be favored in the case where access to the underlying association data is only sometimes needed.
The adjacency list pattern is a common relational pattern whereby a table contains a foreign key reference to itself. This is the most common and simple way to represent hierarchical data in flat tables. The other way is the "nested sets" model, sometimes called "modified preorder". Despite what many online articles say about modified preorder, the adjacency list model is probably the most appropriate pattern for the large majority of hierarchical storage needs, for reasons of concurrency, reduced complexity, and that modified preorder has little advantage over an application which can fully load subtrees into the application space.
SQLAlchemy commonly refers to an adjacency list relation as a self-referential mapper. In this example, we'll work with a single table called treenodes
to represent a tree structure:
nodes = Table('treenodes', metadata, Column('id', Integer, primary_key=True), Column('parent_id', Integer, ForeignKey('treenodes.id')), Column('data', String(50)), )
A graph such as the following:
root --+---> child1 +---> child2 --+--> subchild1 | +--> subchild2 +---> child3
Would be represented with data such as:
id parent_id data --- ------- ---- 1 NULL root 2 1 child1 3 1 child2 4 3 subchild1 5 3 subchild2 6 1 child3
SQLAlchemy's mapper()
configuration for a self-referential one-to-many relationship is exactly like a "normal" one-to-many relationship. When SQLAlchemy encounters the foreign key relation from treenodes
to treenodes
, it assumes one-to-many unless told otherwise:
# entity class class Node(object): pass mapper(Node, nodes, properties={ 'children':relation(Node) })
To create a many-to-one relationship from child to parent, an extra indicator of the "remote side" is added, which contains the Column
object or objects indicating the remote side of the relation:
mapper(Node, nodes, properties={ 'parent':relation(Node, remote_side=[nodes.c.id]) })
And the bi-directional version combines both:
mapper(Node, nodes, properties={ 'children':relation(Node, backref=backref('parent', remote_side=[nodes.c.id])) })
There are several examples included with SQLAlchemy illustrating self-referential strategies; these include basic_tree.py and optimized_al.py, the latter of which illustrates how to persist and search XML documents in conjunction with ElementTree.
Querying self-referential structures is done in the same way as any other query in SQLAlchemy, such as below, we query for any node whose data
attrbibute stores the value child2
:
# get all nodes named 'child2' sess.query(Node).filter(Node.data=='child2')
On the subject of joins, i.e. those described in Querying with Joins, self-referential structures require the usage of aliases so that the same table can be referenced multiple times within the FROM clause of the query. Aliasing can be done either manually using the nodes
Table
object as a source of aliases:
# get all nodes named 'subchild1' with a parent named 'child2' nodealias = nodes.alias() sqlsess.query(Node).filter(Node.data=='subchild1').\ filter(and_(Node.parent_id==nodealias.c.id, nodealias.c.data=='child2')).all()
or automatically, using join()
with aliased=True
:
# get all nodes named 'subchild1' with a parent named 'child2' sqlsess.query(Node).filter(Node.data=='subchild1').\ join('parent', aliased=True).filter(Node.data=='child2').all()
To add criterion to multiple points along a longer join, use from_joinpoint=True
:
# get all nodes named 'subchild1' with a parent named 'child2' and a grandparent 'root' sqlsess.query(Node).filter(Node.data=='subchild1').\ join('parent', aliased=True).filter(Node.data=='child2').\ join('parent', aliased=True, from_joinpoint=True).filter(Node.data=='root').all()
Eager loading of relations occurs using joins or outerjoins from parent to child table during a normal query operation, such that the parent and its child collection can be populated from a single SQL statement. SQLAlchemy's eager loading uses aliased tables in all cases when joining to related items, so it is compatible with self-referential joining. However, to use eager loading with a self-referential relation, SQLAlchemy needs to be told how many levels deep it should join; otherwise the eager load will not take place. This depth setting is configured via join_depth
:
mapper(Node, nodes, properties={ 'children':relation(Node, lazy=False, join_depth=2) }) sqlsession.query(Node).all()
The relation()
function uses the foreign key relationship between the parent and child tables to formulate the primary join condition between parent and child; in the case of a many-to-many relationship it also formulates the secondary join condition. If you are working with a Table
which has no ForeignKey
objects on it (which can be the case when using reflected tables with MySQL), or if the join condition cannot be expressed by a simple foreign key relationship, use the primaryjoin
and possibly secondaryjoin
conditions to create the appropriate relationship.
In this example we create a relation boston_addresses
which will only load the user addresses with a city of "Boston":
class User(object): pass class Address(object): pass mapper(Address, addresses_table) mapper(User, users_table, properties={ 'boston_addresses' : relation(Address, primaryjoin= and_(users_table.c.user_id==Address.c.user_id, Addresses.c.city=='Boston')) })
Many to many relationships can be customized by one or both of primaryjoin
and secondaryjoin
, shown below with just the default many-to-many relationship explicitly set:
class User(object): pass class Keyword(object): pass mapper(Keyword, keywords_table) mapper(User, users_table, properties={ 'keywords':relation(Keyword, secondary=userkeywords_table, primaryjoin=users_table.c.user_id==userkeywords_table.c.user_id, secondaryjoin=userkeywords_table.c.keyword_id==keywords_table.c.keyword_id ) })
Very ambitious custom join conditions may fail to be directly persistable, and in some cases may not even load correctly. To remove the persistence part of the equation, use the flag viewonly=True
on the relation()
, which establishes it as a read-only attribute (data written to the collection will be ignored on flush()). However, in extreme cases, consider using a regular Python property in conjunction with Query
as follows:
class User(object): def _get_addresses(self): return object_session(self).query(Address).with_parent(self).filter(...).all() addresses = property(_get_addresses)
Theres no restriction on how many times you can relate from parent to child. SQLAlchemy can usually figure out what you want, particularly if the join conditions are straightforward. Below we add a newyork_addresses
attribute to complement the boston_addresses
attribute:
mapper(User, users_table, properties={ 'boston_addresses' : relation(Address, primaryjoin= and_(users_table.c.user_id==Address.c.user_id, Addresses.c.city=='Boston')), 'newyork_addresses' : relation(Address, primaryjoin= and_(users_table.c.user_id==Address.c.user_id, Addresses.c.city=='New York')), })
Mapping a one-to-many or many-to-many relationship results in a collection of values accessible through an attribute on the parent instance. By default, this collection is a list
:
mapper(Parent, properties={ children = relation(Child) }) parent = Parent() parent.children.append(Child()) print parent.children[0]
Collections are not limited to lists. Sets, mutable sequences and almost any other Python object that can act as a container can be used in place of the default list.
# use a set mapper(Parent, properties={ children = relation(Child, collection_class=set) }) parent = Parent() child = Child() parent.children.add(child) assert child in parent.children
You can use your own types for collections as well. For most cases, simply inherit from list
or set
and add the custom behavior.
Collections in SQLAlchemy are transparently instrumented. Instrumentation means that normal operations on the collection are tracked and result in changes being written to the database at flush time. Additionally, collection operations can fire events which indicate some secondary operation must take place. Examples of a secondary operation include saving the child item in the parent's Session
(i.e. the save-update
cascade), as well as synchronizing the state of a bi-directional relationship (i.e. a backref
).
The collections package understands the basic interface of lists, sets and dicts and will automatically apply instrumentation to those built-in types and their subclasses. Object-derived types that implement a basic collection interface are detected and instrumented via duck-typing:
class ListLike(object): def __init__(self): self.data = [] def append(self, item): self.data.append(item) def remove(self, item): self.data.remove(item) def extend(self, items): self.data.extend(items) def __iter__(self): return iter(self.data) def foo(self): return 'foo'
append
, remove
, and extend
are known list-like methods, and will be instrumented automatically. __iter__
is not a mutator method and won't be instrumented, and foo
won't be either.
Duck-typing (i.e. guesswork) isn't rock-solid, of course, so you can be explicit about the interface you are implementing by providing an __emulates__
class attribute:
class SetLike(object): __emulates__ = set def __init__(self): self.data = set() def append(self, item): self.data.add(item) def remove(self, item): self.data.remove(item) def __iter__(self): return iter(self.data)
This class looks list-like because of append
, but __emulates__
forces it to set-like. remove
is known to be part of the set interface and will be instrumented.
But this class won't work quite yet: a little glue is needed to adapt it for use by SQLAlchemy. The ORM needs to know which methods to use to append, remove and iterate over members of the collection. When using a type like list
or set
, the appropriate methods are well-known and used automatically when present. This set-like class does not provide the expected add
method, so we must supply an explicit mapping for the ORM via a decorator.
Decorators can be used to tag the individual methods the ORM needs to manage collections. Use them when your class doesn't quite meet the regular interface for its container type, or you simply would like to use a different method to get the job done.
from sqlalchemy.orm.collections import collection class SetLike(object): __emulates__ = set def __init__(self): self.data = set() @collection.appender def append(self, item): self.data.add(item) def remove(self, item): self.data.remove(item) def __iter__(self): return iter(self.data)
And that's all that's needed to complete the example. SQLAlchemy will add instances via the append
method. remove
and __iter__
are the default methods for sets and will be used for removing and iteration. Default methods can be changed as well:
from sqlalchemy.orm.collections import collection class MyList(list): @collection.remover def zark(self, item): # do something special... @collection.iterator def hey_use_this_instead_for_iteration(self): # ...
There is no requirement to be list-, or set-like at all. Collection classes can be any shape, so long as they have the append, remove and iterate interface marked for SQLAlchemy's use. Append and remove methods will be called with a mapped entity as the single argument, and iterator methods are called with no arguments and must return an iterator.
back to section topA dict
can be used as a collection, but a keying strategy is needed to map entities loaded by the ORM to key, value pairs. The collections package provides several built-in types for dictionary-based collections:
from sqlalchemy.orm.collections import column_mapped_collection, attribute_mapped_collection, mapped_collection mapper(Item, items_table, properties={ # key by column notes = relation(Note, collection_class=column_mapped_collection(notes_table.c.keyword)) # or named attribute notes2 = relation(Note, collection_class=attribute_mapped_collection('keyword')) # or any callable notes3 = relation(Note, collection_class=mapped_collection(lambda entity: entity.a + entity.b)) }) # ... item = Item() item.notes['color'] = Note('color', 'blue') print item.notes['color']
These functions each provide a dict
subclass with decorated set
and remove
methods and the keying strategy of your choice.
The collections.MappedCollection class can be used as a base class for your custom types or as a mix-in to quickly add dict
collection support to other classes. It uses a keying function to delegate to __setitem__
and __delitem__
:
from sqlalchemy.util import OrderedDict from sqlalchemy.orm.collections import MappedCollection class NodeMap(OrderedDict, MappedCollection): """Holds 'Node' objects, keyed by the 'name' attribute with insert order maintained.""" def __init__(self, *args, **kw): MappedCollection.__init__(self, keyfunc=lambda node: node.name) OrderedDict.__init__(self, *args, **kw)
The ORM understands the dict
interface just like lists and sets, and will automatically instrument all dict-like methods if you choose to subclass dict
or provide dict-like collection behavior in a duck-typed class. You must decorate appender and remover methods, however- there are no compatible methods in the basic dictionary interface for SQLAlchemy to use by default. Iteration will go through itervalues()
unless otherwise decorated.
Many custom types and existing library classes can be used as a entity collection type as-is without further ado. However, it is important to note that the instrumentation process will modify the type, adding decorators around methods automatically.
The decorations are lightweight and no-op outside of relations, but they do add unneeded overhead when triggered elsewhere. When using a library class as a collection, it can be good practice to use the "trivial subclass" trick to restrict the decorations to just your usage in relations. For example:
class MyAwesomeList(some.great.library.AwesomeList): pass # ... relation(..., collection_class=MyAwesomeList)
The ORM uses this approach for built-ins, quietly substituting a trivial subclass when a list
, set
or dict
is used directly.
The collections package provides additional decorators and support for authoring custom types. See the package documentation for more information and discussion of advanced usage and Python 2.3-compatible decoration options.
back to section topIn the Object Relational Tutorial, we introduced the concept of Eager Loading. We used an option
in conjunction with the Query
object in order to indicate that a relation should be loaded at the same time as the parent, within a single SQL query:
sql>>> jack = session.query(User).options(eagerload('addresses')).filter_by(name='jack').all()
By default, all relations are lazy loading. The scalar or collection attribute associated with a relation()
contains a trigger which fires the first time the attribute is accessed, which issues a SQL call at that point:
sql>>> jack.addresses
[<Address(u'jack@google.com')>, <Address(u'j25@yahoo.com')>]
The default loader strategy for any relation()
is configured by the lazy
keyword argument, which defaults to True
. Below we set it as False
so that the children
relation is eager loading:
# eager load 'children' attribute mapper(Parent, parent_table, properties={ 'children':relation(Child, lazy=False) })
The loader strategy can be changed from lazy to eager as well as eager to lazy using the eagerload()
and lazyload()
query options:
# set children to load lazily session.query(Parent).options(lazyload('children')).all() # set children to load eagerly session.query(Parent).options(eagerload('children')).all()
To reference a relation that is deeper than one level, separate the names by periods:
session.query(Parent).options(eagerload('foo.bar.bat')).all()
When using dot-separated names with eagerload()
, option applies only to the actual attribute named, and not its ancestors. For example, suppose a mapping from A
to B
to C
, where the relations, named atob
and btoc
, are both lazy-loading. A statement like the following:
session.query(A).options(eagerload('atob.btoc')).all()
will load only A
objects to start. When the atob
attribute on each A
is accessed, the returned B
objects will eagerly load their C
objects.
Therefore, to modify the eager load to load both atob
as well as btoc
, place eagerloads for both:
session.query(A).options(eagerload('atob'), eagerload('atob.btoc')).all()
or more simply just use eagerload_all()
:
session.query(A).options(eagerload_all('atob.btoc')).all()
There are two other loader strategies available, dynamic loading and no loading; these are described in Working with Large Collections.
When full statement or result-set loads are used with Query
, SQLAlchemy does not affect the SQL query itself, and therefore has no way of tacking on its own LEFT [OUTER] JOIN
conditions that are normally used to eager load relationships. If the query being constructed is created in such a way that it returns rows not just from a parent table (or tables) but also returns rows from child tables, the result-set mapping can be notified as to which additional properties are contained within the result set. This is done using the contains_eager()
query option, which specifies the name of the relationship to be eagerly loaded.
# mapping is the users->addresses mapping mapper(User, users_table, properties={ 'addresses':relation(Address, addresses_table) }) # define a query on USERS with an outer join to ADDRESSES statement = users_table.outerjoin(addresses_table).select(use_labels=True) # construct a Query object which expects the "addresses" results query = session.query(User).options(contains_eager('addresses')) # get results normally r = query.from_statement(statement)
If the "eager" portion of the statement is "aliased", the alias
keyword argument to contains_eager()
may be used to indicate it. This is a string alias name or reference to an actual Alias
object:
# use an alias of the addresses table adalias = addresses_table.alias('adalias') # define a query on USERS with an outer join to adalias statement = users_table.outerjoin(adalias).select(use_labels=True) # construct a Query object which expects the "addresses" results query = session.query(User).options(contains_eager('addresses', alias=adalias)) # get results normally sqlr = query.from_statement(statement).all()
In the case that the main table itself is also aliased, the contains_alias()
option can be used:
# define an aliased UNION called 'ulist' statement = users.select(users.c.user_id==7).union(users.select(users.c.user_id>7)).alias('ulist') # add on an eager load of "addresses" statement = statement.outerjoin(addresses).select(use_labels=True) # create query, indicating "ulist" is an alias for the main table, "addresses" property should # be eager loaded query = create_session().query(User).options(contains_alias('ulist'), contains_eager('addresses')) # results r = query.from_statement(statement)
The default behavior of relation()
is to fully load the collection of items in, as according to the loading strategy of the relation. Additionally, the Session by default only knows how to delete objects which are actually present within the session. When a parent instance is marked for deletion and flushed, the Session loads its full list of child items in so that they may either be deleted as well, or have their foreign key value set to null; this is to avoid constraint violations. For large collections of child items, there are several strategies to bypass full loading of child items both at load time as well as deletion time.
The most useful by far is the dynamic_loader()
relation. This is a variant of relation()
which returns a Query
object in place of a collection when accessed. filter()
criterion may be applied as well as limits and offsets, either explicitly or via array slices:
mapper(User, users_table, properties={ 'posts':dynamic_loader(Post) }) jack = session.query(User).get(id) # filter Jack's blog posts posts = jack.posts.filter(Post.c.headline=='this is a post') # apply array slices posts = jack.posts[5:20]
The dynamic relation supports limited write operations, via the append()
and remove()
methods. Since the read side of the dynamic relation always queries the database, changes to the underlying collection will not be visible until the data has been flushed:
oldpost = jack.posts.filter(Post.c.headline=='old post').one() jack.posts.remove(oldpost) jack.posts.append(Post('new post'))
To place a dynamic relation on a backref, use lazy='dynamic'
:
mapper(Post, posts_table, properties={ 'user':relation(User, backref=backref('posts', lazy='dynamic')) })
Note that eager/lazy loading options cannot be used in conjunction dynamic relations at this time.
back to section topThe opposite of the dynamic relation is simply "noload", specified using lazy=None
:
mapper(MyClass, table, properties=relation{ 'children':relation(MyOtherClass, lazy=None) })
Above, the children
collection is fully writeable, and changes to it will be persisted to the database as well as locally available for reading at the time they are added. However when instances of MyClass
are freshly loaded from the database, the children
collection stays empty.
Use passive_deletes=True
to disable child object loading on a DELETE operation, in conjunction with "ON DELETE (CASCADE|SET NULL)" on your database to automatically cascade deletes to child objects. Note that "ON DELETE" is not supported on SQLite, and requires InnoDB
tables when using MySQL:
mytable = Table('mytable', meta, Column('id', Integer, primary_key=True), ) myothertable = Table('myothertable', meta, Column('id', Integer, primary_key=True), Column('parent_id', Integer), ForeignKeyConstraint(['parent_id'],['mytable.id'], ondelete="CASCADE"), ) mmapper(MyOtherClass, myothertable) mapper(MyClass, mytable, properties={ 'children':relation(MyOtherClass, cascade="all, delete-orphan", passive_deletes=True) })
When passive_deletes
is applied, the children
relation will not be loaded into memory when an instance of MyClass
is marked for deletion. The cascade="all, delete-orphan"
will take effect for instances of MyOtherClass
which are currently present in the session; however for instances of MyOtherClass
which are not loaded, SQLAlchemy assumes that "ON DELETE CASCADE" rules will ensure that those rows are deleted by the database and that no foreign key violation will occur.
The Mapper is the entrypoint to the configurational API of the SQLAlchemy object relational mapper. But the primary object one works with when using the ORM is the Session.
In the most general sense, the Session
establishes all conversations with the database and represents a "holding zone" for all the mapped instances which you've loaded or created during its lifespan. It implements the Unit of Work pattern, which means it keeps track of all changes which occur, and is capable of flushing those changes to the database as appropriate. Another important facet of the Session
is that it's also maintaining unique copies of each instance, where "unique" means "only one object with a particular primary key" - this pattern is called the Identity Map.
Beyond that, the Session
implements an interface which let's you move objects in or out of the session in a variety of ways, it provides the entryway to a Query
object which is used to query the database for data, it is commonly used to provide transactional boundaries (though this is optional), and it also can serve as a configurational "home base" for one or more Engine
objects, which allows various vertical and horizontal partitioning strategies to be achieved.
The Session
object exists just as a regular Python object, which can be directly instantiated. However, it takes a fair amount of keyword options, several of which you probably want to set explicitly. It's fairly inconvenient to deal with the "configuration" of a session every time you want to create one. Therefore, SQLAlchemy recommends the usage of a helper function called sessionmaker()
, which typically you call only once for the lifespan of an application. This function creates a customized Session
subclass for you, with your desired configurational arguments pre-loaded. Then, whenever you need a new Session
, you use your custom Session
class with no arguments to create the session.
The usage of sessionmaker()
is illustrated below:
from sqlalchemy.orm import sessionmaker # create a configured "Session" class Session = sessionmaker(autoflush=True, transactional=True) # create a Session sess = Session() # work with sess sess.save(x) sess.commit() # close when finished sess.close()
Above, the sessionmaker
call creates a class for us, which we assign to the name Session
. This class is a subclass of the actual sqlalchemy.orm.session.Session
class, which will instantiate with the arguments of autoflush=True
and transactional=True
.
When you write your application, place the call to sessionmaker()
somewhere global, and then make your new Session
class available to the rest of your application.
In our previous example regarding sessionmaker()
, nowhere did we specify how our session would connect to our database. When the session is configured in this manner, it will look for a database engine to connect with via the Table
objects that it works with - the chapter called Binding MetaData to an Engine or Connection describes how to associate Table
objects directly with a source of database connections.
However, it is often more straightforward to explicitly tell the session what database engine (or engines) you'd like it to communicate with. This is particularly handy with multiple-database scenarios where the session can be used as the central point of configuration. To acheive this, the constructor keyword bind
is used for a basic single-database configuration:
# create engine engine = create_engine('postgres://...') # bind custom Session class to the engine Session = sessionmaker(bind=engine, autoflush=True, transactional=True) # work with the session sess = Session()
One common issue with the above scenario is that an application will often organize its global imports before it ever connects to a database. Since the Session
class created by sessionmaker()
is meant to be a global application object (note we are saying the session class, not a session instance), we may not have a bind
argument available. For this, the Session
class returned by sessionmaker()
supports post-configuration of all options, through its method configure()
:
# configure Session class with desired options Session = sessionmaker(autoflush=True, transactional=True) # later, we create the engine engine = create_engine('postgres://...') # associate it with our custom Session class Session.configure(bind=engine) # work with the session sess = Session()
The Session
also has the ability to be bound to multiple engines. Descriptions of these scenarios are described in Partitioning Strategies.
The examples involving bind
so far are dealing with the Engine
object, which is, like the Session
class itself, a global configurational object. The Session
can also be bound to an individual database Connection
. The reason you might want to do this is if your application controls the boundaries of transactions using distinct Transaction
objects (these objects are described in Using Transactions with Connection). You'd have a transactional Connection
, and then you'd want to work with an ORM-level Session
which participates in that transaction. Since Connection
is definitely not a globally-scoped object in all but the most rudimental commandline applications, you can bind an individual Session()
instance to a particular Connection
not at class configuration time, but at session instance construction time:
# global application scope. create Session class, engine Session = sessionmaker(autoflush=True, transactional=True) engine = create_engine('postgres://...') ... # local scope, such as within a controller function # connect to the database connection = engine.connect() # bind an individual Session to the connection sess = Session(bind=connection)
As an alternative to sessionmaker()
, create_session()
exists literally as a function which calls the normal Session
constructor directly. All arguments are passed through and the new Session
object is returned:
session = create_session(bind=myengine)
The create_session()
function doesn't add any functionality to the regular Session
, it just sets up a default argument set of autoflush=False, transactional=False
. But also, by calling create_session()
instead of instantiating Session
directly, you leave room in your application to change the type of session which the function creates. For example, an application which is calling create_session()
in many places, which is typical for a pre-0.4 application, can be changed to use a sessionmaker()
by just assigning the return of sessionmaker()
to the create_session
name:
# change from: from sqlalchemy.orm import create_session # to: create_session = sessionmaker()
A typical session conversation starts with creating a new session, or acquiring one from an ongoing context. You save new objects and load existing ones, make changes, mark some as deleted, and then persist your changes to the database. If your session is transactional, you use commit()
to persist any remaining changes and to commit the transaction. If not, you call flush()
which will flush any remaining data to the database.
Below, we open a new Session
using a configured sessionmaker()
, make some changes, and commit:
# configured Session class Session = sessionmaker(autoflush=True, transactional=True) sess = Session() d = Data(value=10) sess.save(d) d2 = sess.query(Data).filter(Data.value=15).one() d2.value = 19 sess.commit()
It's helpful to know the states which an instance can have within a session:
Transient - an instance that's not in a session, and is not saved to the database; i.e. it has no database identity. The only relationship such an object has to the ORM is that its class has a mapper()
associated with it.
Pending - when you save()
a transient instance, it becomes pending. It still wasn't actually flushed to the database yet, but it will be when the next flush occurs.
Persistent - An instance which is present in the session and has a record in the database. You get persistent instances by either flushing so that the pending instances become persistent, or by querying the database for existing instances (or moving persistent instances from other sessions into your local session).
Detached - an instance which has a record in the database, but is not in any session. Theres nothing wrong with this, and you can use objects normally when they're detached, except they will not be able to issue any SQL in order to load collections or attributes which are not yet loaded, or were marked as "expired".
Knowing these states is important, since the Session
tries to be strict about ambiguous operations (such as trying to save the same object to two different sessions at the same time).
When do I make a sessionmaker
?
Just one time, somewhere in your application's global scope. It should be looked upon as part of your application's configuration. If your application has three .py files in a package, you could, for example, place the sessionmaker
line in your __init__.py
file; from that point on your other modules say "from mypackage import Session". That way, everyone else just uses Session()
, and the configuration of that session is controlled by that central point.
If your application starts up, does imports, but does not know what database it's going to be connecting to, you can bind the Session
at the "class" level to the engine later on, using configure()
.
In the examples in this section, we will frequently show the sessionmaker
being created right above the line where we actually invoke Session()
. But that's just for example's sake ! In reality, the sessionmaker
would be somewhere at the module level, and your individual Session()
calls would be sprinkled all throughout your app, such as in a web application within each controller method.
When do I make a Session
?
You typically invoke Session()
when you first need to talk to your database, and want to save some objects or load some existing ones. Then, you work with it, save your changes, and then dispose of it....or at the very least close()
it. It's not a "global" kind of object, and should be handled more like a "local variable", as it's generally not safe to use with concurrent threads. Sessions are very inexpensive to make, and don't use any resources whatsoever until they are first used...so create some !
There is also a pattern whereby you're using a contextual session, this is described later in Contextual/Thread-local Sessions. In this pattern, a helper object is maintaining a Session
for you, most commonly one that is local to the current thread (and sometimes also local to an application instance). SQLAlchemy 0.4 has worked this pattern out such that it still looks like you're creating a new session as you need one...so in that case, it's still a guaranteed win to just say Session()
whenever you want a session.
Is the Session a cache ?
Yeee...no. It's somewhat used as a cache, in that it implements the identity map pattern, and stores objects keyed to their primary key. However, it doesn't do any kind of query caching. This means, if you say session.query(Foo).filter_by(name='bar')
, even if Foo(name='bar')
is right there, in the identity map, the session has no idea about that. It has to issue SQL to the database, get the rows back, and then when it sees the primary key in the row, then it can look in the local identity map and see that the object is already there. It's only when you say query.get({some primary key})
that the Session
doesn't have to issue a query.
Additionally, the Session stores object instances using a weak reference by default. This also defeats the purpose of using the Session as a cache, unless the weak_identity_map
flag is set to False
.
The Session
is not designed to be a global object from which everyone consults as a "registry" of objects. That is the job of a second level cache. A good library for implementing second level caching is Memcached. It is possible to "sort of" use the Session
in this manner, if you set it to be non-transactional and it never flushes any SQL, but it's not a terrific solution, since if concurrent threads load the same objects at the same time, you may have multiple copies of the same objects present in collections.
How can I get the Session
for a certain object ?
Use the object_session()
classmethod available on Session
:
session = Session.object_session(someobject)
Is the session threadsafe ?
Nope. It has no thread synchronization of any kind built in, and particularly when you do a flush operation, it definitely is not open to concurrent threads accessing it, because it holds onto a single database connection at that point. If you use a session which is non-transactional for read operations only, it's still not thread-"safe", but you also wont get any catastrophic failures either, since it opens and closes connections on an as-needed basis; its just that different threads might load the same objects independently of each other, but only one will wind up in the identity map (however, the other one might still live in a collection somewhere).
But the bigger point here is, you should not want to use the session with multiple concurrent threads. That would be like having everyone at a restaurant all eat from the same plate. The session is a local "workspace" that you use for a specific set of tasks; you don't want to, or need to, share that session with other threads who are doing some other task. If, on the other hand, there are other threads participating in the same task you are, such as in a desktop graphical application, then you would be sharing the session with those threads, but you also will have implemented a proper locking scheme (or your graphical framework does) so that those threads do not collide.
The session provides a set of attributes and collection-oriented methods which allow you to view the current state of the session.
The identity map is accessed by the identity_map
attribute, which provides a dictionary interface. The keys are "identity keys", which are attached to all persistent objects by the attribute _instance_key
:
>>> myobject._instance_key (<class 'test.tables.User'>, (7,)) >>> myobject._instance_key in session.identity_map True >>> session.identity_map.values() [<__main__.User object at 0x712630>, <__main__.Address object at 0x712a70>]
The identity map is a weak-referencing dictionary by default. This means that objects which are dereferenced on the outside will be removed from the session automatically. Note that objects which are marked as "dirty" will not fall out of scope until after changes on them have been flushed; special logic kicks in at the point of auto-removal which ensures that no pending changes remain on the object, else a temporary strong reference is created to the object.
Some people prefer objects to stay in the session until explicitly removed in all cases; for this, you can specify the flag weak_identity_map=False
to the create_session
or sessionmaker
functions so that the Session
will use a regular dictionary.
While the identity_map
accessor is currently the actual dictionary used by the Session
to store instances, you should not add or remove items from this dictionary. Use the session methods save_or_update()
and expunge()
to add or remove items.
The Session also supports an iterator interface in order to see all objects in the identity map:
for obj in session: print obj
As well as __contains__()
:
if obj in session: print "Object is present"
The session is also keeping track of all newly created (i.e. pending) objects, all objects which have had changes since they were last loaded or saved (i.e. "dirty"), and everything that's been marked as deleted.
# pending objects recently added to the Session session.new # persistent objects which currently have changes detected # (this collection is now created on the fly each time the property is called) session.dirty # persistent objects that have been marked as deleted via session.delete(obj) session.deleted
Note that if a session is created with the weak_identity_map
flag, an item which is marked as "dirty" will be silently removed from the session if the item falls out of scope in the user application. This is because the unit of work does not look for "dirty" changes except for within a flush operation (or any time the session.dirty collection is accessed).
However, the new
and deleted
collections are strong-referencing sets. Even if you're using weak_identity_map
, if you abandon all references to new or deleted objects within a session, they are still present in either of those two lists, and will be saved on the next flush operation, unless they are removed from the Session explicitly or the session is cleared.
The query()
function takes one or more classes and/or mappers, along with an optional entity_name
parameter, and returns a new Query
object which will issue mapper queries within the context of this Session. For each mapper is passed, the Query uses that mapper. For each class, the Query will locate the primary mapper for the class using class_mapper()
.
# query from a class session.query(User).filter_by(name='ed').all() # query with multiple classes, returns tuples session.query(User).add_entity(Address).join('addresses').filter_by(name='ed').all() # query from a mapper query = session.query(usermapper) x = query.get(1) # query from a class mapped with entity name 'alt_users' q = session.query(User, entity_name='alt_users') y = q.options(eagerload('orders')).all()
entity_name
is an optional keyword argument sent with a class object, in order to further qualify which primary mapper to be used; this only applies if there was a Mapper
created with that particular class/entity name combination, else an exception is raised. All of the methods on Session which take a class or mapper argument also take the entity_name
argument, so that a given class can be properly matched to the desired primary mapper.
All instances retrieved by the returned Query
object will be stored as persistent instances within the originating Session
.
save()
is called with a single transient instance as an argument, which is then added to the Session and becomes pending. When the session is next flushed, the instance will be saved to the database. If the given instance is not transient, meaning it is either attached to an existing Session or it has a database identity, an exception is raised.
user1 = User(name='user1') user2 = User(name='user2') session.save(user1) session.save(user2) session.commit() # write changes to the database
There's also other ways to have objects saved to the session automatically; one is by using cascade rules, and the other is by using a contextual session. Both of these are described later.
back to section topThe update()
method is used when you have a detached instance, and you want to put it back into a Session
. Recall that "detached" means the object has a database identity.
Since update()
is a little picky that way, most people use save_or_update()
, which checks for an _instance_key
attribute, and based on whether it's there or not, calls either save()
or update()
:
# load user1 using session 1 user1 = sess1.query(User).get(5) # remove it from session 1 sess1.expunge(user1) # move it into session 2 sess2.save_or_update(user1)
update()
is also an operation that can happen automatically using cascade rules, just like save()
.
merge()
on the other hand is a little like update()
, except it creates a copy of the given instance in the session, and returns to you that instance; the instance you send it never goes into the session. merge()
is much fancier than update()
; it will actually look to see if an object with the same primary key is already present in the session, and if not will load it by primary key. Then, it will merge the attributes of the given object into the one which it just located.
This method is useful for bringing in objects which may have been restored from a serialization, such as those stored in an HTTP session, where the object may be present in the session already:
# deserialize an object myobj = pickle.loads(mystring) # "merge" it. if the session already had this object in the # identity map, then you get back the one from the current session. myobj = session.merge(myobj)
The delete
method places an instance into the Session's list of objects to be marked as deleted:
# mark two objects to be deleted session.delete(obj1) session.delete(obj2) # commit (or flush) session.commit()
The big gotcha with delete()
is that nothing is removed from collections. Such as, if a User
has a collection of three Addresses
, deleting an Address
will not remove it from user.addresses
:
>>> address = user.addresses[1] >>> session.delete(address) >>> session.flush() >>> address in user.addresses True
The solution is to use proper cascading:
mapper(User, users_table, properties={ 'addresses':relation(Address, cascade="all, delete") }) del user.addresses[1] session.flush()
This is the main gateway to what the Session
does best, which is save everything ! It should be clear by now what a flush looks like:
session.flush()
It also can be called with a list of objects; in this form, the flush operation will be limited only to the objects specified in the list:
# saves only user1 and address2. all other modified # objects remain present in the session. session.flush([user1, address2])
This second form of flush should be used carefully as it will not necessarily locate other dependent objects within the session, whose database representation may have foreign constraint relationships with the objects being operated upon.
Theres also a way to have flush()
called automatically before each query; this is called "autoflush" and is described below.
Note that flush does not change the state of any collections or entity relationships in memory; for example, if you set a foreign key attribute b_id
on object A
with the the identifier B.id
, the change will be flushed to the database, but A
will not have B
added to its collection. If you want to manipulate foreign key attributes directly, refresh()
or expire()
the objects whose state needs to be refreshed subsequent to flushing.
A session can be configured to issue flush()
calls before each query. This allows you to immediately have DB access to whatever has been saved to the session. It's recommended to use autoflush with transactional=True
, that way an unexpected flush call won't permanently save to the database:
Session = sessionmaker(autoflush=True, transactional=True) sess = Session() u1 = User(name='jack') sess.save(u1) # reload user1 u2 = sess.query(User).filter_by(name='jack').one() assert u2 is u1 # commit session, flushes whatever is remaining sess.commit()
Autoflush is particularly handy when using "dynamic" mapper relations, so that changes to the underlying collection are immediately available via its query interface.
back to section topExpunge removes an object from the Session, sending persistent instances to the detached state, and pending instances to the transient state:
session.expunge(obj1)
Use expunge
when youd like to remove an object altogether from memory, such as before calling del
on it, which will prevent any "ghost" operations occuring when the session is flushed.
This clear()
method is equivalent to expunge()
-ing everything from the Session:
session.clear()
However note that the clear()
method does not reset any transactional state or connection resources; therefore what you usually want to call instead of clear()
is close()
.
The close()
method issues a clear()
, and releases any transactional/connection resources. When connections are returned to the connection pool, whatever transactional state exists is rolled back.
When close()
is called, the Session
is in the same state as when it was first created, and is safe to be used again. close()
is especially important when using a contextual session, which remains in memory after usage. By issuing close()
, the session will be clean for the next request that makes use of it.
To assist with the Session's "sticky" behavior of instances which are present, individual objects can have all of their attributes immediately re-loaded from the database, or marked as "expired" which will cause a re-load to occur upon the next access of any of the object's mapped attributes. This includes all relationships, so lazy-loaders will be re-initialized, eager relationships will be repopulated. Any changes marked on the object are discarded:
# immediately re-load attributes on obj1, obj2 session.refresh(obj1) session.refresh(obj2) # expire objects obj1, obj2, attributes will be reloaded # on the next access: session.expire(obj1) session.expire(obj2)
Mappers support the concept of configurable cascade behavior on relation()
s. This behavior controls how the Session should treat the instances that have a parent-child relationship with another instance that is operated upon by the Session. Cascade is indicated as a comma-separated list of string keywords, with the possible values all
, delete
, save-update
, refresh-expire
, merge
, expunge
, and delete-orphan
.
Cascading is configured by setting the cascade
keyword argument on a relation()
:
mapper(Order, order_table, properties={ 'items' : relation(Item, items_table, cascade="all, delete-orphan"), 'customer' : relation(User, users_table, user_orders_table, cascade="save-update"), })
The above mapper specifies two relations, items
and customer
. The items
relationship specifies "all, delete-orphan" as its cascade
value, indicating that all save
, update
, merge
, expunge
, refresh
delete
and expire
operations performed on a parent Order
instance should also be performed on the child Item
instances attached to it (save
and update
are cascaded using the save_or_update()
method, so that the database identity of the instance doesn't matter). The delete-orphan
cascade value additionally indicates that if an Item
instance is no longer associated with an Order
, it should also be deleted. The "all, delete-orphan" cascade argument allows a so-called lifecycle relationship between an Order
and an Item
object.
The customer
relationship specifies only the "save-update" cascade value, indicating most operations will not be cascaded from a parent Order
instance to a child User
instance, except for if the Order
is attached with a particular session, either via the save()
, update()
, or save-update()
method.
Additionally, when a child item is attached to a parent item that specifies the "save-update" cascade value on the relationship, the child is automatically passed to save_or_update()
(and the operation is further cascaded to the child item).
Note that cascading doesn't do anything that isn't possible by manually calling Session methods on individual instances within a hierarchy, it merely automates common operations on a group of associated instances.
The default value for cascade
on relation()
s is save-update, merge
.
The Session can manage transactions automatically, including across multiple engines. When the Session is in a transaction, as it receives requests to execute SQL statements, it adds each indivdual Connection/Engine encountered to its transactional state. At commit time, all unflushed data is flushed, and each individual transaction is committed. If the underlying databases support two-phase semantics, this may be used by the Session as well if two-phase transactions are enabled.
The easiest way to use a Session with transactions is just to declare it as transactional. The session will remain in a transaction at all times:
# transactional session Session = sessionmaker(transactional=True) sess = Session() item1 = sess.query(Item).get(1) item2 = sess.query(Item).get(2) item1.foo = 'bar' item2.bar = 'foo' # commit- will immediately go into a new transaction afterwards sess.commit()
Alternatively, a transaction can be begun explicitly using begin()
:
# non transactional session Session = sessionmaker(transactional=False) sess = Session() sess.begin() try: item1 = sess.query(Item).get(1) item2 = sess.query(Item).get(2) item1.foo = 'bar' item2.bar = 'foo' except: sess.rollback() raise sess.commit()
Session also supports Python 2.5's with statement so that the example above can be written as:
Session = sessionmaker(transactional=False) sess = Session() with sess.begin(): item1 = sess.query(Item).get(1) item2 = sess.query(Item).get(2) item1.foo = 'bar' item2.bar = 'foo'
For MySQL and Postgres (and soon Oracle), "nested" transactions can be accomplished which use SAVEPOINT behavior, via the begin_nested()
method:
Session = sessionmaker(transactional=False) sess = Session() sess.begin() sess.save(u1) sess.save(u2) sess.flush() sess.begin_nested() # establish a savepoint sess.save(u3) sess.rollback() # rolls back u3, keeps u1 and u2 sess.commit() # commits u1 and u2
Finally, for MySQL, Postgres, and soon Oracle as well, the session can be instructed to use two-phase commit semantics using the flag twophase=True
, which coordinates transactions across multiple databases:
engine1 = create_engine('postgres://db1') engine2 = create_engine('postgres://db2') Session = sessionmaker(twophase=True, transactional=True) # bind User operations to engine 1, Account operations to engine 2 Session.configure(binds={User:engine1, Account:engine2}) sess = Session() # .... work with accounts and users # commit. session will issue a flush to all DBs, and a prepare step to all DBs, # before committing both transactions sess.commit()
This feature allows the value of a database column to be set to a SQL expression instead of a literal value. It's especially useful for atomic updates, calling stored procedures, etc. All you do is assign an expression to an attribute:
class SomeClass(object): pass mapper(SomeClass, some_table) someobject = session.query(SomeClass).get(5) # set 'value' attribute to a SQL expression adding one someobject.value = some_table.c.value + 1 # issues "UPDATE some_table SET value=value+1" session.commit()
This works both for INSERT and UPDATE statements. After the flush/commit operation, the value
attribute on someobject
gets "deferred", so that when you again access it the newly generated value will be loaded from the database. This is the same mechanism at work when database-side column defaults fire off.
SQL constructs and string statements can be executed via the Session
. You'd want to do this normally when your Session
is transactional and youd like your free-standing SQL statements to participate in the same transaction.
The two ways to do this are to use the connection/execution services of the Session, or to have your Session participate in a regular SQL transaction.
First, a Session thats associated with an Engine or Connection can execute statements immediately (whether or not its transactional):
Session = sessionmaker(bind=engine, transactional=True) sess = Session() result = sess.execute("select * from table where id=:id", {'id':7}) result2 = sess.execute(select([mytable], mytable.c.id==7))
To get at the current connection used by the session, which will be part of the current transaction if one is in progress, use connection()
:
connection = sess.connection()
A second scenario is that of a Session which is not directly bound to a connectable. This session executes statements relative to a particular Mapper
, since the mappers are bound to tables which are in turn bound to connectables via their MetaData
(either the session or the mapped tables need to be bound). In this case, the Session can conceivably be associated with multiple databases through different mappers; so it wants you to send along a mapper
argument, which can be any mapped class or mapper instance:
# session is *not* bound to an engine or connection Session = sessionmaker(transactional=True) sess = Session() # need to specify mapper or class when executing result = sess.execute("select * from table where id=:id", {'id':7}, mapper=MyMappedClass) result2 = sess.execute(select([mytable], mytable.c.id==7), mapper=MyMappedClass) # need to specify mapper or class when you call connection() connection = sess.connection(MyMappedClass)
The third scenario is when you are using Connection
and Transaction
yourself, and want the Session
to participate. This is easy, as you just bind the Session
to the connection:
# non-transactional session Session = sessionmaker(transactional=False) # non-ORM connection + transaction conn = engine.connect() trans = conn.begin() # bind the Session *instance* to the connection sess = Session(bind=conn) # ... etc trans.commit()
It's safe to use a Session
which is transactional or autoflushing, as well as to call begin()
/commit()
on the session too; the outermost Transaction object, the one we declared explicitly, controls the scope of the transaction.
When using the threadlocal
engine context, things are that much easier; the Session
uses the same connection/transaction as everyone else in the current thread, whether or not you explicitly bind it:
engine = create_engine('postgres://mydb', strategy="threadlocal") engine.begin() sess = Session() # session takes place in the transaction like everyone else # ... go nuts engine.commit() # commit the transaction
A common need in applications, particularly those built around web frameworks, is the ability to "share" a Session
object among disparate parts of an application, without needing to pass the object explicitly to all method and function calls. What you're really looking for is some kind of "global" session object, or at least "global" to all the parts of an application which are tasked with servicing the current request. For this pattern, SQLAlchemy provides the ability to enhance the Session
class generated by sessionmaker()
to provide auto-contextualizing support. This means that whenever you create a Session
instance with its constructor, you get an existing Session
object which is bound to some "context". By default, this context is the current thread. This feature is what previously was accomplished using the sessioncontext
SQLAlchemy extension.
The scoped_session()
function wraps around the sessionmaker()
function, and produces an object which behaves the same as the Session
subclass returned by sessionmaker()
:
from sqlalchemy.orm import scoped_session, sessionmaker Session = scoped_session(sessionmaker(autoflush=True, transactional=True))
However, when you instantiate this Session
"class", in reality the object is pulled from a threadlocal variable, or if it doesn't exist yet, it's created using the underlying class generated by sessionmaker()
:
>>> # call Session() the first time. the new Session instance is created. >>> sess = Session() >>> # later, in the same application thread, someone else calls Session() >>> sess2 = Session() >>> # the two Session objects are *the same* object >>> sess is sess2 True
Since the Session()
constructor now returns the same Session
object every time within the current thread, the object returned by scoped_session()
also implements most of the Session
methods and properties at the "class" level, such that you don't even need to instantiate Session()
:
# create some objects u1 = User() u2 = User() # save to the contextual session, without instantiating Session.save(u1) Session.save(u2) # view the "new" attribute assert u1 in Session.new # flush changes (if not using autoflush) Session.flush() # commit transaction (if using a transactional session) Session.commit()
To "dispose" of the Session
, theres two general approaches. One is to close out the current session, but to leave it assigned to the current context. This allows the same object to be re-used on another operation. This may be called from a current, instantiated Session
:
sess.close()
Or, when using scoped_session()
, the close()
method may also be called as a classmethod on the Session
"class":
Session.close()
When the Session
is closed, it remains attached, but clears all of its contents and releases any ongoing transactional resources, including rolling back any remaining transactional state. The Session
can then be used again.
The other method is to remove the current session from the current context altogether. This is accomplished using the classmethod remove()
:
Session.remove()
After remove()
is called, the next call to Session()
will create a new Session
object which then becomes the contextual session.
That, in a nutshell, is all there really is to it. Now for all the extra things one should know.
back to section topA (really, really) common question is when does the contextual session get created, when does it get disposed ? We'll consider a typical lifespan as used in a web application:
Web Server Web Framework User-defined Controller Call -------------- -------------- ------------------------------ web request -> call controller -> # call Session(). this establishes a new, # contextual Session. sess = Session() # load some objects, save some changes objects = sess.query(MyClass).all() # some other code calls Session, its the # same contextual session as "sess" sess2 = Session() sess2.save(foo) sess2.commit() # generate content to be returned return generate_content() Session.remove() <- web response <-
Above, we illustrate a typical organization of duties, where the "Web Framework" layer has some integration built-in to manage the span of ORM sessions. Upon the initial handling of an incoming web request, the framework passes control to a controller. The controller then calls Session()
when it wishes to work with the ORM; this method establishes the contextual Session which will remain until it's removed. Disparate parts of the controller code may all call Session()
and will get the same session object. Then, when the controller has completed and the reponse is to be sent to the web server, the framework closes out the current contextual session, above using the remove()
method which removes the session from the context altogether.
As an alternative, the "finalization" step can also call Session.close()
, which will leave the same session object in place. Which one is better ? For a web framework which runs from a fixed pool of threads, it doesn't matter much. For a framework which runs a variable number of threads, or which creates and disposes of a thread for each request, remove()
is better, since it leaves no resources associated with the thread which might not exist.
Why close out the session at all ? Why not just leave it going so the next request doesn't have to do as many queries ?
There are some cases where you may actually want to do this. However, this is a special case where you are dealing with data which does not change very often, or you don't care about the "freshness" of the data. In reality, a single thread of a web server may, on a slow day, sit around for many minutes or even hours without being accessed. When it's next accessed, if data from the previous request still exists in the session, that data may be very stale indeed. So its generally better to have an empty session at the start of a web request.
Another luxury we gain, when we've established a Session()
that can be globally accessed, is the ability for mapped classes and objects to provide us with session-oriented functionality automatically. When using the scoped_session()
function, we access this feature using the mapper
attribute on the object in place of the normal sqlalchemy.orm.mapper
function:
# "contextual" mapper function mapper = Session.mapper # use normally mapper(User, users_table, properties={ relation(Address) }) mapper(Address, addresses_table)
When we use the contextual mapper()
function, our User
and Address
now gain a new attribute query
, which will create a Query
object for us against the contextual session:
wendy = User.query.filter_by(name='wendy').one()
Additionally, new instances are saved into the contextual session automatically upon construction; there is no longer a need to call save()
:
>>> newuser = User(name='ed') >>> assert newuser in Session.new True
This functionality is an updated version of what used to be accomplished by the assignmapper()
SQLAlchemy extension.
Generated docstrings for scoped_session()
back to section topthis section is TODO
Vertical partitioning places different kinds of objects, or different tables, across multiple databases.
engine1 = create_engine('postgres://db1') engine2 = create_engine('postgres://db2') Session = sessionmaker(twophase=True, transactional=True) # bind User operations to engine 1, Account operations to engine 2 Session.configure(binds={User:engine1, Account:engine2}) sess = Session()
Horizontal partitioning partitions the rows of a single table (or a set of tables) across multiple databases.
See the "sharding" example in attribute_shard.py
back to section topExtending the session can be achieved through subclassing as well as through a simple extension class, which resembles the style of Extending Mapper called SessionExtension. See the docstrings for more information on this class' methods.
Basic usage is similar to MapperExtension
:
class MySessionExtension(SessionExtension): def before_commit(self, session): print "before commit!" Session = sessionmaker(extension=MySessionExtension())
or with create_session()
:
sess = create_session(extension=MySessionExtension())
The same SessionExtension
instance can be used with any number of sessions.
The Engine is the starting point for any SQLAlchemy application. It's "home base" for the actual database and its DBAPI, delivered to the SQLAlchemy application through a connection pool and a Dialect, which describes how to talk to a specific kind of database and DBAPI combination.
The general structure is this:
+-----------+ __________ /---| Pool |---\ (__________) +-------------+ / +-----------+ \ +--------+ | | connect() <--| Engine |---x x----| DBAPI |---| database | +-------------+ \ +-----------+ / +--------+ | | \---| Dialect |---/ |__________| +-----------+ (__________)
Where above, a sqlalchemy.engine.Engine references both a sqlalchemy.engine.Dialect and sqlalchemy.pool.Pool, which together interpret the DBAPI's module functions as well as the behavior of the database.
Creating an engine is just a matter of issuing a single call, create_engine()
:
engine = create_engine('postgres://scott:tiger@localhost:5432/mydatabase')
The above engine invokes the postgres
dialect and a connection pool which references localhost:5432
.
The engine can be used directly to issue SQL to the database. The most generic way is to use connections, which you get via the connect()
method:
connection = engine.connect() result = connection.execute("select username from users") for row in result: print "username:", row['username'] connection.close()
The connection is an instance of sqlalchemy.engine.Connection, which is a proxy object for an actual DBAPI connection. The returned result is an instance of sqlalchemy.engine.ResultProxy, which acts very much like a DBAPI cursor.
When you say engine.connect()
, a new Connection
object is created, and a DBAPI connection is retrieved from the connection pool. Later, when you call connection.close()
, the DBAPI connection is returned to the pool; nothing is actually "closed" from the perspective of the database.
To execute some SQL more quickly, you can skip the Connection
part and just say:
result = engine.execute("select username from users") for row in result: print "username:", row['username'] result.close()
Where above, the execute()
method on the Engine
does the connect()
part for you, and returns the ResultProxy
directly. The actual Connection
is inside the ResultProxy
, waiting for you to finish reading the result. In this case, when you close()
the ResultProxy
, the underlying Connection
is closed, which returns the DBAPI connection to the pool.
To summarize the above two examples, when you use a Connection
object, its known as explicit execution. When you don't see the Connection
object, but you still use the execute()
method on the Engine
, its called explicit, connectionless execution. A third variant of execution also exists called implicit execution; this will be described later.
The Engine
and Connection
can do a lot more than what we illustrated above; SQL strings are only its most rudimental function. Later chapters will describe how "constructed SQL" expressions can be used with engines; in many cases, you don't have to deal with the Engine
at all after it's created. The Object Relational Mapper (ORM), an optional feature of SQLAlchemy, also uses the Engine
in order to get at connections; that's also a case where you can often create the engine once, and then forget about it.
Recall that the Dialect
is used to describe how to talk to a specific kind of database. Dialects are included with SQLAlchemy for SQLite, Postgres, MySQL, MS-SQL, Firebird, Informix, and Oracle; these can each be seen as a Python module present in the sqlalchemy.databases
package. Each dialect requires the appropriate DBAPI drivers to be installed separately.
Downloads for each DBAPI at the time of this writing are as follows:
The SQLAlchemy Wiki contains a page of database notes, describing whatever quirks and behaviors have been observed. Its a good place to check for issues with specific databases. Database Notes
back to section topSQLAlchemy indicates the source of an Engine strictly via RFC-1738 style URLs, combined with optional keyword arguments to specify options for the Engine. The form of the URL is:
driver://username:password@host:port/database
Available drivernames are sqlite
, mysql
, postgres
, oracle
, mssql
, and firebird
. For sqlite, the database name is the filename to connect to, or the special name ":memory:" which indicates an in-memory database. The URL is typically sent as a string to the create_engine()
function:
# postgres pg_db = create_engine('postgres://scott:tiger@localhost:5432/mydatabase') # sqlite (note the four slashes for an absolute path) sqlite_db = create_engine('sqlite:////absolute/path/to/database.txt') sqlite_db = create_engine('sqlite:///relative/path/to/database.txt') sqlite_db = create_engine('sqlite://') # in-memory database sqlite_db = create_engine('sqlite://:memory:') # the same # mysql mysql_db = create_engine('mysql://localhost/foo') # oracle via TNS name oracle_db = create_engine('oracle://scott:tiger@dsn') # oracle will feed host/port/SID into cx_oracle.makedsn oracle_db = create_engine('oracle://scott:tiger@127.0.0.1:1521/sidname')
The Engine
will ask the connection pool for a connection when the connect()
or execute()
methods are called. The default connection pool, QueuePool
, as well as the default connection pool used with SQLite, SingletonThreadPool
, will open connections to the database on an as-needed basis. As concurrent statements are executed, QueuePool
will grow its pool of connections to a default size of five, and will allow a default "overflow" of ten. Since the Engine
is essentially "home base" for the connection pool, it follows that you should keep a single Engine
per database established within an application, rather than creating a new one for each connection.
Custom arguments used when issuing the connect()
call to the underlying DBAPI may be issued in three distinct ways. String-based arguments can be passed directly from the URL string as query arguments:
db = create_engine('postgres://scott:tiger@localhost/test?argument1=foo&argument2=bar')
If SQLAlchemy's database connector is aware of a particular query argument, it may convert its type from string to its proper type.
create_engine
also takes an argument connect_args
which is an additional dictionary that will be passed to connect()
. This can be used when arguments of a type other than string are required, and SQLAlchemy's database connector has no type conversion logic present for that parameter:
db = create_engine('postgres://scott:tiger@localhost/test', connect_args = {'argument1':17, 'argument2':'bar'})
The most customizable connection method of all is to pass a creator
argument, which specifies a callable that returns a DBAPI connection:
def connect(): return psycopg.connect(user='scott', host='localhost') db = create_engine('postgres://', creator=connect)
Keyword options can also be specified to create_engine()
, following the string URL as follows:
db = create_engine('postgres://...', encoding='latin1', echo=True)
A list of all standard options, as well as several that are used by particular database dialects, is as follows:
connect()
method as additional keyword arguments.
Unicode
column type instead, described in The Types System.
echo
attribute of Engine
can be modified at any time to turn logging on and off. If set to the string "debug"
, result rows will be printed to the standard output as well. This flag ultimately controls a Python logger; see Configuring Logging at the end of this chapter for information on how to configure logging directly.
Unicode
type object.
sqlalchemy.pool.Pool
, such as a QueuePool
instance. If non-None, this pool will be used directly as the underlying connection pool for the engine, bypassing whatever connection parameters are present in the URL argument. For information on constructing connection pools manually, see Connection Pooling.
sqlalchemy.pool.Pool
subclass, which will be used to create a connection pool instance using the connection parameters given in the URL. Note this differs from pool
in that you don't actually instantiate the pool in this case, you just indicate what type of pool to be used.
QueuePool
.
QueuePool
as well as SingletonThreadPool
.
QueuePool
.
TLEngine
class that provides a modified connection scope for connectionless executions. Connectionless execution as well as further detail on this setting are described in Connectionless Execution, Implicit Execution.
threaded
parameter of the connection indicating thread-safe usage. cx_Oracle docs indicate setting this flag to False
will speed performance by 10-15%. While this defaults to False
in cx_Oracle, SQLAlchemy defaults it to True
, preferring stability over early optimization.
column1(+)=column2
must be used in order to achieve a LEFT OUTER JOIN.
Recall from the beginning of this section that the Engine provides a connect()
method which returns a Connection
object. Connection
is a proxy object which maintains a reference to a DBAPI connection instance. The close()
method on Connection
does not actually close the DBAPI connection, but instead returns it to the connection pool referenced by the Engine
. Connection
will also automatically return its resources to the connection pool when the object is garbage collected, i.e. its __del__()
method is called. When using the standard C implementation of Python, this method is usually called immediately as soon as the object is dereferenced. With other Python implementations such as Jython, this is not so guaranteed.
The execute()
methods on both Engine
and Connection
can also receive SQL clause constructs as well:
connection = engine.connect() result = connection.execute(select([table1], table1.c.col1==5)) for row in result: print row['col1'], row['col2'] connection.close()
The above SQL construct is known as a select()
. The full range of SQL constructs available are described in SQL Expression Language Tutorial.
Both Connection
and Engine
fulfill an interface known as Connectable
which specifies common functionality between the two objects, namely being able to call connect()
to return a Connection
object (Connection
just returns itself), and being able to call execute()
to get a result set. Following this, most SQLAlchemy functions and objects which accept an Engine
as a parameter or attribute with which to execute SQL will also accept a Connection
. As of SQLAlchemy 0.3.9, this argument is named bind
.
engine = create_engine('sqlite:///:memory:') # specify some Table metadata metadata = MetaData() table = Table('sometable', metadata, Column('col1', Integer)) # create the table with the Engine table.create(bind=engine) # drop the table with a Connection off the Engine connection = engine.connect() table.drop(bind=connection)
Connection facts:
connection.close()
) whenever the connection is not in use. If your application has a need for management of multiple connections or is otherwise long running (this includes all web applications, threaded or not), don't hold a single connection open at the module level.
The Connection
object provides a begin()
method which returns a Transaction
object. This object is usually used within a try/except clause so that it is guaranteed to rollback()
or commit()
:
trans = connection.begin() try: r1 = connection.execute(table1.select()) connection.execute(table1.insert(), col1=7, col2='this is some data') trans.commit() except: trans.rollback() raise
The Transaction
object also handles "nested" behavior by keeping track of the outermost begin/commit pair. In this example, two functions both issue a transaction on a Connection, but only the outermost Transaction object actually takes effect when it is committed.
# method_a starts a transaction and calls method_b def method_a(connection): trans = connection.begin() # open a transaction try: method_b(connection) trans.commit() # transaction is committed here except: trans.rollback() # this rolls back the transaction unconditionally raise # method_b also starts a transaction def method_b(connection): trans = connection.begin() # open a transaction - this runs in the context of method_a's transaction try: connection.execute("insert into mytable values ('bat', 'lala')") connection.execute(mytable.insert(), col1='bat', col2='lala') trans.commit() # transaction is not committed yet except: trans.rollback() # this rolls back the transaction unconditionally raise # open a Connection and call method_a conn = engine.connect() method_a(conn) conn.close()
Above, method_a
is called first, which calls connection.begin()
. Then it calls method_b
. When method_b
calls connection.begin()
, it just increments a counter that is decremented when it calls commit()
. If either method_a
or method_b
calls rollback()
, the whole transaction is rolled back. The transaction is not committed until method_a
calls the commit()
method. This "nesting" behavior allows the creation of functions which "guarantee" that a transaction will be used if one was not already available, but will automatically participate in an enclosing transaction if one exists.
Note that SQLAlchemy's Object Relational Mapper also provides a way to control transaction scope at a higher level; this is described in unitofwork_transaction.
Transaction Facts:
The above transaction example illustrates how to use Transaction
so that several executions can take part in the same transaction. What happens when we issue an INSERT, UPDATE or DELETE call without using Transaction
? The answer is autocommit. While many DBAPIs implement a flag called autocommit
, the current SQLAlchemy behavior is such that it implements its own autocommit. This is achieved by searching the statement for strings like INSERT, UPDATE, DELETE, etc. and then issuing a COMMIT automatically if no transaction is in progress.
conn = engine.connect() conn.execute("INSERT INTO users VALUES (1, 'john')") # autocommits
Recall from the first section we mentioned executing with and without a Connection
. Connectionless
execution refers to calling the execute()
method on an object which is not a Connection
, which could be on the the Engine
itself, or could be a constructed SQL object. When we say "implicit", we mean that we are calling the execute()
method on an object which is neither a Connection
nor an Engine
object; this can only be used with constructed SQL objects which have their own execute()
method, and can be "bound" to an Engine
. A description of "constructed SQL objects" may be found in SQL Expression Language Tutorial.
A summary of all three methods follows below. First, assume the usage of the following MetaData
and Table
objects; while we haven't yet introduced these concepts, for now you only need to know that we are representing a database table, and are creating an "executeable" SQL construct which issues a statement to the database. These objects are described in Database Meta Data.
meta = MetaData() users_table = Table('users', meta, Column('id', Integer, primary_key=True), Column('name', String(50)) )
Explicit execution delivers the SQL text or constructed SQL expression to the execute()
method of Connection
:
engine = create_engine('sqlite:///file.db') connection = engine.connect() result = connection.execute(users_table.select()) for row in result: # .... connection.close()
Explicit, connectionless execution delivers the expression to the execute()
method of Engine
:
engine = create_engine('sqlite:///file.db') result = engine.execute(users_table.select()) for row in result: # .... result.close()
Implicit execution is also connectionless, and calls the execute()
method on the expression itself, utilizing the fact that either an Engine
or Connection
has been bound to the expression object (binding is discussed further in the next section, Database Meta Data):
engine = create_engine('sqlite:///file.db') meta.bind = engine result = users_table.select().execute() for row in result: # .... result.close()
In both "connectionless" examples, the Connection
is created behind the scenes; the ResultProxy
returned by the execute()
call references the Connection
used to issue the SQL statement. When we issue close()
on the ResultProxy
, or if the result set object falls out of scope and is garbage collected, the underlying Connection
is closed for us, resulting in the DBAPI connection being returned to the pool.
With connectionless execution, each returned ResultProxy
object references its own distinct DBAPI connection object. This means that multiple executions will result in multiple DBAPI connections being used at the same time; the example below illustrates this:
db = create_engine('mysql://localhost/test') # execute one statement and receive results. r1 now references a DBAPI connection resource. r1 = db.execute("select * from table1") # execute a second statement and receive results. r2 now references a *second* DBAPI connection resource. r2 = db.execute("select * from table2") for row in r1: ... for row in r2: ... # release connection 1 r1.close() # release connection 2 r2.close()
Where above, we have two result sets in scope at the same time, therefore we have two distinct DBAPI connections, both separately checked out from the connection pool, in scope at the same time.
An option exists to create_engine()
called strategy="threadlocal"
, which changes this behavior. When this option is used, the Engine
which is returned by create_engine()
is a special subclass of engine called TLEngine
. This engine, when it creates the Connection
used by a connectionless execution, checks a threadlocal variable for an existing DBAPI connection that was already checked out from the pool, within the current thread. If one exists, it uses that one.
The usage of "threadlocal" modifies the underlying behavior of our example above, as follows:
db = create_engine('mysql://localhost/test', strategy='threadlocal') # execute one statement and receive results. r1 now references a DBAPI connection resource. r1 = db.execute("select * from table1") # execute a second statement and receive results. r2 now references the *same* resource as r1 r2 = db.execute("select * from table2") for row in r1: ... for row in r2: ... # close r1. the connection is still held by r2. r1.close() # close r2. with no more references to the underlying connection resources, they # are returned to the pool. r2.close()
Where above, we again have two result sets in scope at the same time, but because they are present in the same thread, there is only one DBAPI connection in use.
While the above distinction may not seem like much, it has several potentially desireable effects. One is that you can in some cases reduce the number of concurrent connections checked out from the connection pool, in the case that a ResultProxy
is still opened and a second statement is issued. A second advantage is that by limiting the number of checked out connections in a thread to just one, you eliminate the issue of deadlocks within a single thread, such as when connection A locks a table, and connection B attempts to read from the same table in the same thread, it will "deadlock" on waiting for connection A to release its lock; the threadlocal
strategy eliminates this possibility.
A third advantage to the threadlocal
strategy is that it allows the Transaction
object to be used in combination with connectionless execution. Recall from the section on transactions, that the Transaction
is returned by the begin()
method on a Connection
; all statements which wish to participate in this transaction must be executed by the same Connection
, thereby forcing the usage of an explicit connection. However, the TLEngine
provides a Transaction
that is local to the current thread; using it, one can issue many "connectionless" statements within a thread and they will all automatically partake in the current transaction, as in the example below:
# get a TLEngine engine = create_engine('mysql://localhost/test', strategy='threadlocal') engine.begin() try: engine.execute("insert into users values (?, ?)", 1, "john") users.update(users.c.user_id==5).execute(name='ed') engine.commit() except: engine.rollback()
Notice that no Connection
needed to be used; the begin()
method on TLEngine
(which note is not available on the regular Engine
) created a Transaction
as well as a Connection
, and held onto both in a context corresponding to the current thread. Each execute()
call made use of the same connection, allowing them all to participate in the same transaction.
Complex application flows can take advantage of the "threadlocal" strategy in order to allow many disparate parts of an application to take place in the same transaction automatically. The example below demonstrates several forms of "connectionless execution" as well as some specialized explicit ones:
engine = create_engine('mysql://localhost/test', strategy='threadlocal') def dosomethingimplicit(): table1.execute("some sql") table1.execute("some other sql") def dosomethingelse(): table2.execute("some sql") conn = engine.contextual_connect() # do stuff with conn conn.execute("some other sql") conn.close() def dosomethingtransactional(): conn = engine.contextual_connect() trans = conn.begin() # do stuff trans.commit() engine.begin() try: dosomethingimplicit() dosomethingelse() dosomethingtransactional() engine.commit() except: engine.rollback()
In the above example, the program calls three functions dosomethingimplicit()
, dosomethingelse()
and dosomethingtransactional()
. All three functions use either connectionless execution, or a special function contextual_connect()
which we will describe in a moment. These two styles of execution both indicate that all executions will use the same connection object. Additionally, the method dosomethingtransactional()
begins and commits its own Transaction
. But only one transaction is used, too; it's controlled completely by the engine.begin()
/engine.commit()
calls at the bottom. Recall that Transaction
supports "nesting" behavior, whereby transactions begun on a Connection
which already has a tranasaction open, will "nest" into the enclosing transaction. Since the transaction opened in dosomethingtransactional()
occurs using the same connection which already has a transaction begun, it "nests" into that transaction and therefore has no effect on the actual transaction scope (unless it calls rollback()
).
Some of the functions in the above example make use of a method called engine.contextual_connect()
. This method is available on both Engine
as well as TLEngine
, and returns the Connection
that applies to the current connection context. When using the TLEngine
, this is just another term for the "thread local connection" that is being used for all connectionless executions. When using just the regular Engine
(i.e. the "default" strategy), contextual_connect()
is synonymous with connect()
. Below we illustrate that two connections opened via contextual_connect()
at the same time, both reference the same underlying DBAPI connection:
# threadlocal strategy db = create_engine('mysql://localhost/test', strategy='threadlocal') conn1 = db.contextual_connect() conn2 = db.contextual_connect() >>> conn1.connection is conn2.connection True
The basic idea of contextual_connect()
is that its the "connection used by connectionless execution". It's different from the connect()
method in that connect()
is always used when handling an explicit Connection
, which will always reference distinct DBAPI connection. Using connect()
in combination with TLEngine
allows one to "circumvent" the current thread local context, as in this example where a single statement issues data to the database externally to the current transaction:
engine.begin() engine.execute("insert into users values (?, ?)", 1, "john") connection = engine.connect() connection.execute(users.update(users.c.user_id==5).execute(name='ed')) engine.rollback()
In the above example, a thread-local transaction is begun, but is later rolled back. The statement insert into users values (?, ?)
is executed without using a connection, therefore uses the thread-local transaction. So its data is rolled back when the transaction is rolled back. However, the users.update()
statement is executed using a distinct Connection
returned by the engine.connect()
method, so it therefore is not part of the threadlocal transaction; it autocommits immediately.
As of the 0.3 series of SQLAlchemy, Python's standard logging module is used to implement informational and debug log output. This allows SQLAlchemy's logging to integrate in a standard way with other applications and libraries. The echo
and echo_pool
flags that are present on create_engine()
, as well as the echo_uow
flag used on Session
, all interact with regular loggers.
This section assumes familiarity with the above linked logging module. All logging performed by SQLAlchemy exists underneath the sqlalchemy
namespace, as used by logging.getLogger('sqlalchemy')
. When logging has been configured (i.e. such as via logging.basicConfig()
), the general namespace of SA loggers that can be turned on is as follows:
sqlalchemy.engine
- controls SQL echoing. set to logging.INFO
for SQL query output, logging.DEBUG
for query + result set output.
sqlalchemy.pool
- controls connection pool logging. set to logging.INFO
or lower to log connection pool checkouts/checkins.
sqlalchemy.orm
- controls logging of various ORM functions. set to logging.INFO
for configurational logging as well as unit of work dumps, logging.DEBUG
for extensive logging during query and flush() operations. Subcategories of sqlalchemy.orm
include:sqlalchemy.orm.attributes
- logs certain instrumented attribute operations, such as triggered callables
sqlalchemy.orm.mapper
- logs Mapper configuration and operations
sqlalchemy.orm.unitofwork
- logs flush() operations, including dependency sort graphs and other operations
sqlalchemy.orm.strategies
- logs relation loader operations (i.e. lazy and eager loads)
sqlalchemy.orm.sync
- logs synchronization of attributes from parent to child instances during a flush()
For example, to log SQL queries as well as unit of work debugging:
import logging logging.basicConfig() logging.getLogger('sqlalchemy.engine').setLevel(logging.INFO) logging.getLogger('sqlalchemy.orm.unitofwork').setLevel(logging.DEBUG)
By default, the log level is set to logging.ERROR
within the entire sqlalchemy
namespace so that no log operations occur, even within an application that has logging enabled otherwise.
The echo
flags present as keyword arguments to create_engine()
and others as well as the echo
property on Engine
, when set to True
, will first attempt to ensure that logging is enabled. Unfortunately, the logging
module provides no way of determining if output has already been configured (note we are referring to if a logging configuration has been set up, not just that the logging level is set). For this reason, any echo=True
flags will result in a call to logging.basicConfig()
using sys.stdout as the destination. It also sets up a default format using the level name, timestamp, and logger name. Note that this configuration has the affect of being configured in addition to any existing logger configurations. Therefore, when using Python logging, ensure all echo flags are set to False at all times, to avoid getting duplicate log lines.
The core of SQLAlchemy's query and object mapping operations are supported by database metadata, which is comprised of Python objects that describe tables and other schema-level objects. These objects can be created by explicitly naming the various components and their properties, using the Table, Column, ForeignKey, Index, and Sequence objects imported from sqlalchemy.schema
. There is also support for reflection of some entities, which means you only specify the name of the entities and they are recreated from the database automatically.
A collection of metadata entities is stored in an object aptly named MetaData
:
from sqlalchemy import * metadata = MetaData()
To represent a Table, use the Table
class:
users = Table('users', metadata, Column('user_id', Integer, primary_key = True), Column('user_name', String(16), nullable = False), Column('email_address', String(60), key='email'), Column('password', String(20), nullable = False) ) user_prefs = Table('user_prefs', metadata, Column('pref_id', Integer, primary_key=True), Column('user_id', Integer, ForeignKey("users.user_id"), nullable=False), Column('pref_name', String(40), nullable=False), Column('pref_value', String(100)) )
The specific datatypes for each Column, such as Integer, String, etc. are described in The Types System, and exist within the module sqlalchemy.types
as well as the global sqlalchemy
namespace.
Foreign keys are most easily specified by the ForeignKey
object within a Column
object. For a composite foreign key, i.e. a foreign key that contains multiple columns referencing multiple columns to a composite primary key, an explicit syntax is provided which allows the correct table CREATE statements to be generated:
# a table with a composite primary key invoices = Table('invoices', metadata, Column('invoice_id', Integer, primary_key=True), Column('ref_num', Integer, primary_key=True), Column('description', String(60), nullable=False) ) # a table with a composite foreign key referencing the parent table invoice_items = Table('invoice_items', metadata, Column('item_id', Integer, primary_key=True), Column('item_name', String(60), nullable=False), Column('invoice_id', Integer, nullable=False), Column('ref_num', Integer, nullable=False), ForeignKeyConstraint(['invoice_id', 'ref_num'], ['invoices.invoice_id', 'invoices.ref_num']) )
Above, the invoice_items
table will have ForeignKey
objects automatically added to the invoice_id
and ref_num
Column
objects as a result of the additional ForeignKeyConstraint
object.
The MetaData
object supports some handy methods, such as getting a list of Tables in the order (or reverse) of their dependency:
>>> for t in metadata.table_iterator(reverse=False): ... print t.name users user_prefs
And Table
provides an interface to the table's properties as well as that of its columns:
employees = Table('employees', metadata, Column('employee_id', Integer, primary_key=True), Column('employee_name', String(60), nullable=False, key='name'), Column('employee_dept', Integer, ForeignKey("departments.department_id")) ) # access the column "EMPLOYEE_ID": employees.columns.employee_id # or just employees.c.employee_id # via string employees.c['employee_id'] # iterate through all columns for c in employees.c: # ... # get the table's primary key columns for primary_key in employees.primary_key: # ... # get the table's foreign key objects: for fkey in employees.foreign_keys: # ... # access the table's MetaData: employees.metadata # access the table's bound Engine or Connection, if its MetaData is bound: employees.bind # access a column's name, type, nullable, primary key, foreign key employees.c.employee_id.name employees.c.employee_id.type employees.c.employee_id.nullable employees.c.employee_id.primary_key employees.c.employee_dept.foreign_key # get the "key" of a column, which defaults to its name, but can # be any user-defined string: employees.c.name.key # access a column's table: employees.c.employee_id.table is employees >>> True # get the table related by a foreign key fcolumn = employees.c.employee_dept.foreign_key.column.table
A MetaData
object can be associated with an Engine
or an individual Connection
; this process is called binding. The term used to describe "an engine or a connection" is often referred to as a connectable. Binding allows the MetaData
and the elements which it contains to perform operations against the database directly, using the connection resources to which it's bound. Common operations which are made more convenient through binding include being able to generate SQL constructs which know how to execute themselves, creating Table
objects which query the database for their column and constraint information, and issuing CREATE or DROP statements.
To bind MetaData
to an Engine
, use the bind
attribute:
engine = create_engine('sqlite://', **kwargs) # create MetaData meta = MetaData() # bind to an engine meta.bind = engine
Once this is done, the MetaData
and its contained Table
objects can access the database directly:
meta.create_all() # issue CREATE statements for all tables # describe a table called 'users', query the database for its columns users_table = Table('users', meta, autoload=True) # generate a SELECT statement and execute result = users_table.select().execute()
Note that the feature of binding engines is completely optional. All of the operations which take advantage of "bound" MetaData
also can be given an Engine
or Connection
explicitly with which to perform the operation. The equivalent "non-bound" of the above would be:
meta.create_all(engine) # issue CREATE statements for all tables # describe a table called 'users', query the database for its columns users_table = Table('users', meta, autoload=True, autoload_with=engine) # generate a SELECT statement and execute result = engine.execute(users_table.select())
A Table
object can be created without specifying any of its contained attributes, using the argument autoload=True
in conjunction with the table's name and possibly its schema (if not the databases "default" schema). (You can also specify a list or set of column names to autoload as the kwarg include_columns, if you only want to load a subset of the columns in the actual database.) This will issue the appropriate queries to the database in order to locate all properties of the table required for SQLAlchemy to use it effectively, including its column names and datatypes, foreign and primary key constraints, and in some cases its default-value generating attributes. To use autoload=True
, the table's MetaData
object need be bound to an Engine
or Connection
, or alternatively the autoload_with=<some connectable>
argument can be passed. Below we illustrate autoloading a table and then iterating through the names of its columns:
>>> messages = Table('messages', meta, autoload=True) >>> [c.name for c in messages.columns] ['message_id', 'message_name', 'date']
Note that if a reflected table has a foreign key referencing another table, the related Table
object will be automatically created within the MetaData
object if it does not exist already. Below, suppose table shopping_cart_items
references a table shopping_carts
. After reflecting, the shopping carts
table is present:
>>> shopping_cart_items = Table('shopping_cart_items', meta, autoload=True) >>> 'shopping_carts' in meta.tables: True
To get direct access to 'shopping_carts', simply instantiate it via the Table
constructor. Table
uses a special contructor that will return the already created Table
instance if its already present:
shopping_carts = Table('shopping_carts', meta)
Of course, its a good idea to use autoload=True
with the above table regardless. This is so that if it hadn't been loaded already, the operation will load the table. The autoload operation only occurs for the table if it hasn't already been loaded; once loaded, new calls to Table
will not re-issue any reflection queries.
Individual columns can be overridden with explicit values when reflecting tables; this is handy for specifying custom datatypes, constraints such as primary keys that may not be configured within the database, etc.
>>> mytable = Table('mytable', meta, ... Column('id', Integer, primary_key=True), # override reflected 'id' to have primary key ... Column('mydata', Unicode(50)), # override reflected 'mydata' to be Unicode ... autoload=True)
Some databases support the concept of multiple schemas. A Table
can reference this by specifying the schema
keyword argument:
financial_info = Table('financial_info', meta, Column('id', Integer, primary_key=True), Column('value', String(100), nullable=False), schema='remote_banks' )
Within the MetaData
collection, this table will be identified by the combination of financial_info
and remote_banks
. If another table called financial_info
is referenced without the remote_banks
schema, it will refer to a different Table
. ForeignKey
objects can reference columns in this table using the form remote_banks.financial_info.id
.
ON UPDATE
and ON DELETE
clauses to a table create are specified within the ForeignKeyConstraint
object, using the onupdate
and ondelete
keyword arguments:
foobar = Table('foobar', meta, Column('id', Integer, primary_key=True), Column('lala', String(40)), ForeignKeyConstraint(['lala'],['hoho.lala'], onupdate="CASCADE", ondelete="CASCADE"))
Note that these clauses are not supported on SQLite, and require InnoDB
tables when used with MySQL. They may also not be supported on other databases.
Tables
may support database-specific options, such as MySQL's engine
option that can specify "MyISAM", "InnoDB", and other backends for the table:
addresses = Table('engine_email_addresses', meta, Column('address_id', Integer, primary_key = True), Column('remote_user_id', Integer, ForeignKey(users.c.user_id)), Column('email_address', String(20)), mysql_engine='InnoDB' )
Creating and dropping individual tables can be done via the create()
and drop()
methods of Table
; these methods take an optional bind
parameter which references an Engine
or a Connection
. If not supplied, the Engine
bound to the MetaData
will be used, else an error is raised:
meta = MetaData() meta.bind = 'sqlite:///:memory:' employees = Table('employees', meta, Column('employee_id', Integer, primary_key=True), Column('employee_name', String(60), nullable=False, key='name'), Column('employee_dept', Integer, ForeignKey("departments.department_id")) ) sqlemployees.create()
drop()
method:
sqlemployees.drop(bind=e)
The create()
and drop()
methods also support an optional keyword argument checkfirst
which will issue the database's appropriate pragma statements to check if the table exists before creating or dropping:
employees.create(bind=e, checkfirst=True) employees.drop(checkfirst=False)
Entire groups of Tables can be created and dropped directly from the MetaData
object with create_all()
and drop_all()
. These methods always check for the existence of each table before creating or dropping. Each method takes an optional bind
keyword argument which can reference an Engine
or a Connection
. If no engine is specified, the underlying bound Engine
, if any, is used:
engine = create_engine('sqlite:///:memory:') metadata = MetaData() users = Table('users', metadata, Column('user_id', Integer, primary_key = True), Column('user_name', String(16), nullable = False), Column('email_address', String(60), key='email'), Column('password', String(20), nullable = False) ) user_prefs = Table('user_prefs', metadata, Column('pref_id', Integer, primary_key=True), Column('user_id', Integer, ForeignKey("users.user_id"), nullable=False), Column('pref_name', String(40), nullable=False), Column('pref_value', String(100)) ) sqlmetadata.create_all(bind=engine)
SQLAlchemy includes several constructs which provide default values provided during INSERT and UPDATE statements. The defaults may be provided as Python constants, Python functions, or SQL expressions, and the SQL expressions themselves may be "pre-executed", executed inline within the insert/update statement itself, or can be created as a SQL level "default" placed on the table definition itself. A "default" value by definition is only invoked if no explicit value is passed into the INSERT or UPDATE statement.
The "default" keyword argument on Column can reference a Python value or callable which is invoked at the time of an insert:
# a function which counts upwards i = 0 def mydefault(): global i i += 1 return i t = Table("mytable", meta, # function-based default Column('id', Integer, primary_key=True, default=mydefault), # a scalar default Column('key', String(10), default="default") )
Similarly, the "onupdate" keyword does the same thing for update statements:
import datetime t = Table("mytable", meta, Column('id', Integer, primary_key=True), # define 'last_updated' to be populated with datetime.now() Column('last_updated', DateTime, onupdate=datetime.now), )
The "default" and "onupdate" keywords may also be passed SQL expressions, including select statements or direct function calls:
t = Table("mytable", meta, Column('id', Integer, primary_key=True), # define 'create_date' to default to now() Column('create_date', DateTime, default=func.now()), # define 'key' to pull its default from the 'keyvalues' table Column('key', String(20), default=keyvalues.select(keyvalues.c.type='type1', limit=1)) # define 'last_modified' to use the current_timestamp SQL function on update Column('last_modified', DateTime, onupdate=func.current_timestamp()) )
The above SQL functions are usually executed "inline" with the INSERT or UPDATE statement being executed. In some cases, the function is "pre-executed" and its result pre-fetched explicitly. This happens under the following circumstances:
the column is a primary key column
the database dialect does not support a usable cursor.lastrowid
accessor (or equivalent); this currently includes Postgres, Oracle, and Firebird.
the statement is a single execution, i.e. only supplies one set of parameters and doesn't use "executemany" behavior
the inline=True
flag is not set on the Insert()
or Update()
construct.
For a statement which executes with inline=False
and is not an executemany execution, the returned ResultProxy
will contain a collection accessible via result.postfetch_cols()
which contains a list of all Column
objects which had an inline-executed default. Similarly, all parameters which were bound to the statement, including all Python and SQL expressions which were pre-executed, are present in the last_inserted_params()
or last_updated_params()
collections on ResultProxy
. The last_inserted_ids()
collection contains a list of primary key values for the row inserted.
A variant on a SQL expression default is the PassiveDefault
, which gets placed in the CREATE TABLE statement during a create()
operation:
t = Table('test', meta, Column('mycolumn', DateTime, PassiveDefault("sysdate")) )
A create call for the above table will produce:
CREATE TABLE test ( mycolumn datetime default sysdate )
The behavior of PassiveDefault
is similar to that of a regular SQL default; if it's placed on a primary key column for a database which doesn't have a way to "postfetch" the ID, and the statement is not "inlined", the SQL expression is pre-executed; otherwise, SQLAlchemy lets the default fire off on the database side normally.
A table with a sequence looks like:
table = Table("cartitems", meta, Column("cart_id", Integer, Sequence('cart_id_seq'), primary_key=True), Column("description", String(40)), Column("createdate", DateTime()) )
The Sequence
object works a lot like the default
keyword on Column
, except that it only takes effect on a database which supports sequences. When used with a database that does not support sequences, the Sequence
object has no effect; therefore it's safe to place on a table which is used against multiple database backends. The same rules for pre- and inline execution apply.
When the Sequence
is associated with a table, CREATE and DROP statements issued for that table will also issue CREATE/DROP for the sequence object as well, thus "bundling" the sequence object with its parent table.
The flag optional=True
on Sequence
will produce a sequence that is only used on databases which have no "autoincrementing" capability. For example, Postgres supports primary key generation using the SERIAL keyword, whereas Oracle has no such capability. Therefore, a Sequence
placed on a primary key column with optional=True
will only be used with an Oracle backend but not Postgres.
A sequence can also be executed standalone, using an Engine
or Connection
, returning its next value in a database-independent fashion:
seq = Sequence('some_sequence') nextid = connection.execute(seq)
Unique constraints can be created anonymously on a single column using the unique
keyword on Column
. Explicitly named unique constraints and/or those with multiple columns are created via the UniqueConstraint
table-level construct.
meta = MetaData() mytable = Table('mytable', meta, # per-column anonymous unique constraint Column('col1', Integer, unique=True), Column('col2', Integer), Column('col3', Integer), # explicit/composite unique constraint. 'name' is optional. UniqueConstraint('col2', 'col3', name='uix_1') )
Check constraints can be named or unnamed and can be created at the Column or Table level, using the CheckConstraint
construct. The text of the check constraint is passed directly through to the database, so there is limited "database independent" behavior. Column level check constraints generally should only refer to the column to which they are placed, while table level constraints can refer to any columns in the table.
Note that some databases do not actively support check constraints such as MySQL and sqlite.
meta = MetaData() mytable = Table('mytable', meta, # per-column CHECK constraint Column('col1', Integer, CheckConstraint('col1>5')), Column('col2', Integer), Column('col3', Integer), # table level CHECK constraint. 'name' is optional. CheckConstraint('col2 > col3 + 5', name='check1') )
Indexes can be created anonymously (using an auto-generated name "ix_index
keyword on Column
, which also modifies the usage of unique
to apply the uniqueness to the index itself, instead of adding a separate UNIQUE constraint. For indexes with specific names or which encompass more than one column, use the Index
construct, which requires a name.
Note that the Index
construct is created externally to the table which it corresponds, using Column
objects and not strings.
meta = MetaData() mytable = Table('mytable', meta, # an indexed column, with index "ix_mytable_col1" Column('col1', Integer, index=True), # a uniquely indexed column with index "ix_mytable_col2" Column('col2', Integer, index=True, unique=True), Column('col3', Integer), Column('col4', Integer), Column('col5', Integer), Column('col6', Integer), ) # place an index on col3, col4 Index('idx_col34', mytable.c.col3, mytable.c.col4) # place a unique index on col5, col6 Index('myindex', mytable.c.col5, mytable.c.col6, unique=True)
The Index
objects will be created along with the CREATE statements for the table itself. An index can also be created on its own independently of the table:
# create a table sometable.create() # define an index i = Index('someindex', sometable.c.col5) # create the index, will use the table's bound connectable if the `bind` keyword argument not specified i.create()
A Table
object created against a specific MetaData
object can be re-created against a new MetaData using the tometadata
method:
# create two metadata meta1 = MetaData('sqlite:///querytest.db') meta2 = MetaData() # load 'users' from the sqlite engine users_table = Table('users', meta1, autoload=True) # create the same Table object for the plain metadata users_table_2 = users_table.tometadata(meta2)
The package sqlalchemy.types
defines the datatype identifiers which may be used when defining metadata. This package includes a set of generic types, a set of SQL-specific subclasses of those types, and a small extension system used by specific database connectors to adapt these generic types into database-specific type objects.
SQLAlchemy comes with a set of standard generic datatypes, which are defined as classes.
The standard set of generic types are:
class String(TypeEngine): def __init__(self, length=None) class Integer(TypeEngine) class SmallInteger(Integer) class Numeric(TypeEngine): def __init__(self, precision=10, length=2) class Float(Numeric): def __init__(self, precision=10) # DateTime, Date and Time types deal with datetime objects from the Python datetime module class DateTime(TypeEngine) class Date(TypeEngine) class Time(TypeEngine) class Binary(TypeEngine): def __init__(self, length=None) class Boolean(TypeEngine) # converts unicode strings to raw bytes # as bind params, raw bytes to unicode as # rowset values, using the unicode encoding # setting on the engine (defaults to 'utf-8') class Unicode(TypeDecorator): impl = String # uses the pickle protocol to serialize data # in/out of Binary columns class PickleType(TypeDecorator): impl = Binary
More specific subclasses of these types are available, which various database engines may choose to implement specifically, allowing finer grained control over types:
class FLOAT(Numeric) class TEXT(String) class DECIMAL(Numeric) class INT(Integer) INTEGER = INT class TIMESTAMP(DateTime) class DATETIME(DateTime) class CLOB(String) class VARCHAR(String) class CHAR(String) class BLOB(Binary) class BOOLEAN(Boolean)
When using a specific database engine, these types are adapted even further via a set of database-specific subclasses defined by the database engine. There may eventually be more type objects that are defined for specific databases. An example of this would be Postgres' Array type.
Type objects are specified to table meta data using either the class itself, or an instance of the class. Creating an instance of the class allows you to specify parameters for the type, such as string length, numerical precision, etc.:
mytable = Table('mytable', engine, # define type using a class Column('my_id', Integer, primary_key=True), # define type using an object instance Column('value', Number(7,4)) )
Each dialect has its own set of types, many of which are available only within that dialect. For example, MySQL has a BigInteger
type and Postgres has an Inet
type. To use these, import them from the module explicitly:
from sqlalchemy.databases.mysql import MSEnum, MSBigInteger table = Table('foo', meta, Column('enumerates', MSEnum('a', 'b', 'c')), Column('id', MSBigInteger) )
Or some postgres types:
from sqlalchemy.databases.postgres import PGInet, PGArray table = Table('foo', meta, Column('ipaddress', PGInet), Column('elements', PGArray(str)) # PGArray is available in 0.4, and takes a type argument )
User-defined types can be created, to support either database-specific types, or customized pre-processing of query parameters as well as post-processing of result set data. You can make your own classes to perform these operations. To augment the behavior of a TypeEngine
type, such as String
, the TypeDecorator
class is used:
import sqlalchemy.types as types class MyType(types.TypeDecorator): """basic type that decorates String, prefixes values with "PREFIX:" on the way in and strips it off on the way out.""" impl = types.String def convert_bind_param(self, value, engine): return "PREFIX:" + value def convert_result_value(self, value, engine): return value[7:]
The PickleType
class is an instance of TypeDecorator
already and can be subclassed directly.
To build a type object from scratch, which will not have a corresponding database-specific implementation, subclass TypeEngine
:
import sqlalchemy.types as types class MyType(types.TypeEngine): def __init__(self, precision = 8): self.precision = precision def get_col_spec(self): return "MYTYPE(%s)" % self.precision def convert_bind_param(self, value, engine): return value def convert_result_value(self, value, engine): return value
Once you make your type, its immediately useable:
table = Table('foo', meta, Column('id', Integer, primary_key=True), Column('data', MyType(16)) )
This section describes the connection pool module of SQLAlchemy. The Pool
object it provides is normally embedded within an Engine
instance. For most cases, explicit access to the pool module is not required. However, the Pool
object can be used on its own, without the rest of SA, to manage DBAPI connections; this section describes that usage. Also, this section will describe in more detail how to customize the pooling strategy used by an Engine
.
At the base of any database helper library is a system of efficiently acquiring connections to the database. Since the establishment of a database connection is typically a somewhat expensive operation, an application needs a way to get at database connections repeatedly without incurring the full overhead each time. Particularly for server-side web applications, a connection pool is the standard way to maintain a "pool" of database connections which are used over and over again among many requests. Connection pools typically are configured to maintain a certain "size", which represents how many connections can be used simultaneously without resorting to creating more newly-established connections.
Any DBAPI module can be "proxied" through the connection pool using the following technique (note that the usage of 'psycopg2' is just an example; substitute whatever DBAPI module you'd like):
import sqlalchemy.pool as pool import psycopg2 as psycopg psycopg = pool.manage(psycopg) # then connect normally connection = psycopg.connect(database='test', username='scott', password='tiger')
This produces a sqlalchemy.pool.DBProxy
object which supports the same connect()
function as the original DBAPI module. Upon connection, a connection proxy object is returned, which delegates its calls to a real DBAPI connection object. This connection object is stored persistently within a connection pool (an instance of sqlalchemy.pool.Pool
) that corresponds to the exact connection arguments sent to the connect()
function.
The connection proxy supports all of the methods on the original connection object, most of which are proxied via __getattr__()
. The close()
method will return the connection to the pool, and the cursor()
method will return a proxied cursor object. Both the connection proxy and the cursor proxy will also return the underlying connection to the pool after they have both been garbage collected, which is detected via the __del__()
method.
Additionally, when connections are returned to the pool, a rollback()
is issued on the connection unconditionally. This is to release any locks still held by the connection that may have resulted from normal activity.
By default, the connect()
method will return the same connection that is already checked out in the current thread. This allows a particular connection to be used in a given thread without needing to pass it around between functions. To disable this behavior, specify use_threadlocal=False
to the manage()
function.
For all types of Pool construction, which includes the "transparent proxy" described in the previous section, using an Engine
via create_engine()
, or constructing a pool through direct class instantiation, the options are generally the same. Additional options may be available based on the specific subclass of Pool
being used.
For a description of all pool classes, see the generated documentation.
Common options include:
echo_pool
.
create_engine()
, corresponding to the "plain" or
"threadlocal" connection strategy.
QueuePool options include:
Besides using the transparent proxy, instances of sqlalchemy.pool.Pool
can be created directly. Constructing your own pool involves passing a callable used to create a connection. Through this method, custom connection schemes can be made, such as a connection that automatically executes some initialization commands to start.
import sqlalchemy.pool as pool import psycopg2 def getconn(): c = psycopg2.connect(username='ed', host='127.0.0.1', dbname='test') # execute an initialization function on the connection before returning c.cursor.execute("setup_encodings()") return c p = pool.QueuePool(getconn, max_overflow=10, pool_size=5, use_threadlocal=True)
Or with SingletonThreadPool:
import sqlalchemy.pool as pool import sqlite def getconn(): return sqlite.connect(filename='myfile.db') # SQLite connections require the SingletonThreadPool p = pool.SingletonThreadPool(getconn)
SQLAlchemy has a variety of extensions available which provide extra functionality to SA, either via explicit usage or by augmenting the core behavior. Several of these extensions are designed to work together.
Author: Mike Bayer and Jason Kirtland
Version: 0.3.1 or greater
associationproxy
is used to create a simplified, read/write view of a relationship. It can be used to cherry-pick fields from a collection of related objects or to greatly simplify access to associated objects in an association relationship.
users_table = Table('users', metadata, Column('id', Integer, primary_key=True), Column('name', String(64)), ) keywords_table = Table('keywords', metadata, Column('id', Integer, primary_key=True), Column('keyword', String(64)) ) userkeywords_table = Table('userkeywords', metadata, Column('user_id', Integer, ForeignKey("users.id"), primary_key=True), Column('keyword_id', Integer, ForeignKey("keywords.id"), primary_key=True) ) class User(object): def __init__(self, name): self.name = name class Keyword(object): def __init__(self, keyword): self.keyword = keyword mapper(User, users_table, properties={ 'kw': relation(Keyword, secondary=userkeywords_table) }) mapper(Keyword, keywords_table)
Above are three simple tables, modeling users, keywords and a many-to-many relationship between the two. These Keyword
objects are little more than a container for a name, and accessing them via the relation is awkward:
user = User('jek') user.kw.append(Keyword('cheese inspector')) print user.kw # [<__main__.Keyword object at 0xb791ea0c>] print user.kw[0].keyword # 'cheese inspector' print [keyword.keyword for keyword in u._keywords] # ['cheese inspector']
With association_proxy
you have a "view" of the relation that contains just the .keyword
of the related objects. The proxy is a Python property, and unlike the mapper relation, is defined in your class:
from sqlalchemy.ext.associationproxy import association_proxy class User(object): def __init__(self, name): self.name = name # proxy the 'keyword' attribute from the 'kw' relation keywords = association_proxy('kw', 'keyword') # ... >>> user.kw [<__main__.Keyword object at 0xb791ea0c>] >>> user.keywords ['cheese inspector'] >>> user.keywords.append('snack ninja') >>> user.keywords ['cheese inspector', 'snack ninja'] >>> user.kw [<__main__.Keyword object at 0x9272a4c>, <__main__.Keyword object at 0xb7b396ec>]
The proxy is read/write. New associated objects are created on demand when values are added to the proxy, and modifying or removing an entry through the proxy also affects the underlying collection.
creator
function can be used to create instances instead.
Above, the Keyword.__init__
takes a single argument keyword
, which maps conveniently to the value being set through the proxy. A creator
function could have been used instead if more flexiblity was required.
Because the proxies are backed a regular relation collection, all of the usual hooks and patterns for using collections are still in effect. The most convenient behavior is the automatic setting of "parent"-type relationships on assignment. In the example above, nothing special had to be done to associate the Keyword to the User. Simply adding it to the collection is sufficient.
back to section topAssociation proxies are also useful for keeping association objects out the way during regular use. For example, the userkeywords
table might have a bunch of auditing columns that need to get updated when changes are made- columns that are updated but seldom, if ever, accessed in your application. A proxy can provide a very natural access pattern for the relation.
from sqlalchemy.ext.associationproxy import association_proxy # users_table and keywords_table tables as above, then: userkeywords_table = Table('userkeywords', metadata, Column('user_id', Integer, ForeignKey("users.id"), primary_key=True), Column('keyword_id', Integer, ForeignKey("keywords.id"), primary_key=True), # add some auditing columns Column('updated_at', DateTime, default=datetime.now), Column('updated_by', Integer, default=get_current_uid, onupdate=get_current_uid), ) def _create_uk_by_keyword(keyword): """A creator function.""" return UserKeyword(keyword=keyword) class User(object): def __init__(self, name): self.name = name keywords = association_proxy('user_keywords', 'keyword', creator=_create_uk_by_keyword) class Keyword(object): def __init__(self, keyword): self.keyword = keyword def __repr__(self): return 'Keyword(%s)' % repr(self.keyword) class UserKeyword(object): def __init__(self, user=None, keyword=None): self.user = user self.keyword = keyword mapper(User, users_table, properties={ 'user_keywords': relation(UserKeyword) }) mapper(Keyword, keywords_table) mapper(UserKeyword, userkeywords_table, properties={ 'user': relation(User), 'keyword': relation(Keyword), }) user = User('log') kw1 = Keyword('new_from_blammo') # Adding a Keyword requires creating a UserKeyword association object user.user_keywords.append(UserKeyword(user, kw1)) # And accessing Keywords requires traverrsing UserKeywords print user.user_keywords[0] # <__main__.UserKeyword object at 0xb79bbbec> print user.user_keywords[0].keyword # Keyword('new_from_blammo') # Lots of work. # It's much easier to go through the association proxy! for kw in (Keyword('its_big'), Keyword('its_heavy'), Keyword('its_wood')): user.keywords.append(kw) print user.keywords # [Keyword('new_from_blammo'), Keyword('its_big'), Keyword('its_heavy'), Keyword('its_wood')]
stocks = Table("stocks", meta, Column('symbol', String(10), primary_key=True), Column('description', String(100), nullable=False), Column('last_price', Numeric) ) brokers = Table("brokers", meta, Column('id', Integer,primary_key=True), Column('name', String(100), nullable=False) ) holdings = Table("holdings", meta, Column('broker_id', Integer, ForeignKey('brokers.id'), primary_key=True), Column('symbol', String(10), ForeignKey('stocks.symbol'), primary_key=True), Column('shares', Integer) )
Above are three tables, modeling stocks, their brokers and the number of shares of a stock held by each broker. This situation is quite different from the association example above. shares
is a property of the relation, an important one that we need to use all the time.
For this example, it would be very convenient if Broker
objects had a dictionary collection that mapped Stock
instances to the shares held for each. That's easy.
from sqlalchemy.ext.associationproxy import association_proxy from sqlalchemy.orm.collections import attribute_mapped_collection def _create_holding(stock, shares): """A creator function, constructs Holdings from Stock and share quantity.""" return Holding(stock=stock, shares=shares) class Broker(object): def __init__(self, name): self.name = name holdings = association_proxy('by_stock', 'shares', creator=_create_holding) class Stock(object): def __init__(self, symbol, description=None): self.symbol = symbol self.description = description self.last_price = 0 class Holding(object): def __init__(self, broker=None, stock=None, shares=0): self.broker = broker self.stock = stock self.shares = shares mapper(Stock, stocks_table) mapper(Broker, brokers_table, properties={ 'by_stock': relation(Holding, collection_class=attribute_mapped_collection('stock')) }) mapper(Holding, holdings_table, properties={ 'stock': relation(Stock), 'broker': relation(Broker) })
Above, we've set up the 'by_stock' relation collection to act as a dictionary, using the .stock
property of each Holding as a key.
Populating and accessing that dictionary manually is slightly inconvenient because of the complexity of the Holdings association object:
stock = Stock('ZZK') broker = Broker('paj') broker.holdings[stock] = Holding(broker, stock, 10) print broker.holdings[stock].shares # 10
The by_stock
proxy we've added to the Broker
class hides the details of the Holding
while also giving access to .shares
:
for stock in (Stock('JEK'), Stock('STPZ')): broker.holdings[stock] = 123 for stock, shares in broker.holdings.items(): print stock, shares # lets take a peek at that holdings_table after committing changes to the db print list(holdings_table.select().execute()) # [(1, 'ZZK', 10), (1, 'JEK', 123), (1, 'STEPZ', 123)]
Further examples can be found in the examples/
directory in the SQLAlchemy distribution.
The association_proxy
convenience function is not present in SQLAlchemy versions 0.3.1 through 0.3.7, instead instantiate the class directly:
from sqlalchemy.ext.associationproxy import AssociationProxy class Article(object): keywords = AssociationProxy('keyword_associations', 'keyword')
Author: Jason Kirtland
orderinglist
is a helper for mutable ordered relations. It will intercept
list operations performed on a relation collection and automatically
synchronize changes in list position with an attribute on the related objects.
(See advdatamapping_properties_entitycollections for more information on the general pattern.)
Example: Two tables that store slides in a presentation. Each slide has a number of bullet points, displayed in order by the 'position' column on the bullets table. These bullets can be inserted and re-ordered by your end users, and you need to update the 'position' column of all affected rows when changes are made.
slides_table = Table('Slides', metadata, Column('id', Integer, primary_key=True), Column('name', String)) bullets_table = Table('Bullets', metadata, Column('id', Integer, primary_key=True), Column('slide_id', Integer, ForeignKey('Slides.id')), Column('position', Integer), Column('text', String)) class Slide(object): pass class Bullet(object): pass mapper(Slide, slides_table, properties={ 'bullets': relation(Bullet, order_by=[bullets_table.c.position]) }) mapper(Bullet, bullets_table)
The standard relation mapping will produce a list-like attribute on each Slide containing all related Bullets, but coping with changes in ordering is totally your responsibility. If you insert a Bullet into that list, there is no magic- it won't have a position attribute unless you assign it it one, and you'll need to manually renumber all the subsequent Bullets in the list to accommodate the insert.
An orderinglist
can automate this and manage the 'position' attribute on all
related bullets for you.
mapper(Slide, slides_table, properties={ 'bullets': relation(Bullet, collection_class=ordering_list('position'), order_by=[bullets_table.c.position]) }) mapper(Bullet, bullets_table) s = Slide() s.bullets.append(Bullet()) s.bullets.append(Bullet()) s.bullets[1].position >>> 1 s.bullets.insert(1, Bullet()) s.bullets[2].position >>> 2
Use the ordering_list
function to set up the collection_class
on relations
(as in the mapper example above). This implementation depends on the list
starting in the proper order, so be SURE to put an order_by on your relation.
ordering_list
takes the name of the related object's ordering attribute as
an argument. By default, the zero-based integer index of the object's
position in the ordering_list
is synchronized with the ordering attribute:
index 0 will get position 0, index 1 position 1, etc. To start numbering at 1
or some other integer, provide count_from=1
.
Ordering values are not limited to incrementing integers. Almost any scheme
can implemented by supplying a custom ordering_func
that maps a Python list
index to any value you require. See the module
documentation for more
information, and also check out the unit tests for examples of stepped
numbering, alphabetical and Fibonacci numbering.
Author: Jonathan Ellis
SqlSoup creates mapped classes on the fly from tables, which are automatically reflected from the database based on name. It is essentially a nicer version of the "row data gateway" pattern.
>>> from sqlalchemy.ext.sqlsoup import SqlSoup >>> soup = SqlSoup('sqlite:///') >>> db.users.select(order_by=[db.users.c.name]) [MappedUsers(name='Bhargan Basepair',email='basepair@example.edu',password='basepair',classname=None,admin=1), MappedUsers(name='Joe Student',email='student@example.edu',password='student',classname=None,admin=0)]
Full SqlSoup documentation is on the SQLAlchemy Wiki.
back to section topA lot of our extensions are deprecated. But this is a good thing. Why ? Because all of them have been refined and focused, and rolled into the core of SQLAlchemy (or in the case of ActiveMapper
, it's become Elixir). So they aren't removed, they've just graduated into fully integrated features. Below we describe a set of extensions which are present in 0.4 but are deprecated.
Author: Jonas Borgström
NOTE: As of verison 0.3.6 of SQLAlchemy, most behavior of SelectResults
has been rolled into the base Query
object. Explicit usage of SelectResults
is therefore no longer needed.
SelectResults
gives transformative behavior to the results returned from the select
and select_by
methods of Query
.
from sqlalchemy.ext.selectresults import SelectResults query = session.query(MyClass) res = SelectResults(query) res = res.filter(table.c.column == "something") # adds a WHERE clause (or appends to the existing via "and") res = res.order_by([table.c.column]) # adds an ORDER BY clause for x in res[:10]: # Fetch and print the top ten instances - adds OFFSET 0 LIMIT 10 or equivalent print x.column2 # evaluate as a list, which executes the query x = list(res) # Count how many instances that have column2 > 42 # and column == "something" print res.filter(table.c.column2 > 42).count() # select() is a synonym for filter() session.query(MyClass).select(mytable.c.column=="something").order_by([mytable.c.column])[2:7]
An important facet of SelectResults is that the actual SQL execution does not occur until the object is used in a list or iterator context. This means you can call any number of transformative methods (including filter
, order_by
, list range expressions, etc) before any SQL is actually issued.
Configuration of SelectResults may be per-Query, per Mapper, or per application:
from sqlalchemy.ext.selectresults import SelectResults, SelectResultsExt # construct a SelectResults for an individual Query sel = SelectResults(session.query(MyClass)) # construct a Mapper where the Query.select()/select_by() methods will return a SelectResults: mapper(MyClass, mytable, extension=SelectResultsExt()) # globally configure all Mappers to return SelectResults, using the "selectresults" mod import sqlalchemy.mods.selectresults
SelectResults greatly enhances querying and is highly recommended. For example, heres an example of constructing a query using a combination of joins and outerjoins:
mapper(User, users_table, properties={ 'orders':relation(mapper(Order, orders_table, properties={ 'items':relation(mapper(Item, items_table)) })) }) session = create_session() query = SelectResults(session.query(User)) result = query.outerjoin_to('orders').outerjoin_to('items').select(or_(Order.c.order_id==None,Item.c.item_id==2))
For a full listing of methods, see the generated documentation.
back to section topAuthor: Daniel Miller
The SessionContext
extension is still available in the 0.4 release of SQLAlchemy, but has been deprecated in favor of the scoped_session() function, which provides a class-like object that constructs a Session
on demand which references a thread-local scope.
For docs on SessionContext
, see the SQLAlchemy 0.3 documentation.
Author: Mike Bayer
The assignmapper
extension is still available in the 0.4 release of SQLAlchemy, but has been deprecated in favor of the scoped_session() function, which provides a mapper
callable that works similarly to assignmapper
.
For docs on assignmapper
, see the SQLAlchemy 0.3 documentation.
Author: Jonathan LaCour
Please note that ActiveMapper has been deprecated in favor of Elixir, a more comprehensive solution to declarative mapping, of which Jonathan is a co-author.
ActiveMapper is a so-called "declarative layer" which allows the construction of a class, a Table
, and a Mapper
all in one step:
class Person(ActiveMapper): class mapping: id = column(Integer, primary_key=True) full_name = column(String) first_name = column(String) middle_name = column(String) last_name = column(String) birth_date = column(DateTime) ssn = column(String) gender = column(String) home_phone = column(String) cell_phone = column(String) work_phone = column(String) prefs_id = column(Integer, foreign_key=ForeignKey('preferences.id')) addresses = one_to_many('Address', colname='person_id', backref='person') preferences = one_to_one('Preferences', colname='pref_id', backref='person') def __str__(self): s = '%s\n' % self.full_name s += ' * birthdate: %s\n' % (self.birth_date or 'not provided') s += ' * fave color: %s\n' % (self.preferences.favorite_color or 'Unknown') s += ' * personality: %s\n' % (self.preferences.personality_type or 'Unknown') for address in self.addresses: s += ' * address: %s\n' % address.address_1 s += ' %s, %s %s\n' % (address.city, address.state, address.postal_code) return s class Preferences(ActiveMapper): class mapping: __table__ = 'preferences' id = column(Integer, primary_key=True) favorite_color = column(String) personality_type = column(String) class Address(ActiveMapper): class mapping: id = column(Integer, primary_key=True) type = column(String) address_1 = column(String) city = column(String) state = column(String) postal_code = column(String) person_id = column(Integer, foreign_key=ForeignKey('person.id'))
More discussion on ActiveMapper can be found at Jonathan LaCour's Blog as well as the SQLAlchemy Wiki.
back to section topThis is the MIT license: http://www.opensource.org/licenses/mit-license.php
Copyright (c) 2005, 2006, 2007 Michael Bayer and contributors. SQLAlchemy is a trademark of Michael Bayer.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
back to section top