SQLAlchemy features a lot of tools and patterns to help in every area of writing applications that talk to relational databases. To achieve this, it has a lot of areas of functionality which work together to provide a cohesive package. Ultimately, just a little bit of familiarity with each concept is all thats needed to get off the ground.
That said, here's two quick links that summarize the two most prominent features of SQLAlchemy:
For a comprehensive tour through all of SQLAlchemy's components, below is a "Trail Map" of the knowledge dependencies between these components indicating the order in which concepts may be learned. Concepts marked in bold indicate features that are useful on their own.
Start | | |--- Connection Pooling | | | | | |------ Connection Pool Configuration | | | | +--- Establishing a Database Engine | | | | | |--------- Database Engine Options | | +---- Describing Tables with MetaData | | |---- Creating and Dropping Database Tables | | |---- Constructing SQL Queries via Python Expressions | | | | +---- Basic Data Mapping | | | | | | | | Unit of Work | | | | | | | | +----------- Advanced Data Mapping | +----- The Types System
Note:This section describes the connection pool module of SQLAlchemy, which is the smallest component of the library that can be used on its own. If you are interested in using SQLAlchemy for query construction or Object Relational Mapping, this module is automatically managed behind the scenes; you can skip ahead to Database Engines in that case.
At the base of any database helper library is a system of efficiently acquiring connections to the database. Since the establishment of a database connection is typically a somewhat expensive operation, an application needs a way to get at database connections repeatedly without incurring the full overhead each time. Particularly for server-side web applications, a connection pool is the standard way to maintain a "pool" of database connections which are used over and over again among many requests. Connection pools typically are configured to maintain a certain "size", which represents how many connections can be used simultaneously without resorting to creating more newly-established connections.
SQLAlchemy includes a pooling module that can be used completely independently of the rest of the toolset. This section describes how it can be used on its own, as well as the available options. If SQLAlchemy is being used more fully, the connection pooling described below occurs automatically. The options are still available, though, so this core feature is a good place to start.
import sqlalchemy.pool as pool import psycopg2 as psycopg psycopg = pool.manage(psycopg) # then connect normally connection = psycopg.connect(database='test', username='scott', password='tiger')
This produces a sqlalchemy.pool.DBProxy object which supports the same connect() function as the original DBAPI module. Upon connection, a thread-local connection proxy object is returned, which delegates its calls to a real DBAPI connection object. This connection object is stored persistently within a connection pool (an instance of sqlalchemy.pool.Pool) that corresponds to the exact connection arguments sent to the connect() function. The connection proxy also returns a proxied cursor object upon calling connection.cursor(). When all cursors as well as the connection proxy are de-referenced, the connection is automatically made available again by the owning pool object.
Basically, the connect() function is used in its usual way, and the pool module transparently returns thread-local pooled connections. Each distinct set of connect arguments corresponds to a brand new connection pool created; in this way, an application can maintain connections to multiple schemas and/or databases, and each unique connect argument set will be managed by a different pool object.
When proxying a DBAPI module through the pool module, options exist for how the connections should be pooled:
A database engine is a subclass of sqlalchemy.engine.SQLEngine, and is the starting point for where SQLAlchemy provides a layer of abstraction on top of the various DBAPI2 database modules. It serves as an abstract factory for database-specific implementation objects as well as a layer of abstraction over the most essential tasks of a database connection, including connecting, executing queries, returning result sets, and managing transactions.
The average developer doesn't need to know anything about the interface or workings of a SQLEngine in order to use it. Simply creating one, and then specifying it when constructing tables and other SQL objects is all that's needed.
A SQLEngine is also a layer of abstraction on top of the connection pooling described in the previous section. While a DBAPI connection pool can be used explicitly alongside a SQLEngine, its not really necessary. Once you have a SQLEngine, you can retrieve pooled connections directly from its underlying connection pool via its own connection() method. However, if you're exclusively using SQLALchemy's SQL construction objects and/or object-relational mappers, all the details of connecting are handled by those libraries automatically.
Engines exist for SQLite, Postgres, MySQL, and Oracle, using the Pysqlite, Psycopg (1 or 2), MySQLDB, and cx_Oracle modules. Each engine imports its corresponding module which is required to be installed. For Postgres and Oracle, an alternate module may be specified at construction time as well.
An example of connecting to each engine is as follows:
from sqlalchemy import * # sqlite in memory sqlite_engine = create_engine('sqlite', {'filename':':memory:'}, **opts) # sqlite using a file sqlite_engine = create_engine('sqlite', {'filename':'querytest.db'}, **opts) # postgres postgres_engine = create_engine('postgres', {'database':'test', 'host':'127.0.0.1', 'user':'scott', 'password':'tiger'}, **opts) # mysql mysql_engine = create_engine('mysql', { 'db':'mydb', 'user':'scott', 'passwd':'tiger', 'host':'127.0.0.1' } **opts) # oracle oracle_engine = create_engine('oracle', {'dsn':'mydsn', 'user':'scott', 'password':'tiger'}, **opts)
Note that the general form of connecting to an engine is:
engine = create_engine( <enginename>, {<named DBAPI arguments>}, <sqlalchemy options> )
The second argument is a dictionary whose key/value pairs will be passed to the underlying DBAPI connect() method as keyword arguments. Any keyword argument supported by the DBAPI module can be in this dictionary.
Engines can also be loaded by URL. The above format is converted into <enginename>://key=val&key=val:
sqlite_engine = create_engine('sqlite://filename=querytest.db') postgres_engine = create_engine('postgres://database=test&user=scott&password=tiger')
A few useful methods off the SQLEngine are described here:
engine = create_engine('postgres://hostname=localhost&user=scott&password=tiger&database=test') # get a pooled DBAPI connection conn = engine.connection() # create/drop tables based on table metadata objects # (see the next section, Table Metadata, for info on table metadata) engine.create(mytable) engine.drop(mytable) # get the DBAPI module being used dbapi = engine.dbapi() # get the default schema name name = engine.get_default_schema_name() # execute some SQL directly, returns a ResultProxy (see the SQL Construction section for details) result = engine.execute("select * from table where col1=:col1", {'col1':'foo'}) # log a message to the engine's log stream engine.log('this is a message')
The remaining arguments to create_engine are keyword arguments that are passed to the specific subclass of sqlalchemy.engine.SQLEngine being used, as well as the underlying sqlalchemy.pool.Pool instance. All of the options described in the previous section Connection Pool Configuration can be specified, as well as engine-specific options:
The ProxyEngine is useful for applications that need to swap engines at runtime, or to create their tables and mappers before they know what engine they will use. One use case is an application meant to be pluggable into a mix of other applications, such as a WSGI application. Well-behaved WSGI applications should be relocatable; and since that means that two versions of the same application may be running in the same process (or in the same thread at different times), WSGI applications ought not to depend on module-level or global configuration. Using the ProxyEngine allows a WSGI application to define tables and mappers in a module, but keep the specific database connection uri as an application instance or thread-local value.
The ProxyEngine is used in the same way as any other engine, with one additional method:
# define the tables and mappers from sqlalchemy import * from sqlalchemy.ext.proxy import ProxyEngine engine = ProxyEngine() users = Table('users', engine, ... ) class Users(object): pass assign_mapper(Users, users) def app(environ, start_response): # later, connect the proxy engine to a real engine via the connect() method engine.connect(environ['db_uri']) # now you have a real db connection and can select, insert, etc.
A SQLEngine also provides an interface to the transactional capabilities of the underlying DBAPI connection object, as well as the connection object itself. Note that when using the object-relational-mapping package, described in a later section, basic transactional operation is handled for you automatically by its "Unit of Work" system; the methods described here will usually apply just to literal SQL update/delete/insert operations or those performed via the SQL construction library.
Typically, a connection is opened with "autocommit=False". So to perform SQL operations and just commit as you go, you can simply pull out a connection from the connection pool, keep it in the local scope, and call commit() on it as needed. As long as the connection remains referenced, all other SQL operations within the same thread will use this same connection, including those used by the SQL construction system as well as the object-relational mapper, both described in later sections:
conn = engine.connection() # execute SQL via the engine engine.execute("insert into mytable values ('foo', 'bar')") conn.commit() # execute SQL via the SQL construction library mytable.insert().execute(col1='bat', col2='lala') conn.commit()
There is a more automated way to do transactions, and that is to use the engine's begin()/commit() functionality. When the begin() method is called off the engine, a connection is checked out from the pool and stored in a thread-local context. That way, all subsequent SQL operations within the same thread will use that same connection. Subsequent commit() or rollback() operations are performed against that same connection. In effect, its a more automated way to perform the "commit as you go" example above.
engine.begin() engine.execute("insert into mytable values ('foo', 'bar')") mytable.insert().execute(col1='foo', col2='bar') engine.commit()
A traditional "rollback on exception" pattern looks like this:
engine.begin() try: engine.execute("insert into mytable values ('foo', 'bar')") mytable.insert().execute(col1='foo', col2='bar') except: engine.rollback() raise engine.commit()
An shortcut which is equivalent to the above is provided by the transaction method:
def do_stuff(): engine.execute("insert into mytable values ('foo', 'bar')") mytable.insert().execute(col1='foo', col2='bar') engine.transaction(do_stuff)
An added bonus to the engine's transaction methods is "reentrant" functionality; once you call begin(), subsequent calls to begin() will increment a counter that must be decremented corresponding to each commit() statement before an actual commit can happen. This way, any number of methods that want to insure a transaction can call begin/commit, and be nested arbitrarily:
# method_a starts a transaction and calls method_b def method_a(): engine.begin() try: method_b() except: engine.rollback() raise engine.commit() # method_b starts a transaction, or joins the one already in progress, # and does some SQL def method_b(): engine.begin() try: engine.execute("insert into mytable values ('bat', 'lala')") mytable.insert().execute(col1='bat', col2='lala') except: engine.rollback() raise engine.commit() # call method_a method_a()
Above, method_a is called first, which calls engine.begin(). Then it calls method_b. When method_b calls engine.begin(), it just increments a counter that is decremented when it calls commit(). If either method_a or method_b calls rollback(), the whole transaction is rolled back. The transaction is not committed until method_a calls the commit() method.
The object-relational-mapper capability of SQLAlchemy includes its own commit() method that gathers SQL statements into a batch and runs them within one transaction. That transaction is also invokved within the scope of the "reentrant" methodology above; so multiple objectstore.commit() operations can also be bundled into a larger database transaction via the above methodology.
The core of SQLAlchemy's query and object mapping operations is table metadata, which are Python objects that describe tables. Metadata objects can be created by explicitly naming the table and all its properties, using the Table, Column, ForeignKey, and Sequence objects imported from sqlalchemy.schema, and a database engine constructed as described in the previous section, or they can be automatically pulled from an existing database schema. First, the explicit version:
from sqlalchemy import * engine = create_engine('sqlite', {'filename':':memory:'}, **opts) users = Table('users', engine, Column('user_id', Integer, primary_key = True), Column('user_name', String(16), nullable = False), Column('email_address', String(60), key='email'), Column('password', String(20), nullable = False) ) user_prefs = Table('user_prefs', engine, Column('pref_id', Integer, primary_key=True), Column('user_id', Integer, ForeignKey("users.user_id"), nullable=False), Column('pref_name', String(40), nullable=False), Column('pref_value', String(100)) )
The specific datatypes, such as Integer, String, etc. are defined in sqlalchemy.types and are automatically pulled in when you import * from sqlalchemy. Note that for Column objects, an altername name can be specified via the "key" parameter; if this parameter is given, then all programmatic references to this Column object will be based on its key, instead of its actual column name.
Once constructed, the Table object provides a clean interface to the table's properties as well as that of its columns:
employees = Table('employees', engine, Column('employee_id', Integer, primary_key=True), Column('employee_name', String(60), nullable=False, key='name'), Column('employee_dept', Integer, ForeignKey("departments.department_id")) ) # access the column "EMPLOYEE_ID": employees.columns.employee_id # or just employees.c.employee_id # via string employees.c['employee_id'] # iterate through all columns for c in employees.c: # ... # get the table's primary key columns for primary_key in employees.primary_key: # ... # get the table's foreign key objects: for fkey in employees.foreign_keys: # ... # access the table's SQLEngine object: employees.engine # access a column's name, type, nullable, primary key, foreign key employees.c.employee_id.name employees.c.employee_id.type employees.c.employee_id.nullable employees.c.employee_id.primary_key employees.c.employee_dept.foreign_key # get the "key" of a column, which defaults to its name, but can # be any user-defined string: employees.c.name.key # access a column's table: employees.c.employee_id.table is employees >>> True # get the table related by a foreign key fcolumn = employees.c.employee_dept.foreign_key.column.table
Metadata objects can also be reflected from tables that already exist in the database. Reflection means based on a table name, the names, datatypes, and attributes of all columns, including foreign keys, will be loaded automatically. This feature is supported by all database engines:
>>> messages = Table('messages', engine, autoload = True) >>> [c.name for c in messages.columns] ['message_id', 'message_name', 'date']
Note that if a reflected table has a foreign key referencing another table, then the metadata for the related table will be loaded as well, even if it has not been defined by the application:
>>> shopping_cart_items = Table('shopping_cart_items', engine, autoload = True) >>> print shopping_cart_items.c.cart_id.table.name shopping_carts
To get direct access to 'shopping_carts', simply instantiate it via the Table constructor. You'll get the same instance of the shopping cart Table as the one that is attached to shopping_cart_items:
>>> shopping_carts = Table('shopping_carts', engine) >>> shopping_carts is shopping_cart_items.c.cart_id.table.name True
This works because when the Table constructor is called for a particular name and database engine, if the table has already been created then the instance returned will be the same as the original. This is a singleton constructor:
>>> news_articles = Table('news', engine, ... Column('article_id', Integer, primary_key = True), ... Column('url', String(250), nullable = False) ... ) >>> othertable = Table('news', engine) >>> othertable is news_articles True
Creating and dropping is easy, just use the create() and drop() methods:
Occasionally an application will need to reference the same tables within multiple databases simultaneously. Since a Table object is specific to a SQLEngine, an extra method is provided to create copies of the Table object for a different SQLEngine instance, which can represent a different set of connection parameters, or a totally different database driver:
# create two engines sqlite_engine = create_engine('sqlite', {'filename':'querytest.db'}) postgres_engine = create_engine('postgres', {'database':'test', 'host':'127.0.0.1', 'user':'scott', 'password':'tiger'}) # load 'users' from the sqlite engine users = Table('users', sqlite_engine, autoload=True) # create the same Table object for the other engine pg_users = users.toengine(postgres_engine)
You can also create tables using a "database neutral" engine, which can serve as a starting point for tables that are then adapted to specific engines:
import sqlalchemy.ansisql as ansisql generic_engine = ansisql.engine() users = Table('users', generic_engine, Column('user_id', Integer), Column('user_name', String(50)) ) sqlite_engine = create_engine('sqlite', {'filename':'querytest.db'}) sqlite_users = users.toengine(sqlite_engine) sqlite_users.create()
A table with a sequence looks like:
table = Table("cartitems", db, Column("cart_id", Integer, Sequence('cart_id_seq'), primary_key=True), Column("description", String(40)), Column("createdate", DateTime()) )
Defining a Sequence means that it will be created along with the table.create() call, and more importantly the sequence will be explicitly used when inserting new rows for this table. For databases that dont support sequences, the Sequence object has no effect. A sequence can also be specified with optional=True which indicates the Sequence should only be used on a database that requires an explicit sequence (which currently is just Oracle). A database like Postgres, while it uses sequences to create primary keys, is often used via the SERIAL column option which removes the need for explicit access to the sequence.
Note: This section describes how to use SQLAlchemy to construct SQL queries and receive result sets. It does not cover the object relational mapping capabilities of SQLAlchemy; that is covered later on in Basic Data Mapping. However, both areas of functionality work similarly in how selection criterion is constructed, so if you are interested just in ORM, you should probably skim through basic WHERE Clause construction before moving on.
Once you have used the sqlalchemy.schema module to construct your tables and/or reflect them from the database, performing SQL queries using those table meta data objects is done via the sqlalchemy.sql package. This package defines a large set of classes, each of which represents a particular kind of lexical construct within a SQL query; all are descendants of the common base class sqlalchemy.sql.ClauseElement. A full query is represented via a structure of ClauseElements. A set of reasonably intuitive creation functions is provided by the sqlalchemy.sql package to create these structures; these functions are described in the rest of this section.
To execute a query, you create its structure, then call the resulting structure's execute() method, which returns a cursor-like object (more on that later). The same clause structure can be used repeatedly. A ClauseElement is compiled into a string representation by an underlying SQLEngine object, which is located by searching through the clause's child items for a Table object, which provides a reference to its SQLEngine.
The examples below all include a dump of the generated SQL corresponding to the query object, as well as a dump of the statement's bind parameters. In all cases, bind parameters are named parameters using the colon format (i.e. ':name'). A named parameter scheme, either ':name' or '%(name)s', is used with all databases, including those that use positional schemes. For those, the named-parameter statement and its bind values are converted to the proper list-based format right before execution. Therefore a SQLAlchemy application that uses ClauseElements can standardize on named parameters for all databases.
For this section, we will assume the following tables:
from sqlalchemy import * db = create_engine('sqlite://filename=mydb', echo=True) # a table to store users users = Table('users', db, Column('user_id', Integer, primary_key = True), Column('user_name', String(40)), Column('password', String(80)) ) # a table that stores mailing addresses associated with a specific user addresses = Table('addresses', db, Column('address_id', Integer, primary_key = True), Column('user_id', Integer, ForeignKey("users.user_id")), Column('street', String(100)), Column('city', String(80)), Column('state', String(2)), Column('zip', String(10)) ) # a table that stores keywords keywords = Table('keywords', db, Column('keyword_id', Integer, primary_key = True), Column('name', VARCHAR(50)) ) # a table that associates keywords with users userkeywords = Table('userkeywords', db, Column('user_id', INT, ForeignKey("users")), Column('keyword_id', INT, ForeignKey("keywords")) )
A select is done by constructing a Select object with the proper arguments, adding any extra arguments if desired, then calling its execute() method.
from sqlalchemy import * # use the select() function defined in the sql package s = select([users]) # or, call the select() method off of a Table object s = users.select() # then, call execute on the Select object: sqlc = s.execute()
# the SQL text of any clause object can also be viewed via the str() call: >>> str(s) SELECT users.user_id, users.user_name, users.password FROM users
The object returned by the execute call is a sqlalchemy.engine.ResultProxy object, which acts much like a DBAPI cursor object in the context of a result set, except that the rows returned can address their columns by ordinal position, column name, or even column object:
# select rows, get resulting ResultProxy object sqlc = users.select().execute()
# get one row row = c.fetchone() # get the 'user_id' column via integer index: user_id = row[0] # or column name user_name = row['user_name'] # or column object password = row[users.c.password] # or column accessor password = row.password # ResultProxy object also supports fetchall() rows = c.fetchall() # or get the underlying DBAPI cursor object cursor = c.cursor
A common need when writing statements that reference multiple tables is to create labels for columns, thereby separating columns from different tables with the same name. The Select construct supports automatic generation of column labels via the use_labels=True parameter:
sqlc = select([users, addresses], users.c.user_id==addresses.c.address_id, use_labels=True).execute()
The table name part of the label is affected if you use a construct such as a table alias:
person = users.alias('person') sqlc = select([person, addresses], person.c.user_id==addresses.c.address_id, use_labels=True).execute()
You can also specify custom labels on a per-column basis using the label() function:
sqlc = select([users.c.user_id.label('id'), users.c.user_name.label('name')]).execute()
Calling select off a table automatically generates a column clause which includes all the table's columns, in the order they are specified in the source Table object.
But in addition to selecting all the columns off a single table, any set of columns can be specified, as well as full tables, and any combination of the two:
The WHERE condition is the named keyword argument whereclause, or the second positional argument to the select() constructor and the first positional argument to the select() method of Table.
WHERE conditions are constructed using column objects, literal values, and functions defined in the sqlalchemy.sql module. Column objects override the standard Python operators to provide clause compositional objects, which compile down to SQL operations:
sqlc = users.select(users.c.user_id == 7).execute()
Notice that the literal value "7" was broken out of the query and placed into a bind parameter. Databases such as Oracle must parse incoming SQL and create a "plan" when new queries are received, which is an expensive process. By using bind parameters, the same query with various literal values can have its plan compiled only once, and used repeatedly with less overhead.
More where clauses:
# another comparison operator sqlc = select([users], users.c.user_id>7).execute()
# OR keyword sqlc = users.select(or_(users.c.user_name=='jack', users.c.user_name=='ed')).execute()
# AND keyword sqlc = users.select(and_(users.c.user_name=='jack', users.c.password=='dog')).execute()
# NOT keyword sqlc = users.select(not_( or_(users.c.user_name=='jack', users.c.password=='dog') )).execute()
# IN clause sqlc = users.select(users.c.user_name.in_('jack', 'ed', 'fred')).execute()
# join users and addresses together sqlc = select([users, addresses], users.c.user_id==addresses.c.address_id).execute()
# join users and addresses together, but dont specify "addresses" in the # selection criterion. The WHERE criterion adds it to the FROM list # automatically. sqlc = select([users], and_( users.c.user_id==addresses.c.user_id, users.c.user_name=='fred' )).execute()
Select statements can also generate a WHERE clause based on the parameters you give it. If a given parameter, which matches the name of a column or its "label" (the combined tablename + "_" + column name), and does not already correspond to a bind parameter in the select object, it will be added as a comparison against that column. This is a shortcut to creating a full WHERE clause:
Supported column operators so far are all the numerical comparison operators, i.e. '==', '>', '>=', etc., as well as like(), startswith(), endswith(), between(), and in(). Boolean operators include not_(), and_() and or_(), which also can be used inline via '~', '&', and '|'. Math operators are '+', '-', '*', '/'. Any custom operator can be specified via the op() function shown below.
# "like" operator users.select(users.c.user_name.like('%ter')) # equality operator users.select(users.c.user_name == 'jane') # in opertator users.select(users.c.user_id.in_(1,2,3)) # and_, endswith, equality operators users.select(and_(addresses.c.street.endswith('green street'), addresses.c.zip=='11234')) # & operator subsituting for 'and_' users.select(addresses.c.street.endswith('green street') & (addresses.c.zip=='11234')) # + concatenation operator select([users.c.user_name + '_name']) # NOT operator users.select(~(addresses.c.street == 'Green Street')) # any custom operator select([users.c.user_name.op('||')('_category')])
For queries that don't contain any tables, the SQLEngine can be specified to any constructed statement via the engine keyword parameter:
# select a literal select(["hi"], engine=myengine) # select a function select([func.now()], engine=db)
You can drop in a literal value anywhere there isnt a column to attach to via the literal keyword:
sqlselect([literal('foo') + literal('bar'), users.c.user_name]).execute()
# literals have all the same comparison functions as columns sqlselect([literal('foo') == literal('bar')], engine=myengine).scalar()
Literals also take an optional type parameter to give literals a type. This can sometimes be significant, for example when using the "+" operator with SQLite, the String type is detected and the operator is converted to "||":
sqlselect([literal('foo', type=String) + 'bar'], engine=e).execute()
The ORDER BY clause of a select statement can be specified as individual columns to order by within an array specified via the order_by parameter, and optional usage of the asc() and desc() functions:
As some of the examples indicated above, a regular inner join can be implicitly stated, just like in a SQL expression, by just specifying the tables to be joined as well as their join conditions:
sqladdresses.select(addresses.c.user_id==users.c.user_id).execute()
There is also an explicit join constructor, which can be embedded into a select query via the from_obj parameter of the select statement:
sqladdresses.select(from_obj=[ addresses.join(users, addresses.c.user_id==users.c.user_id) ]).execute()
The join constructor can also be used by itself:
sqljoin(users, addresses, users.c.user_id==addresses.c.user_id).select().execute()
The join criterion in a join() call is optional. If not specified, the condition will be derived from the foreign key relationships of the two tables. If no criterion can be constructed, an exception will be raised.
sqljoin(users, addresses).select().execute()
Notice that this is the first example where the FROM criterion of the select statement is explicitly specified. In most cases, the FROM criterion is automatically determined from the columns requested as well as the WHERE clause. The from_obj keyword argument indicates a list of explicit FROM clauses to be used in the statement.
A join can be created on its own using the join or outerjoin functions, or can be created off of an existing Table or other selectable unit via the join or outerjoin methods:
Aliases are used primarily when you want to use the same table more than once as a FROM expression in a statement:
address_b = addresses.alias('addressb') sql# select users who have an address on Green street as well as Orange street users.select(and_( users.c.user_id==addresses.c.user_id, addresses.c.street.like('%Green%'), users.c.user_id==address_b.c.user_id, address_b.c.street.like('%Orange%') )).execute()
SQLAlchemy allows the creation of select statements from not just Table objects, but from a whole class of objects that implement the Selectable interface. This includes Tables, Aliases, Joins and Selects. Therefore, if you have a Select, you can select from the Select:
>>> s = users.select() >>> str(s) SELECT users.user_id, users.user_name, users.password FROM users >>> s = s.select() >>> str(s) SELECT user_id, user_name, password FROM (SELECT users.user_id, users.user_name, users.password FROM users)
Any Select, Join, or Alias object supports the same column accessors as a Table:
>>> s = users.select() >>> [c.key for c in s.columns] ['user_id', 'user_name', 'password']
When you use use_labels=True in a Select object, the label version of the column names become the keys of the accessible columns. In effect you can create your own "view objects":
s = select([users, addresses], users.c.user_id==addresses.c.user_id, use_labels=True) sqlselect([ s.c.users_user_name, s.c.addresses_street, s.c.addresses_zip ], s.c.addresses_city=='San Francisco').execute()
To specify a SELECT statement as one of the selectable units in a FROM clause, it usually should be given an alias.
sqls = users.select().alias('u') select([addresses, s]).execute()
Select objects can be used in a WHERE condition, in operators such as IN:
# select user ids for all users whos name starts with a "p" s = select([users.c.user_id], users.c.user_name.like('p%')) # now select all addresses for those users sqladdresses.select(addresses.c.address_id.in_(s)).execute()
The sql package supports embedding select statements into other select statements as the criterion in a WHERE condition, or as one of the "selectable" objects in the FROM list of the query. It does not at the moment directly support embedding a SELECT statement as one of the column criterion for a statement, although this can be achieved via direct text insertion, described later.
When a select object is embedded inside of another select object, and both objects reference the same table, SQLAlchemy makes the assumption that the table should be correlated from the child query to the parent query.
# make an alias of a regular select. s = select([addresses.c.street], addresses.c.user_id==users.c.user_id).alias('s') >>> str(s) SELECT addresses.street FROM addresses, users WHERE addresses.user_id = users.user_id # now embed that select into another one. the "users" table is removed from # the embedded query's FROM list and is instead correlated to the parent query s2 = select([users, s.c.street]) >>> str(s2) SELECT users.user_id, users.user_name, users.password, s.street FROM users, (SELECT addresses.street FROM addresses WHERE addresses.user_id = users.user_id) s
An EXISTS clause can function as a higher-scaling version of an IN clause, and is usually used in a correlated fashion:
# find all users who have an address on Green street: sqlusers.select( exists( [addresses.c.address_id], and_( addresses.c.user_id==users.c.user_id, addresses.c.street.like('%Green%') ) ))
Unions come in two flavors, UNION and UNION ALL, which are available via module level functions or methods off a Selectable:
sqlunion( addresses.select(addresses.c.street=='123 Green Street'), addresses.select(addresses.c.street=='44 Park Ave.'), addresses.select(addresses.c.street=='3 Mill Road'), order_by=[addresses.c.street] ).execute()
sqlusers.select( users.c.user_id==7 ).union_all( users.select( users.c.user_id==9 ), order_by=[users.c.user_id] # order_by is an argument to union_all() ).execute()
Throughout all these examples, SQLAlchemy is busy creating bind parameters wherever literal expressions occur. You can also specify your own bind parameters with your own names, and use the same statement repeatedly. As mentioned at the top of this section, named bind parameters are always used regardless of the type of DBAPI being used; for DBAPI's that expect positional arguments, bind parameters are converted to lists right before execution, and Pyformat strings in statements, i.e. '%(name)s', are converted to the appropriate positional style.
s = users.select(users.c.user_name==bindparam('username')) sqls.execute(username='fred')
sqls.execute(username='jane')
sqls.execute(username='mary')
executemany() is also available, but that applies more to INSERT/UPDATE/DELETE, described later.
The generation of bind parameters is performed specific to the engine being used. The examples in this document all show "named" parameters like those used in sqlite and oracle. Depending on the parameter type specified by the DBAPI module, the correct bind parameter scheme will be used.
By throwing the compile() method onto the end of any query object, the query can be "compiled" by the SQLEngine into a sqlalchemy.sql.Compiled object just once, and the resulting compiled object reused, which eliminates repeated internal compilation of the SQL string:
s = users.select(users.c.user_name==bindparam('username')).compile() s.execute(username='fred') s.execute(username='jane') s.execute(username='mary')
The sql package tries to allow free textual placement in as many ways as possible. In the examples below, note that the from_obj parameter is used only when no other information exists within the select object with which to determine table metadata. Also note that in a query where there isnt even table metadata used, the SQLEngine to be used for the query has to be explicitly specified:
# strings as column clauses sqlselect(["user_id", "user_name"], from_obj=[users]).execute()
# strings for full column lists sqlselect( ["user_id, user_name, password, addresses.*"], from_obj=[users.alias('u'), addresses]).execute()
# functions, etc. sqlselect([users.c.user_id, "process_string(user_name)"]).execute()
# where clauses sqlusers.select(and_(users.c.user_id==7, "process_string(user_name)=27")).execute()
# subqueries sqlusers.select( "exists (select 1 from addresses where addresses.user_id=users.user_id)").execute()
# custom FROM objects sqlselect( ["*"], from_obj=["(select user_id, user_name from users)"], engine=db).execute()
# a full query sqltext("select user_name from users", engine=db).execute()
# or call text() off of the engine engine.text("select user_name from users").execute() # execute off the engine directly - you must use the engine's native bind parameter # style (i.e. named, pyformat, positional, etc.) sqldb.execute( "select user_name from users where user_id=:user_id", {'user_id':7}).execute()
Use the format :<paramname> to define bind parameters inside of a text block. They will be converted to the appropriate format upon compilation:
t = engine.text("select foo from mytable where lala=:hoho") r = t.execute(hoho=7)
Bind parameters can also be explicit, which allows typing information to be added. Just specify them as a list with keys that match those inside the textual statement:
t = engine.text("select foo from mytable where lala=:hoho", bindparams=[bindparam('hoho', type=types.String)]) r = t.execute(hoho="im hoho")
Result-row type processing can be added via the typemap argument, which is a dictionary of return columns mapped to types:
# specify DateTime type for the 'foo' column in the result set # sqlite, for example, uses result-row post-processing to construct dates t = engine.text("select foo from mytable where lala=:hoho", bindparams=[bindparam('hoho', type=types.String)], typemap={'foo':types.DateTime} ) r = t.execute(hoho="im hoho") # 'foo' is a datetime year = r.fetchone()['foo'].year
One of the primary motivations for a programmatic SQL library is to allow the piecemeal construction of a SQL statement based on program variables. All the above examples typically show Select objects being created all at once. The Select object also includes "builder" methods to allow building up an object. The below example is a "user search" function, where users can be selected based on primary key, user name, street address, keywords, or any combination:
def find_users(id=None, name=None, street=None, keywords=None): statement = users.select() if id is not None: statement.append_whereclause(users.c.user_id==id) if name is not None: statement.append_whereclause(users.c.user_name==name) if street is not None: # append_whereclause joins "WHERE" conditions together with AND statement.append_whereclause(users.c.user_id==addresses.c.user_id) statement.append_whereclause(addresses.c.street==street) if keywords is not None: statement.append_from( users.join(userkeywords, users.c.user_id==userkeywords.c.user_id).join( keywords, userkeywords.c.keyword_id==keywords.c.keyword_id)) statement.append_whereclause(keywords.c.name.in_(keywords)) # to avoid multiple repeats, set query to be DISTINCT: statement.distinct=True return statement.execute() sqlfind_users(id=7)
sqlfind_users(street='123 Green Street')
sqlfind_users(name='Jack', keywords=['jack','foo'])
An INSERT involves just one table. The Insert object is used via the insert() function, and the specified columns determine what columns show up in the generated SQL. If primary key columns are left out of the criterion, the SQL generator will try to populate them as specified by the particular database engine and sequences, i.e. relying upon an auto-incremented column or explicitly calling a sequence beforehand. Insert statements, as well as updates and deletes, can also execute multiple parameters in one pass via specifying an array of dictionaries as parameters.
The values to be populated for an INSERT or an UPDATE can be specified to the insert()/update() functions as the values named argument, or the query will be compiled based on the values of the parameters sent to the execute() method.
# basic insert sqlusers.insert().execute(user_id=1, user_name='jack', password='asdfdaf')
# insert just user_name, NULL for others # will auto-populate primary key columns if they are configured # to do so sqlusers.insert().execute(user_name='ed')
# INSERT with a list: sqlusers.insert(values=(3, 'jane', 'sdfadfas')).execute()
# INSERT with user-defined bind parameters i = users.insert( values={'user_name':bindparam('name'), 'password':bindparam('pw')} ) sqli.execute(name='mary', pw='adas5fs')
# INSERT many - if no explicit 'values' parameter is sent, # the first parameter list in the list determines # the generated SQL of the insert (i.e. what columns are present) # executemany() is used at the DBAPI level sqlusers.insert().execute( {'user_id':7, 'user_name':'jack', 'password':'asdfasdf'} {'user_id':8, 'user_name':'ed', 'password':'asdffcadf'} {'user_id':9, 'user_name':'fred', 'password':'asttf'} )
Updates work a lot like INSERTS, except there is an additional WHERE clause that can be specified.
# change 'jack' to 'ed' sqlusers.update(users.c.user_name=='jack').execute(user_name='ed')
# use bind parameters u = users.update(users.c.user_name==bindparam('name'), values={'user_name':bindparam('newname')}) sqlu.execute(name='jack', newname='ed')
# update a column to another column sqlusers.update(values={users.c.password:users.c.user_name}).execute()
# multi-update sqlusers.update(users.c.user_id==bindparam('id')).execute( {'id':7, 'user_name':'jack', 'password':'fh5jks'}, {'id':8, 'user_name':'ed', 'password':'fsr234ks'}, {'id':9, 'user_name':'mary', 'password':'7h5jse'}, )
A correlated update lets you update a table using selection from another table, or the same table:
s = select([addresses.c.city], addresses.c.user_id==users.c.user_id) sqlusers.update( and_(users.c.user_id>10, users.c.user_id<20), values={users.c.user_name:s} ).execute()
A delete is formulated like an update, except theres no values:
users.delete(users.c.user_id==7).execute() users.delete(users.c.user_name.like(bindparam('name'))).execute( {'name':'%Jack%'}, {'name':'%Ed%'}, {'name':'%Jane%'}, ) users.delete(exists())
Data mapping describes the process of defining Mapper objects, which associate table metadata with user-defined classes. The Mapper's role is to perform SQL operations upon the database, associating individual table rows with instances of those classes, and individual database columns with properties upon those instances, to transparently associate in-memory objects with a persistent database representation.
When a Mapper is created to associate a Table object with a class, all of the columns defined in the Table object are associated with the class via property accessors, which add overriding functionality to the normal process of setting and getting object attributes. These property accessors also keep track of changes to object attributes; these changes will be stored to the database when the application "commits" the current transactional context (known as a Unit of Work). The __init__() method of the object is also decorated to communicate changes when new instances of the object are created.
The Mapper also provides the interface by which instances of the object are loaded from the database. The primary method for this is its select() method, which has similar arguments to a sqlalchemy.sql.Select object. But this select method executes automatically and returns results, instead of awaiting an execute() call. Instead of returning a cursor-like object, it returns an array of objects.
The three elements to be defined, i.e. the Table metadata, the user-defined class, and the Mapper, are typically defined as module-level variables, and may be defined in any fashion suitable to the application, with the only requirement being that the class and table metadata are described before the mapper. For the sake of example, we will be defining these elements close together, but this should not be construed as a requirement; since SQLAlchemy is not a framework, those decisions are left to the developer or an external framework.
This is the simplest form of a full "round trip" of creating table meta data, creating a class, mapping the class to the table, getting some results, and saving changes. For each concept, the following sections will dig in deeper to the available capabilities.
from sqlalchemy import * # engine engine = create_engine("sqlite://mydb.db") # table metadata users = Table('users', engine, Column('user_id', Integer, primary_key=True), Column('user_name', String(16)), Column('password', String(20)) ) # class definition class User(object): pass # create a mapper usermapper = mapper(User, users) # select sqluser = usermapper.select_by(user_name='fred')[0]
# modify user.user_name = 'fred jones' # commit - saves everything that changed sqlobjectstore.commit()
For convenience's sake, the Mapper can be attached as an attribute on the class itself as well:
User.mapper = mapper(User, users) userlist = User.mapper.select_by(user_id=12)
There is also a full-blown "monkeypatch" function that creates a primary mapper, attaches the above mapper class property, and also the methods get, get_by, select, select_by, selectone, commit and delete:
assign_mapper(User, users) userlist = User.select_by(user_id=12)
Other methods of associating mappers and finder methods with their corresponding classes, such as via common base classes or mixins, can be devised as well. SQLAlchemy does not aim to dictate application architecture and will always allow the broadest variety of architectural patterns, but may include more helper objects and suggested architectures in the future.
There are a variety of ways to select from a mapper. These range from minimalist to explicit. Below is a synopsis of the these methods:
# select_by, using property names or column names as keys # the keys are grouped together by an AND operator result = mapper.select_by(name='john', street='123 green street') # select_by can also combine SQL criterion with key/value properties result = mapper.select_by(users.c.user_name=='john', addresses.c.zip_code=='12345, street='123 green street') # get_by, which takes the same arguments as select_by # returns a single scalar result or None if no results user = mapper.get_by(id=12) # "dynamic" versions of select_by and get_by - everything past the # "select_by_" or "get_by_" is used as the key, and the function argument # as the value result = mapper.select_by_name('fred') u = mapper.get_by_name('fred') # get an object directly from its primary key. this will bypass the SQL # call if the object has already been loaded u = mapper.get(15) # get an object that has a composite primary key of three columns. # the order of the arguments matches that of the table meta data. myobj = mapper.get(27, 3, 'receipts') # using a WHERE criterion result = mapper.select(or_(users.c.user_name == 'john', users.c.user_name=='fred')) # using a WHERE criterion to get a scalar u = mapper.selectone(users.c.user_name=='john') # using a full select object result = mapper.select(users.select(users.c.user_name=='john')) # using straight text result = mapper.select_text("select * from users where user_name='fred'") # or using a "text" object result = mapper.select(text("select * from users where user_name='fred'", engine=engine))
The last few examples above show the usage of the mapper's table object to provide the columns for a WHERE Clause. These columns are also accessible off of the mapped class directly. When a mapper is assigned to a class, it also attaches a special property accessor c to the class itself, which can be used just like the table metadata to access the columns of the table:
User.mapper = mapper(User, users) userlist = User.mapper.select(User.c.user_id==12)
When objects corresponding to mapped classes are created or manipulated, all changes are logged by a package called sqlalchemy.mapping.objectstore. The changes are then written to the database when an application calls objectstore.commit(). This pattern is known as a Unit of Work, and has many advantages over saving individual objects or attributes on those objects with individual method invocations. Domain models can be built with far greater complexity with no concern over the order of saves and deletes, excessive database round-trips and write operations, or deadlocking issues. The commit() operation uses a transaction as well, and will also perform "concurrency checking" to insure the proper number of rows were in fact affected (not supported with the current MySQL drivers). Transactional resources are used effectively in all cases; the unit of work handles all the details.
When a mapper is created, the target class has its mapped properties decorated by specialized property accessors that track changes, and its __init__() method is also decorated to mark new objects as "new".
User.mapper = mapper(User, users) # create a new User myuser = User() myuser.user_name = 'jane' myuser.password = 'hello123' # create another new User myuser2 = User() myuser2.user_name = 'ed' myuser2.password = 'lalalala' # load a third User from the database sqlmyuser3 = User.mapper.select(User.c.user_name=='fred')[0]
myuser3.user_name = 'fredjones' # save all changes sqlobjectstore.commit()
In the examples above, we defined a User class with basically no properties or methods. Theres no particular reason it has to be this way, the class can explicitly set up whatever properties it wants, whether or not they will be managed by the mapper. It can also specify a constructor, with the restriction that the constructor is able to function with no arguments being passed to it (this restriction can be lifted with some extra parameters to the mapper; more on that later):
class User(object): def __init__(self, user_name = None, password = None): self.user_id = None self.user_name = user_name self.password = password def get_name(self): return self.user_name def __repr__(self): return "User id %s name %s password %s" % (repr(self.user_id), repr(self.user_name), repr(self.password)) User.mapper = mapper(User, users) u = User('john', 'foo') sqlobjectstore.commit()
>>> u User id 1 name 'john' password 'foo'
Recent versions of SQLAlchemy will only put modified object attributes columns into the UPDATE statements generated upon commit. This is to conserve database traffic and also to successfully interact with a "deferred" attribute, which is a mapped object attribute against the mapper's primary table that isnt loaded until referenced by the application.
So that covers how to map the columns in a table to an object, how to load objects, create new ones, and save changes. The next step is how to define an object's relationships to other database-persisted objects. This is done via the relation function provided by the mapper module. So with our User class, lets also define the User has having one or more mailing addresses. First, the table metadata:
from sqlalchemy import * engine = create_engine('sqlite', {'filename':'mydb'}) # define user table users = Table('users', engine, Column('user_id', Integer, primary_key=True), Column('user_name', String(16)), Column('password', String(20)) ) # define user address table addresses = Table('addresses', engine, Column('address_id', Integer, primary_key=True), Column('user_id', Integer, ForeignKey("users.user_id")), Column('street', String(100)), Column('city', String(80)), Column('state', String(2)), Column('zip', String(10)) )
Of importance here is the addresses table's definition of a foreign key relationship to the users table, relating the user_id column into a parent-child relationship. When a Mapper wants to indicate a relation of one object to another, this ForeignKey object is the default method by which the relationship is determined (although if you didn't define ForeignKeys, or you want to specify explicit relationship columns, that is available as well).
So then lets define two classes, the familiar User class, as well as an Address class:
class User(object): def __init__(self, user_name = None, password = None): self.user_name = user_name self.password = password class Address(object): def __init__(self, street=None, city=None, state=None, zip=None): self.street = street self.city = city self.state = state self.zip = zip
And then a Mapper that will define a relationship of the User and the Address classes to each other as well as their table metadata. We will add an additional mapper keyword argument properties which is a dictionary relating the name of an object property to a database relationship, in this case a relation object against a newly defined mapper for the Address class:
User.mapper = mapper(User, users, properties = { 'addresses' : relation(mapper(Address, addresses)) } )
Lets do some operations with these classes and see what happens:
u = User('jane', 'hihilala') u.addresses.append(Address('123 anywhere street', 'big city', 'UT', '76543')) u.addresses.append(Address('1 Park Place', 'some other city', 'OK', '83923')) objectstore.commit()
A lot just happened there! The Mapper object figured out how to relate rows in the addresses table to the users table, and also upon commit had to determine the proper order in which to insert rows. After the insert, all the User and Address objects have all their new primary and foreign keys populated.
Also notice that when we created a Mapper on the User class which defined an 'addresses' relation, the newly created User instance magically had an "addresses" attribute which behaved like a list. This list is in reality a property accessor function, which returns an instance of sqlalchemy.util.HistoryArraySet, which fulfills the full set of Python list accessors, but maintains a unique set of objects (based on their in-memory identity), and also tracks additions and deletions to the list:
del u.addresses[1] u.addresses.append(Address('27 New Place', 'Houston', 'TX', '34839')) objectstore.commit()
So our one address that was removed from the list, was updated to have a user_id of None, and a new address object was inserted to correspond to the new Address added to the User. But now, theres a mailing address with no user_id floating around in the database of no use to anyone. How can we avoid this ? This is acheived by using the private=True parameter of relation:
User.mapper = mapper(User, users, properties = { 'addresses' : relation(mapper(Address, addresses), private=True) } ) del u.addresses[1] u.addresses.append(Address('27 New Place', 'Houston', 'TX', '34839')) objectstore.commit()
In this case, with the private flag set, the element that was removed from the addresses list was also removed from the database. By specifying the private flag on a relation, it is indicated to the Mapper that these related objects exist only as children of the parent object, otherwise should be deleted.
By creating relations with the backref keyword, a bi-directional relationship can be created which will keep both ends of the relationship updated automatically, even without any database queries being executed. Below, the User mapper is created with an "addresses" property, and the corresponding Address mapper receives a "backreference" to the User object via the property name "user":
Address.mapper = mapper(Address, addresses) User.mapper = mapper(User, users, properties = { 'addresses' : relation(Address.mapper, backref='user') } ) u = User('fred', 'hi') a1 = Address('123 anywhere street', 'big city', 'UT', '76543') a2 = Address('1 Park Place', 'some other city', 'OK', '83923') # append a1 to u u.addresses.append(a1) # attach u to a2 a2.user = u # the bi-directional relation is maintained >>> u.addresses == [a1, a2] True >>> a1.user is user and a2.user is user True
The backreference feature also works with many-to-many relationships, which are described later. When creating a backreference, a corresponding property is placed on the child mapper. This property can be overridden with a custom property using the add_property function:
Address.mapper = mapper(Address, addresses) User.mapper = mapper(User, users, properties = { 'addresses' : relation(Address.mapper, backref='user') } ) Address.mapper.add_property('user', relation( User.mapper, lazy=False, private=True, backref='addresses' ))
Note that when overriding a backreferenced property, we re-specify the backreference as well. This will not override the existing 'addresses' property on the User class, but just sends a message to the attribute-management system that it should continue to maintain this backreference.
The mapper package has a helper function cascade_mappers() which can simplify the task of linking several mappers together. Given a list of classes and/or mappers, it identifies the foreign key relationships between the given mappers or corresponding class mappers, and creates relation() objects representing those relationships, including a backreference. Attempts to find the "secondary" table in a many-to-many relationship as well. The names of the relations are a lowercase version of the related class. In the case of one-to-many or many-to-many, the name is "pluralized", which currently is based on the English language (i.e. an 's' or 'es' added to it):
# create two mappers. the 'users' and 'addresses' tables have a foreign key # relationship mapper1 = mapper(User, users) mapper2 = mapper(Address, addresses) # cascade the two mappers together (can also specify User, Address as the arguments) cascade_mappers(mapper1, mapper2) # two new object instances u = User('user1') a = Address('test') # "addresses" and "user" property are automatically added u.addresses.append(a) print a.user
We've seen how the relation specifier affects the saving of an object and its child items, how does it affect selecting them? By default, the relation keyword indicates that the related property should be attached a Lazy Loader when instances of the parent object are loaded from the database; this is just a callable function that when accessed will invoke a second SQL query to load the child objects of the parent.
# define a mapper User.mapper = mapper(User, users, properties = { 'addresses' : relation(mapper(Address, addresses), private=True) }) # select users where username is 'jane', get the first element of the list # this will incur a load operation for the parent table user = User.mapper.select(user_name='jane')[0]
# iterate through the User object's addresses. this will incur an # immediate load of those child items for a in user.addresses:
print repr(a)
In mappers that have relationships, the select_by method and its cousins include special functionality that can be used to create joins. Just specify a key in the argument list which is not present in the primary mapper's list of properties or columns, but *is* present in the property list of one of its relationships:
sqll = User.mapper.select_by(street='123 Green Street')
The above example is shorthand for:
l = User.mapper.select(and_( Address.c.user_id==User.c.user_id, Address.c.street=='123 Green Street') )
Once the child list of Address objects is loaded, it is done loading for the lifetime of the object instance. Changes to the list will not be interfered with by subsequent loads, and upon commit those changes will be saved. Similarly, if a new User object is created and child Address objects added, a subsequent select operation which happens to touch upon that User instance, will also not affect the child list, since it is already loaded.
The issue of when the mapper actually gets brand new objects from the database versus when it assumes the in-memory version is fine the way it is, is a subject of transactional scope. Described in more detail in the Unit of Work section, for now it should be noted that the total storage of all newly created and selected objects, within the scope of the current thread, can be reset via releasing or otherwise disregarding all current object instances, and calling:
objectstore.clear()
This operation will clear out all currently mapped object instances, and subsequent select statements will load fresh copies from the databse.
To operate upon a single object, just use the remove function:
# (this function coming soon) objectstore.remove(myobject)
With just a single parameter "lazy=False" specified to the relation object, the parent and child SQL queries can be joined together.
Address.mapper = mapper(Address, addresses) User.mapper = mapper(User, users, properties = { 'addresses' : relation(Address.mapper, lazy=False) } ) user = User.mapper.get_by(user_name='jane')
for a in user.addresses: print repr(a)
Above, a pretty ambitious query is generated just by specifying that the User should be loaded with its child Addresses in one query. When the mapper processes the results, it uses an Identity Map to keep track of objects that were already loaded, based on their primary key identity. Through this method, the redundant rows produced by the join are organized into the distinct object instances they represent.
The generation of this query is also immune to the effects of additional joins being specified in the original query. To use our select_by example above, joining against the "addresses" table to locate users with a certain street results in this behavior:
users = User.mapper.select_by(street='123 Green Street')
The join implied by passing the "street" parameter is converted into an "aliasized" clause by the eager loader, so that it does not conflict with the join used to eager load the child address objects.
The options method of mapper provides an easy way to get alternate forms of a mapper from an original one. The most common use of this feature is to change the "eager/lazy" loading behavior of a particular mapper, via the functions eagerload(), lazyload() and noload():
# user mapper with lazy addresses User.mapper = mapper(User, users, properties = { 'addresses' : relation(mapper(Address, addresses)) } ) # make an eager loader eagermapper = User.mapper.options(eagerload('addresses')) u = eagermapper.select() # make another mapper that wont load the addresses at all plainmapper = User.mapper.options(noload('addresses')) # multiple options can be specified mymapper = oldmapper.options(lazyload('tracker'), noload('streets'), eagerload('members')) # to specify a relation on a relation, separate the property names by a "." mymapper = oldmapper.options(eagerload('orders.items'))
The above examples focused on the "one-to-many" relationship. To do other forms of relationship is easy, as the relation function can usually figure out what you want:
# a table to store a user's preferences for a site prefs = Table('user_prefs', engine, Column('pref_id', Integer, primary_key = True), Column('stylename', String(20)), Column('save_password', Boolean, nullable = False), Column('timezone', CHAR(3), nullable = False) ) # user table gets 'preference_id' column added users = Table('users', engine, Column('user_id', Integer, primary_key = True), Column('user_name', String(16), nullable = False), Column('password', String(20), nullable = False), Column('preference_id', Integer, ForeignKey("prefs.pref_id")) ) # class definition for preferences class UserPrefs(object): pass UserPrefs.mapper = mapper(UserPrefs, prefs) # address mapper Address.mapper = mapper(Address, addresses) # make a new mapper referencing everything. m = mapper(User, users, properties = dict( addresses = relation(Address.mapper, lazy=True, private=True), preferences = relation(UserPrefs.mapper, lazy=False, private=True), )) # select sqluser = m.get_by(user_name='fred')
save_password = user.preferences.save_password # modify user.preferences.stylename = 'bluesteel' sqluser.addresses.append(Address('freddy@hi.org'))
# commit sqlobjectstore.commit()
The relation function handles a basic many-to-many relationship when you specify the association table:
articles = Table('articles', engine, Column('article_id', Integer, primary_key = True), Column('headline', String(150), key='headline'), Column('body', TEXT, key='body'), ) keywords = Table('keywords', engine, Column('keyword_id', Integer, primary_key = True), Column('name', String(50)) ) itemkeywords = Table('article_keywords', engine, Column('article_id', Integer, ForeignKey("articles.article_id")), Column('keyword_id', Integer, ForeignKey("keywords.keyword_id")) ) # class definitions class Keyword(object): def __init__(self, name = None): self.name = name class Article(object): pass # define a mapper that does many-to-many on the 'itemkeywords' association # table Article.mapper = mapper(Article, articles, properties = dict( keywords = relation(mapper(Keyword, keywords), keywords, itemkeywords, lazy=False) ) ) article = Article() article.headline = 'a headline' article.body = 'this is the body' article.keywords.append(Keyword('politics')) article.keywords.append(Keyword('entertainment')) sql objectstore.commit()
# select articles based on a keyword. select_by will handle the extra joins. sqlarticles = Article.mapper.select_by(keyword='politics')
# modify a = articles[0] del a.keywords[:] a.keywords.append(Keyword('topstories')) a.keywords.append(Keyword('government')) # commit. individual INSERT/DELETE operations will take place only for the list # elements that changed. sql objectstore.commit()
Many to Many can also be done with an association object, that adds additional information about how two items are related. This association object is set up in basically the same way as any other mapped object. However, since an association table typically has no primary key columns, you have to tell the mapper what columns will compose its "primary key", which are the two (or more) columns involved in the association. Also, the relation function needs an additional hint as to the fact that this mapped object is an association object, via the "association" argument which points to the class or mapper representing the other side of the association.
# add "attached_by" column which will reference the user who attached this keyword itemkeywords = Table('article_keywords', engine, Column('article_id', Integer, ForeignKey("articles.article_id")), Column('keyword_id', Integer, ForeignKey("keywords.keyword_id")), Column('attached_by', Integer, ForeignKey("users.user_id")) ) # define an association class class KeywordAssociation(object): pass # mapper for KeywordAssociation KeywordAssociation.mapper = mapper(KeywordAssociation, itemkeywords) # mappers for Users, Keywords User.mapper = mapper(User, users) Keyword.mapper = mapper(Keyword, keywords) # define the mapper. when we load an article, we always want to get the keywords via # eager loading. but the user who added each keyword, we usually dont need so specify # lazy loading for that. m = mapper(Article, articles, properties=dict( keywords = relation(KeywordAssociation.mapper, lazy=False, association=Keyword, primary_key = [itemkeywords.c.article_id, itemkeywords.c.keyword_id], properties={ 'keyword' : relation(Keyword, lazy = False), # uses primary Keyword mapper 'user' : relation(User, lazy = True) # uses primary User mapper } ) ) ) # bonus step - well, we do want to load the users in one shot, # so modify the mapper via an option. # this returns a new mapper with the option switched on. m2 = mapper.options(eagerload('keywords.user')) # select by keyword again sqlalist = m2.select_by(keyword_name='jacks_stories')
# user is available for a in alist: for k in a.keywords: if k.keyword.name == 'jacks_stories': print k.user.user_name
The concept behind Unit of Work is to track modifications to a field of objects, and then be able to commit those changes to the database in a single operation. Theres a lot of advantages to this, including that your application doesn't need to worry about individual save operations on objects, nor about the required order for those operations, nor about excessive repeated calls to save operations that would be more efficiently aggregated into one step. It also simplifies database transactions, providing a neat package with which to insert into the traditional database begin/commit phase.
SQLAlchemy's unit of work includes these functions:
The current unit of work is accessed via the Session interface. The Session is available in a thread-local context from the objectstore module as follows:
# get the current thread's session s = objectstore.get_session()
The Session object acts as a proxy to an underlying UnitOfWork object. Common methods include commit(), begin(), clear(), and delete(). Most of these methods are available at the module level in the objectstore module, which operate upon the Session returned by the get_session() function.
To clear out the current thread's UnitOfWork, which has the effect of discarding the Identity Map and the lists of all objects that have been modified, just issue a clear:
# via module objectstore.clear() # or via Session objectstore.get_session().clear()
This is the easiest way to "start fresh", as in a web application that wants to have a newly loaded graph of objects on each request. Any object instances before the clear operation should be discarded.
The current thread's UnitOfWork object keeps track of objects that are modified. It maintains the following lists:
# new objects that were just constructed objectstore.get_session().new # objects that exist in the database, that were modified objectstore.get_session().dirty # objects that have been marked as deleted via objectstore.delete() objectstore.get_session().deleted
To commit the changes stored in those lists, just issue a commit. This can be called via objectstore.session().commit(), or through the module-level convenience method in the objectstore module:
objectstore.commit()
The commit operation takes place within a SQL-level transaction, so any failures that occur will roll back the state of everything to before the commit took place.
When mappers are created for classes, new object construction automatically places objects in the "new" list on the UnitOfWork, and object modifications automatically place objects in the "dirty" list. To mark objects as to be deleted, use the "delete" method on UnitOfWork, or the module level version:
objectstore.delete(myobj1, myobj2, ...)
Commit() can also take a list of objects which narrow its scope to looking at just those objects to save:
objectstore.commit(myobj1, myobj2, ...)
Committing just a subset of instances should be used carefully, as it may result in an inconsistent save state between dependent objects (it should manage to locate loaded dependencies and save those also, but it hasnt been tested much).
The "scope" of the unit of work commit can be controlled further by issuing a begin(). A begin operation constructs a new UnitOfWork object and sets it as the currently used UOW. It maintains a reference to the original UnitOfWork as its "parent", and shares the same "identity map" of objects that have been loaded from the database within the scope of the parent UnitOfWork. However, the "new", "dirty", and "deleted" lists are empty. This has the effect that only changes that take place after the begin() operation get logged to the current UnitOfWork, and therefore those are the only changes that get commit()ted. When the commit is complete, the "begun" UnitOfWork removes itself and places the parent UnitOfWork as the current one again.
# modify an object myobj1.foo = "something new" # begin an objectstore scope # this is equivalent to objectstore.get_session().begin() objectstore.begin() # modify another object myobj2.lala = "something new" # only 'myobj2' is saved objectstore.commit()
As always, the actual database transaction begin/commit occurs entirely within the objectstore.commit() operation.
At the moment, begin/commit supports the same "nesting" behavior as the SQLEngine (note this behavior is not the original "nested" behavior), meaning that repeated calls to begin() will increment a counter, which is not released until that same number of commit() statements occur.
The UOW commit operation places its INSERT/UPDATE/DELETE operations within the scope of a database transaction controlled by a SQLEngine:
engine.begin() try: # run objectstore update operations except: engine.rollback() raise engine.commit()
If you recall from the Transactions section, the engine's begin()/commit() methods support reentrant behavior. This means you can nest begin and commits and only have the outermost begin/commit pair actually take effect (rollbacks however, abort the whole operation at any stage). From this it follows that the UnitOfWork commit operation can be nested within a transaction as well:
engine.begin() try: # perform custom SQL operations objectstore.commit() # perform custom SQL operations except: engine.rollback() raise engine.commit()
All object instances which are saved to the database, or loaded from the database, are given an identity by the mapper/objectstore. This identity is available via the _identity_key property attached to each object instance, and is a tuple consisting of the table's class, the SQLAlchemy-specific "hash key" of the table its persisted to, and an additional tuple of primary key values, in the order that they appear within the table definition:
>>> obj._instance_key (<class 'test.tables.User'>, "Table('users',SQLiteSQLEngine(([':memory:'], {})),schema=None)", (7,))
Note that this identity is a database identity, not an in-memory identity. An application can have several different objects in different unit-of-work scopes that have the same database identity, or an object can be removed from memory, and constructed again later, with the same database identity. What can never happen is for two copies of the same object to exist in the same unit-of-work scope with the same database identity; this is guaranteed by the identity map.
At the moment that an object is assigned this key, it is also added to the current thread's unit-of-work's identity map. The identity map is just a WeakValueDictionary which maintains the one and only reference to a particular object within the current unit of work scope. It is used when result rows are fetched from the database to insure that only one copy of a particular object actually comes from that result set in the case that eager loads or other joins are used, or if the object had already been loaded from a previous result set. The get() method on a mapper, which retrieves an object based on primary key identity, also checks in the current identity map first to save a database round-trip if possible. In the case of an object lazy-loading a single child object, the get() method is also used.
Methods on mappers and the objectstore module, which are relevant to identity include the following:
# assume 'm' is a mapper m = mapper(User, users) # get the identity key corresponding to a primary key key = m.identity_key(7) # for composite key, list out the values in the order they # appear in the table key = m.identity_key(12, 'rev2') # get the identity key given a primary key # value as a tuple, a class, and a table key = objectstore.get_id_key((12, 'rev2'), User, users) # get the identity key for an object, whether or not it actually # has one attached to it (m is the mapper for obj's class) key = m.instance_key(obj) # same thing, from the objectstore (works for any obj type) key = objectstore.instance_key(obj) # is this key in the current identity map? objectstore.has_key(key) # is this object in the current identity map? objectstore.has_instance(obj) # get this object from the current identity map based on # singular/composite primary key, or if not go # and load from the database obj = m.get(12, 'rev2')
The _identity_key attribute is designed to work with objects that are serialized into strings and brought back again. As it contains no references to internal structures or database connections, applications that use caches or session storage which require serialization (i.e. pickling) can store SQLAlchemy-loaded objects. However, as mentioned earlier, an object with a particular database identity is only allowed to exist uniquely within the current unit-of-work scope. So, upon deserializing such an object, it has to "check in" with the current unit-of-work/identity map combination, to insure that it is the only unique instance. This is achieved via the import_instance() function in objectstore:
# deserialize an object myobj = pickle.loads(mystring) # "import" it. if the objectstore already had this object in the # identity map, then you get back the one from the current session. myobj = objectstore.import_instance(myobj)
Note that the import_instance() function will either mark the deserialized object as the official copy in the current identity map, which includes updating its _identity_key with the current application's class instance, or it will discard it and return the corresponding object that was already present.
The current thread's UnitOfWork can be replaced with a manually created instance:
# get the Session s = objectstore.session() # create new UnitOfWork u = objectstore.UnitOfWork(s) # set it on the Session for the current thread s.uow = u
The global Session can also be replaced. This allows changing the algorithm used to retrieve the current scoped UnitOfWork object.
# make a new Session, with just one global UnitOfWork s = objectstore.Session() # make a new Session that returns thread-local UnitOfWork objects s = objectstore.Session(scope="thread") # make a new Session, with a custom scope # give it a "key" function used to identify a UnitOfWork def myreg(): return "mykey" s = objectstore.Session(keyfunc=myreg) # make a Session with a custom function to create UnitOfWorks def myuow(session): return UnitOfWork(session) s = objectstore.Session(createfunc=myuow) # set this Session as the global "session": objectstore.global_session = s
Sessions can be created on an ad-hoc basis and used for individual groups of objects and operations. This has the effect of bypassing the entire "global"/"threadlocal" UnitOfWork system and explicitly using a particular Session:
# make a new Session with a global UnitOfWork s = objectstore.Session() # make objects bound to this Session x = MyObj(_sa_session=s) # perform mapper operations bound to this Session # (this function coming soon) r = MyObj.mapper.using(s).select_by(id=12) # get the session that corresponds to an instance s = objectstore.get_session(x) # commit s.commit() # perform a block of operations with this session set within the current scope objectstore.push_session(s) try: r = mapper.select_by(id=12) x = new MyObj() objectstore.commit() finally: objectstore.pop_session()
This section details all the options available to Mappers, as well as advanced patterns.
To start, heres the tables we will work with again:
from sqlalchemy import * db = create_engine('sqlite://filename=mydb', echo=True) # a table to store users users = Table('users', db, Column('user_id', Integer, primary_key = True), Column('user_name', String(40)), Column('password', String(80)) ) # a table that stores mailing addresses associated with a specific user addresses = Table('addresses', db, Column('address_id', Integer, primary_key = True), Column('user_id', Integer, ForeignKey("users.user_id")), Column('street', String(100)), Column('city', String(80)), Column('state', String(2)), Column('zip', String(10)) ) # a table that stores keywords keywords = Table('keywords', db, Column('keyword_id', Integer, primary_key = True), Column('name', VARCHAR(50)) ) # a table that associates keywords with users userkeywords = Table('userkeywords', db, Column('user_id', INT, ForeignKey("users")), Column('keyword_id', INT, ForeignKey("keywords")) )
When creating relations on a mapper, most examples so far have illustrated the mapper and relationship joining up based on the foreign keys of the tables they represent. in fact, this "automatic" inspection can be completely circumvented using the primaryjoin and secondaryjoin arguments to relation, as in this example which creates a User object which has a relationship to all of its Addresses which are in Boston:
class User(object): pass class Address(object): pass Address.mapper = mapper(Address, addresses) User.mapper = mapper(User, users, properties={ 'boston_addreses' : relation(Address.mapper, primaryjoin= and_(users.c.user_id==Address.c.user_id, Addresses.c.city=='Boston')) })
Many to many relationships can be customized by one or both of primaryjoin and secondaryjoin, shown below with just the default many-to-many relationship explicitly set:
class User(object): pass class Keyword(object): pass Keyword.mapper = mapper(Keyword, keywords) User.mapper = mapper(User, users, properties={ 'keywords':relation(Keyword.mapper, primaryjoin=users.c.user_id==userkeywords.c.user_id, secondaryjoin=userkeywords.c.keyword_id==keywords.c.keyword_id ) })
The previous example leads in to the idea of joining against the same table multiple times. Below is a User object that has lists of its Boston and New York addresses, both lazily loaded when they are first accessed:
User.mapper = mapper(User, users, properties={ 'boston_addreses' : relation(Address.mapper, primaryjoin= and_(users.c.user_id==Address.c.user_id, Addresses.c.city=='Boston')), 'newyork_addresses' : relation(Address.mapper, primaryjoin= and_(users.c.user_id==Address.c.user_id, Addresses.c.city=='New York')), })
A complication arises with the above pattern if you want the relations to be eager loaded. Since there will be two separate joins to the addresses table during an eager load, an alias needs to be used to separate them. You can create an alias of the addresses table to separate them, but then you are in effect creating a brand new mapper for each property, unrelated to the main Address mapper, which can create problems with commit operations. So an additional argument use_alias can be used with an eager relationship to specify the alias to be used just within the eager query:
User.mapper = mapper(User, users, properties={ 'boston_addreses' : relation(Address.mapper, primaryjoin= and_(User.c.user_id==Address.c.user_id, Addresses.c.city=='Boston'), lazy=False, use_alias=True), 'newyork_addresses' : relation(Address.mapper, primaryjoin= and_(User.c.user_id==Address.c.user_id, Addresses.c.city=='New York'), lazy=False, use_alias=True), }) sqlu = User.mapper.select()
By default, mappers will not supply any ORDER BY clause when selecting rows. This can be modified in several ways.
A "default ordering" can be supplied by all mappers, by enabling the "default_ordering" flag to the engine, which indicates that table primary keys or object IDs should be used as the default ordering:
db = create_engine('postgres://username=scott&password=tiger', default_ordering=True)
The "order_by" parameter can be sent to a mapper, overriding the per-engine ordering if any. A value of None means that the mapper should not use any ordering, even if the engine's default_ordering property is True. A non-None value, which can be a column, an asc or desc clause, or an array of either one, indicates the ORDER BY clause that should be added to all select queries:
# disable all ordering mapper = mapper(User, users, order_by=None) # order by a column mapper = mapper(User, users, order_by=users.c.user_id) # order by multiple items mapper = mapper(User, users, order_by=[users.c.user_id, desc(users.c.user_name)])
"order_by" can also be specified to an individual select method, overriding all other per-engine/per-mapper orderings:
# order by a column l = mapper.select(users.c.user_name=='fred', order_by=users.c.user_id) # order by multiple criterion l = mapper.select(users.c.user_name=='fred', order_by=[users.c.user_id, desc(users.c.user_name)])
For relations, the "order_by" property can also be specified to all forms of relation:
# order address objects by address id mapper = mapper(User, users, properties = { 'addresses' : relation(mapper(Address, addresses), order_by=addresses.c.address_id) }) # eager load with ordering - the ORDER BY clauses of parent/child will be organized properly mapper = mapper(User, users, properties = { 'addresses' : relation(mapper(Address, addresses), order_by=desc(addresses.c.email_address), eager=True) }, order_by=users.c.user_id)
You can limit rows in a regular SQL query by specifying limit and offset. A Mapper can handle the same concepts:
class User(object): pass m = mapper(User, users) sqlr = m.select(limit=20, offset=10)
class User(object): pass class Address(object): pass m = mapper(User, users, properties={ 'addresses' : relation(mapper(Address, addresses), lazy=False) }) r = m.select(User.c.user_name.like('F%'), limit=20, offset=10)
The main WHERE clause as well as the limiting clauses are coerced into a subquery; this subquery represents the desired result of objects. A containing query, which handles the eager relationships, is joined against the subquery to produce the result.
When mappers are constructed, by default the column names in the Table metadata are used as the names of attributes on the mapped class. This can be customzed within the properties by stating the key/column combinations explicitly:
user_mapper = mapper(User, users, properties={ 'id' : users.c.user_id, 'name' : users.c.user_name, })
In the situation when column names overlap in a mapper against multiple tables, columns may be referenced together with a list:
# join users and addresses usersaddresses = sql.join(users, addresses, users.c.user_id == addresses.c.user_id) m = mapper(User, usersaddresses, properties = { 'id' : [users.c.user_id, addresses.c.user_id], } )
This feature allows particular columns of a table to not be loaded by default, instead being loaded later on when first referenced. It is essentailly "column-level lazy loading". This feature is useful when one wants to avoid loading a large text or binary field into memory when its not needed. Individual columns can be lazy loaded by themselves or placed into groups that lazy-load together.
book_excerpts = Table('books', db, Column('book_id', Integer, primary_key=True), Column('title', String(200), nullable=False), Column('summary', String(2000)), Column('excerpt', String), Column('photo', Binary) ) class Book(object): pass # define a mapper that will load each of 'excerpt' and 'photo' in # separate, individual-row SELECT statements when each attribute # is first referenced on the individual object instance book_mapper = mapper(Book, book_excerpts, properties = { 'excerpt' : deferred(book_excerpts.c.excerpt), 'photo' : deferred(book_excerpts.c.photo) })
Deferred columns can be placed into groups so that they load together:
book_excerpts = Table('books', db, Column('book_id', Integer, primary_key=True), Column('title', String(200), nullable=False), Column('summary', String(2000)), Column('excerpt', String), Column('photo1', Binary), Column('photo2', Binary), Column('photo3', Binary) ) class Book(object): pass # define a mapper with a 'photos' deferred group. when one photo is referenced, # all three photos will be loaded in one SELECT statement. The 'excerpt' will # be loaded separately when it is first referenced. book_mapper = mapper(Book, book_excerpts, properties = { 'excerpt' : deferred(book_excerpts.c.excerpt), 'photo1' : deferred(book_excerpts.c.photo1, group='photos'), 'photo2' : deferred(book_excerpts.c.photo2, group='photos'), 'photo3' : deferred(book_excerpts.c.photo3, group='photos') })
The options method of mapper, first introduced in Switching Lazy/Eager, No Load, supports the copying of a mapper into a new one, with any number of its relations replaced by new ones. The method takes a variable number of MapperOption objects which know how to change specific things about the mapper. The five available options are eagerload, lazyload, noload, deferred and extension.
An example of a mapper with a lazy load relationship, upgraded to an eager load relationship:
class User(object): pass class Address(object): pass # a 'lazy' relationship User.mapper = mapper(User, users, properties = { 'addreses':relation(mapper(Address, addresses), lazy=True) }) # copy the mapper and convert 'addresses' to be eager eagermapper = User.mapper.options(eagerload('addresses'))
The load options also can take keyword arguments that apply to the new relationship. To take the "double" address lazy relationship from the previous section and upgrade it to eager, adding the "selectalias" keywords as well:
m = User.mapper.options( eagerload('boston_addresses', selectalias='boston_ad'), eagerload('newyork_addresses', selectalias='newyork_ad') )
The defer and undefer options can control the deferred loading of attributes:
# set the 'excerpt' deferred attribute to load normally m = book_mapper.options(undefer('excerpt')) # set the referenced mapper 'photos' to defer its loading of the column 'imagedata' m = book_mapper.options(defer('photos.imagedata'))
Table Inheritance indicates the pattern where two tables, in a parent-child relationship, are mapped to an inheritance chain of classes. If a table "employees" contains additional information about managers in the table "managers", a corresponding object inheritance pattern would have an Employee class and a Manager class. Loading a Manager object means you are joining managers to employees. For SQLAlchemy, this pattern is just a special case of a mapper that maps against a joined relationship, and is provided via the inherits keyword.
class User(object): """a user object.""" pass User.mapper = mapper(User, users) class AddressUser(User): """a user object that also has the users mailing address.""" pass # define a mapper for AddressUser that inherits the User.mapper, and joins on the user_id column AddressUser.mapper = mapper( AddressUser, addresses, inherits=User.mapper ) items = AddressUser.mapper.select()
Above, the join condition is determined via the foreign keys between the users and the addresses table. To specify the join condition explicitly, use inherit_condition:
AddressUser.mapper = mapper( AddressUser, addresses, inherits=User.mapper, inherit_condition=users.c.user_id==addresses.c.user_id )
The more general case of the pattern described in "table inheritance" is a mapper that maps against more than one table. The join keyword from the SQL package creates a neat selectable unit comprised of multiple tables, complete with its own composite primary key, which can be passed in to a mapper as the table.
# a class class AddressUser(object): pass # define a Join j = join(users, addresses) # map to it - the identity of an AddressUser object will be # based on (user_id, address_id) since those are the primary keys involved m = mapper(AddressUser, j)
# many-to-many join on an association table j = join(users, userkeywords, users.c.user_id==userkeywords.c.user_id).join(keywords, userkeywords.c.keyword_id==keywords.c.keyword_id) # a class class KeywordUser(object): pass # map to it - the identity of a KeywordUser object will be # (user_id, keyword_id) since those are the primary keys involved m = mapper(KeywordUser, j)
Similar to mapping against a join, a plain select() object can be used with a mapper as well. Below, an example select which contains two aggregate functions and a group_by is mapped to a class:
s = select([customers, func.count(orders).label('order_count'), func.max(orders.price).label('highest_order')], customers.c.customer_id==orders.c.customer_id, group_by=[c for c in customers.c] ) class Customer(object): pass mapper = mapper(Customer, s)
Above, the "customers" table is joined against the "orders" table to produce a full row for each customer row, the total count of related rows in the "orders" table, and the highest price in the "orders" table, grouped against the full set of columns in the "customers" table. That query is then mapped against the Customer class. New instances of Customer will contain attributes for each column in the "customers" table as well as an "order_count" and "highest_order" attribute. Updates to the Customer object will only be reflected in the "customers" table and not the "orders" table. This is because the primary keys of the "orders" table are not represented in this mapper and therefore the table is not affected by save or delete operations.
By now it should be apparent that the mapper defined for a class is in no way the only mapper that exists for that class. Other mappers can be created at any time; either explicitly or via the options method, to provide different loading behavior.
However, its not as simple as that. The mapper serves a dual purpose; one is to generate select statements and load objects from executing those statements; the other is to keep track of the defined dependencies of that object when save and delete operations occur, and to extend the attributes of the object so that they store information about their history and communicate with the unit of work system. For this reason, it is a good idea to be aware of the behavior of multiple mappers. When creating dependency relationships between objects, one should insure that only the primary mappers are used in those relationships, else deep object traversal operations will fail to load in the expected properties, and update operations will not take all the dependencies into account.
Generally its as simple as, the first mapper that is defined for a particular class is the one that gets to define that classes' relationships to other mapped classes, and also decorates its attributes and constructors with special behavior. Any subsequent mappers created for that class will be able to load new instances, but object manipulation operations will still function via the original mapper. The special keyword is_primary will override this behavior, and make any mapper the new "primary" mapper.
class User(object): pass # mapper one - mark it as "primary", meaning this mapper will handle # saving and class-level properties m1 = mapper(User, users, is_primary=True) # mapper two - this one will also eager-load address objects in m2 = mapper(User, users, properties={ 'addresses' : relation(mapper(Address, addresses), lazy=False) }) # get a user. this user will not have an 'addreses' property u1 = m1.select(User.c.user_id==10) # get another user. this user will have an 'addreses' property. u2 = m2.select(User.c.user_id==27) # make some modifications, including adding an Address object. u1.user_name = 'jack' u2.user_name = 'jane' u2.addresses.append(Address('123 green street')) # upon commit, the User objects will be saved. # the Address object will not, since the primary mapper for User # does not have an 'addresses' relationship defined objectstore.commit()
Oftentimes it is necessary for two mappers to be related to each other. With a datamodel that consists of Users that store Addresses, you might have an Address object and want to access the "user" attribute on it, or have a User object and want to get the list of Address objects. The easiest way to do this is via the backreference keyword described in Useful Feature: Backreferences. Although even when backreferences are used, it is sometimes necessary to explicitly specify the relations on both mappers pointing to each other.
To achieve this involves creating the first mapper by itself, then creating the second mapper referencing the first, then adding references to the first mapper to reference the second:
class User(object): pass class Address(object): pass User.mapper = mapper(User, users) Address.mapper = mapper(Address, addresses, properties={ 'user':relation(User.mapper) }) User.mapper.add_property('addresses', relation(Address.mapper))
Note that with a circular relationship as above, you cannot declare both relationships as "eager" relationships, since that produces a circular query situation which will generate a recursion exception. So what if you want to load an Address and its User eagerly? Just make a second mapper using options:
eagermapper = Address.mapper.options(eagerload('user')) s = eagermapper.select(Address.c.address_id==12)
A self-referential mapper is a mapper that is designed to operate with an adjacency list table. This is a table that contains one or more foreign keys back to itself, and is usually used to create hierarchical tree structures. SQLAlchemy's default model of saving items based on table dependencies is not sufficient in this case, as an adjacency list table introduces dependencies between individual rows. Fortunately, SQLAlchemy will automatically detect a self-referential mapper and do the extra lifting to make it work.
# define a self-referential table trees = Table('treenodes', engine, Column('node_id', Integer, primary_key=True), Column('parent_node_id', Integer, ForeignKey('treenodes.node_id'), nullable=True), Column('node_name', String(50), nullable=False), ) # treenode class class TreeNode(object): pass # mapper defines "children" property, pointing back to TreeNode class, # with the mapper unspecified. it will point back to the primary # mapper on the TreeNode class. TreeNode.mapper = mapper(TreeNode, trees, properties={ 'children' : relation( TreeNode, private=True ), } ) # or, specify the circular relationship after establishing the original mapper: mymapper = mapper(TreeNode, trees) mymapper.add_property('children', relation( mymapper, private=True ))
This kind of mapper goes through a lot of extra effort when saving and deleting items, to determine the correct dependency graph of nodes within the tree.
A self-referential mapper where there is more than one relationship on the table requires that all join conditions be explicitly spelled out. Below is a self-referring table that contains a "parent_node_id" column to reference parent/child relationships, and a "root_node_id" column which points child nodes back to the ultimate root node:
# define a self-referential table with several relations trees = Table('treenodes', engine, Column('node_id', Integer, primary_key=True), Column('parent_node_id', Integer, ForeignKey('treenodes.node_id'), nullable=True), Column('root_node_id', Integer, ForeignKey('treenodes.node_id'), nullable=True), Column('node_name', String(50), nullable=False), ) # treenode class class TreeNode(object): pass # define the "children" property as well as the "root" property TreeNode.mapper = mapper(TreeNode, trees, properties={ 'children' : relation( TreeNode, primaryjoin=trees.c.parent_node_id==trees.c.node_id private=True ), 'root' : relation( TreeNode, primaryjoin=trees.c.root_node_id=trees.c.node_id, foreignkey=trees.c.node_id, uselist=False ) } )
The "root" property on a TreeNode is a many-to-one relationship. By default, a self-referential mapper declares relationships as one-to-many, so the extra parameter foreignkey, pointing to the "many" side of a relationship, is needed to indicate a "many-to-one" self-referring relationship.
Both TreeNode examples above are available in functional form in the examples/adjacencytree directory of the distribution.
Take any result set and feed it into a mapper to produce objects. Multiple mappers can be combined to retrieve unrelated objects from the same row in one step. The instances method on mapper takes a ResultProxy object, which is the result type generated from SQLEngine, and delivers object instances.
class User(object): pass User.mapper = mapper(User, users) # select users c = users.select().execute() # get objects userlist = User.mapper.instances(c)
# define a second class/mapper class Address(object): pass Address.mapper = mapper(Address, addresses) # select users and addresses in one query s = select([users, addresses], users.c.user_id==addresses.c.user_id) # execute it, and process the results with the User mapper, chained to the Address mapper r = User.mapper.instances(s.execute(), Address.mapper) # result rows are an array of objects, one for each mapper used for entry in r: user = r[0] address = r[1]
Mappers can have functionality augmented or replaced at many points in its execution via the usage of the MapperExtension class. This class is just a series of "hooks" where various functionality takes place. An application can make its own MapperExtension objects, overriding only the methods it needs.
class MapperExtension(object): def create_instance(self, mapper, row, imap, class_): """called when a new object instance is about to be created from a row. the method can choose to create the instance itself, or it can return None to indicate normal object creation should take place. mapper - the mapper doing the operation row - the result row from the database imap - a dictionary that is storing the running set of objects collected from the current result set class_ - the class we are mapping. """ def append_result(self, mapper, row, imap, result, instance, isnew, populate_existing=False): """called when an object instance is being appended to a result list. If it returns True, it is assumed that this method handled the appending itself. mapper - the mapper doing the operation row - the result row from the database imap - a dictionary that is storing the running set of objects collected from the current result set result - an instance of util.HistoryArraySet(), which may be an attribute on an object if this is a related object load (lazy or eager). use result.append_nohistory(value) to append objects to this list. instance - the object instance to be appended to the result isnew - indicates if this is the first time we have seen this object instance in the current result set. if you are selecting from a join, such as an eager load, you might see the same object instance many times in the same result set. populate_existing - usually False, indicates if object instances that were already in the main identity map, i.e. were loaded by a previous select(), get their attributes overwritten """ def before_insert(self, mapper, instance): """called before an object instance is INSERTed into its table. this is a good place to set up primary key values and such that arent handled otherwise.""" def after_insert(self, mapper, instance): """called after an object instance has been INSERTed""" def before_delete(self, mapper, instance): """called before an object instance is DELETEed"""
To use MapperExtension, make your own subclass of it and just send it off to a mapper:
mapper = mapper(User, users, extension=MyExtension())
An existing mapper can create a copy of itself using an extension via the extension option:
extended_mapper = mapper.options(extension(MyExtension()))
This section is a quick summary of what's going on when you send a class to the mapper() function. This material, not required to be able to use SQLAlchemy, is a little more dense and should be approached patiently!
The primary changes to a class that is mapped involve attaching property objects to it which represent table columns. These property objects essentially track changes. In addition, the __init__ method of the object is decorated to track object creates.
Here is a quick rundown of all the changes in code form:
# step 1 - override __init__ to 'register_new' with the Unit of Work oldinit = myclass.__init__ def init(self, *args, **kwargs): nohist = kwargs.pop('_mapper_nohistory', False) oldinit(self, *args, **kwargs) if not nohist: # register_new with Unit Of Work objectstore.uow().register_new(self) myclass.__init__ = init # step 2 - set a string identifier that will # locate the classes' primary mapper myclass._mapper = mapper.hashkey # step 3 - add column accessor myclass.c = mapper.columns # step 4 - attribute decorating. # this happens mostly within the package sqlalchemy.attributes # this dictionary will store a series of callables # that generate "history" containers for # individual object attributes myclass._class_managed_attributes = {} # create individual properties for each column - # these objects know how to talk # to the attribute package to create appropriate behavior. # the next example examines the attributes package more closely. myclass.column1 = SmartProperty().property('column1', uselist=False) myclass.column2 = SmartProperty().property('column2', uselist=True)
The attribute package is used when save operations occur to get a handle on modified values. In the example below, a full round-trip attribute tracking operation is illustrated:
import sqlalchemy.attributes as attributes # create an attribute manager. # the sqlalchemy.mapping package keeps one of these around as # 'objectstore.global_attributes' manager = attributes.AttributeManager() # regular old new-style class class MyClass(object): pass # register a scalar and a list attribute manager.register_attribute(MyClass, 'column1', uselist=False) manager.register_attribute(MyClass, 'column2', uselist=True) # create/modify an object obj = MyClass() obj.column1 = 'this is a new value' obj.column2.append('value 1') obj.column2.append('value 2') # get history objects col1_history = manager.get_history(obj, 'column1') col2_history = manager.get_history(obj, 'column2') # whats new ? >>> col1_history.added_items() ['this is a new value'] >>> col2_history.added_items() ['value1', 'value2'] # commit changes manager.commit(obj) # the new values become the "unchanged" values >>> col1_history.added_items() [] >>> col1_history.unchanged_items() ['this is a new value'] >>> col2_history.added_items() [] >>> col2_history.unchanged_items() ['value1', 'value2']
The above AttributeManager also includes a method value_changed which is triggered whenever change events occur on the managed object attributes. The Unit of Work (objectstore) package overrides this method in order to receive change events; its essentially this:
import sqlalchemy.attributes as attributes class UOWAttributeManager(attributes.AttributeManager): def value_changed(self, obj, key, value): if hasattr(obj, '_instance_key'): uow().register_dirty(obj) else: uow().register_new(obj) global_attributes = UOWAttributeManager()
Objects that contain the attribute "_instance_key" are already registered with the Identity Map, and are assumed to have come from the database. They therefore get marked as "dirty" when changes happen. Objects without an "_instance_key" are not from the database, and get marked as "new" when changes happen, although usually this will already have occured via the object's __init__ method.
The package sqlalchemy.types defines the datatype identifiers which may be used when defining table metadata. This package includes a set of generic types, a set of SQL-specific subclasses of those types, and a small extension system used by specific database connectors to adapt these generic types into database-specific type objects.
SQLAlchemy comes with a set of standard generic datatypes, which are defined as classes. They are specified to table meta data using either the class itself, or an instance of the class. Creating an instance of the class allows you to specify parameters for the type, such as string length, numerical precision, etc.
The standard set of generic types are:
class String(TypeEngine): def __init__(self, length=None) class Integer(TypeEngine) class Numeric(TypeEngine): def __init__(self, precision=10, length=2) class Float(TypeEngine): def __init__(self, precision=10) class DateTime(TypeEngine) class Binary(TypeEngine): def __init__(self, length=None) class Boolean(TypeEngine)
More specific subclasses of these types are available, to allow finer grained control over types:
class FLOAT(Numeric) class TEXT(String) class DECIMAL(Numeric) class INT(Integer) INTEGER = INT class TIMESTAMP(DateTime) class DATETIME(DateTime) class CLOB(String) class VARCHAR(String) class CHAR(String) class BLOB(Binary) class BOOLEAN(Boolean)
When using a specific database engine, these types are adapted even further via a set of database-specific subclasses defined by the database engine.
Types also support pre-processing of query parameters as well as post-processing of result set data. You can make your own classes to perform these operations. They are specified by subclassing the desired type class as well as the special mixin TypeDecorator, which manages the adaptation of the underlying type to a database-specific type:
import sqlalchemy.types as types class MyType(types.TypeDecorator, types.String): """basic type that decorates String, prefixes values with "PREFIX:" on the way in and strips it off on the way out.""" def convert_bind_param(self, value, engine): return "PREFIX:" + value def convert_result_value(self, value, engine): return value[7:]
Another example, which illustrates a fully defined datatype. This just overrides the base type class TypeEngine:
import sqlalchemy.types as types class MyType(types.TypeEngine): def __init__(self, precision = 8): self.precision = precision def get_col_spec(self): return "MYTYPE(%s)" % self.precision def convert_bind_param(self, value, engine): return value def convert_result_value(self, value, engine): return value def adapt(self, typeobj): """produces an adaptation of this object given a type which is a subclass of this object""" return typeobj(self.precision) def adapt_args(self): """allows for the adaptation of this TypeEngine object into a new kind of type depending on its arguments.""" return self
the schema module provides the building blocks for database metadata. This means all the entities within a SQL database that we might want to look at, modify, or create and delete are described by these objects, in a database-agnostic way.
A structure of SchemaItems also provides a "visitor" interface which is the primary method by which other methods operate upon the schema. The SQL package extends this structure with its own clause-specific objects as well as the visitor interface, so that the schema package "plugs in" to the SQL package.
represents a column in a database table.
A plain default value on a column. this could correspond to a constant, a callable function, or a SQL clause.
defines a ForeignKey constraint between two columns. ForeignKey is specified as an argument to a Column object.
a default that takes effect on the database side
a factory object used to create implementations for schema objects. This object is the ultimate base class for the engine.SQLEngine class.
base class for an object that traverses across Schema structures.
represents a sequence, which applies to Oracle and Postgres databases.
represents a relational database table. Be sure to look at sqlalchemy.sql.TableImpl for additional methods defined on a Table.
Defines the SQLEngine class, which serves as the primary "database" object used throughout the sql construction and object-relational mapper packages. A SQLEngine is a facade around a single connection pool corresponding to a particular set of connection parameters, and provides thread-local transactional methods and statement execution methods for Connection objects. It also provides a facade around a Cursor object to allow richer column selection for result rows as well as type conversion operations, known as a ResultProxy.
A SQLEngine is provided to an application as a subclass that is specific to a particular type of DBAPI, and is the central switching point for abstracting different kinds of database behavior into a consistent set of behaviors. It provides a variety of factory methods to produce everything specific to a certain kind of database, including a Compiler, schema creation/dropping objects, and TableImpl and ColumnImpl objects to augment the behavior of table metadata objects.
The term "database-specific" will be used to describe any object or function that has behavior corresponding to a particular vendor, such as mysql-specific, sqlite-specific, etc.
The central "database" object used by an application. Subclasses of this object is used by the schema and SQL construction packages to provide database-specific behaviors, as well as an execution and thread-local transaction context. SQLEngines are constructed via the create_engine() function inside this package.
wraps a DBAPI cursor object to provide access to row columns based on integer position, case-insensitive column name, or by schema.Column object. e.g.: row = fetchone()
col1 = row[0] # access via integer position
col2 = row['col2'] # access via name
col3 = row[mytable.c.mycol] # access via Column object. ResultProxy also contains a map of TypeEngine objects and will invoke the appropriate convert_result_value() method before returning columns.
defines the base components of SQL expression trees.
represents a compiled SQL expression. the __str__ method of the Compiled object should produce the actual text of the statement. Compiled objects are specific to the database library that created them, and also may or may not be specific to the columns referenced within a particular set of bind parameters. In no case should the Compiled object be dependent on the actual values of those bind parameters, even though it may reference those values as defaults.
base class for elements of a programmatically constructed SQL expression.
attached to a schema.Table to provide it with a Selectable interface as well as other functions
provides a connection pool implementation, which optionally manages connections on a thread local basis. Also provides a DBAPI2 transparency layer so that pools can be managed automatically, based on module type and connect arguments, simply by calling regular DBAPI connect() methods.
proxies a DBAPI2 connect() call to a pooled connection keyed to the specific connect parameters.
uses Queue.Queue to maintain a fixed-size list of connections.
Maintains one connection per each thread, never moving to another thread. this is used for SQLite and other databases with a similar restriction.
the mapper package provides object-relational functionality, building upon the schema and sql packages and tying operations to class properties and constructors.
maintains all currently loaded objects in memory, using the "identity map" pattern. Also provides a "unit of work" object which tracks changes to objects so that they may be properly persisted within a transactional scope.
Maintains a UnitOfWork instance, including transaction state.