Deployers: Configuring Storage

Storage Managers insulate an application developer from the specifics of databases, query languages, and cache mechanisms. As the deployer of a Dejavu application, you get to be in control of these specifics. But don't worry; in the vast majority of cases, you will set up a single database with just two lines in a configuration file. Often, the application developer will have already prepared default config files which you can simply "plug and play". But if you need more control over your data storage, you have it, without becoming a programmer.

Configuration Files

When you deploy an app built with Dejavu, you must specify Storage Managers to use for persisting application objects. This is usually done through an ini-style configuration file. Here's a short example:

[Junct]
Class: access
Connect: "PROVIDER=MICROSOFT.JET.OLEDB.4.0;DATA SOURCE=D:\data\junct.mdb;"
The first line of our example ("[Junct]") names the Storage Manager; each [section] in your conf file defines a different SM. You can use whatever name you like here; in this example, we used the name of the application. The second line tells Dejavu the class of SM we'd like to use. For most applications, you'll decide which class to use based on the database you want to use. Our example declares that we want to persist our application data in an "MS Access" (i.e., Jet) database. The third line in our example is a standard ADO Connect string. The MS Access class requires this entry; other SM's may not.

Common Configuration Entries

There are a few configuration entries which (probably) apply to all Storage Managers:

KeyExample ValueDescription
Class cache, or dejavu.storage.CachingProxy Which backend to use when instantiating this StorageManager. You may supply a known short name or the full dotted-package name.
Load Order 5 Optional. The order in which to load this SM. Lower numbers are loaded first. SM's without a Load Order default to 0.
Shutdown Order 10 Optional. The order in which to shut down this SM. Lower numbers are shut down first. SM's without a Shutdown Order default to 0.
Units [UnitCollection, UnitEngine, UnitEngineRule, FieldDashboardSumSet] Optional. Declares which Unit classes to manage with this SM (see below).

The "Units" entry is what you will use to separate application objects into separate stores (if you need to). The objects in an application which need to be stored are called "Units", and each Unit is of a certain Unit class. If you specify a "Units" entry, then only Units of those classes will be managed by that Storage Manager. If you do not specify such an entry, then all Units will be handled by that Storage Manager. This means that only one SM should be missing this entry.

Database Storage Managers

Microsoft SQL Server / Microsoft Access (Jet)

This module was developed against ADO 2.7 and 2.8, using MSDE, SQL Server 2000, and Access 2000. Configuration entries:

PostgreSQL (pyPgSQL)

This class was developed against PostgreSQL 8.0.0 rc-1 on Win2k, and also tested on PostgreSQL 7.6.6-6 on Debian "sarge". Configuration entries:

PostgreSQL (psycopg2)

This class was developed against PostgreSQL 8.0.0 rc-1 on Win2k, using psycopg2 version '2.0.5.1 (dec dt ext pq3)'. Configuration entries:

MySQL (MySQLdb)

This class was developed against mysql Ver 14.7 Distrib 4.1.8, for Win95/Win98 (i32), and also tested on mysql Ver 12.22 Distrib 4.0.23, for pc-linux-gnu (i386). Configuration entries:

SQLite (pysqlite/sqlite3)

This class was developed against sqlite 3.0.8 (pysqlite-1.1.6.win32-py2.3), sqlite 3.3.3 (pysqlite-1.1.7.win32-py2.4), sqlite 2.8.15-3 on Debian "sarge", and sqlite 3.3.4 (python 2.5 on win2k). If you have Python 2.5 or later, the builtin _sqlite3 library will be used; otherwise, you need to install pysqlite 1.x. Configuration entries:

Firebird (kinterbasdb)

This class was developed against KInterbasDB Version: (3, 2, 0, 'alpha', 1) and Server Version: 'WI-V1.5.2.4731 Firebird 1.5' on Win2k. Configuration entries:

The Firebird Storage Manager is new and not yet fully thread-safe. Patches welcome.

Common Database Configuration Entries

In addition to the above, Storage Managers for databases (probably) accept these additional options:

KeyExample ValueDescription
poolsize 10 Optional. Defaults to 10. If nonzero, connections will be pooled (up to a total equal to Pool Size). If zero, no pool will be used; each statement (!) will use a new connection.
Prefix myapp_ Optional. If specified, all tables in the database will have names starting with this prefix. If not provided, it defaults to "" (empty). This helps if you need to mix Dejavu tables with tables from another application. Leave blank if you want no prefix.
Type Adapter myapp.storage.FieldTypeAdapterForMyDB Optional. The "Type Adapter" is used to map Python types to database column types for use in CREATE TABLE statements; for example, the Python float type might be mapped to a REAL column type. If you don't like the default column types which your Storage Manager provides, you can write your own adapter and declare its use here. The value should be the full dotted package name of the class you wish to use.
To Adapter myapp.storage.AdapterToMyDBSQL Optional. The "To Adapter" is used to map Python values to database values for use in SQL statements; for example, the Python str type usually needs to be wrapped in quote marks. If you don't like the SQL which your Storage Manager generates, you can write your own adapter and declare its use here. The value should be the full dotted package name of the class you wish to use.
From Adapter myapp.storage.AdapterFromMyDB Optional. The "From Adapter" is used to map incoming database values (i.e., the results of a SELECT query) to Python values; for example, your database may return a date value as a string, which must then be converted to the Python datetime.date type. If you don't like the default coercions which your Storage Manager provides, you can write your own adapter and declare its use here. The value should be the full dotted package name of the class you wish to use.
default_isolation "READ COMMITTED" Optional. All database SM's already have a value for this, but you can select another if you wish. This value should be a "native value" for your database's particular transaction mechanisms. For example, PostgreSQL uses ANSI/SQL names like "READ COMMITTED", but Firebird uses library constants like kinterbasdb.isc_tpb_read_committed.

Other Storage Managers

RAM

Persists Units in RAM; all Units are lost when the process exits.

Shelve

Persists Units to shelve-type files. Extremely simple implementation; everything is pickled. Querying will be slow--every Unit is sucked in one-by-one and tested in pure Python using Expression(unit). But for many applications, you don't need heavyweight query tools; for example, an online forum may only need topic content looked up by ID. Or small system tables that only get read at startup might benefit.

Developers note: The shelve implementation in Dejavu does not use "writeback"; that is, changes you make to data are stored only in memory until each shelf has its close method called. If close is never called, your changes are lost! The easiest way to ensure that your changes are saved is to call arena.shutdown() when your app is closing. Since one of the design goals of Dejavu is to allow deployers to choose which backend to use, your applications should always guarantee that arena.shutdown() is called on program exit.

Configuration entries:

Folders

Persists Units to a filesystem, one folder per class. Each folder contains subfolders, one per Unit, with the Unit identity as the folder name. Each of those unit folders contains one file for each Unit Property. For example:

root/
    Album/
    |   78952/
    |       Name.txt
    |       Artist.txt
    Song/
        1372/
        |   AlbumID.txt
        |   Data.mp3
        88/
            AlbumID.txt
            Data.mp3

This is an extremely simple implementation; every value that is not of type str is pickled. Querying will be slow--every Unit is sucked in one-by-one and tested in pure Python. But for many applications, you don't need heavyweight query tools; for example, an upload site may only need files looked up by ID.

Configuration entries:

Middleware

Some Storage Managers act as "middleware", and can be chained together to provide layered functionality. Consider, for example, the CachingProxy class; it has another Storage Manager "behind it", which it proxies. It can be used to cache objects between client connections independently from the underlying, database-specific Storage Manager. The beauty of this design is that the decision to use a CachingProxy is completely up to the deployer, not the application developer. The deployer can separate stores, test response times, and address other integration concerns on their own systems.

Caching Proxy

Use this class to persist Units in memory between client connections. It must proxy another Storage Manager. Configuration entries:

Burned Proxy

Use this class to persist Units in memory between client connections. It needs another Storage Manager to proxy. Unlike the Caching Proxy above, this Storage Manager recalls all Units at once upon the first request, and won't recall them again from storage. They are "burned" into memory for the lifetime of the application. Configuration entries:

SM Comparison Chart

When selecting a storage implementation, you should be aware of the strengths and limitations of each option. The following chart should help you decide.

First, it shows you which stores do and do not support certain optional features of Dejavu. Your application developer should provide you with a list of any features which they require.

Second, it shows you which stores have performance or boundary issues and where. When developing applications, you should avoid these issues either by coding alternative solutions, or by recommending to your deployers that they avoid the problematic stores. Note that some limitations are inherent in the storage mechanism itself, while some are limitations of the current Storage Manager for that mechanism.

access firebird mysql postgres ram shelve sqlite sqlserver folders
Connection Pool [5] N (single only) P P P N N P P N
Transactions Y Y Y Y N N Y Y N
Indexes Y Y Y Y N N Y Y N
Max identifier length 64 31 64 63 P P no limit? 128 OS-dependent
Case-sensitive identifiers Y Y Unix only Y Y Y Y Y Y [3]
Case-sensitive LIKE ("a in b") P Y Y Y P P Y P P
Case-sensitive string comparison ("a" > "A") <, <=, ==, !=, >, >= Y Y Y P P Y <, <=, ==, !=, >, >= P
Wildcard literals in LIKE ("a in b") Y Y Y Y P P 3.0.8+ Y P
Autoincrement Y Y Y Y P P 3.1.0+ Y P
add/drop/rename property Y Y Y Y Y Y P [2]
(add: 3.2.0+)
Y Y
access firebird mysql postgres ram shelve sqlite sqlserver folders
fixed point/decimal precision (in decimal digits) 12 18 16 1000 P (pickle) P (pickle) 0 (always uses TEXT instead) 12 P (pickle)
Max str/unicode bytes 1 GB [6] 32765 (255 for an index) 8000 (row limit) 1 GB? P (pickle) P (pickle) 1 MB (row limit) 8000 [4] P (pickle)
datetime ranges 0100-01-01 to 9999-12-31 1753-01-01 to 9999-12-31 1000-01-01 00:00:00 to 9999-12-31 23:59:59 4713 BC to 5874897 AD P P 4714-11-24 BC to ??? 1753-01-01 00:00:00.0 to 9999-12-31 23:59:59.997 P
datetime precision 1 second 1 second 1 second 1 microsecond P P 1 second 1 second P
dejavu.year, month, day functions Y P Y Y P P 3.2.3+ [1] Y P
dejavu.now, today functions Y now Y Y P P 3.2.3+ [1] Y P
startswith, endswith, containedby, dejavu.icontainedby, dejavu.icontains, dejavu.istartswith, dejavu.iendswith Y Y Y Y P P Y Y P
builtin function: len Y P Y Y P P Y Y P
access firebird mysql postgres ram shelve sqlite sqlserver folders
READ UNCOMMITTED Y N Y N [7] N Y
READ COMMITTED N Y Y Y N Y (timeout)
REPEATABLE READ N N [7] N [7] N [7] N Y (timeout)
SERIALIZABLE N Y Y (timeout) Y Y [8] Y (timeout)
Change isolation inside transaction N N Y Y N Y

[1] In order to use native date functions in SQLite, you must be storing your date and time values in one of the acceptable formats. See the SQLite wiki for more information. Once you have verified that you are using such a format, you must then set AdapterToSQLite.using_perfect_dates to True. This can be done with the configuration entry: Perfect Dates: True.

[2] SQLite must copy the entire table to an intermediate table and then to a new, final table in order to alter tables. Beginning in 3.2.0, adding columns may now be performed natively (but not renaming or dropping them).

[3] The Folders store keeps identifer values and property names in folder and file names. Not all filesystems support case-sensitive file/folder names.

[4] Microsoft SQL Server does not allow comparisons on string fields larger than 8000 characters.

[5] Dejavu provides connection pool factories in pure Python, and does not yet make any attempt to use native pooling features.

[6] Microsoft Access "MEMO" fields have a 1 GB limit, but so does the entire database. Memo fields also cannot be used as join keys; set hints['bytes'] = 255 or less to use VARCHAR instead.

[7] Some databases over-protect at various isolation levels. For example, "REPEATABLE READ" should prevent fuzzy reads but allow phantoms, but MySQL's and Firebird's REPEATABLE READ prevent both. PostgreSQL only uses two isolation levels internally, so that selecting "READ UNCOMMITTED" behaves like "READ COMMITTED" and "REPEATABLE READ" behaves like "SERIALIZABLE".

[8] SQLite :memory: databases cannot use multiple connections, so a single connection is used for all threads. However, this means that transactions are generally not allowed for :memory: databases when using multiple threads (because multiple transactions would overlap on the same connection and not be isolated at all!).