Usage Guide

Overview

At the time of this writing, popular key/value servers include Memcached, Redis, and Riak. While these tools all have different usage focuses, they all have in common that the storage model is based on the retrieval of a value based on a key; as such, they are all potentially suitable for caching, particularly Memcached which is first and foremost designed for caching.

With a caching system in mind, dogpile.cache provides an interface to a particular Python API targeted at that system.

A dogpile.cache configuration consists of the following components:

  • A region, which is an instance of CacheRegion, and defines the configuration details for a particular cache backend. The CacheRegion can be considered the “front end” used by applications.
  • A backend, which is an instance of CacheBackend, describing how values are stored and retrieved from a backend. This interface specifies only get(), set() and delete(). The actual kind of CacheBackend in use for a particular CacheRegion is determined by the underlying Python API being used to talk to the cache, such as Pylibmc. The CacheBackend is instantiated behind the scenes and not directly accessed by applications under normal circumstances.
  • Value generation functions. These are user-defined functions that generate new values to be placed in the cache. While dogpile.cache offers the usual “set” approach of placing data into the cache, the usual mode of usage is to only instruct it to “get” a value, passing it a creation function which will be used to generate a new value if and only if one is needed. This “get-or-create” pattern is the entire key to the “Dogpile” system, which coordinates a single value creation operation among many concurrent get operations for a particular key, eliminating the issue of an expired value being redundantly re-generated by many workers simultaneously.

Rudimentary Usage

dogpile.cache includes a Pylibmc backend. A basic configuration looks like:

from dogpile.cache import make_region

region = make_region().configure(
    'dogpile.cache.pylibmc',
    expiration_time = 3600,
    arguments = {
        'url':["127.0.0.1"],
    }
)

@region.cache_on_arguments()
def load_user_info(user_id):
    return some_database.lookup_user_by_id(user_id)

Above, we create a CacheRegion using the make_region() function, then apply the backend configuration via the CacheRegion.configure() method, which returns the region. The name of the backend is the only argument required by CacheRegion.configure() itself, in this case dogpile.cache.pylibmc. However, in this specific case, the pylibmc backend also requires that the URL of the memcached server be passed within the arguments dictionary.

The configuration is separated into two sections. Upon construction via make_region(), the CacheRegion object is available, typically at module import time, for usage in decorating functions. Additional configuration details passed to CacheRegion.configure() are typically loaded from a configuration file and therefore not necessarily available until runtime, hence the two-step configurational process.

Key arguments passed to CacheRegion.configure() include expiration_time, which is the expiration time passed to the Dogpile lock, and arguments, which are arguments used directly by the backend - in this case we are using arguments that are passed directly to the pylibmc module.

Region Configuration

The make_region() function currently calls the CacheRegion constructor directly.

class dogpile.cache.region.CacheRegion(name=None, function_key_generator=<function function_key_generator at 0x10173cf50>, key_mangler=None)

A front end to a particular cache backend.

Parameters:
  • name – Optional, a string name for the region. This isn’t used internally but can be accessed via the .name parameter, helpful for configuring a region from a config file.
  • function_key_generator

    Optional. A function that will produce a “cache key” given a data creation function and arguments, when using the CacheRegion.cache_on_arguments() method. The structure of this function should be two levels: given the data creation function, return a new function that generates the key based on the given arguments. Such as:

    def my_key_generator(namespace, fn):
        fname = fn.__name__
        def generate_key(*arg):
            return namespace + "_" + fname + "_".join(str(s) for s in arg)
        return generate_key
    
    
    region = make_region(
        function_key_generator = my_key_generator
    ).configure(
        "dogpile.cache.dbm",
        expiration_time=300,
        arguments={
            "filename":"file.dbm"
        }
    )
    

    The namespace is that passed to CacheRegion.cache_on_arguments(). It’s not consulted outside this function, so in fact can be of any form. For example, it can be passed as a tuple, used to specify arguments to pluck from **kw:

    def my_key_generator(namespace, fn):
        def generate_key(*arg, **kw):
            return ":".join(
                    [kw[k] for k in namespace] + 
                    [str(x) for x in arg]
                )
    

    Where the decorator might be used as:

    @my_region.cache_on_arguments(namespace=('x', 'y'))
    def my_function(a, b, **kw):
        return my_data()
    
  • key_mangler – Function which will be used on all incoming keys before passing to the backend. Defaults to None, in which case the key mangling function recommended by the cache backend will be used. A typical mangler is the SHA1 mangler found at sha1_mangle_key() which coerces keys into a SHA1 hash, so that the string length is fixed. To disable all key mangling, set to False.

One you have a CacheRegion, the CacheRegion.cache_on_arguments() method can be used to decorate functions, but the cache itself can’t be used until CacheRegion.configure() is called. The interface for that method is as follows:

CacheRegion.configure(backend, expiration_time=None, arguments=None, _config_argument_dict=None, _config_prefix=None)

Configure a CacheRegion.

The CacheRegion itself is returned.

Parameters:
  • backend – Required. This is the name of the CacheBackend to use, and is resolved by loading the class from the dogpile.cache entrypoint.
  • expiration_time – Optional. The expiration time passed to the dogpile system. The CacheRegion.get_or_create() method as well as the CacheRegion.cache_on_arguments() decorator (though note: not the CacheRegion.get() method) will call upon the value creation function after this time period has passed since the last generation.
  • arguments – Optional. The structure here is passed directly to the constructor of the CacheBackend in use, though is typically a dictionary.

The CacheRegion can also be configured from a dictionary, using the CacheRegion.configure_from_config() method:

CacheRegion.configure_from_config(config_dict, prefix)

Configure from a configuration dictionary and a prefix.

Example:

local_region = make_region()
memcached_region = make_region()

# regions are ready to use for function
# decorators, but not yet for actual caching

# later, when config is available
myconfig = {
    "cache.local.backend":"dogpile.cache.dbm",
    "cache.local.arguments.filename":"/path/to/dbmfile.dbm",
    "cache.memcached.backend":"dogpile.cache.pylibmc",
    "cache.memcached.arguments.url":"127.0.0.1, 10.0.0.1",
}
local_region.configure_from_config(myconfig, "cache.local.")
memcached_region.configure_from_config(myconfig, 
                                    "cache.memcached.")

Using a Region

The CacheRegion object is our front-end interface to a cache. It includes the following methods:

CacheRegion.get(key)

Return a value from the cache, based on the given key.

While it’s typical the key is a string, it’s passed through to the underlying backend so can be of any type recognized by the backend. If the value is not present, returns the token NO_VALUE. NO_VALUE evaluates to False, but is separate from None to distinguish between a cached value of None. Note that the expiration_time argument is not used here - this method is a direct line to the backend’s behavior.

CacheRegion.get_or_create(key, creator, expiration_time=None)

Similar to get, will use the given “creation” function to create a new value if the value does not exist.

This will use the underlying dogpile/ expiration mechanism to determine when/how the creation function is called.

Parameters:
  • key – Key to retrieve
  • creator – function which creates a new value.
  • expiration_time – optional expiration time which will overide the expiration time already configured on this CacheRegion if not None. To set no expiration, use the value -1.
CacheRegion.set(key, value)

Place a new value in the cache under the given key.

CacheRegion.delete(key)

Remove a value from the cache.

This operation is idempotent (can be called multiple times, or on a non-existent key, safely)

CacheRegion.cache_on_arguments(namespace=None, expiration_time=None)

A function decorator that will cache the return value of the function using a key derived from the function itself and its arguments.

E.g.:

@someregion.cache_on_arguments()
def generate_something(x, y):
    return somedatabase.query(x, y)

The decorated function can then be called normally, where data will be pulled from the cache region unless a new value is needed:

result = generate_something(5, 6)

The function is also given an attribute invalidate, which provides for invalidation of the value. Pass to invalidate() the same arguments you’d pass to the function itself to represent a particular value:

generate_something.invalidate(5, 6)

The default key generation will use the name of the function, the module name for the function, the arguments passed, as well as an optional “namespace” parameter in order to generate a cache key.

Given a function one inside the module myapp.tools:

@region.cache_on_arguments(namespace="foo")
def one(a, b):
    return a + b

Above, calling one(3, 4) will produce a cache key as follows:

myapp.tools:one|foo|3, 4

The key generator will ignore an initial argument of self or cls, making the decorator suitable (with caveats) for use with instance or class methods. Given the example:

class MyClass(object):
    @region.cache_on_arguments(namespace="foo")
    def one(self, a, b):
        return a + b

The cache key above for MyClass().one(3, 4) will again produce the same cache key of myapp.tools:one|foo|3, 4 - the name self is skipped.

The namespace parameter is optional, and is used normally to disambiguate two functions of the same name within the same module, as can occur when decorating instance or class methods as below:

class MyClass(object):
    @region.cache_on_arguments(namespace='MC')
    def somemethod(self, x, y):
        ""

class MyOtherClass(object):
    @region.cache_on_arguments(namespace='MOC')
    def somemethod(self, x, y):
        ""

Above, the namespace parameter disambiguates between somemethod on MyClass and MyOtherClass. Python class declaration mechanics otherwise prevent the decorator from having awareness of the MyClass and MyOtherClass names, as the function is received by the decorator before it becomes an instance method.

The function key generation can be entirely replaced on a per-region basis using the function_key_generator argument present on make_region() and CacheRegion. If defaults to function_key_generator().

Parameters:
  • namespace – optional string argument which will be established as part of the cache key. This may be needed to disambiguate functions of the same name within the same source file, such as those associated with classes - note that the decorator itself can’t see the parent class on a function as the class is being declared.
  • expiration_time – if not None, will override the normal expiration time.

Creating Backends

Backends are located using the setuptools entrypoint system. To make life easier for writers of ad-hoc backends, a helper function is included which registers any backend in the same way as if it were part of the existing sys.path.

For example, to create a backend called DictionaryBackend, we subclass CacheBackend:

from dogpile.cache import CacheBackend, NO_VALUE

class DictionaryBackend(CacheBackend):
    def __init__(self, arguments):
        self.cache = {}

    def get(self, key):
        return self.cache.get(key, NO_VALUE)

    def set(self, key, value):
        self.cache[key] = value

    def delete(self, key):
        self.cache.pop(key)

Then make sure the class is available underneath the entrypoint dogpile.cache. If we did this in a setup.py file, it would be in setup() as:

entry_points="""
  [dogpile.cache]
  dictionary = mypackage.mybackend:DictionaryBackend
  """

Alternatively, if we want to register the plugin in the same process space without bothering to install anything, we can use register_backend:

from dogpile.cache import register_backend

register_backend("dictionary", "mypackage.mybackend", "DictionaryBackend")

Our new backend would be usable in a region like this:

from dogpile.cache import make_region

region = make_region("dictionary")

data = region.set("somekey", "somevalue")

The values we receive for the backend here are instances of CachedValue. This is a tuple subclass of length two, of the form:

(payload, metadata)

Where “payload” is the thing being cached, and “metadata” is information we store in the cache - a dictionary which currently has just the “creation time” and a “version identifier” as key/values. If the cache backend requires serialization, pickle or similar can be used on the tuple - the “metadata” portion will always be a small and easily serializable Python structure.

Table Of Contents

Previous topic

Front Matter

Next topic

API

This Page