pycerberus is a framework to check user data thoroughly so that you can protect your application from malicious (or just garbled) input data.
pycerberus is just a Python library which uses setuptools so it does not require a special setup. It has no dependencies besides the standard Python library. Python 2.4-2.6 is supported.
In every software you must check carefully that untrusted user input data matches your expectations. Unvalidated user input is a common source of security flaws. However many checks are repetitive and validation logic tend to be scattered all around the code. Because basic checks are duplicated, developers forget to check also for uncommon edge cases. Eventually there is often also some code to convert the input data (usually strings) to more convenient Python data types like int or bool.
pycerberus is a framework that tackles these common problems and allows you to write tailored validators to perform additional checks. Furthermore the framework also has built-in support for less common (but important) use cases like internationalization.
The framework itself is heavily inspired by FormEncode by Ian Bicking. Therefore most of FormEncode’s design rationale is directly applicable to pycerberus. However several things about FormEncode annoyed me so much that I decided to write my own library when I needed one for my SMTP server project pymta.
pycerberus separates validation rules (“Validators”) from the objects they validate against. It might be tempting to derive the validation rules from restrictions you specified earlier (e.g. from a class which is mapped by an ORM to a database). However that approach completely ignores that validation typically depends on context: In an API you have typically a lot more freedom in regard to allowed values compared to a public web interface where input needs to conform to a lot more checks. With a model where you declare the validation explicitly, this is possible. Also it is quite easy writing some code that generates a bottom line of validation rules automatically based on your ORM model and add additional restrictions depending on the context.
As pycerberus is completely context-agnostic (not being bundled with a specific framework), you can use it in many different places (e.g. web applications with different frameworks, server applications, check parameters in a library, ...).
Further reading: FormEncode’s design rationale - most of the design ideas are also present in pycerberus.
Currently (February 2010, version 0.1) pycerberus is at a very basic stage - though with very solid foundations. The API for single validators is basically complete, i18n support is built in and there is decent documentation covering all important aspects.
The future development will focus on compound validators (Schema) and repeating fields. After that, I’ll try to increase the number of built-in validators for specific domains (e.g. correct email address validation, validating host names, localized numbers). Another interesting topic will be integration into different frameworks like TurboGears and trac.
In pycerberus “Validators” are used to specify validation rules which ensure that the input matches your expectations. Every basic validator validates a just single value (e.g. one specific input field in a web application). When the validation was successful, the validated and converted value is returned. If something is wrong with the data, an exception is raised:
from pycerberus.validators import IntegerValidator
IntegerValidator().process('42') # returns 42 as int
pycerberus puts conversion and validation together in one call because of two main reasons:
Every validation error will trigger an exception, usually an InvalidDataError. This exception will contain a translated error message which can be presented to the user, a key so you can identify the exact error programmatically and the original, unmodified value:
from pycerberus.errors import InvalidDataError
from pycerberus.validators import IntegerValidator
try:
IntegerValidator().process('foo')
except InvalidDataError, e:
error_message = e.msg # u'Please enter a number.'
original_value = e.value # 'foo'
error_key = e.key # 'invalid_number'
You can configure the behavior of the validator when instantiating it. For example, if you pass required=False to the constructor, most validators will also accept None as a valid value:
IntegerValidator(required=True).process(None) # -> validation error
IntegerValidator(required=False).process(None) # None
Validators support different configuration options which are explained along the validator description.
All validators support an optional context argument (which defaults to an emtpy dict). It is used to plug validators into your application and make them aware of the overall system state: For example a validator must know which locale it should use to translate an error message to the correct language without relying on some global variables:
context = {'locale': 'de'}
validator = IntegerValidator()
validator.process('foo', context=context) # u'Bitte geben Sie eine Zahl ein.'
The context variable is especially useful when writing custom validators - locale is the only context information that pycerberus cares about.
After all, using only built-in validators won’t help you much: You need custom validation rules which means that you need to write your own validators.
pycerberus comes with two classes that can serve as a good base when you start writing a custom validator: The BaseValidator only provides the absolutely required set of API so you have maximum freedom. The Validator class itself is inherited from the BaseValidator and defines a more sophisticated API and i18n support. Usually you should use the Validator class.
The BaseValidator implements only the minimally required methods. Therefore it does not put many constraints on you. Most users probably want to use the Validator class which already implements some commonly used features.
Return all messages which are defined by this validator as a key/message dictionary. Calling this method might be costly when you have a lot of messages and returning them is expensive.
You must declare all your messages in this function so that all keys are known after this method was called.
This is the method to validate your input. The validator returns a (Python) representation of the given input value.
In case of errors a InvalidDataError is thrown.
The Validator is the base class of most validators and implements some commonly used features like required values (raise exception if no value was provided) or default values in case no value is given.
This validator splits conversion and validation into two separate steps: When a value is process()``ed, the validator first calls ``convert() which performs some checks on the value and eventually returns the converted value. Only if the value was converted correctly, the validate() function can do additional checks on the converted value and possibly raise an Exception in case of errors. If you only want to do additional checks (but no conversion) in your validator, you can implement validate() and simply assume that you get the correct Python type (e.g. int).
Of course if you can also raise a ValidationError inside of convert() - often errors can only be detected during the conversion process.
By default, a validator will raise an InvalidDataError if no value was given (unless you set a default value). If required is False, the default is None. All exceptions thrown by validators must be derived from ValidationError. Exceptions caused by invalid user input should use InvalidDataError or one of the subclasses.
In order to prevent programmer errors, an exception will be raised if you set required to True but provide a default value as well.
Perform additional checks on the value which was processed successfully before (otherwise this method is not called). Raise an InvalidDataError if the input data is invalid.
You can implement only this method in your validator if you just want to add additional restrictions without touching the actual conversion.
This method must not modify the converted_value.
pycerberus uses simple_super so you can just say ‘self.super()’ in your custom validator classes. This will call the super implementation with just the same parameters as your method was called.
Now it’s time to put all together. This validator demonstrates most of the API as explained so far:
class UnicodeValidator(Validator):
def __init__(self, max=None):
self.super()
self._max_length = max
def messages(self):
return {
'invalid_type': _(u'Validator got unexpected input (expected string, got %(classname)s).'),
'too_long': _(u'Please enter at maximum %(max_length) characters.')
}
def convert(self, value, context):
try:
return unicode(value, 'UTF-8')
except Exception:
classname = value.__class__.__name__
self.error('invalid_type', value, context, classname=classname)
def validate(self, converted_value, context):
if self._max_length is None:
return
if len(converted_value) > self._max_length:
self.error('too_long', converted_value, context, max_length=self._max_length)
The validator will convert all input to unicode strings (using the UTF-8 encoding). It also checks for a maximum length of the string.
You can see that all the conversion is done in convert() while additional validation is encapsulated in validate(). This can help you keeping your methods small.
In case there is an error the error() method will raise an InvalidDataError. You select the error message to show by passing a string constant key which identifies the message. The key can be used later to adapt the user interface without relying the message itself (e.g. show an additional help box in the user interface if the user typed in the wrong password).
The error messages are declared in the messages(). You’ll notice that the message strings can also contain variable parts. You can use these variable parts to give the user some additional hints about what was wrong with the data.
Modern applications must be able to handle different languages. Internationalization (i18n) in pycerberus refers to validating locale-dependent input data (e.g. different decimal separator characters) as well as validation errors in different languages. The former aspect is not yet covered by default but you should be able to write custom validators easily.
All messages from validators included in pycerberus are translated in different languages using the standard gettext library. The language of validation error messages will be chosen depending on the locale which is given in the state dictionary,
i18n support in pycerberus is a bit broader than just translating existing error messages. i18n becomes interesting when you write your own validators (based on the ones that come with pycerberus) and your translations need to play along with the built-in ones:
All i18n support in pycerberus aims to provide custom validators with a nice, simple-to-use API while maintaining the flexibility that serious applications need.
If you want to get translated error messages from a validator, you set the correct ‘’context’‘. formencode looks for a key named ‘locale’ in the context dictionary:
validator = IntegerValidator()
validator.process('foo', context={'locale': 'en'}) # u'Please enter a number.'
validator.process('foo', context={'locale': 'de'}) # u'Bitte geben Sie eine Zahl ein.'
Usually you don’t have to know much about how pycerberus uses gettext internally. Just for completeness: The default domain is ‘pycerberus’. By default translations (.mo files) are loaded from pycerberus.locales, with a fall back to the system-wide locale dir ‘’/usr/share/locale’‘.
To translate messages from a custom validator, you need to declare them in the messages() method and mark the message strings as translatable:
from pycerberus.api import Validator
from pycerberus.i18n import _
class MyValidator(Validator):
def messages(self):
return {
'foo': _('A message.'),
'bar': _('Another message.'),
}
# your validation logic ...
Afterwards you just have to start the usual gettext process:
Assume your custom validator is a subclass of a built-in validator but you don’t like the built-in translation. Of course you can replace pycerberus’ mo files directly. However there is also another way where you don’t have to change pycerberus itself:
class CustomValidatorThatOverridesTranslations(Validator):
def messages(self):
return {'empty': _('My custom message if the value is empty'),
'custom': _('A custom message')}
# ...
This validator will use a different message for the ‘empty’ error and you can define custom translations for this key in your own .po files.
The gettext framework is configurable, e.g. in which directory your .mo files are located and which domain (.mo filename) should be used. In pycerberus this is configurable by validator:
class ValidatorWithCustomGettextOptions(Validator):
def messages(self):
return {'custom': _('A custom message')}
def translation_parameters(self, context):
return {'domain': 'myapp', 'localedir': '/home/foo/locale'}
# ...
These translation parameters are passed directly to the ‘’gettext’’ call so you can read about the available options in the gettext documentation. Your parameter will be applied for all messages which were declared in your validator class (but not in others). So you can modify the parameters for your own validator but keep all the existing parameters (and translations) for built-in validators.
Sometimes you don’t want to use gettext. For instance you could store translations in a relational database so that your users can update the messages themselves without fiddling with gettext tools:
class ValidatorWithNonGettextTranslation(FrameworkValidator):
def messages(self):
return {'custom': _('A custom message')}
def translate_message(self, key, native_message, translation_parameters, context):
# fetch the translation for 'native_message' from somewhere
translated_message = get_translation_from_db(native_message)
return translated_message
You can use this mechanism to plug in arbitrary translation systems into gettext. Your translation mechanism is (again) only applied to keys which were defined by your specific validator class. If you want to use your translation system also for keys which were defined by built-in validators, you need to re-define these keys in your class as shown in the previous section.
So far I did not bother setting up a mailing list. If you have questions, please send an email to Felix.Schwarz@oss.schwarz.eu. When there are some users for pycerberus, I’ll create a mailing list.
pycerberus is licensed under the MIT license. As there are no other dependencies (besides Python itself), you can easily use pycerberus in proprietary as well as GPL applications.