Database source

The database source provides an interface for querying data from a table in the database.

Note

The database source requires SQLAlchemy to be installed and connected to your database. It also expects your tables to be described using Elixir. See Installation.

The source intentionally does not do any joins for performance reasons. Because the reports will often be run over very large data sets, we want to be sure that running the reports is not prohibitively time-consuming or hogging the database.

If you want to do produce reports over multiple tables, the best option is generally to pre-join the tables into one merged reporting table, and then run the report over that. In “enterprise-ese” this is basically a star-schema table in your database with an ETL process to populate your data into it. The interwebs have plenty to say about this topic, so we’ll leave this issue in your capable hands.

If a whole new reporting database table is too heavy-handed for your use case, there are a couple of simpler options. Often all you want is to pull in a bit of data from another table, which you can do with the Lookup column. You can also use the Merge source to combine the results of two or more reports over two or more database tables.

When using columns from the database source, you’ll be expected to provide an extra report attribute to specify which table to pull the data from:

  • database_entity: This report attribute specifies the database table to query. It should be specified as a dotted-path string pointing to an Elixir Entity subclass. For example:

    database_entity = 'model.reporting.ReportInfluencer'
    

Column types

class blingalytics.sources.database.GroupBy(entity_column, include_null=False, **kwargs)

Performs a group-by operation on the given database column. It takes one positional argument: a string specifying the column to group by. There is also an optional keyword argument:

  • include_null: Whether the database column you’re grouping on should filter out or include the null group. Defaults to False, which will not include the null group.

Any group-by columns should generally be listed in your report’s keys. You are free to use more than one of these in your report, which will be treated as a multi-group-by operation in the database.

This column does not compute or output a footer.

class blingalytics.sources.database.Sum(entity_column, **kwargs)

Performs a database sum aggregation. The first argument should be a string specifying the database column to sum.

class blingalytics.sources.database.Count(entity_column, distinct=False, **kwargs)

Performs a database count aggregation. The first argument should be a string specifying the database column to count on. This also accepts one extra keyword argument:

  • distinct: Whether to perform a distinct count or not. Defaults to False.
class blingalytics.sources.database.BoolAnd(entity_column, **kwargs)

Note

Using this column requires that your database have a bool_and aggregation function.

Performs a boolean-and aggregation. This aggregates to true if all the aggregated values are true; otherwise, it will aggregate to false. The first argument should be a string specifying the database column to aggregate on.

class blingalytics.sources.database.BoolOr(entity_column, **kwargs)

Note

Using this column requires that your database have a bool_or aggregation function.

Performs a boolean-or aggregation. This aggregates to true if any of the aggregated values are true; otherwise, it will aggregate to false. The first argument should be a string specifying the database column to aggregate on.

class blingalytics.sources.database.ArrayAgg(entity_column, **kwargs)

Note

Using this column requires that your database have an array_agg aggregation function.

Performs an array aggregation. This essentially compiles a list of all the values in all the rows being aggregated. The first argument should be a string specifying the database column to aggregate.

class blingalytics.sources.database.First(entity_column, **kwargs)

Note

Using this column requires that your database have a first aggregation function. In many databases, you will have to add this aggregate yourself. For example, here is a PostgreSQL implementation.

Performs a database first aggregation to return the first value found. The first argument should be a string specifying the database column.

class blingalytics.sources.database.Lookup(entity, lookup_attr, pk_column, pk_attr='id', **kwargs)

This column allows you to “cheat” on the no-joins rule and look up a value from an arbitrary database table by primary key.

This column expects several positional arguments to specify how to do the lookup:

  • The Elixir Entity object to look up from, specified as a dotted-string reference.
  • A string specifying the column attribute on the Entity you want to look up.
  • The name of the column in the report which is the primary key to use for the lookup in this other table.

The primary key name on the lookup table is assumed to be ‘id’. If it’s different, you can use the keyword argument:

  • pk_attr: The name of the primary key column in the lookup database table. Defaults to 'id'.

For example:

database.Lookup('project.models.Publisher', 'name', 'publisher_id',
    format=formats.String)

Because the lookups are only done by primary key and are bulked up into just a few operations, this isn’t as taxing on the database as it could be. But doing a lot of lookups on large datasets can get pretty resource-intensive, so it’s best to be judicious.

Filter types

class blingalytics.sources.database.ColumnTransform(filter_func, **kwargs)

A transform allows you to alter a database column for every report column or other filter that needs to access it. For example, this can be used to provide a timezone offset option that shifts all date and time columns by a certain number of hours.

This filter expects one positional argument, a function defining the transform operation. This function will be passed the Elixir column object as its first argument. If a widget is defined for this filter, the function will also be passed a second argument, which is the user input value. The function should return the altered column object.

This filter requires the columns keyword argument, which should be a list of strings referring to the columns this transform will be applied to.

For example, if you have a database column with the number of hours since the epoch and want to transform it to the number of days since the epoch, with a given number of hours offset for timezone, you can use:

database.ColumnTransform(
    lambda column, user_input: column.op('+')(user_input).op('/')(24),
    columns=['purchase_time', 'user_last_login_time'],
    widget=widgets.Select(choices=TIMEZONE_CHOICES))
class blingalytics.sources.database.QueryFilter(filter_func, **kwargs)

Filters the database query or queries for this report.

This filter expects one positional argument, a function defining the filter operation. This function will be passed as its first argument the Entity object. If a widget is defined for this filter, the function will also be passed a second argument, which is the user input value. The function should return a filtering term that can be used to filter a query on that entity. Or, based on the user input, the filter function can return None to indicate that no filtering should be done.

More specifically, the returned object should be a sqlalchemy.sql.expression._BinaryExpression object. You will generally build these in a lambda like so:

database.QueryFilter(lambda entity: entity.is_active == True)

Or, with a user input widget:

database.QueryFilter(
    lambda entity, user_input: entity.user_id.in_(user_input),
    widget=Autocomplete(multiple=True))

Key ranges

class blingalytics.sources.database.TableKeyRange(entity, pk_column='id', filters=[])

This key range ensures that there is a key for every row in the given database table. This is primarily useful to ensure that you get every row ID from an external table in your report.

This key range takes one positional argument, a dotted-string reference to the Entity to pull from. It also takes two optional keyword arguments:

  • pk_column: The column name for the primary key to use from the table. Defaults to 'id'.
  • filters: Either a single filter or a list of filters. These filters will be applied when pulling the keys from this database table.

Table Of Contents

Previous topic

Key range source

Next topic

Static source

This Page