Package intermine :: Module query :: Class Query
[hide private]
[frames] | no frames]

Class Query

source code

object --+
         |
        Query
Known Subclasses:

A Class representing a structured database query

Objects of this class have properties that model the attributes of the query, and methods for performing the request.

SYNOPSIS

example:

>>> service = Service("http://www.flymine.org/query/service")
>>> query = service.new_query()
>>>
>>> query.add_view("Gene.symbol", "Gene.pathways.name", "Gene.proteins.symbol")
>>> query.add_sort_order("Gene.pathways.name")
>>>
>>> query.add_constraint("Gene", "LOOKUP", "eve")
>>> query.add_constraint("Gene.pathways.name", "=", "Phosphate*")
>>>
>>> query.set_logic("A or B")
>>>
>>> for row in query.results():
...     handle_row(row)

Query objects represent structured requests for information over the database housed at the datawarehouse whose webservice you are querying. They utilise some of the concepts of relational databases, within an object-related ORM context. If you don't know what that means, don't worry: you don't need to write SQL, and the queries will be fast.

PRINCIPLES

The data model represents tables in the databases as classes, with records within tables as instances of that class. The columns of the database are the fields of that object:

 The Gene table - showing two records/objects
 +---------------------------------------------------+
 | id  | symbol  | length | cyto-location | organism |
 +----------------------------------------+----------+
 | 01  | eve     | 1539   | 46C10-46C10   |  01      |
 +----------------------------------------+----------+
 | 02  | zen     | 1331   | 84A5-84A5     |  01      |
 +----------------------------------------+----------+
 ...

 The organism table - showing one record/object
 +----------------------------------+
 | id  | name            | taxon id |
 +----------------------------------+
 | 01  | D. melanogaster | 7227     |
 +----------------------------------+

Columns that contain a meaningful value are known as 'attributes' (in the tables above, that is everything except the id columns). The other columns (such as "organism" in the gene table) are ones that reference records of other tables (ie. other objects), and are called references. You can refer to any field in any class, that has a connection, however tenuous, with a table, by using dotted path notation:

 Gene.organism.name -> the name column in the organism table, referenced by a record in the gene table

These paths, and the connections between records and tables they represent, are the basis for the structure of InterMine queries.

THE STUCTURE OF A QUERY

A query has two principle sets of properties:

A query must have at least one output column in its view, but constraints are optional - if you don't include any, you will get back every record from the table (every object of that type)

In addition, the query must be coherent: if you have information about an organism, and you want a list of genes, then the "Gene" table should be the basis for your query, and as such the Gene class, which represents this table, should be the root of all the paths that appear in it:

So, to take a simple example:

   I have an organism name, and I want a list of genes:

The view is the list of things I want to know about those genes:

>>> query.add_view("Gene.name")
>>> query.add_view("Gene.length")
>>> query.add_view("Gene.proteins.sequence.length")

Note I can freely mix attributes and references, as long as every view ends in an attribute (a meaningful value). As a short-cut I can also write:

>>> query.add_view("Gene.name", "Gene.length", "Gene.proteins.sequence.length")

or:

>>> query.add_view("Gene.name Gene.length Gene.proteins.sequence.length")

They are all equivalent.

Now I can add my constraints. As, we mentioned, I have information about an organism, so:

>>> query.add_constraint("Gene.organism.name", "=", "D. melanogaster")

If I run this query, I will get literally millions of results - it needs to be filtered further:

>>> query.add_constraint("Gene.proteins.sequence.length", "<", 500)

If that doesn't restrict things enough I can add more filters:

>>> query.add_constraint("Gene.symbol", "ONE OF", ["eve", "zen", "h"])

Now I am guaranteed to get only information on genes I am interested in.

Note, though, that because I have included the link (or "join") from Gene -> Protein, this, by default, means that I only want genes that have protein information associated with them. If in fact I want information on all genes, and just want to know the protein information if it is available, then I can specify that with:

>>> query.add_join("Gene.proteins", "OUTER")

And if perhaps my query is not as simple as a strict cumulative filter, but I want all D. mel genes that EITHER have a short protein sequence OR come from one of my favourite genes (as unlikely as that sounds), I can specify the logic for that too:

>>> query.set_logic("A and (B or C)")

Each letter refers to one of the constraints - the codes are assigned in the order you add the constraints. If you want to be absolutely certain about the constraints you mean, you can use the constraint objects themselves:

>>> gene_is_eve = query.add_constraint("Gene.symbol", "=", "eve")
>>> gene_is_zen = query.add_constraint("Gene.symbol", "=", "zne")
>>>
>>> query.set_logic(gene_is_eve | gene_is_zen)

By default the logic is a straight cumulative filter (ie: A and B and C and D and ...)

Putting it all together:

>>> query.add_view("Gene.name", "Gene.length", "Gene.proteins.sequence.length")
>>> query.add_constraint("Gene.organism.name", "=", "D. melanogaster")
>>> query.add_constraint("Gene.proteins.sequence.length", "<", 500)
>>> query.add_constraint("Gene.symbol", "ONE OF", ["eve", "zen", "h"])
>>> query.add_join("Gene.proteins", "OUTER")
>>> query.set_logic("A and (B or C)")

And the query is defined.

Result Processing

calling ".results()" on a query will return an iterator of rows, where each row is a list of values, one for each field in the output columns (view) you selected.

To process these simply use normal iteration syntax:

>>> for row in query.results():
...     for column in row:
...         do_something(column)

Here each row will have a gene name, a gene length, and a sequence length, eg:

>>> print row
["even skipped", "1359", "376"]

To make that clearer, you can ask for a dictionary instead of a list:

>>> for row in query.result("dict")
...       print row
{"Gene.name":"even skipped","Gene.length":"1359","Gene.proteins.sequence.length":"376"}

Which means you can refer to columns by name:

>>> for row in query.result("dict")
...     print "name is", row["Gene.name"]
...     print "length is", row["Gene.length"]

If you just want the raw results, for printing to a file, or for piping to another program, you can request strings instead:

>>> for row in query.result("string")
...     print(row)

Getting us to Generate your Code

Not that you have to actually write any of this! The webapp will happily generate the code for any query (and template) you can build in it. A good way to get started is to use the webapp to generate your code, and then run it as scripts to speed up your queries. You can always tinker with and edit the scripts you download.

To get generated queries, look for the "python" link at the bottom of query-builder and template form pages, it looks a bit like this:

 . +=====================================+=============
   |                                     |
   |    Perl  |  Python  |  Java [Help]  |
   |                                     |
   +==============================================
Instance Methods [hide private]
 
__init__(self, model, service=None, validate=True)
Construct a new query for making database queries against an InterMine data warehouse.
source code
 
verify(self)
Invalid queries will fail to run, and it is not always obvious why.
source code
 
add_view(self, *paths)
example:
source code
 
verify_views(self, views=None)
This method checks to see if the views:
source code
intermine.constraints.Constraint
add_constraint(self, *args, **kwargs)
example:
source code
 
verify_constraint_paths(self, cons=None)
This method will check the path attribute of each constraint.
source code
intermine.constraints.CodedConstraint
get_constraint(self, code)
Returns the constraint with the given code, if if exists.
source code
intermine.pathfeatures.Join
add_join(self, *args, **kwargs)
example:
source code
 
verify_join_paths(self, joins=None)
Joins must have valid paths, and they must refer to references.
source code
intermine.pathfeatures.PathDescription
add_path_description(self, *args, **kwargs)
example:
source code
 
verify_pd_paths(self, pds=None)
Checks for consistency with the data model
source code
intermine.constraints.LogicGroup
get_logic(self)
This returns the up to date logic expression.
source code
 
set_logic(self, value)
example:
source code
 
validate_logic(self, logic=None)
Attempts to validate the logic by checking that every coded_constraint is included at least once
source code
intermine.pathfeatures.SortOrderList
get_default_sort_order(self)
This method is called to determine the sort order if none is specified
source code
intermine.pathfeatures.SortOrderList
get_sort_order(self)
This method returns the sort order if set, otherwise it returns the default sort order
source code
 
add_sort_order(self, path, direction='asc')
example:
source code
 
validate_sort_order(self, *so_elems)
Checks that the sort order paths are:
source code
dict(string, string)
get_subclass_dict(self)
This method returns a mapping of classes used by the model for assessing whether certain paths are valid.
source code
intermine.webservice.ResultIterator
results(self, row='list')
Usage:
source code
str
get_results_path(self)
Query.get_results_path() -> str
source code
list
get_results_list(self, rowformat='list')
This method is a shortcut so that you do not have to do a list comprehension yourself on the iterator that is normally returned.
source code
list
children(self)
This method is used during the serialisation of queries to xml.
source code
dict
to_query_params(self)
The query is responsible for producing its own query parameters.
source code
xml.minidom.Node
to_Node(self)
This is an intermediate step in the creation of the xml serialised version of the query.
source code
string
to_xml(self)
This method serialises the current state of the query to an xml string, suitable for storing, or sending over the internet to the webservice.
source code
string
to_formatted_xml(self)
This method serialises the current state of the query to an xml string, suitable for storing, or sending over the internet to the webservice, only more readably.
source code
 
clone(self)
This method will produce a clone that is independent, and can be altered without affecting the original, but starts off with the exact same state as it.
source code

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __sizeof__, __str__, __subclasshook__

Class Methods [hide private]
Query
from_xml(cls, xml, *args, **kwargs)
This method is used to instantiate serialised queries.
source code
Properties [hide private]
list(Constraint) constraints
Query.constraints → list(intermine.constraints.Constraint)
list(intermine.constraints.CodedConstraint) coded_constraints
Query.coded_constraints → list(intermine.constraints.CodedConstraint)

Inherited from object: __class__

Method Details [hide private]

__init__(self, model, service=None, validate=True)
(Constructor)

source code 

Construct a new Query

Construct a new query for making database queries against an InterMine data warehouse.

Normally you would not need to use this constructor directly, but instead use the factory method on intermine.webservice.Service, which will handle construction for you.

Parameters:
  • model - an instance of intermine.model.Model. Required
  • service - an instance of l{intermine.service.Service}. Optional, but you will not be able to make requests without one.
  • validate - a boolean - defaults to True. If set to false, the query will not try and validate itself. You should not set this to false.
Overrides: object.__init__

from_xml(cls, xml, *args, **kwargs)
Class Method

source code 

Deserialise a query serialised to XML

This method is used to instantiate serialised queries. It is used by intermine.webservice.Service objects to instantiate Template objects and it can be used to read in queries you have saved to a file.

Parameters:
  • xml - The xml as a file name, url, or string
Returns: Query
Raises:

verify(self)

source code 

Validate the query

Invalid queries will fail to run, and it is not always obvious why. The validation routine checks to see that the query will not cause errors on execution, and tries to provide informative error messages.

This method is called immediately after a query is fully deserialised.

Raises:

add_view(self, *paths)

source code 

Add one or more views to the list of output columns

example:

   query.add_view("Gene.name Gene.organism.name")

This is the main method for adding views to the list of output columns. As well as appending views, it will also split a single, space or comma delimited string into multiple paths, and flatten out lists, or any combination. It will also immediately try to validate the views.

Output columns must be valid paths according to the data model, and they must represent attributes of tables

See Also:
intermine.model.Model, intermine.model.Path, intermine.model.Attribute

verify_views(self, views=None)

source code 

Check to see if the views given are valid

This method checks to see if the views:

  • are valid according to the model
  • represent attributes
Raises:

add_constraint(self, *args, **kwargs)

source code 

Add a constraint (filter on records)

example:

   query.add_constraint("Gene.symbol", "=", "zen")

This method will try to make a constraint from the arguments given, trying each of the classes it knows of in turn to see if they accept the arguments. This allows you to add constraints of different types without having to know or care what their classes or implementation details are. All constraints derive from intermine.constraints.Constraint, and they all have a path attribute, but are otherwise diverse.

Before adding the constraint to the query, this method will also try to check that the constraint is valid by calling Query.verify_constraint_paths()

Returns: intermine.constraints.Constraint

verify_constraint_paths(self, cons=None)

source code 

Check that the constraints are valid

This method will check the path attribute of each constraint. In addition it will:

  • Check that BinaryConstraints and MultiConstraints have an Attribute as their path
  • Check that TernaryConstraints have a Reference as theirs
  • Check that SubClassConstraints have a correct subclass relationship
  • Check that LoopConstraints have a valid loopPath, of a compatible type
  • Check that ListConstraints refer to an object
Parameters:
  • cons - The constraints to check (defaults to all constraints on the query)
Raises:

get_constraint(self, code)

source code 

Returns the constraint with the given code

Returns the constraint with the given code, if if exists. If no such constraint exists, it throws a ConstraintError

Returns: intermine.constraints.CodedConstraint
the constraint corresponding to the given code

add_join(self, *args, **kwargs)

source code 

Add a join statement to the query

example:

query.add_join("Gene.proteins", "OUTER")

A join statement is used to determine if references should restrict the result set by only including those references exist. For example, if one had a query with the view:

 "Gene.name", "Gene.proteins.name"

Then in the normal case (that of an INNER join), we would only get Genes that also have at least one protein that they reference. Simply by asking for this output column you are placing a restriction on the information you get back.

If in fact you wanted all genes, regardless of whether they had proteins associated with them or not, but if they did you would rather like to know _what_ proteins, then you need to specify this reference to be an OUTER join:

query.add_join("Gene.proteins", "OUTER")

Now you will get many more rows of results, some of which will have "null" values where the protein name would have been,

This method will also attempt to validate the join by calling Query.verify_join_paths(). Joins must have a valid path, the style can be either INNER or OUTER (defaults to OUTER, as the user does not need to specify inner joins, since all references start out as inner joins), and the path must be a reference.

Returns: intermine.pathfeatures.Join
Raises:
  • ModelError - if the path is invalid
  • TypeError - if the join style is invalid

verify_join_paths(self, joins=None)

source code 

Check that the joins are valid

Joins must have valid paths, and they must refer to references.

Raises:

add_path_description(self, *args, **kwargs)

source code 

Add a path description to the query

example:

   query.add_path_description("Gene.symbol", "The symbol for this gene")

If you wish you can add annotations to your query that describe what the component paths are and what they do - this is only really useful if you plan to keep your query (perhaps as xml) or store it as a template.

Returns: intermine.pathfeatures.PathDescription

verify_pd_paths(self, pds=None)

source code 

Check that the path of the path description is valid

Checks for consistency with the data model

Raises:

get_logic(self)

source code 

Returns the logic expression for the query

This returns the up to date logic expression. The default value is the representation of all coded constraints and'ed together.

The LogicGroup object stringifies to a string that can be parsed to obtain itself (eg: "A and (B or C or D)").

Returns: intermine.constraints.LogicGroup

set_logic(self, value)

source code 

Sets the Logic given the appropriate input

example:

 Query.set_logic("A and (B or C)")

This sets the logic to the appropriate value. If the value is already a LogicGroup, it is accepted, otherwise the string is tokenised and parsed.

The logic is then validated with a call to validate_logic()

raise LogicParseError: if there is a syntax error in the logic

validate_logic(self, logic=None)

source code 

Validates the query logic

Attempts to validate the logic by checking that every coded_constraint is included at least once

Raises:
  • QueryError - if not every coded constraint is represented

get_default_sort_order(self)

source code 

Gets the sort order when none has been specified

This method is called to determine the sort order if none is specified

Returns: intermine.pathfeatures.SortOrderList
Raises:

get_sort_order(self)

source code 

Return a sort order for the query

This method returns the sort order if set, otherwise it returns the default sort order

Returns: intermine.pathfeatures.SortOrderList
Raises:

add_sort_order(self, path, direction='asc')

source code 

Adds a sort order to the query

example:

 Query.add_sort_order("Gene.name", "DESC")

This method adds a sort order to the query. A query can have multiple sort orders, which are assessed in sequence.

If a query has two sort-orders, for example, the first being "Gene.organism.name asc", and the second being "Gene.name desc", you would have the list of genes grouped by organism, with the lists within those groupings in reverse alphabetical order by gene name.

This method will try to validate the sort order by calling validate_sort_order()

validate_sort_order(self, *so_elems)

source code 

Check the validity of the sort order

Checks that the sort order paths are:

  • valid paths
  • in the view
Raises:

get_subclass_dict(self)

source code 

Return the current mapping of class to subclass

This method returns a mapping of classes used by the model for assessing whether certain paths are valid. For intance, if you subclass MicroArrayResult to be FlyAtlasResult, you can refer to the .presentCall attributes of fly atlas results. MicroArrayResults do not have this attribute, and a path such as:

 Gene.microArrayResult.presentCall

would be marked as invalid unless the dictionary is provided.

Users most likely will not need to ever call this method.

Returns: dict(string, string)

results(self, row='list')

source code 

Return an iterator over result rows

Usage:

 for row in query.results():
   do_sth_with(row)
Parameters:
  • row (string) - the format for the row. Defaults to "list". Valid options are "dict", "list", "jsonrows", "jsonobject", "tsv", "csv".
Returns: intermine.webservice.ResultIterator
Raises:

get_results_path(self)

source code 

Returns the path section pointing to the REST resource

Query.get_results_path() -> str

Internally, this just calls a constant property in intermine.service.Service

Returns: str

get_results_list(self, rowformat='list')

source code 

Get a list of result rows

This method is a shortcut so that you do not have to do a list comprehension yourself on the iterator that is normally returned. If you have a very large result set (in the millions of rows) you will not want to have the whole list in memory at once, but there may be other circumstances when you might want to keep the whole list in one place.

Parameters:
  • rowformat (string) - the format for the row. Defaults to "list". Valid options are "dict", "list", "jsonrows", "jsonobject", "tsv", "csv".
Returns: list
Raises:

children(self)

source code 

Returns the child objects of the query

This method is used during the serialisation of queries to xml. It is unlikely you will need access to this as a whole. Consider using "path_descriptions", "joins", "constraints" instead

Returns: list
the child element of this query
See Also:
Query.path_descriptions, Query.joins, Query.constraints

to_query_params(self)

source code 

Returns the parameters to be passed to the webservice

The query is responsible for producing its own query parameters. These consist simply of:

  • query: the xml representation of the query
Returns: dict

to_Node(self)

source code 

Returns a DOM node representing the query

This is an intermediate step in the creation of the xml serialised version of the query. You probably won't need to call this directly.

Returns: xml.minidom.Node

to_xml(self)

source code 

Return an XML serialisation of the query

This method serialises the current state of the query to an xml string, suitable for storing, or sending over the internet to the webservice.

Returns: string
the serialised xml string

to_formatted_xml(self)

source code 

Return a readable XML serialisation of the query

This method serialises the current state of the query to an xml string, suitable for storing, or sending over the internet to the webservice, only more readably.

Returns: string
the serialised xml string

clone(self)

source code 

Performs a deep clone

This method will produce a clone that is independent, and can be altered without affecting the original, but starts off with the exact same state as it.

The only shared elements should be the model and the service, which are shared by all queries that refer to the same webservice.

Returns:
same class as caller

Property Details [hide private]

constraints

Returns the constraints of the query

Query.constraints → list(intermine.constraints.Constraint)

Constraints are returned in the order of their code (normally the order they were added to the query) and with any subclass contraints at the end.

Get Method:
unreachable.constraints(self) - Query.constraints → list(intermine.constraints.Constraint)
Type:
list(Constraint)

coded_constraints

Returns the list of constraints that have a code

Query.coded_constraints → list(intermine.constraints.CodedConstraint)

This returns an up to date list of the constraints that can be used in a logic expression. The only kind of constraint that this excludes, at present, is SubClassConstraints

Get Method:
unreachable.coded_constraints(self) - Query.coded_constraints → list(intermine.constraints.CodedConstraint)
Type:
list(intermine.constraints.CodedConstraint)