Class Query
source code
object --+
|
Query
- Known Subclasses:
-
A Class representing a structured database query
Objects of this class have properties that model the attributes of
the query, and methods for performing the request.
SYNOPSIS
example:
>>> service = Service("http://www.flymine.org/query/service")
>>> query = service.new_query()
>>>
>>> query.add_view("Gene.symbol", "Gene.pathways.name", "Gene.proteins.symbol")
>>> query.add_sort_order("Gene.pathways.name")
>>>
>>> query.add_constraint("Gene", "LOOKUP", "eve")
>>> query.add_constraint("Gene.pathways.name", "=", "Phosphate*")
>>>
>>> query.set_logic("A or B")
>>>
>>> for row in query.results():
... handle_row(row)
Query objects represent structured requests for information over
the database housed at the datawarehouse whose webservice you are
querying. They utilise some of the concepts of relational databases,
within an object-related ORM context. If you don't know what that
means, don't worry: you don't need to write SQL, and the queries will
be fast.
PRINCIPLES
The data model represents tables in the databases as classes, with
records within tables as instances of that class. The columns of the
database are the fields of that object:
The Gene table - showing two records/objects
+---------------------------------------------------+
| id | symbol | length | cyto-location | organism |
+----------------------------------------+----------+
| 01 | eve | 1539 | 46C10-46C10 | 01 |
+----------------------------------------+----------+
| 02 | zen | 1331 | 84A5-84A5 | 01 |
+----------------------------------------+----------+
...
The organism table - showing one record/object
+----------------------------------+
| id | name | taxon id |
+----------------------------------+
| 01 | D. melanogaster | 7227 |
+----------------------------------+
Columns that contain a meaningful value are known as 'attributes'
(in the tables above, that is everything except the id columns). The
other columns (such as "organism" in the gene table) are
ones that reference records of other tables (ie. other objects), and
are called references. You can refer to any field in any class, that
has a connection, however tenuous, with a table, by using dotted path
notation:
Gene.organism.name -> the name column in the organism table, referenced by a record in the gene table
These paths, and the connections between records and tables they
represent, are the basis for the structure of InterMine queries.
THE STUCTURE OF A QUERY
A query has two principle sets of properties:
-
its view: the set of output columns
-
its constraints: the set of rules for what to include
A query must have at least one output column in its view, but
constraints are optional - if you don't include any, you will get
back every record from the table (every object of that type)
In addition, the query must be coherent: if you have information
about an organism, and you want a list of genes, then the
"Gene" table should be the basis for your query, and as
such the Gene class, which represents this table, should be the root
of all the paths that appear in it:
So, to take a simple example:
I have an organism name, and I want a list of genes:
The view is the list of things I want to know about those
genes:
>>> query.add_view("Gene.name")
>>> query.add_view("Gene.length")
>>> query.add_view("Gene.proteins.sequence.length")
Note I can freely mix attributes and references, as long as every
view ends in an attribute (a meaningful value). As a short-cut I can
also write:
>>> query.add_view("Gene.name", "Gene.length", "Gene.proteins.sequence.length")
or:
>>> query.add_view("Gene.name Gene.length Gene.proteins.sequence.length")
They are all equivalent.
Now I can add my constraints. As, we mentioned, I have information
about an organism, so:
>>> query.add_constraint("Gene.organism.name", "=", "D. melanogaster")
If I run this query, I will get literally millions of results - it
needs to be filtered further:
>>> query.add_constraint("Gene.proteins.sequence.length", "<", 500)
If that doesn't restrict things enough I can add more filters:
>>> query.add_constraint("Gene.symbol", "ONE OF", ["eve", "zen", "h"])
Now I am guaranteed to get only information on genes I am
interested in.
Note, though, that because I have included the link (or
"join") from Gene -> Protein, this, by default, means
that I only want genes that have protein information associated with
them. If in fact I want information on all genes, and just want to
know the protein information if it is available, then I can specify
that with:
>>> query.add_join("Gene.proteins", "OUTER")
And if perhaps my query is not as simple as a strict cumulative
filter, but I want all D. mel genes that EITHER have a short protein
sequence OR come from one of my favourite genes (as unlikely as that
sounds), I can specify the logic for that too:
>>> query.set_logic("A and (B or C)")
Each letter refers to one of the constraints - the codes are
assigned in the order you add the constraints. If you want to be
absolutely certain about the constraints you mean, you can use the
constraint objects themselves:
>>> gene_is_eve = query.add_constraint("Gene.symbol", "=", "eve")
>>> gene_is_zen = query.add_constraint("Gene.symbol", "=", "zne")
>>>
>>> query.set_logic(gene_is_eve | gene_is_zen)
By default the logic is a straight cumulative filter (ie: A and B
and C and D and ...)
Putting it all together:
>>> query.add_view("Gene.name", "Gene.length", "Gene.proteins.sequence.length")
>>> query.add_constraint("Gene.organism.name", "=", "D. melanogaster")
>>> query.add_constraint("Gene.proteins.sequence.length", "<", 500)
>>> query.add_constraint("Gene.symbol", "ONE OF", ["eve", "zen", "h"])
>>> query.add_join("Gene.proteins", "OUTER")
>>> query.set_logic("A and (B or C)")
And the query is defined.
Result Processing
calling ".results()" on a query will return an iterator
of rows, where each row is a list of values, one for each field in
the output columns (view) you selected.
To process these simply use normal iteration syntax:
>>> for row in query.results():
... for column in row:
... do_something(column)
Here each row will have a gene name, a gene length, and a sequence
length, eg:
>>> print row
["even skipped", "1359", "376"]
To make that clearer, you can ask for a dictionary instead of a
list:
>>> for row in query.result("dict")
... print row
{"Gene.name":"even skipped","Gene.length":"1359","Gene.proteins.sequence.length":"376"}
Which means you can refer to columns by name:
>>> for row in query.result("dict")
... print "name is", row["Gene.name"]
... print "length is", row["Gene.length"]
If you just want the raw results, for printing to a file, or for
piping to another program, you can request strings instead:
>>> for row in query.result("string")
... print(row)
Getting us to Generate your Code
Not that you have to actually write any of this! The webapp will
happily generate the code for any query (and template) you can build
in it. A good way to get started is to use the webapp to generate
your code, and then run it as scripts to speed up your queries. You
can always tinker with and edit the scripts you download.
To get generated queries, look for the "python" link at
the bottom of query-builder and template form pages, it looks a bit
like this:
. +=====================================+=============
| |
| Perl | Python | Java [Help] |
| |
+==============================================
|
__init__(self,
model,
service=None,
validate=True)
Construct a new query for making database queries against an
InterMine data warehouse. |
source code
|
|
|
verify(self)
Invalid queries will fail to run, and it is not always obvious why. |
source code
|
|
|
|
|
|
intermine.constraints.Constraint
|
|
|
|
intermine.constraints.CodedConstraint
|
|
intermine.pathfeatures.Join
|
|
|
|
intermine.pathfeatures.PathDescription
|
|
|
|
intermine.constraints.LogicGroup
|
|
|
|
|
validate_logic(self,
logic=None)
Attempts to validate the logic by checking that every
coded_constraint is included at least once |
source code
|
|
intermine.pathfeatures.SortOrderList
|
|
intermine.pathfeatures.SortOrderList
|
|
|
|
|
|
dict(string, string)
|
get_subclass_dict(self)
This method returns a mapping of classes used by the model for
assessing whether certain paths are valid. |
source code
|
|
intermine.webservice.ResultIterator
|
|
str
|
|
list
|
get_results_list(self,
rowformat=' list ' )
This method is a shortcut so that you do not have to do a list
comprehension yourself on the iterator that is normally returned. |
source code
|
|
list
|
|
dict
|
|
xml.minidom.Node
|
to_Node(self)
This is an intermediate step in the creation of the xml serialised
version of the query. |
source code
|
|
string
|
to_xml(self)
This method serialises the current state of the query to an xml
string, suitable for storing, or sending over the internet to the
webservice. |
source code
|
|
string
|
to_formatted_xml(self)
This method serialises the current state of the query to an xml
string, suitable for storing, or sending over the internet to the
webservice, only more readably. |
source code
|
|
|
clone(self)
This method will produce a clone that is independent, and can be
altered without affecting the original, but starts off with the exact
same state as it. |
source code
|
|
Inherited from object :
__delattr__ ,
__format__ ,
__getattribute__ ,
__hash__ ,
__new__ ,
__reduce__ ,
__reduce_ex__ ,
__repr__ ,
__setattr__ ,
__sizeof__ ,
__str__ ,
__subclasshook__
|
__init__(self,
model,
service=None,
validate=True)
(Constructor)
| source code
|
Construct a new Query
Construct a new query for making database queries against an
InterMine data warehouse.
Normally you would not need to use this constructor directly, but
instead use the factory method on intermine.webservice.Service, which
will handle construction for you.
- Parameters:
model - an instance of intermine.model.Model. Required
service - an instance of l{intermine.service.Service}. Optional, but you
will not be able to make requests without one.
validate - a boolean - defaults to True. If set to false, the query will not
try and validate itself. You should not set this to false.
- Overrides:
object.__init__
|
from_xml(cls,
xml,
*args,
**kwargs)
Class Method
| source code
|
Deserialise a query serialised to XML
This method is used to instantiate serialised queries. It is used by
intermine.webservice.Service objects to instantiate Template objects
and it can be used to read in queries you have saved to a file.
- Parameters:
xml - The xml as a file name, url, or string
- Returns: Query
- Raises:
|
Validate the query
Invalid queries will fail to run, and it is not always obvious why.
The validation routine checks to see that the query will not cause
errors on execution, and tries to provide informative error
messages.
This method is called immediately after a query is fully
deserialised.
- Raises:
|
Add one or more views to the list of output columns
example:
query.add_view("Gene.name Gene.organism.name")
This is the main method for adding views to the list of output
columns. As well as appending views, it will also split a single, space
or comma delimited string into multiple paths, and flatten out lists,
or any combination. It will also immediately try to validate the
views.
Output columns must be valid paths according to the data model, and
they must represent attributes of tables
- See Also:
-
intermine.model.Model,
intermine.model.Path,
intermine.model.Attribute
|
Check to see if the views given are valid
This method checks to see if the views:
-
are valid according to the model
-
represent attributes
- Raises:
|
Add a constraint (filter on records)
example:
query.add_constraint("Gene.symbol", "=", "zen")
This method will try to make a constraint from the arguments given,
trying each of the classes it knows of in turn to see if they accept
the arguments. This allows you to add constraints of different types
without having to know or care what their classes or implementation
details are. All constraints derive from
intermine.constraints.Constraint, and they all have a path attribute,
but are otherwise diverse.
Before adding the constraint to the query, this method will also try
to check that the constraint is valid by calling
Query.verify_constraint_paths()
- Returns: intermine.constraints.Constraint
|
Check that the constraints are valid
This method will check the path attribute of each constraint. In
addition it will:
-
Check that BinaryConstraints and MultiConstraints have an Attribute
as their path
-
Check that TernaryConstraints have a Reference as theirs
-
Check that SubClassConstraints have a correct subclass relationship
-
Check that LoopConstraints have a valid loopPath, of a compatible
type
-
Check that ListConstraints refer to an object
- Parameters:
cons - The constraints to check (defaults to all constraints on the
query)
- Raises:
|
Returns the constraint with the given code
Returns the constraint with the given code, if if exists. If no such
constraint exists, it throws a ConstraintError
- Returns: intermine.constraints.CodedConstraint
- the constraint corresponding to the given code
|
Add a join statement to the query
example:
query.add_join("Gene.proteins", "OUTER")
A join statement is used to determine if references should restrict
the result set by only including those references exist. For example,
if one had a query with the view:
"Gene.name", "Gene.proteins.name"
Then in the normal case (that of an INNER join), we would only get
Genes that also have at least one protein that they reference. Simply
by asking for this output column you are placing a restriction on the
information you get back.
If in fact you wanted all genes, regardless of whether they had
proteins associated with them or not, but if they did you would rather
like to know _what_ proteins, then you need to specify this reference
to be an OUTER join:
query.add_join("Gene.proteins", "OUTER")
Now you will get many more rows of results, some of which will have
"null" values where the protein name would have been,
This method will also attempt to validate the join by calling
Query.verify_join_paths(). Joins must have a valid path, the style can
be either INNER or OUTER (defaults to OUTER, as the user does not need
to specify inner joins, since all references start out as inner joins),
and the path must be a reference.
- Returns: intermine.pathfeatures.Join
- Raises:
ModelError - if the path is invalid
TypeError - if the join style is invalid
|
Check that the joins are valid
Joins must have valid paths, and they must refer to references.
- Raises:
|
add_path_description(self,
*args,
**kwargs)
| source code
|
Add a path description to the query
example:
query.add_path_description("Gene.symbol", "The symbol for this gene")
If you wish you can add annotations to your query that describe what
the component paths are and what they do - this is only really useful
if you plan to keep your query (perhaps as xml) or store it as a
template.
- Returns: intermine.pathfeatures.PathDescription
|
Check that the path of the path description is valid
Checks for consistency with the data model
- Raises:
|
Returns the logic expression for the query
This returns the up to date logic expression. The default value is
the representation of all coded constraints and'ed together.
The LogicGroup object stringifies to a string that can be parsed to
obtain itself (eg: "A and (B or C or D)").
- Returns: intermine.constraints.LogicGroup
|
Sets the Logic given the appropriate input
example:
Query.set_logic("A and (B or C)")
This sets the logic to the appropriate value. If the value is
already a LogicGroup, it is accepted, otherwise the string is tokenised
and parsed.
The logic is then validated with a call to validate_logic()
raise LogicParseError: if there is a syntax error in the logic
|
Validates the query logic
Attempts to validate the logic by checking that every
coded_constraint is included at least once
- Raises:
QueryError - if not every coded constraint is represented
|
add_sort_order(self,
path,
direction=' asc ' )
| source code
|
Adds a sort order to the query
example:
Query.add_sort_order("Gene.name", "DESC")
This method adds a sort order to the query. A query can have
multiple sort orders, which are assessed in sequence.
If a query has two sort-orders, for example, the first being
"Gene.organism.name asc", and the second being
"Gene.name desc", you would have the list of genes grouped by
organism, with the lists within those groupings in reverse alphabetical
order by gene name.
This method will try to validate the sort order by calling
validate_sort_order()
|
Check the validity of the sort order
Checks that the sort order paths are:
- Raises:
|
Return the current mapping of class to subclass
This method returns a mapping of classes used by the model for
assessing whether certain paths are valid. For intance, if you subclass
MicroArrayResult to be FlyAtlasResult, you can refer to the
.presentCall attributes of fly atlas results. MicroArrayResults do not
have this attribute, and a path such as:
Gene.microArrayResult.presentCall
would be marked as invalid unless the dictionary is provided.
Users most likely will not need to ever call this method.
- Returns: dict(string, string)
|
Return an iterator over result rows
Usage:
for row in query.results():
do_sth_with(row)
- Parameters:
row (string) - the format for the row. Defaults to "list". Valid
options are "dict", "list",
"jsonrows", "jsonobject", "tsv",
"csv".
- Returns: intermine.webservice.ResultIterator
- Raises:
|
Returns the path section pointing to the REST resource
Query.get_results_path() -> str
Internally, this just calls a constant property in
intermine.service.Service
- Returns: str
|
Get a list of result rows
This method is a shortcut so that you do not have to do a list
comprehension yourself on the iterator that is normally returned. If
you have a very large result set (in the millions of rows) you will not
want to have the whole list in memory at once, but there may be other
circumstances when you might want to keep the whole list in one
place.
- Parameters:
rowformat (string) - the format for the row. Defaults to "list". Valid
options are "dict", "list",
"jsonrows", "jsonobject", "tsv",
"csv".
- Returns: list
- Raises:
|
Returns the child objects of the query
This method is used during the serialisation of queries to xml. It
is unlikely you will need access to this as a whole. Consider using
"path_descriptions", "joins",
"constraints" instead
- Returns: list
- the child element of this query
- See Also:
-
Query.path_descriptions,
Query.joins,
Query.constraints
|
Returns the parameters to be passed to the webservice
The query is responsible for producing its own query parameters.
These consist simply of:
-
query: the xml representation of the query
- Returns: dict
|
Returns a DOM node representing the query
This is an intermediate step in the creation of the xml serialised
version of the query. You probably won't need to call this
directly.
- Returns: xml.minidom.Node
|
Return an XML serialisation of the query
This method serialises the current state of the query to an xml
string, suitable for storing, or sending over the internet to the
webservice.
- Returns: string
- the serialised xml string
|
Return a readable XML serialisation of the query
This method serialises the current state of the query to an xml
string, suitable for storing, or sending over the internet to the
webservice, only more readably.
- Returns: string
- the serialised xml string
|
Performs a deep clone
This method will produce a clone that is independent, and can be
altered without affecting the original, but starts off with the exact
same state as it.
The only shared elements should be the model and the service, which
are shared by all queries that refer to the same webservice.
- Returns:
- same class as caller
|
constraints
Returns the constraints of the query
Query.constraints → list(intermine.constraints.Constraint)
Constraints are returned in the order of their code (normally the
order they were added to the query) and with any subclass contraints at
the end.
- Get Method:
- unreachable.constraints(self)
- Query.constraints → list(intermine.constraints.Constraint)
- Type:
- list(Constraint)
|
coded_constraints
Returns the list of constraints that have a code
Query.coded_constraints →
list(intermine.constraints.CodedConstraint)
This returns an up to date list of the constraints that can be used
in a logic expression. The only kind of constraint that this excludes,
at present, is SubClassConstraints
- Get Method:
- unreachable.coded_constraints(self)
- Query.coded_constraints → list(intermine.constraints.CodedConstraint)
- Type:
- list(intermine.constraints.CodedConstraint)
|