Class Query
source code
object --+
|
Query
- Known Subclasses:
-
A Class representing a structured database query
================================================
Objects of this class have properties that model the
attributes of the query, and methods for performing
the request.
SYNOPSIS
--------
example:
>>> service = Service("http://www.flymine.org/query/service")
>>> query = service.new_query()
>>>
>>> query.add_view("Gene.symbol", "Gene.pathways.name", "Gene.proteins.symbol")
>>> query.add_sort_order("Gene.pathways.name")
>>>
>>> query.add_constraint("Gene", "LOOKUP", "eve")
>>> query.add_constraint("Gene.pathways.name", "=", "Phosphate*")
>>>
>>> query.set_logic("A or B")
>>>
>>> for row in query.results():
... handle_row(row)
OR, using an SQL style DSL:
>>> s = Service("www.flymine.org/query")
>>> query = s.query("Gene"). ... select("*", "pathways.*"). ... where("symbol", "=", "H"). ... outerjoin("pathways"). ... order_by("symbol")
>>> for row in query.results():
... handle_row(row)
OR, for a more SQL-alchemy, ORM style:
>>>
>>> for gene in s.query(s.model.Gene).filter(s.model.Gene.symbol == ["zen", "H", "eve"]).add_columns(s.model.Gene.alleles):
... handle(gene)
Query objects represent structured requests for information over the database
housed at the datawarehouse whose webservice you are querying. They utilise
some of the concepts of relational databases, within an object-related
ORM context. If you don't know what that means, don't worry: you
don't need to write SQL, and the queries will be fast.
To make things slightly more familiar to those with knowledge of SQL, some syntactical
sugar is provided to make constructing queries a bit more recognisable.
PRINCIPLES
----------
The data model represents tables in the databases as classes, with records
within tables as instances of that class. The columns of the database are the
fields of that object::
The Gene table - showing two records/objects
+---------------------------------------------------+
| id | symbol | length | cyto-location | organism |
+----------------------------------------+----------+
| 01 | eve | 1539 | 46C10-46C10 | 01 |
+----------------------------------------+----------+
| 02 | zen | 1331 | 84A5-84A5 | 01 |
+----------------------------------------+----------+
...
The organism table - showing one record/object
+----------------------------------+
| id | name | taxon id |
+----------------------------------+
| 01 | D. melanogaster | 7227 |
+----------------------------------+
Columns that contain a meaningful value are known as 'attributes' (in the tables above, that is
everything except the id columns). The other columns (such as "organism" in the gene table)
are ones that reference records of other tables (ie. other objects), and are called
references. You can refer to any field in any class, that has a connection,
however tenuous, with a table, by using dotted path notation::
Gene.organism.name -> the name column in the organism table, referenced by a record in the gene table
These paths, and the connections between records and tables they represent,
are the basis for the structure of InterMine queries.
THE STUCTURE OF A QUERY
-----------------------
A query has two principle sets of properties:
- its view: the set of output columns
- its constraints: the set of rules for what to include
A query must have at least one output column in its view, but constraints
are optional - if you don't include any, you will get back every record
from the table (every object of that type)
In addition, the query must be coherent: if you have information about
an organism, and you want a list of genes, then the "Gene" table
should be the basis for your query, and as such the Gene class, which
represents this table, should be the root of all the paths that appear in it:
So, to take a simple example::
I have an organism name, and I want a list of genes:
The view is the list of things I want to know about those genes:
>>> query.add_view("Gene.name")
>>> query.add_view("Gene.length")
>>> query.add_view("Gene.proteins.sequence.length")
Note I can freely mix attributes and references, as long as every view ends in
an attribute (a meaningful value). As a short-cut I can also write:
>>> query.add_views("Gene.name", "Gene.length", "Gene.proteins.sequence.length")
or:
>>> query.add_views("Gene.name Gene.length Gene.proteins.sequence.length")
They are all equivalent. You can also use common SQL style shortcuts such as "*" for all
attribute fields:
>>> query.add_views("Gene.*")
You can also use "select" as a synonymn for "add_view"
Now I can add my constraints. As, we mentioned, I have information about an organism, so:
>>> query.add_constraint("Gene.organism.name", "=", "D. melanogaster")
(note, here I can use "where" as a synonymn for "add_constraint")
If I run this query, I will get literally millions of results -
it needs to be filtered further:
>>> query.add_constraint("Gene.proteins.sequence.length", "<", 500)
If that doesn't restrict things enough I can add more filters:
>>> query.add_constraint("Gene.symbol", "ONE OF", ["eve", "zen", "h"])
Now I am guaranteed to get only information on genes I am interested in.
Note, though, that because I have included the link (or "join") from Gene -> Protein,
this, by default, means that I only want genes that have protein information associated
with them. If in fact I want information on all genes, and just want to know the
protein information if it is available, then I can specify that with:
>>> query.add_join("Gene.proteins", "OUTER")
And if perhaps my query is not as simple as a strict cumulative filter, but I want all
D. mel genes that EITHER have a short protein sequence OR come from one of my favourite genes
(as unlikely as that sounds), I can specify the logic for that too:
>>> query.set_logic("A and (B or C)")
Each letter refers to one of the constraints - the codes are assigned in the order you add
the constraints. If you want to be absolutely certain about the constraints you mean, you
can use the constraint objects themselves:
>>> gene_is_eve = query.add_constraint("Gene.symbol", "=", "eve")
>>> gene_is_zen = query.add_constraint("Gene.symbol", "=", "zne")
>>>
>>> query.set_logic(gene_is_eve | gene_is_zen)
By default the logic is a straight cumulative filter (ie: A and B and C and D and ...)
Putting it all together:
>>> query.add_view("Gene.name", "Gene.length", "Gene.proteins.sequence.length")
>>> query.add_constraint("Gene.organism.name", "=", "D. melanogaster")
>>> query.add_constraint("Gene.proteins.sequence.length", "<", 500)
>>> query.add_constraint("Gene.symbol", "ONE OF", ["eve", "zen", "h"])
>>> query.add_join("Gene.proteins", "OUTER")
>>> query.set_logic("A and (B or C)")
This can be made more concise and readable with a little DSL sugar:
>>> query = service.query("Gene")
>>> query.select("name", "length", "proteins.sequence.length"). ... where('organism.name' '=', 'D. melanogaster'). ... where("proteins.sequence.length", "<", 500). ... where('symbol', 'ONE OF', ['eve', 'h', 'zen']). ... outerjoin('proteins'). ... set_logic("A and (B or C)")
And the query is defined.
Result Processing
-----------------
calling ".results()" on a query will return an iterator of rows, where each row
is a ResultRow object, which can be treated as both a list and a dictionary.
Which means you can refer to columns by name:
>>> for row in query.results():
... print "name is %s" % (row["name"])
... print "length is %d" % (row["length"])
As well as using list indices:
>>> for row in query.results():
... print "The first column is %s" % (row[0])
Iterating over a row iterates over the cell values as a list:
>>> for row in query.results():
... for column in row:
... do_something(column)
Here each row will have a gene name, a gene length, and a sequence length, eg:
>>> print row.to_l
["even skipped", "1359", "376"]
To make that clearer, you can ask for a dictionary instead of a list:
>>> for row in query.result()
... print row.to_d
{"Gene.name":"even skipped","Gene.length":"1359","Gene.proteins.sequence.length":"376"}
If you just want the raw results, for printing to a file, or for piping to another program,
you can request strings instead:
>>> for row in query.result("string")
... print(row)
Getting us to Generate your Code
--------------------------------
Not that you have to actually write any of this! The webapp will happily
generate the code for any query (and template) you can build in it. A good way to get
started is to use the webapp to generate your code, and then run it as scripts
to speed up your queries. You can always tinker with and edit the scripts you download.
To get generated queries, look for the "python" link at the bottom of query-builder and
template form pages, it looks a bit like this::
. +=====================================+=============
| |
| Perl | Python | Java [Help] |
| |
+==============================================
|
__init__(self,
model,
service=None,
validate=True,
root=None)
Construct a new query for making database queries against an
InterMine data warehouse. |
source code
|
|
|
|
|
|
|
verify(self)
Invalid queries will fail to run, and it is not always obvious why. |
source code
|
|
|
|
|
|
|
clear_view(self)
Deletes all entries currently in the view list. |
source code
|
|
|
|
intermine.constraints.Constraint
|
|
|
where(self,
*args,
**kwargs)
In contrast to add_constraint, this method also adds all attributes
to the query if no view has been set, and returns self to support
method chaining. |
source code
|
|
|
|
|
|
intermine.constraints.CodedConstraint
|
|
intermine.pathfeatures.Join
|
|
|
outerjoin(self,
column)
Alias for add_join(column, "OUTER") |
source code
|
|
|
|
intermine.pathfeatures.PathDescription
|
|
|
|
intermine.constraints.LogicGroup
|
|
|
|
|
validate_logic(self,
logic=None)
Attempts to validate the logic by checking that every
coded_constraint is included at least once |
source code
|
|
intermine.pathfeatures.SortOrderList
|
|
intermine.pathfeatures.SortOrderList
|
|
|
|
|
|
dict(string, string)
|
get_subclass_dict(self)
This method returns a mapping of classes used by the model for
assessing whether certain paths are valid. |
source code
|
|
intermine.webservice.ResultIterator
|
|
|
one(self,
row=' rr ' )
Return one result, and raise an error if the result size is not 1 |
source code
|
|
|
first(self,
row=' rr ' ,
start=0)
Return the first result, or None if the results are empty |
source code
|
|
|
get_results_list(self,
*args,
**kwargs)
This method is a shortcut so that you do not have to do a list
comprehension yourself on the iterator that is normally returned. |
source code
|
|
int
|
count(self)
Obtain the number of rows a particular query will return, without
having to fetch and parse all the actual data. |
source code
|
|
str
|
|
str
|
|
str
|
|
list
|
|
dict
|
|
xml.minidom.Node
|
to_Node(self)
This is an intermediate step in the creation of the xml serialised
version of the query. |
source code
|
|
string
|
to_xml(self)
This method serialises the current state of the query to an xml
string, suitable for storing, or sending over the internet to the
webservice. |
source code
|
|
string
|
to_formatted_xml(self)
This method serialises the current state of the query to an xml
string, suitable for storing, or sending over the internet to the
webservice, only more readably. |
source code
|
|
|
clone(self)
This method will produce a clone that is independent, and can be
altered without affecting the original, but starts off with the exact
same state as it. |
source code
|
|
Inherited from object :
__delattr__ ,
__format__ ,
__getattribute__ ,
__hash__ ,
__new__ ,
__reduce__ ,
__reduce_ex__ ,
__repr__ ,
__setattr__ ,
__sizeof__ ,
__subclasshook__
|
|
SO_SPLIT_PATTERN = re.compile(r'(?i) \s* ( asc| desc) \s* ')
|
|
LOGIC_SPLIT_PATTERN = re.compile(r'(?i) \s* (?: and| or| \(| \)) \s* ')
|
|
TRAILING_OP_PATTERN = re.compile(r'(?i) \s* ( and| or) \s* $')
|
|
LEADING_OP_PATTERN = re.compile(r'(?i) ^\s* ( and| or) \s* ')
|
|
LOGIC_OPS = [ ' and ' , ' or ' ]
|
|
LOGIC_PRODUCT = [ ( ' and ' , ' and ' ) , ( ' and ' , ' or ' ) , ( ' or ' , ' and ' ) , ...
|
|
x = ' or '
|
|
y = ' or '
|
__init__(self,
model,
service=None,
validate=True,
root=None)
(Constructor)
| source code
|
Construct a new Query
Construct a new query for making database queries against an
InterMine data warehouse.
Normally you would not need to use this constructor directly, but
instead use the factory method on intermine.webservice.Service, which
will handle construction for you.
- Parameters:
model - an instance of intermine.model.Model. Required
service - an instance of l{intermine.service.Service}. Optional, but you
will not be able to make requests without one.
validate - a boolean - defaults to True. If set to false, the query will not
try and validate itself. You should not set this to false.
- Overrides:
object.__init__
|
from_xml(cls,
xml,
*args,
**kwargs)
Class Method
| source code
|
Deserialise a query serialised to XML
This method is used to instantiate serialised queries. It is used by
intermine.webservice.Service objects to instantiate Template objects
and it can be used to read in queries you have saved to a file.
- Parameters:
xml - The xml as a file name, url, or string
- Returns: Query
- Raises:
|
__str__(self)
(Informal representation operator)
| source code
|
str(x)
- Overrides:
object.__str__
- (inherited documentation)
|
Validate the query
Invalid queries will fail to run, and it is not always obvious why.
The validation routine checks to see that the query will not cause
errors on execution, and tries to provide informative error
messages.
This method is called immediately after a query is fully
deserialised.
- Raises:
|
Add one or more views to the list of output columns
example:
query.add_view("Gene.name Gene.organism.name")
This is the main method for adding views to the list of output
columns. As well as appending views, it will also split a single, space
or comma delimited string into multiple paths, and flatten out lists,
or any combination. It will also immediately try to validate the
views.
Output columns must be valid paths according to the data model, and
they must represent attributes of tables
- See Also:
-
intermine.model.Model,
intermine.model.Path,
intermine.model.Attribute
|
Check to see if the views given are valid
This method checks to see if the views:
-
are valid according to the model
-
represent attributes
- Raises:
|
Add a constraint (filter on records)
example:
query.add_constraint("Gene.symbol", "=", "zen")
This method will try to make a constraint from the arguments given,
trying each of the classes it knows of in turn to see if they accept
the arguments. This allows you to add constraints of different types
without having to know or care what their classes or implementation
details are. All constraints derive from
intermine.constraints.Constraint, and they all have a path attribute,
but are otherwise diverse.
Before adding the constraint to the query, this method will also try
to check that the constraint is valid by calling
Query.verify_constraint_paths()
- Returns: intermine.constraints.Constraint
|
Check that the constraints are valid
This method will check the path attribute of each constraint. In
addition it will:
-
Check that BinaryConstraints and MultiConstraints have an Attribute
as their path
-
Check that TernaryConstraints have a Reference as theirs
-
Check that SubClassConstraints have a correct subclass relationship
-
Check that LoopConstraints have a valid loopPath, of a compatible
type
-
Check that ListConstraints refer to an object
- Parameters:
cons - The constraints to check (defaults to all constraints on the
query)
- Raises:
|
Returns the constraint with the given code
Returns the constraint with the given code, if if exists. If no such
constraint exists, it throws a ConstraintError
- Returns: intermine.constraints.CodedConstraint
- the constraint corresponding to the given code
|
Add a join statement to the query
example:
query.add_join("Gene.proteins", "OUTER")
A join statement is used to determine if references should restrict
the result set by only including those references exist. For example,
if one had a query with the view:
"Gene.name", "Gene.proteins.name"
Then in the normal case (that of an INNER join), we would only get
Genes that also have at least one protein that they reference. Simply
by asking for this output column you are placing a restriction on the
information you get back.
If in fact you wanted all genes, regardless of whether they had
proteins associated with them or not, but if they did you would rather
like to know _what_ proteins, then you need to specify this reference
to be an OUTER join:
query.add_join("Gene.proteins", "OUTER")
Now you will get many more rows of results, some of which will have
"null" values where the protein name would have been,
This method will also attempt to validate the join by calling
Query.verify_join_paths(). Joins must have a valid path, the style can
be either INNER or OUTER (defaults to OUTER, as the user does not need
to specify inner joins, since all references start out as inner joins),
and the path must be a reference.
- Returns: intermine.pathfeatures.Join
- Raises:
ModelError - if the path is invalid
TypeError - if the join style is invalid
|
Check that the joins are valid
Joins must have valid paths, and they must refer to references.
- Raises:
|
add_path_description(self,
*args,
**kwargs)
| source code
|
Add a path description to the query
example:
query.add_path_description("Gene.proteins.proteinDomains", "Protein Domain")
This allows you to alias the components of long paths to improve the
way they display column headers in a variety of circumstances. In the
above example, if the view included the unwieldy path
"Gene.proteins.proteinDomains.primaryIdentifier", it would
(depending on the mine) be displayed as "Protein Domain > DB
Identifer". These setting are taken into account by the webservice
when generating column headers for flat-file results with the
columnheaders parameter given, and always supplied when requesting
jsontable results.
- Returns: intermine.pathfeatures.PathDescription
|
Check that the path of the path description is valid
Checks for consistency with the data model
- Raises:
|
Returns the logic expression for the query
This returns the up to date logic expression. The default value is
the representation of all coded constraints and'ed together.
If the logic is empty and there are no constraints, returns an empty
string.
The LogicGroup object stringifies to a string that can be parsed to
obtain itself (eg: "A and (B or C or D)").
- Returns: intermine.constraints.LogicGroup
|
Sets the Logic given the appropriate input
example:
Query.set_logic("A and (B or C)")
This sets the logic to the appropriate value. If the value is
already a LogicGroup, it is accepted, otherwise the string is tokenised
and parsed.
The logic is then validated with a call to validate_logic()
raise LogicParseError: if there is a syntax error in the logic
|
Validates the query logic
Attempts to validate the logic by checking that every
coded_constraint is included at least once
- Raises:
QueryError - if not every coded constraint is represented
|
add_sort_order(self,
path,
direction=' asc ' )
| source code
|
Adds a sort order to the query
example:
Query.add_sort_order("Gene.name", "DESC")
This method adds a sort order to the query. A query can have
multiple sort orders, which are assessed in sequence.
If a query has two sort-orders, for example, the first being
"Gene.organism.name asc", and the second being
"Gene.name desc", you would have the list of genes grouped by
organism, with the lists within those groupings in reverse alphabetical
order by gene name.
This method will try to validate the sort order by calling
validate_sort_order()
|
Check the validity of the sort order
Checks that the sort order paths are:
- Raises:
|
Return the current mapping of class to subclass
This method returns a mapping of classes used by the model for
assessing whether certain paths are valid. For intance, if you subclass
MicroArrayResult to be FlyAtlasResult, you can refer to the
.presentCall attributes of fly atlas results. MicroArrayResults do not
have this attribute, and a path such as:
Gene.microArrayResult.presentCall
would be marked as invalid unless the dictionary is provided.
Users most likely will not need to ever call this method.
- Returns: dict(string, string)
|
results(self,
row=' rr ' ,
start=0,
size=None)
| source code
|
Return an iterator over result rows
Usage:
for row in query.results():
do_sth_with(row)
- Parameters:
row (string) - the format for the row. Defaults to "rr". Valid options
are "rr", "dict", "list",
"jsonrows", "jsonobject", "tsv",
"csv".
- Returns: intermine.webservice.ResultIterator
- Raises:
|
Get a list of result rows
This method is a shortcut so that you do not have to do a list
comprehension yourself on the iterator that is normally returned. If
you have a very large result set (and these can get up to 100's of
thousands or rows pretty easily) you will not want to have the whole
list in memory at once, but there may be other circumstances when you
might want to keep the whole list in one place.
It takes all the same arguments and parameters as Query.results
aliased as 'all'
See Also:
intermine.query.results
|
Return the total number of rows this query returns
Obtain the number of rows a particular query will return, without
having to fetch and parse all the actual data. This method makes a
request to the server to report the count for the query, and is sugar
for a results call.
- Returns: int
- Raises:
|
Returns the uri to use to create a list from this query
Query.get_list_upload_uri() -> str
This method is used internally when performing list operations on
queries.
- Returns: str
|
Returns the uri to use to create a list from this query
Query.get_list_append_uri() -> str
This method is used internally when performing list operations on
queries.
- Returns: str
|
Returns the path section pointing to the REST resource
Query.get_results_path() -> str
Internally, this just calls a constant property in
intermine.service.Service
- Returns: str
|
Returns the child objects of the query
This method is used during the serialisation of queries to xml. It
is unlikely you will need access to this as a whole. Consider using
"path_descriptions", "joins",
"constraints" instead
- Returns: list
- the child element of this query
- See Also:
-
Query.path_descriptions,
Query.joins,
Query.constraints
|
Returns the parameters to be passed to the webservice
The query is responsible for producing its own query parameters.
These consist simply of:
-
query: the xml representation of the query
- Returns: dict
|
Returns a DOM node representing the query
This is an intermediate step in the creation of the xml serialised
version of the query. You probably won't need to call this
directly.
- Returns: xml.minidom.Node
|
Return an XML serialisation of the query
This method serialises the current state of the query to an xml
string, suitable for storing, or sending over the internet to the
webservice.
- Returns: string
- the serialised xml string
|
Return a readable XML serialisation of the query
This method serialises the current state of the query to an xml
string, suitable for storing, or sending over the internet to the
webservice, only more readably.
- Returns: string
- the serialised xml string
|
Performs a deep clone
This method will produce a clone that is independent, and can be
altered without affecting the original, but starts off with the exact
same state as it.
The only shared elements should be the model and the service, which
are shared by all queries that refer to the same webservice.
- Returns:
- same class as caller
|
LOGIC_PRODUCT
- Value:
[ ( ' and ' , ' and ' ) , ( ' and ' , ' or ' ) , ( ' or ' , ' and ' ) , ( ' or ' , ' or ' ) ]
|
|
constraints
Returns the constraints of the query
Query.constraints → list(intermine.constraints.Constraint)
Constraints are returned in the order of their code (normally the
order they were added to the query) and with any subclass contraints at
the end.
- Get Method:
- unreachable.constraints(self)
- Query.constraints → list(intermine.constraints.Constraint)
- Type:
- list(Constraint)
|
coded_constraints
Returns the list of constraints that have a code
Query.coded_constraints →
list(intermine.constraints.CodedConstraint)
This returns an up to date list of the constraints that can be used
in a logic expression. The only kind of constraint that this excludes,
at present, is SubClassConstraints
- Get Method:
- unreachable.coded_constraints(self)
- Query.coded_constraints → list(intermine.constraints.CodedConstraint)
- Type:
- list(intermine.constraints.CodedConstraint)
|