AdvancedQuery
is a Zope product extending Zope's search
engine ZCatalog
with the following key features:
Queries are specified by (full blown) Python objects. They are constructed in the following way:
Expression | printed as | Meaning |
---|---|---|
Eq(index, value, filter=False) |
index = value |
the documents indexed by index under value |
Le(index, value, filter=False) |
index <= value |
the documents indexed by index under a value less or equal value |
Ge(index, value, filter=False) |
index >= value |
the documents indexed by index under a value greater or equal value |
Between(index, min, max, filter=False) |
min <= index <= max |
the documents indexed by index under a value between min and max |
In(index, sequence, filter=False) |
index in sequence |
the documents indexed by index under a value in sequence |
Generic(index, value, filter=False) |
index ~~ value |
this query type is used to pass any search expression to
index as understood by it. Such search expressions
usually take the form of a dictionary with query
as the most essential key. Generic is necessary
to use the full power of specialized indexes, such as the
level argument for PathIndex searches. |
Indexed(index) |
Indexed(index) |
the documents that are indexed by index. This does not work for all index types. |
MatchGlob(index, pattern, filter=False) |
index =~ pattern |
the documents indexed by index under a value matching the glob pattern. A glob pattern can contain wildcards * (matches any sequence of characters) and ? (matches any single character).This query type is only supported for indexes which can be adapted to IKeyedIndex . In addition, the index
must index text values.
|
MatchRegexp(index, regexp, filter=False) |
index =~~ regexp |
the documents indexed by index under a value matching
the regular expression regexp.
See the re module documentation in the Python Library Reference,
for a description of regular expressions.This query type is only supported for indexes which can be adapted to IKeyedIndex . In addition, the index
must index text values.
|
Filter(index, filter) |
Filtered(index, filter) |
filter out documents not accepted by filter. filter is called with the document's indexed value; if it returns a true value, the document is accepted and rejected otherwise. Note that you must precisely know how index determines a document's indexed value to use this properly. |
LiteralResultSet(set) |
LiteralResultSet(set) |
the documents specified by set. set must be an IISet , IITreeSet
or sequence of catalog "data_record_id_"s.This can e.g. be used to further restrict the document set previously obtained through a query (e.g. for "facetting"). |
~ query |
~ query |
Not: the documents that do not satisfy query |
query1 & query2 |
(query1 & query2) |
And: the documents satisfying both query1 and query2 |
And(*queries) |
(query1 & ... & queryn) |
And: the documents satisfying all queries; if queries is empty, any document satisfies this And query |
query1 | query2 |
(query1 | query2) |
Or: the documents satisfying either query1 or query2 (or both) |
Or(*queries) |
(query1 | ... | queryn) |
Or: the documents satisfying (at least) one of queries; if queries is empty, no document satisfies this Or query |
A true filter value calls for
incremental filtering.
It is supported only for indexes which can be adapted to
IFilterIndex
.
And
and Or
queries are so called
CompositeQuerys. They possess a method
addSubquery(query)
to add an additional subquery.
The constructors are imported from Products.AdvancedQuery
.
AdvancedQuery
uses so called Monkey Patching
to give ZCatalog
the new method
makeAdvancedQuery(catalogSearchSpec)
.
A catalogSearchSpec is a search specification as
described in the Zope Book for ZCatalog
searches
(essentially a dictionary mapping index names to search specifications).
makeAdvancedQuery
returns the equivalent
AdvancedQuery
search object.
AdvancedQuery
uses so called Monkey Patching
to give ZCatalog
the new methods
evalAdvancedQuery(query, sortSpecs=(), withSortValues=_notPassed, **kw)
and
_unrestrictedEvalAdvancedQuery(query, sortSpecs=(), withSortValues=_notPassed, restricted=False, **kw)
.
evalAdvancedQuery
evaluates query and then
sorts the document result set according to sortSpecs.
If withSortValues is not passed in, it is set to True
if sortSpecs contains a ranking specification (as you
are probably interested in the rank) and to False
otherwise.
If withSortValues, then the data_record_score_
attribute of the returned proxies is abused to hold the sort value.
It is a tuple with one component per component in sortSpecs.
The attribute data_record_normalized_score_
is set to
None
.
Classes derived from ZCatalog
can by default automatically
restrict queries. For example, Products.CMFCore.CatalogTool.CatalogTool
retricts queries automatically to those documents for which the current
user has View
rights and which are "active".
_unrestrictedEvalAdvancedQuery
allows to avoid
this automatic restriction.
AdvancedQuery
supports incremental multi-level
lexicographic sorting via field index like indexes.
If an index used for sorting is not
field index like (i.e. does not index an object under at most one value),
you may get funny (and partly non determistic) results.
Sorting is specified by a sequence of sort specifications, each for
a single level. Such a specification is either an index name,
a pair index name and direction or a ranking specification (see below).
A direction is
'asc'
(ascending) or 'desc'
(descending);
if the direction is not specified, 'asc'
is assumed.
When the result contains documents not indexed by a sorting index, such documents are delivered after indexed documents. This happens always, independant of search direction.
From version 1.1 on, AdvancedQuery
supports incremental
filtering. Incremental filtering can be very promissing for an
unspecific subquery inside an otherwise specific And query,
especially for large Le
, Ge
,
Between
and range subqueries. If we use the index in the normal
way a huge Or query is constructed for such subqueries. Even
dm.incrementalsearch
cannot fully optimize the search against
this huge Or query. Whith incremental filtering the index is not used
in the normal way. Instead, the remaining And subqueries are
used to produce a set of document candidates. These are then
filtered by the filtering subquery, discarding documents not matching
the subquery. Provided that the other And subqueries already have
reduced the document set sufficiently, incremental filtering can
save a lot of time.
You request incremental filtering for an (elementary) subquery
with the filter
keyword argument. Usually,
you use it only for some subqueries of specific And queries.
Otherwise, incremental filtering may not reduce but increase the
query time (even considerably).
If you have more than a single filtering subquery in an And query, their order might be relevant for efficiency. You should put filtering subqueries that are likely to reduce the document set more before other filtering subqueries.
Incremental filtering requires that the affected index
can be adapted to IFilterIndex
; otherwise,
the filter argument is ignored.
In addition, you should consider the use of
dm.incrementalsearch
when you make significant
use of incremental filtering. dm.incrementalsearch
can globally optimize incremental filtering while otherwise
only a local optimization is possible.
From version 2.0 on, AdvancedQuery
supports
incremental ranking. Ranking is a form of sorting.
Therefore, you specify it as a sort spec. Ranking can be
combined with other sort specs in the usual way (leading to multi-level
sorting).
Like sorting in general, ranking is performed incrementally -- just as far as you have looked at the result. Therefore, although ranking in general is very expensive, its effect can be small if you only look at the first few (hundred) result objects (rather than the several hundred thousands).
Currently, the ranking specifications RankByQueries_Sum
,
and RankByQueries_Max
are supported.
In both cases, you call the constructors with one
or more pairs (q,
vq), i.e. with a sequence of weighted queries.
The rank of a document is the sum or the maximum of the
weights for queries matching the document, respectively.
Note that the runtime
behaviour for RankByQueries_Sum
is exponential, that
of RankByQueries_Max
linear in the number of queries
involved in the ranking.
Note that you probably want to normalize the document ranks.
The ranking classes above have methods getQueryValueSum()
and getQueryValueMax()
, respectively, that can help with
this.
from Products.AdvancedQuery import Eq, Between, Le # search for objects below 'a/b/c' with ids between 'a' and 'z~' query = Eq('path','a/b/c') & Between('id', 'a', 'z~') # evaluate and sort descending by 'modified' and ascending by 'Creator' context.Catalog.evalAdvancedQuery(query, (('modified','desc'), 'Creator',)) # search 'News' not yet archived and 'File's not yet expired. now = context.ZopeTime() query = Eq('portal_type', 'News') & ~ Le('ArchivalDate', now) | Eq('portal_type', 'File') & ~ Le('expires', now) context.Catalog.evalAdvancedQuery(query) # search 'News' containing 'AdvancedQuery' and filter out # not yet effective or still expired documents. query = Eq('portal_type', 'News') & Eq('SearchableText', 'AdvancedQuery') \ & Ge('expires', now, filter=True) & Le('effective', now, filter=True) context.Catalog.evalAdvancedQuery(query) # search for 'ranking' in 'SearchableText' and rank very high # when the term is in 'Subject' and high when it is in 'Title'. # print the id and the normalized rank from Products.AdvancedQuery import RankByQueries_Sum term = 'ranking' rs = RankByQueries_Sum((Eq('Subject', term),16), (Eq('Title', term),8)) norm = 1 + rs.getQueryValueSum() for r in context.Catalog.evalAdvancedQuery( Eq('SearchableText', term), (rs,) ): print r.getId, (1 + r.data_record_score_) / norm
You must not cache the result of an AdvancedQuery
unless you have ensured that sorting has finished (e.g. by
accessing the last element in the result). This is because
AdvancedQuery
uses incremental sorting with
BTrees
iterators. Like any iterator, they do not
like when the base object changes during iteration. Nasty types
of (apparently) non-deterministic errors can happen when
the index changes during sorting.
The current version supports Zope 4 (and above), is maintained on
PyPI and can
be pip
installed. To use it, its configure.zcml
must
be "executed" at startup (which typically happens automatically).
For the use in Plone (version 5.2+),
the companion package dm.plone.advancedquery
must be installed and its configure.zcml
"executed" at startup.
This software is open source and licensed under a BSD style license. See the license file in the distribution for details.
Former versions relied entirely on dm.incrementalsearch
for optimizations. To get the full potential,
the indexes should have known about dm.incrementalsearch
as well
and used it for their lookup; likely only Products.ManagableIndex
indexes did this. From version 4 on, optimizations no longer
rely on dm.incrementalsearch
(even though this
is still used, if installed). Optimizations now rely on
(conditional) adapters. In fact, (almost) the complete query
evaluation is controlled via adapters -- and by overriding
the package's adapters, you could (in principle) take over complete
control over the query evaluation. Likely, you will not do this
but maybe register additional adapters to provide optimizations for
new indexes.
Query evaluation proceeds in the following steps:
Generic
queries are transformed into specific queries (if possible).CompositeIndex
available.
Products.PluginIndexes.interfaces.IPluggableIndex
) is
available. By defining a more specific adapter, the index's lookup
can be "white boxed" by specifying how the lookup result is combined
from more elementary lookups via "and", "or", "not" (and potentially "filter").
This allows for more optimizations over the case that the index
is treated as a "black box".
AdvancedQuery
should be able to
work with any index implementing
Products.PluginIndexes.interfaces.IPluggableIndex
out of the box. No index specific configuration should be necessary
for search features also supported by ZCatalog
.
If AdvancedQuery
extensions should be supported
for the new index (e.g. filtering or matching) or if searches
involving the index should benefit from index specific optimizations,
then it might become necessary to register corresponding adapters
for the new index. Those adapters
would typically have as "provided" interface
IQueryNodeOptimizer
,
IQueryConverter
,
IFilterIndex
,
IIndexedValue
,
IMultiplicityAware
,
ITermValueMatch
,
IIndexed
,
IKeyedIndex
,
IKeyNormalizingIndex
,
ILookupIndex
,
or
ILookupTreeIndex
,
all defined in Products.AdvancedQuery.eval.interfaces
.
It is typically not necessary to define adapters for all those
interfaces. For example, the IQueryNodeOptimizer
adapter
is necessary only when the index wants to perform optimizations
on the query level (as e.g. CompositeIndex
does).
IFilterIndex
, IIndexedValue
,
IMultiplicityAware
and ITermValueMatch
may be relevant for filtering.
IMultiplicityAware
is used in the optimization
of not
, if available. IIndexed
is required
for an index, when the Indexed
query should be supported
for this index. IKeyedIndex
is typically required for
the matching queries; and used for optimized convertions of
Le
, Ge
and between
queries.
If the new index normalizes its search terms and you
define an IKeyedIndex
or IFilterIndex
adapter,
then likely an IKeyNormalizingIndex
adapter is
required. The "Lookup" and IQueryConverter
adapters
are always optional and used for optimizations; typically, at most
one of those would be defined for an index.
There are roughly two cases:
AdvancedQuery
extensions can be supported.
One would register adapters for the "provided" interfaces
IFilterIndex
,
IIndexedValue
,
IMultiplicityAware
,
IIndexed
,
IKeyedIndex
,
IKeyNormalizingIndex
and optionally for ILookupIndex
or
ILookupTreeIndex
. Many of those adapters could be
taken over from those for UnIndex
. Examples
are in Products.AdvancedQuery.eval.adatper.*index
.
AdvancedQuery
extensions (such as filtering,
Indexed
queries, ...) and either define no adapter at all
or define one or several IQueryConverter
adapters
for this index. Examples are in Products.AdvancedQuery.eval.adapter.query.converter.*index
.
Whereever this documentation speaks of adaptation, it actually
means "conditional adaptation". A conditional adapter
is a zope.interface
"subscription adapter" usually with an associated condition.
Products.AdvancedQuery.eval.adapter
contains functions
to define and look up conditional adapters as well as typical
conditions.
The new concept "conditional adapter" is necessary because
Zope's standard adapter concept makes assumptions not valid
in our context.
For example, an adapter defined for an index I
would be considered adequate for any index
J inheriting from I
unless this adapter was overridden by another adapter registered for index
K inheriting from I and either J is
K or inherits from it. The adapters employed by
AdvancedQuery
for an index I are typically
not adequate for all indexes J inheriting from I.
If
AdvancedQuery
would use "normal" adapters,
then such an index J would require the registration
of an adequate overriding adapter for J, otherwise
search results involving J could be wrong. As
Zope's index system is open (flexibly extendable), the risk would
be too great. Therefore,
AdvancedQuery
uses conditional adapters with a
condition typically of the form "applicable to index I
and derived indexes provided they do not override any of the following
methods". A conditional adatper is looked up like a "normal" adapter
with the exception that non applicable adapters
are skipped. This makes it possible that a more general adapter
can override a more specific one -- provided that the latter is
not applicable.