Home | Trees | Indices | Help |
|
---|
|
Provides search functionality for TurboGears using PyLucene.
This module uses PyLucene to do all the heavy lifting, but as a result this module does some fancy things with threads.
PyLucene requires that all threads that use it must inherit from PythonThread. This means either patching CherryPy and/or TurboGears, or having the CherryPy thread hand off the request to a PythonThread and, in the case of searching, wait for the result. The second method was chosen so that a patched CherryPy or TurboGears does not have to be maintained.
The other advantage to the chosen method is that indexing happens in a separate thread so the web request can return more quickly by not waiting for the results.
The main disadvantage with PyLucene and CherryPy, however, is that autoreload does not work with it. You must disable it by adding autoreload.on = False to your dev.cfg.
TurboLucene uses the following configuration options:
- turbolucene.search_fields:
- The list of fields that should be searched by default when a specific field is not specified. (e.g. ['id', 'title', 'text', 'categories']) (Default: ['id'])
- turbolucene.default_language:
- The default language to use if a language is not given calling add/update/search/etc. (Default: 'en')
- turbolucene.languages:
- The list of languages to support. This is a list of ISO language codes that you want to support in your application. The languages must be supported by PyLucene and must be configured in the languages configuration file. Currently the choice of languages that are possible out-of-the-box are : Czech (cs), Danish (da), German (de), Greek (el), English (en), Spanish (es), Finnish (fi), French (fr), Italian (it), Japanese (ja), Korean (ko), Dutch (nl), Norwegian (no), Portuguese (pt), Brazilian (pt-br), Russian (ru), Swedish (sv), and Chinese (zh). (Default: [<default_language>])
- turbolucene.default_operator:
- The default search operator to use between search terms when non is specified. (Default: 'AND') This must be a valid operator object from the PyLucene.MultiFieldQueryParser.Operator namespace.
- turbolucene.optimize_days:
- The list of days to schedule index optimization. Index optimization cleans up and compacts the indexes so that searches happen faster. This is a list of day numbers (Sunday = 1). Optimization of all indexes will occur on those days. (Default: [1, 2, 3, 4, 5, 6, 7], i.e. every day)
- turbolucene.optimize_time:
- A tuple containing the hour (24 hour format) and minute of the time to run the scheduled index optimizations. (Default: (00, 00), i.e. midnight)
- turbolucene.index_root:
- The base path in which to store the indexes. There is one index per supported language. Each index is a directory. Those directories will be sub-directories of this base path. If the path is relative, it is relative to your project's root. Normally you should not need to override this unless you specifically need the indexes to be located somewhere else. (Default: u'index')
- turbolucene.languages_file:
- The path to the languages configuration file. The languages configuration file provides the configuration information for all the languages that TurboLucene supports. Normally you should not need to override this. (Default: the u'languages.cfg' file in the turbolucene package)
- turbolucene.languages_file_encoding:
- The encoding of the languages file. (Default: 'utf-8')
- turbolucene.stopwords_root:
- The languages file can specify files that contain stopwords. If a stopwords file path is relative, this path with be prepended to it. This allows for all stopword files to be customized without needing to specify full paths for every one. Normally you should not need to override this. (Default: the stopwords directory in the turbolucene package)
All fields are optional, but at the minimum, you will likely want to specify turbolucene.search_fields.
See Also: _load_language_data for details about the languages configuration file.
Warning: Do not forget to turn off autoreload in dev.cfg.
Requires: TurboGears and PyLucene
Version: 0.2
Author: Krys Wilken
Contact: krys AT krys DOT ca
Copyright: (c) 2007 Krys Wilken
License: MIT
API Version: 2.0
Revision: $Id: __init__.py 47 2007-04-01 22:36:05Z krys $
|
|||
_Indexer Responsible for updating and maintaining the search engine index. |
|||
_Searcher Responsible for searching an index and returning results. |
|||
_SearcherFactory Produces running _Searcher threads. |
|||
Objects to use in make_document | |||
---|---|---|---|
Document j_document objects |
|||
Field j_field objects |
|
|||
|
|||
|
|||
unicode
|
|
||
list of unicode strings
|
|
||
PyLucene.Analyzer sub-class |
|
||
|
|||
|
|||
Public API | |||
---|---|---|---|
|
|||
|
|||
|
|||
|
|||
iterable |
|
|
|||
_DEFAULT_LANGUAGE =
Default language to use if none is specified in config. |
|||
_log = getLogger('turbolucene') Logger for this module |
|||
_language_data = None This will hold the language support data read from file. |
|||
_indexer = None This will hold the _Indexer singleton class. |
|||
_searcher_factory = None This will hold the _SearcherFactory singleton class. |
|||
Objects to use in make_document | |||
---|---|---|---|
STORE = <Field_Store: YES> Tells Field not to compress the field data |
|||
COMPRESS = <Field_Store: COMPRESS> Tells Field to compress the field data |
|||
TOKENIZED = <Field_Index: TOKENIZED> Tells Field to tokenize and do stemming on the field data |
|||
UN_TOKENIZED = <Field_Index: UN_TOKENIZED> Tells Field not to tokenize and do stemming on the field data |
|
Load all the language data from the configured languages file. The languages configuration file can be set with the turbolucene.languages_file configuration option and it's encoding is set with turbolucene.languages_file_encoding. Configuration file formatThe languages file is an INI-type (ConfigObj) file. Each section is defined by an ISO language code (en, de, el, pt-br, etc.). In each section the following keys are possible:
If neither stopwords or stopwords_file is defined for a language, then any stopwords that are used are determined automatically by the analyzer class' constructor. Example# German [de] analyzer_class = SnowballAnalyzer analyzer_class_args = German2 stopwords_file = stopwords_de.txt stopwords_file_encoding = windows-1252
See Also:
|
Schedule index optimization using the TurboGears scheduler. This function reads it's configuration data from turbolucene.optimize_days and turbolucene.optimize_time.
See Also: turbolucene (module docstring) for details about configuration settings. |
Return the path to the index for the given language. This function gets it's configuration data from turbolucene.index_root.
See Also: turbolucene (module docstring) for details about configuration settings. |
Read the stopwords from the given a stopwords file path. Stopwords are words that should not be indexed because they are too common or have no significant meaning (e.g. the, in, with, etc.) They are language dependent. This function gets it's configuration data from turbolucene.stopwords_root. If Stopwords files are text files (in the given encoding), with one stopword per line. Comments are marked by a | character. This is for compatibility with the stopwords files found at http://snowball.tartarus.org/.
See Also: turbolucene (module docstring) for details about configuration settings. |
Produce an analyzer object appropriate for the given language. This function uses the data that was read in from the languages configuration file to determine and instantiate the analyzer object.
See Also: _load_language_data for details about the language configuration file. |
Initialize and start the search engine threads. This function loads the language configuration information, starts the search engine threads, makes sure the search engine will be shutdown upon shutdown of TurboGears and starts the optimization scheduler to run at the configured times. The Example
|
Tell the search engine to add the given object to the index. This function returns immediately. It does not wait for the indexer to be finished.
See Also:
|
Tell the the search engine to update the index for the given object. This function returns immediately. It does not wait for the indexer to be finished.
See Also:
|
Tell the search engine to remove the given object from the index. This function returns immediately. It does not wait for the indexer to be finished.
See Also:
|
Return results from the search engine that match the query. If a results_formatter function was passed to start then the results will be passed through the formatter before returning. If not, the returned value is a list of strings that are the id fields of matching objects.
See Also:
|
Home | Trees | Indices | Help |
|
---|
Generated by Epydoc 3.0beta1 on Sun Apr 1 18:46:30 2007 | http://epydoc.sourceforge.net |