pycrossword  0.4
Pure-Python implementation of a crossword puzzle generator and editor
Public Member Functions | Public Attributes | Private Member Functions | List of all members
pycross.dbapi.HunspellImportTask Class Reference

A single import task to import words from a DIC file (downloaded from the Hunspell repo) to an SQLite database *.db file. More...

Inheritance diagram for pycross.dbapi.HunspellImportTask:

Public Member Functions

def __init__ (self, lang, dicfile=None, posrules=None, posrules_strict=False, posdelim='/', lcase=True, replacements=None, remove_hyphens=True, filter_out=None, rows=None, commit_each=1000, on_stopcheck=None, id=0)
 
def run (self)
 Overridden worker method called when the task is started: does the import job. More...
 

Public Attributes

 signals
 HunspellImportSignals signals emiited by the import task More...
 
 lang
 str short name of the language, e.g. More...
 
 dicfile
 str | None full path to the DIC file to import words from More...
 
 posrules
 dict part-of-speech regular expression parsing rules More...
 
 posrules_strict
 bool import only the indicated or all parts of speech More...
 
 posdelim
 str delimiter delimiting the word and its part of speech (default = '/') More...
 
 lcase
 bool import words in lower case More...
 
 replacements
 dict character replacement rules More...
 
 remove_hyphens
 bool remove all hyphens from words More...
 
 filter_out
 dict regex-based rules to exclude words More...
 
 rows
 2-tuple | None the start and end rows (indices) of the words to import More...
 
 commit_each
 int threshold of DB insert operations after which the changes are written to the DB More...
 
 on_stopcheck
 callback callback function called periodically to check for interrupt condition More...
 
 id
 int unique ID of this task (in the thread pool) More...
 

Private Member Functions

def _delete_db (self, db)
 Deletes the existing DB file. More...
 
def _get_pos (self, cur)
 Retrieves the list of parts of speech present in the DB. More...
 

Detailed Description

A single import task to import words from a DIC file (downloaded from the Hunspell repo) to an SQLite database *.db file.

Derived from QtCore.QRunnable so the task can be run in a thread pool concurrently with other tasks.

Constructor & Destructor Documentation

◆ __init__()

def pycross.dbapi.HunspellImportTask.__init__ (   self,
  lang,
  dicfile = None,
  posrules = None,
  posrules_strict = False,
  posdelim = '/',
  lcase = True,
  replacements = None,
  remove_hyphens = True,
  filter_out = None,
  rows = None,
  commit_each = 1000,
  on_stopcheck = None,
  id = 0 
)
Parameters
langstr short name of the language, e.g. 'en'
dicfilestr | None full path to the DIC file to import words from (None means the default path will be assumed: pycross/assets/dic/<LANGUAGE>.dic)
posrulesdict part-of-speech regular expression parsing rules in the format:
{'N': 'regex for nouns', 'V': 'regex for verb', ...}
     Possible keys are: 'N' [noun], 'V' [verb], 'ADV' [adverb], 'ADJ' [adjective],
     'P' [participle], 'PRON' [pronoun], 'I' [interjection],
     'C' [conjuction], 'PREP' [preposition], 'PROP' [proposition],
     'MISC' [miscellaneous / other], 'NONE' [no POS]
 
posrules_strictbool if True (default), only the parts of speech present in posrules dict will be imported [all other words will be skipped]. If False, such words will be imported with 'MISC' and 'NONE' POS markers.
posdelimstr delimiter delimiting the word and its part of speech [default = '/']
lcasebool if True (default), found words will be imported in lower case; otherwise, the original case will remain
replacementsdict character replacement rules in the format:
{'char_from': 'char_to', ...}
Default = None (no replacements)
remove_hyphensbool if True (default), all hyphens ['-'] will be removed from the words
filter_outdict regex-based rules to filter out [exclude] words in the format:
{'word': ['regex1', 'regex2', ...], 'pos': ['regex1', 'regex2', ...]}
These words will not be imported. One of the POS rules can be used to screen off specific parts of speech. Match rules for words will be applied AFTER replacements and in the sequential order of the regex list. Default = None (no filter rules apply).
rows2-tuple | None the start and end rows (indices) of the words to import; e.g. (20, 100) means start import from row 20 and end import after row 100. If the second element in the tuple is negative (e.g. -1), only the start row will be considered and the import will go on till the last word in the source DIC file. None means ALL available words.
commit_eachint threshold of insert operations after which the transaction will be committed (default = 1000)
on_stopcheckcallback callback function called periodically to check for interrupt condition; takes 3 parameters:
  • id int unique ID of this task (in the thread pool)
  • lang str short name of the language, e.g. 'en'
  • filepath str full path to the source DIC file Must return a Boolean value: True to stop the import task, False to continue
idint unique ID of this task (in the thread pool)

Member Function Documentation

◆ _delete_db()

def pycross.dbapi.HunspellImportTask._delete_db (   self,
  db 
)
private

Deletes the existing DB file.

Parameters
dbSqlitedb a single SQLite database to delete

◆ _get_pos()

def pycross.dbapi.HunspellImportTask._get_pos (   self,
  cur 
)
private

Retrieves the list of parts of speech present in the DB.

Parameters
curSQLite cursor object the DB cursor
Returns
list parts of speech in the short form, e.g. ['N', 'V']

◆ run()

def pycross.dbapi.HunspellImportTask.run (   self)

Overridden worker method called when the task is started: does the import job.

Member Data Documentation

◆ commit_each

pycross.dbapi.HunspellImportTask.commit_each

int threshold of DB insert operations after which the changes are written to the DB

◆ dicfile

pycross.dbapi.HunspellImportTask.dicfile

str | None full path to the DIC file to import words from

◆ filter_out

pycross.dbapi.HunspellImportTask.filter_out

dict regex-based rules to exclude words

◆ id

pycross.dbapi.HunspellImportTask.id

int unique ID of this task (in the thread pool)

◆ lang

pycross.dbapi.HunspellImportTask.lang

str short name of the language, e.g.

'en'

◆ lcase

pycross.dbapi.HunspellImportTask.lcase

bool import words in lower case

◆ on_stopcheck

pycross.dbapi.HunspellImportTask.on_stopcheck

callback callback function called periodically to check for interrupt condition

◆ posdelim

pycross.dbapi.HunspellImportTask.posdelim

str delimiter delimiting the word and its part of speech (default = '/')

◆ posrules

pycross.dbapi.HunspellImportTask.posrules

dict part-of-speech regular expression parsing rules

◆ posrules_strict

pycross.dbapi.HunspellImportTask.posrules_strict

bool import only the indicated or all parts of speech

◆ remove_hyphens

pycross.dbapi.HunspellImportTask.remove_hyphens

bool remove all hyphens from words

◆ replacements

pycross.dbapi.HunspellImportTask.replacements

dict character replacement rules

◆ rows

pycross.dbapi.HunspellImportTask.rows

2-tuple | None the start and end rows (indices) of the words to import

◆ signals

pycross.dbapi.HunspellImportTask.signals

HunspellImportSignals signals emiited by the import task


The documentation for this class was generated from the following file: