Package Api Documentation for mlconjug

API Reference for the classes in mlconjug3.mlconjug.py

MLConjug Main module.

This module declares the main classes the user interacts with.
The module defines the classes needed to interface with Machine Learning models.
mlconjug3.mlconjug.extract_verb_features(verb, lang, ngram_range)[source]
Custom Vectorizer optimized for extracting verbs features.
The Vectorizer subclasses sklearn.feature_extraction.text.CountVectorizer .
As in Indo-European languages verbs are inflected by adding a morphological suffix, the vectorizer extracts verb endings and produces a vector representation of the verb with binary features.
To enhance the results of the feature extration, several other features have been included:
The features are the verb’s ending n-grams, starting n-grams, length of the verb, number of vowels, number of consonants and the ratio of vowels over consonants.
Parameters
  • verb – string. Verb to vectorize.

  • lang – string. Language to analyze.

  • ngram_range – tuple. The range of the ngram sliding window.

Returns

list. List of the most salient features of the verb for the task of finding it’s conjugation’s class.

class mlconjug3.mlconjug.Conjugator(language='fr', model=None)[source]

Bases: object

This is the main class of the project.
The class manages the Verbiste data set and provides an interface with the scikit-learn pipeline.
If no parameters are provided, the default language is set to french and the pre-trained french conjugation pipeline is used.
The class defines the method conjugate(verb, language) which is the main method of the module.
Parameters
  • language – string. Language of the conjugator. The default language is ‘fr’ for french.

  • model – mlconjug3.Model or scikit-learn Pipeline or Classifier implementing the fit() and predict() methods. A user provided pipeline if the user has trained his own pipeline.

conjugate(verb, subject='abbrev')[source]
This is the main method of this class.
It first checks to see if the verb is in Verbiste.
If it is not, and a pre-trained scikit-learn pipeline has been supplied, the method then calls the pipeline to predict the conjugation class of the provided verb.
Returns a Verb object or None.
Parameters
  • verb – string. Verb to conjugate.

  • subject – string. Toggles abbreviated or full pronouns. The default value is ‘abbrev’. Select ‘pronoun’ for full pronouns.

Returns

Verb object or None.

set_model(model)[source]

Assigns the provided pre-trained scikit-learn pipeline to be able to conjugate unknown verbs.

Parameters

model – scikit-learn Classifier or Pipeline.

class mlconjug3.mlconjug.DataSet(verbs_dict)[source]

Bases: object

This class holds and manages the data set.
Defines helper methodss for managing Machine Learning tasks like constructing a training and testing set.
Parameters

verbs_dict – A dictionary of verbs and their corresponding conjugation class.

construct_dict_conjug()[source]
Populates the dictionary containing the conjugation templates.
Populates the lists containing the verbs and their templates.
split_data(threshold=8, proportion=0.5)[source]

Splits the data into a training and a testing set.

Parameters
  • threshold – int. Minimum size of conjugation class to be split.

  • proportion – float. Proportion of samples in the training set. Must be between 0 and 1.

class mlconjug3.mlconjug.Model(vectorizer=None, feature_selector=None, classifier=None, language=None)[source]

Bases: object

This class manages the scikit-learn pipeline.
The Pipeline includes a feature vectorizer, a feature selector and a classifier.
If any of the vectorizer, feature selector or classifier is not supplied at instance declaration, the __init__ method will provide good default values that get more than 92% prediction accuracy.
Parameters
  • vectorizer – scikit-learn Vectorizer.

  • feature_selector – scikit-learn Classifier with a fit_transform() method

  • classifier – scikit-learn Classifier with a predict() method

  • language – language of the corpus of verbs to be analyzed.

train(samples, labels)[source]

Trains the pipeline on the supplied samples and labels.

Parameters
  • samples – list. List of verbs.

  • labels – list. List of verb templates.

predict(verbs)[source]

Predicts the conjugation class of the provided list of verbs.

Parameters

verbs – list. List of verbs.

Returns

list. List of predicted conjugation groups.

API Reference for the classes in mlconjug3.PyVerbiste.py

PyVerbiste.

A Python library for conjugating verbs in French, English, Spanish, Italian, Portuguese and Romanian (more soon).
It contains conjugation data generated by machine learning models using the python library mlconjug3.
More information about mlconjug3 at https://pypi.org/project/mlconjug3/
The conjugation data conforms to the XML schema defined by Verbiste.
class mlconjug3.PyVerbiste.ConjugManager(language='default')[source]

Bases: object

This is the class handling the mlconjug3 json files.

Parameters

language – string. | The language of the conjugator. The default value is fr for French. | The allowed values are: fr, en, es, it, pt, ro.

_load_verbs(verbs_file)[source]

Load and parses the verbs from the json file.

Parameters

verbs_file – string or path object. Path to the verbs json file.

_load_conjugations(conjugations_file)[source]

Load and parses the conjugations from the json file.

Parameters

conjugations_file – string or path object. Path to the conjugation json file.

_detect_allowed_endings()[source]
Detects the allowed endings for verbs in the supported languages.
All the supported languages except for English restrict the form a verb can take.
As English is much more productive and varied in the morphology of its verbs, any word is allowed as a verb.
Returns

set. A set containing the allowed endings of verbs in the target language.

is_valid_verb(verb)[source]
Checks if the verb is a valid verb in the given language.
English words are always treated as possible verbs.
Verbs in other languages are filtered by their endings.
Parameters

verb – string. The verb to conjugate.

Returns

bool. True if the verb is a valid verb in the language. False otherwise.

get_verb_info(verb)[source]

Gets verb information and returns a VerbInfo instance.

Parameters

verb – string. Verb to conjugate.

Returns

VerbInfo object or None.

get_conjug_info(template)[source]

Gets conjugation information corresponding to the given template.

Parameters

template – string. Name of the verb ending pattern.

Returns

OrderedDict or None. OrderedDict containing the conjugated suffixes of the template.

class mlconjug3.PyVerbiste.Verbiste(language='default')[source]

Bases: mlconjug3.PyVerbiste.ConjugManager

This is the class handling the Verbiste xml files.

Parameters

language – string. | The language of the conjugator. The default value is fr for French. | The allowed values are: fr, en, es, it, pt, ro.

_load_verbs(verbs_file)[source]

Load and parses the verbs from the xml file.

Parameters

verbs_file – string or path object. Path to the verbs xml file.

static _parse_verbs(file)[source]

Parses the XML file.

Parameters

file – FileObject. XML file containing the verbs.

Returns

OrderedDict. An OrderedDict containing the verb and its template for all verbs in the file.

_load_conjugations(conjugations_file)[source]

Load and parses the conjugations from the xml file.

Parameters

conjugations_file – string or path object. Path to the conjugation xml file.

_parse_conjugations(file)[source]

Parses the XML file.

Parameters

file – FileObject. XML file containing the conjugation templates.

Returns

OrderedDict. An OrderedDict containing all the conjugation templates in the file.

static _load_tense(tense)[source]

Load and parses the inflected forms of the tense from xml file.

Parameters

tense – list of xml tags containing inflected forms. The list of inflected forms for the current tense being processed.

Returns

list. List of inflected forms.

_detect_allowed_endings()
Detects the allowed endings for verbs in the supported languages.
All the supported languages except for English restrict the form a verb can take.
As English is much more productive and varied in the morphology of its verbs, any word is allowed as a verb.
Returns

set. A set containing the allowed endings of verbs in the target language.

get_conjug_info(template)

Gets conjugation information corresponding to the given template.

Parameters

template – string. Name of the verb ending pattern.

Returns

OrderedDict or None. OrderedDict containing the conjugated suffixes of the template.

get_verb_info(verb)

Gets verb information and returns a VerbInfo instance.

Parameters

verb – string. Verb to conjugate.

Returns

VerbInfo object or None.

is_valid_verb(verb)
Checks if the verb is a valid verb in the given language.
English words are always treated as possible verbs.
Verbs in other languages are filtered by their endings.
Parameters

verb – string. The verb to conjugate.

Returns

bool. True if the verb is a valid verb in the language. False otherwise.

class mlconjug3.PyVerbiste.VerbInfo(infinitive, root, template)[source]

Bases: object

This class defines the Verbiste verb information structure.

Parameters
  • infinitive – string. Infinitive form of the verb.

  • root – string. Lexical root of the verb.

  • template – string. Name of the verb ending pattern.

class mlconjug3.PyVerbiste.Verb(verb_info, conjug_info, subject='abbrev', predicted=False)[source]

Bases: object

This class defines the Verb Object.

Parameters
  • verb_info – VerbInfo Object.

  • conjug_info – OrderedDict.

  • subject – string. Toggles abbreviated or full pronouns. The default value is ‘abbrev’. Select ‘pronoun’ for full pronouns.

  • predicted – bool. Indicates if the conjugation information was predicted by the model or retrieved from the dataset.

iterate()[source]

Iterates over all conjugated forms and returns a list of tuples of those conjugated forms.

Returns

list. List of conjugated forms.

_load_conjug()[source]
Populates the inflected forms of the verb.
This is the generic version of this method.
It does not add personal pronouns to the conjugated forms.
This method can handle any new language if the conjugation structure conforms to the Verbiste XML Schema.
conjugate_person(key, persons_dict, term)[source]

Creates the conjugated form of the person specified by the key argument. :param key: string. :param persons_dict: OrderedDict :param term: string. :return: None.

class mlconjug3.PyVerbiste.VerbFr(verb_info, conjug_info, subject='abbrev', predicted=False)[source]

Bases: mlconjug3.PyVerbiste.Verb

This class defines the French Verb Object.

_load_conjug()[source]
Populates the inflected forms of the verb.
Adds personal pronouns to the inflected verbs.
conjugate_person(key, persons_dict, term)

Creates the conjugated form of the person specified by the key argument. :param key: string. :param persons_dict: OrderedDict :param term: string. :return: None.

iterate()

Iterates over all conjugated forms and returns a list of tuples of those conjugated forms.

Returns

list. List of conjugated forms.

class mlconjug3.PyVerbiste.VerbEn(verb_info, conjug_info, subject='abbrev', predicted=False)[source]

Bases: mlconjug3.PyVerbiste.Verb

This class defines the English Verb Object.

_load_conjug()[source]
Populates the inflected forms of the verb.
Adds personal pronouns to the inflected verbs.
conjugate_person(key, persons_dict, term)

Creates the conjugated form of the person specified by the key argument. :param key: string. :param persons_dict: OrderedDict :param term: string. :return: None.

iterate()

Iterates over all conjugated forms and returns a list of tuples of those conjugated forms.

Returns

list. List of conjugated forms.

class mlconjug3.PyVerbiste.VerbEs(verb_info, conjug_info, subject='abbrev', predicted=False)[source]

Bases: mlconjug3.PyVerbiste.Verb

This class defines the Spanish Verb Object.

_load_conjug()[source]
Populates the inflected forms of the verb.
Adds personal pronouns to the inflected verbs.
conjugate_person(key, persons_dict, term)

Creates the conjugated form of the person specified by the key argument. :param key: string. :param persons_dict: OrderedDict :param term: string. :return: None.

iterate()

Iterates over all conjugated forms and returns a list of tuples of those conjugated forms.

Returns

list. List of conjugated forms.

class mlconjug3.PyVerbiste.VerbIt(verb_info, conjug_info, subject='abbrev', predicted=False)[source]

Bases: mlconjug3.PyVerbiste.Verb

This class defines the Italian Verb Object.

_load_conjug()[source]
Populates the inflected forms of the verb.
Adds personal pronouns to the inflected verbs.
conjugate_person(key, persons_dict, term)

Creates the conjugated form of the person specified by the key argument. :param key: string. :param persons_dict: OrderedDict :param term: string. :return: None.

iterate()

Iterates over all conjugated forms and returns a list of tuples of those conjugated forms.

Returns

list. List of conjugated forms.

class mlconjug3.PyVerbiste.VerbPt(verb_info, conjug_info, subject='abbrev', predicted=False)[source]

Bases: mlconjug3.PyVerbiste.Verb

This class defines the Portuguese Verb Object.

_load_conjug()[source]
Populates the inflected forms of the verb.
Adds personal pronouns to the inflected verbs.
conjugate_person(key, persons_dict, term)

Creates the conjugated form of the person specified by the key argument. :param key: string. :param persons_dict: OrderedDict :param term: string. :return: None.

iterate()

Iterates over all conjugated forms and returns a list of tuples of those conjugated forms.

Returns

list. List of conjugated forms.

class mlconjug3.PyVerbiste.VerbRo(verb_info, conjug_info, subject='abbrev', predicted=False)[source]

Bases: mlconjug3.PyVerbiste.Verb

This class defines the Romanian Verb Object.

_load_conjug()[source]
Populates the inflected forms of the verb.
Adds personal pronouns to the inflected verbs.
conjugate_person(key, persons_dict, term)

Creates the conjugated form of the person specified by the key argument. :param key: string. :param persons_dict: OrderedDict :param term: string. :return: None.

iterate()

Iterates over all conjugated forms and returns a list of tuples of those conjugated forms.

Returns

list. List of conjugated forms.