discrimintools package
Submodules
discrimintools.candisc module
- class discrimintools.candisc.CANDISC(n_components=None, target=None, features=None, priors=None, parallelize=False)[source]
Bases:
BaseEstimator
,TransformerMixin
Canonical Discriminant Analysis (CANDISC)
Description
This class inherits from sklearn BaseEstimator and TransformerMixin class
Performs a Canonical Discriminant Analysis, computes squared Mahalanobis distances between class means, and performs both univariate and multivariate one-way analyses of variance
- param n_components:
- type n_components:
number of dimensions kept in the results
- param target:
- type target:
string, target variable
- param priors:
- type priors:
Class priors (sum to 1)
- param parallelize:
- If model should be parallelize
If True : parallelize using mapply
If False : parallelize using apply
- type parallelize:
boolean, default = False
- returns:
summary_information_ (summary information about the variables in the analysis. This information includes the number of observations,) – the number of quantitative variables in the analysis, and the number of classes in the classification variable. The frequency of each class is also displayed.
eig_ (a pandas dataframe containing all the eigenvalues, the difference between each eigenvalue, the percentage of variance and the cumulative percentage of variance)
ind_ (a dictionary of pandas dataframe containing all the results for the active individuals (coordinates))
statistics_ (statistics)
classes_ (classes informations)
cov_ (covariances)
corr_ (correlation)
coef_ (pandas dataframe, Weight vector(s).)
intercept_ (pandas dataframe, Intercept term.)
score_coef_
score_intercept_
svd_ (eigenv value decomposition)
call_ (a dictionary with some statistics)
model_ (string. The model fitted = ‘candisc’)
Author(s)
———
Duvérier DJIFACK ZEBAZE duverierdjifack@gmail.com
References
SAS Documentation, https://documentation.sas.com/doc/en/statug/15.2/statug_candisc_toc.htm https://www.rdocumentation.org/packages/candisc/versions/0.8-6/topics/candisc https://www.rdocumentation.org/packages/candisc/versions/0.8-6 Ricco Rakotomalala, Pratique de l’analyse discriminante linéaire, Version 1.0, 2020
- decision_function(X)[source]
Apply decision function to a pandas dataframe of samples
The decision function is equal (up to a constant factor) to the log-posterior of the model, i.e. log p(y = k | x). In a binary classification setting this instead corresponds to the difference log p(y = 1 | x) - log p(y = 0 | x).
- param X:
DataFrame of samples (test vectors).
- type X:
DataFrame of shape (n_samples_, n_features)
- returns:
C – Decision function values related to each class, per sample. In the two-class case, the shape is (n_samples_,), giving the log likelihood ratio of the positive class.
- rtype:
DataFrame of shape (n_samples_,) or (n_samples_, n_classes)
- fit(X, y=None)[source]
Fit the Canonical Discriminant Analysis model
- param X:
Training Data
- type X:
pandas/polars DataFrame,
- param Returns:
- param ——–:
- param self:
Fitted estimator
- type self:
object
- fit_transform(X)[source]
Fit to data, then transform it
Fits transformer to x and returns a transformed version of X.
Parameters:
- XDataFrame of shape (n_samples_, n_features_)
Input samples
- returns:
X_new – Transformed data.
- rtype:
DataFrame of shape (n_rows, n_features_)
- pred_table()[source]
Prediction table
Notes
pred_table[i,j] refers to the number of times “i” was observed and the model predicted “j”. Correct predictions are along the diagonal.
- predict(X)[source]
Predict class labels for samples in X
- param X:
The data matrix for which we want to get the predictions.
- type X:
DataFrame of shape (n_samples_, n_features_)
- param Returns:
- param ——–:
- param y_pred:
Vectors containing the class labels for each sample
- type y_pred:
ndarray of shape (n_samples)
- predict_proba(X)[source]
Estimate probability
- param X:
Input data.
- type X:
DataFrame of shape (n_samples_,n_features_)
- param Returns:
- param ——–:
- param C:
Estimated probabilities.
- type C:
DataFrame of shape (n_samples_,n_classes_)
- score(X, y, sample_weight=None)[source]
Return the mean accuracy on the given test data and labels
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- param X:
Test samples.
- type X:
DataFrame of shape (n_samples_, n_features)
- param y:
True labels for X.
- type y:
array-like of shape (n_samples,) or (n_samples, n_outputs)
- param sample_weight:
Sample weights.
- type sample_weight:
array-like of shape (n_samples,), default=None
- returns:
score – Mean accuracy of
self.predict(X)
w.r.t. y.- rtype:
float
- transform(X)[source]
Project data to maximize class separation
- param X:
Input data
- type X:
DataFrame of shape (n_samples_, n_features_)
- param Returns:
- param ——–:
- param X_new:
Transformed data.
- type X_new:
DataFrame of shape (n_samples_, n_components_)
discrimintools.datasets module
discrimintools.disca module
- class discrimintools.disca.DISCA(n_components=None, target=None, features=None, priors=None, parallelize=False)[source]
Bases:
BaseEstimator
,TransformerMixin
Discriminant Correspondence Analysis (DISCA)
Description
This class inherits from sklearn BaseEstimator and TransformerMixin class
Performance Discriminant Correspondence Analysis
Parameters:
n_components : number of dimensions kept in the results
target : string, target variable
features : list of qualitatives variables to be included in the analysis.
- priorsThe priors statement specifies the prior probabilities of group membership.
“equal” to set the prior probabilities equal,
“proportional” or “prop” to set the prior probabilities proportional to the sample sizes
a pandas series which specify the prior probability for each level of the classification variable.
- parallelizeboolean, default = False
- If model should be parallelize
If True : parallelize using mapply
If False : parallelize using apply
- returns:
call_ (a dictionary with some statistics)
ind_ (a dictionary of pandas dataframe containing all the results for the active individuals (coordinates))
var_ (a dictionary of pandas dataframe containing all the results for the active variables (coordinates, correlation between variables and axes, square cosine, contributions))
statistics_ (statistics)
classes_ (classes informations)
anova_ (analyse of variance)
factor_model_ (correspondence analysis model)
coef_ (discriminant correspondence analysis coefficients)
model_ (string. The model fitted = ‘disca’)
Author(s)
———
Duvérier DJIFACK ZEBAZE duverierdjifack@gmail.com
Notes
——
https (//bookdown.org/teddyswiebold/multivariate_statistical_analysis_using_r/discriminant-correspondence-analysis.html)
https (//search.r-project.org/CRAN/refmans/TExPosition/html/tepDICA.html)
http (//pbil.univ-lyon1.fr/ADE-4/ade4-html/discrimin.coa.html)
https (//rdrr.io/cran/ade4/man/discrimin.coa.html)
https (//stat.ethz.ch/pipermail/r-help/2010-December/263170.html)
https (//www.sciencedirect.com/science/article/pii/S259026012200011X)
- decision_function(X)[source]
Apply decision function to an array of samples
- param X:
DataFrame of samples (test vectors).
- type X:
DataFrame of shape (n_samples_, n_features)
- returns:
C – Decision function values related to each class, per sample.
- rtype:
DataFrame of shape (n_samples_,) or (n_samples_, n_classes)
- fit(X)[source]
Fit the Discriminant Correspondence Analysis model
- param X:
Training Data
- type X:
pandas/polars DataFrame,
- param Returns:
- param ——–:
- param self:
Fitted estimator
- type self:
object
- fit_transform(X)[source]
Fit to data, then transform it
Fits transformer to X and returns a transformed version of X.
- param X:
Input samples.
- type X:
DataFrame of shape (n_samples, n_features+1)
- returns:
X_new – Transformed array.
- rtype:
DataFrame of shape (n_samples, n_features_new)
- pred_table()[source]
Prediction table
Notes
pred_table[i,j] refers to the number of times “i” was observed and the model predicted “j”. Correct predictions are along the diagonal.
- predict(X)[source]
Predict class labels for samples in X
- param X:
The data matrix for which we want to get the predictions.
- type X:
DataFrame of shape (n_samples_, n_features_)
- param Returns:
- param ——–:
- param y_pred:
Vectors containing the class labels for each sample
- type y_pred:
ndarray of shape (n_samples)
- predict_proba(X)[source]
Estimate probability
- param X:
Input data.
- type X:
DataFrame of shape (n_samples_,n_features_)
- param Returns:
- param ——–:
- param C:
Estimated probabilities.
- type C:
DataFrame of shape (n_samples_,n_classes_)
- score(X, y, sample_weight=None)[source]
Return the mean accuracy on the given test data and labels
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- param X:
Test samples.
- type X:
array-like of shape (n_samples, n_features)
- param y:
True labels for X.
- type y:
array-like of shape (n_samples,) or (n_samples, n_outputs)
- param sample_weight:
Sample weights.
- type sample_weight:
array-like of shape (n_samples,), default=None
- returns:
score – Mean accuracy of
self.predict(X)
w.r.t. y.- rtype:
float
- transform(X, y=None)[source]
Apply the dimensionality reduction on X
X is projected on the first axes previous extracted from a training set. :param X: New data, where n_rows_sup is the number of supplementary
row points and n_vars is the number of variables. X is a data table containing a category in each cell. Categories can be coded by strings or numeric values. X rows correspond to supplementary row points that are projected onto the axes.
- type X:
array of string, int or float, shape (n_rows_sup, n_vars)
- param y:
y is ignored.
- type y:
None
- returns:
X_new – X_new : coordinates of the projections of the supplementary row points onto the axes.
- rtype:
array of float, shape (n_rows_sup, n_components_)
discrimintools.dismix module
- class discrimintools.dismix.DISMIX(n_components=None, target=None, features=None, priors=None, parallelize=False)[source]
Bases:
BaseEstimator
,TransformerMixin
Discriminant Analysis of Mixed Data (DISMIX)
Description
This class inherits from sklearn BaseEstimator and TransformerMixin class
Performs linear discriminant analysis with both continuous and catogericals variables
Parameters:
n_components : number of dimensions kept in the results
target : The values of the classification variable define the groups for analysis.
features : list of mixed variables to be included in the analysis
- priorsThe priors statement specifies the prior probabilities of group membership.
“equal” to set the prior probabilities equal,
“proportional” or “prop” to set the prior probabilities proportional to the sample sizes
a pandas series which specify the prior probability for each level of the classification variable.
- parallelizeboolean, default = False
- If model should be parallelize
If True : parallelize using mapply
If False : parallelize using apply
- returns:
call_ (a dictionary with some statistics)
coef_ (DataFrame of shape (n_features,n_classes_))
intercept_ (DataFrame of shape (1, n_classes))
lda_model_ (linear discriminant analysis model)
factor_model_ (factor analysis of mixed data model)
projection_function_ (projection function)
coef_ (pandas dataframe of shpz (n_categories, n_classes))
intercept_ (pandas dataframe of shape (1, n_classes))
model_ (string. The model fitted = ‘dismix’)
Author(s)
———
Duvérier DJIFACK ZEBAZE duverierdjifack@gmail.com
References
———–
Ricco Rakotomalala, Pratique de l’analyse discriminante linéaire, Version 1.0, 2020
- fit(X, y=None)[source]
Fit the Linear Discriminant Analysis of Mixed Data model
Parameters:
- XDataFrame of shape (n_samples, n_features+1)
Training data
y : None
Returns:
- selfobject
Fitted estimator
- fit_transform(X)[source]
Fit to data, then transform it
Fits transformer to X and returns a transformed version of X.
- param X:
Input samples.
- type X:
DataFrame of shape (n_samples, n_features+1)
- returns:
X_new – Transformed array.
- rtype:
DataFrame of shape (n_samples, n_features_new)
- pred_table()[source]
Prediction table
Notes
pred_table[i,j] refers to the number of times “i” was observed and the model predicted “j”. Correct predictions are along the diagonal.
- predict(X)[source]
Predict class labels for samples in X
Parameters:
- XDataFrame of shape (n_samples, n_features)
The dataframe for which we want to get the predictions
Returns:
- y_predDtaFrame of shape (n_samples, 1)
DataFrame containing the class labels for each sample.
- predict_proba(X)[source]
Estimate probability
- param X:
Input data
- type X:
DataFrame of shape (n_samples, n_features)
- param Returns:
- param ——-:
- param C:
Estimate probabilities
- type C:
DataFrame of shape (n_samples, n_classes)
- score(X, y, sample_weight=None)[source]
Return the mean accuracy on the given test data and labels
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- param X:
Test samples.
- type X:
array-like of shape (n_samples, n_features)
- param y:
True labels for X.
- type y:
array-like of shape (n_samples,) or (n_samples, n_outputs)
- param sample_weight:
Sample weights.
- type sample_weight:
array-like of shape (n_samples,), default=None
- returns:
score – Mean accuracy of
self.predict(X)
w.r.t. y.- rtype:
float
- transform(X)[source]
Project data to maximize class separation
Parameters:
- XDataFrame of shape (n_samples, n_features)
Input data
Returns:
X_new : DataFrame of shape (n_samples, n_components_)
discrimintools.disqual module
- class discrimintools.disqual.DISQUAL(n_components=None, target=None, features=None, priors=None, parallelize=False)[source]
Bases:
BaseEstimator
,TransformerMixin
Discriminant Analysis for qualitatives/categoricals variables (DISQUAL)
Description
This class inherits from sklearn BaseEstimator and TransformerMixin class
Performs discriminant analysis for categorical variables using multiple correspondence analysis (MCA) and linear discriminant analysis
Parameters:
n_components : number of dimensions kept in the results
target : The values of the classification variable define the groups for analysis.
features : list of qualitatives variables to be included in the analysis.
- priorsThe priors statement specifies the prior probabilities of group membership.
“equal” to set the prior probabilities equal,
“proportional” or “prop” to set the prior probabilities proportional to the sample sizes
a pandas series which specify the prior probability for each level of the classification variable.
- parallelizeboolean, default = False
- If model should be parallelize
If True : parallelize using mapply
If False : parallelize using apply
Returns:
call_ : a dictionary with some statistics
statistics_ : Chi-square test of independence of variables in a contingency table.
coef_ : DataFrame of shape (n_features,n_classes_)
intercept_ : DataFrame of shape (1, n_classes)
lda_model_ : linear discriminant analysis model
factor_model_ : multiple correspondence analysis model
projection_function_ : projection function
coef_ : pandas dataframe of shpz (n_categories, n_classes)
intercept_ : pandas dataframe of shape (1, n_classes)
model_ : string. The model fitted = ‘disqual’
References:
https://lemakistatheux.wordpress.com/category/outils-danalyse-supervisee/la-methode-disqual/ Ricco Rakotomalala, Pratique de l’analyse discriminante linéaire, Version 1.0, 2020 Saporta G., Probabilité, analyse des données et Statistique, Technip, 2006 Tufféry S., Data Mining et statistique décisionnelle - L’intelligence des données, Technip, 2012
# prodécure SAS: http://od-datamining.com/download/#macro Package et fonction R : http://finzi.psych.upenn.edu/library/DiscriMiner/html/disqual.html https://github.com/gastonstat/DiscriMiner
- fit(X, y=None)[source]
Fit the Linear Discriminant Analysis with categories variables model
Parameters:
- Xpandas/polars DataFrame of shape (n_samples, n_features+1)
Training data
y : None
Returns:
- selfobject
Fitted estimator
- fit_transform(X)[source]
Fit to data, then transform it
Fits transformer to X and returns a transformed version of X.
- param X:
Input samples.
- type X:
DataFrame of shape (n_samples, n_features+1)
- returns:
X_new – Transformed array.
- rtype:
DataFrame of shape (n_samples, n_features_new)
- pred_table()[source]
Prediction table
Notes
pred_table[i,j] refers to the number of times “i” was observed and the model predicted “j”. Correct predictions are along the diagonal.
- predict(X)[source]
Predict class labels for samples in X
Parameters:
- XDataFrame of shape (n_samples, n_features)
The dataframe for which we want to get the predictions
Returns:
- y_predDtaFrame of shape (n_samples, 1)
DataFrame containing the class labels for each sample.
- predict_proba(X)[source]
Estimate probability
- param X:
Input data
- type X:
DataFrame of shape (n_samples, n_features)
- param Returns:
- param ——-:
- param C:
Estimate probabilities
- type C:
DataFrame of shape (n_samples, n_classes)
- score(X, y, sample_weight=None)[source]
Return the mean accuracy on the given test data and labels
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- param X:
Test samples.
- type X:
array-like of shape (n_samples, n_features)
- param y:
True labels for X.
- type y:
array-like of shape (n_samples,) or (n_samples, n_outputs)
- param sample_weight:
Sample weights.
- type sample_weight:
array-like of shape (n_samples,), default=None
- returns:
score – Mean accuracy of
self.predict(X)
w.r.t. y.- rtype:
float
discrimintools.eta2 module
- discrimintools.eta2.eta2(categories, value, digits=4)[source]
Calcul du rapport de corréltion eta carré
Description
Cette fonction calcule le rapport de corrélation eta carré qui est une mesure d’association importante entre une variable quantitative et une variable qualitative.
- param categories:
- type categories:
un facteur associé à la variable qualitative
- param value:
- type value:
un vecteur associé à la variable quantitatives
- param digits:
- type digits:
int, default=3. Number of decimal printed
- returns:
a dictionary of numeric elements
Sum. Intra (la somme des carrés intra)
Sum. Inter (La somme des carrés inter)
Correlation ratio (La valeur du rapport de corrélation empirique)
F-stats (La statistique de test F de Fisher)
pvalue (la probabilité critique)
References
Bertrand, M. Maumy-Bertrand, Initiation à la Statistique avec R, Dunod, 4ème édition, 2023.
Author(s)
Duvérier DJIFACK ZEBAZE duverierdjifack@gmail.com
see also https://stackoverflow.com/questions/52083501/how-to-compute-correlation-ratio-or-eta-in-python
discrimintools.fviz_candisc module
- discrimintools.fviz_candisc.fviz_candisc(self, axis=[0, 1], x_label=None, y_label=None, x_lim=None, y_lim=None, title=None, geom=['point', 'text'], point_size=1.5, text_size=8, text_type='text', add_grid=True, add_hline=True, add_vline=True, repel=False, hline_color='black', hline_style='dashed', vline_color='black', vline_style='dashed', ha='center', va='center', ggtheme=<plotnine.themes.theme_minimal.theme_minimal object>) <module 'plotnine' from 'C:\\Users\\duver\\AppData\\Roaming\\Python\\Python310\\site-packages\\plotnine\\__init__.py'> [source]
Draw the Canonical Discriminant Analysis (CANDISC) individuals graphs
discrimintools.fviz_disca module
- discrimintools.fviz_disca.fviz_disca_ind(self, axis=[0, 1], x_lim=None, y_lim=None, x_label=None, y_label=None, title=None, geom=['point', 'text'], repel=True, point_size=1.5, text_size=8, text_type='text', add_grid=True, add_hline=True, add_vline=True, ha='center', va='center', hline_color='black', hline_style='dashed', vline_color='black', vline_style='dashed', add_group=True, center_marker_size=5, ggtheme=<plotnine.themes.theme_minimal.theme_minimal object>)[source]
Draw the Discriminant Correspondence Analysis (CANDISC) individuals graphs
Author:
Duvérier DJIFACK ZEBAZE duverierdjifack@gmail.com
- discrimintools.fviz_disca.fviz_disca_mod(self, axis=[0, 1], x_lim=None, y_lim=None, x_label=None, y_label=None, title=None, color='black', geom=['point', 'text'], text_type='text', marker='o', point_size=1.5, text_size=8, add_grid=True, add_group=True, color_sup='blue', marker_sup='^', add_hline=True, add_vline=True, ha='center', va='center', hline_color='black', hline_style='dashed', vline_color='black', vline_style='dashed', repel=False, ggtheme=<plotnine.themes.theme_minimal.theme_minimal object>) <module 'plotnine' from 'C:\\Users\\duver\\AppData\\Roaming\\Python\\Python310\\site-packages\\plotnine\\__init__.py'> [source]
Visualize Discriminant Correspondence Analysis - Graph of variables/categories
Description
- param self:
- type self:
an object of class DISCA
- param axis:
- type axis:
a numeric list or vector of length 2 specifying the dimensions to be plotted, default = [0,1]
- returns:
a plotnine graph
Author(s)
———
Duvérier DJIFACK ZEBAZE duverierdjifack@gmail.com
discrimintools.get_candisc module
- discrimintools.get_candisc.get_candisc(self, choice='ind')[source]
Extract the results - CANDISC
- param self:
- type self:
an object of class CANDISC
- param choice:
- returns:
a dictionary or a pandas dataframe
Author(s)
———
Duvérier DJIFACK ZEBAZE duverierdjifack@gmail.com
- discrimintools.get_candisc.get_candisc_coef(self, choice='absolute')[source]
Extract coefficients - CANDISC
- param self:
- type self:
an object of class CANDISC
- param choice:
- type choice:
the element to subset from the output. Allowed values are “absolute” (for canonical coefficients) or “score” (for class coefficients)
- returns:
a pandas dataframe containing coefficients
Author(s)
———
Duvérier DJIFACK ZEBAZE duverierdjifack@gmail.com
- discrimintools.get_candisc.get_candisc_ind(self)[source]
Extract the results for individuals - CANDISC
- param self:
- type self:
an object of class CANDISC
- returns:
a dictionary of dataframes containing all the results for the active individuals including
- coord (coordinates for the individuals)
Author(s)
———
Duvérier DJIFACK ZEBAZE duverierdjifack@gmail.com
- discrimintools.get_candisc.get_candisc_var(self, choice='correlation')[source]
Extract the results for variables - CANDISC
- param self:
- type self:
an object of class CANDISC
- param choice:
- type choice:
the element to subset from the output. Allowed values are “correlation” (for canonical correlation) or “covariance” (for covariance).
- returns:
a dictionary of dataframes containings all the results for the variables
Author(s)
———
Duvérier DJIFACK ZEBAZE duverierdjifack@gmail.com
- discrimintools.get_candisc.summaryCANDISC(self, digits=3, nb_element=10, ncp=3, to_markdown=False, tablefmt='pipe', **kwargs)[source]
Printing summaries of Canonical Discriminant Analysis model
- param self:
- type self:
an object of class CANDISC
- param digits:
- type digits:
int, default=3. Number of decimal printed
- param nb_element:
- type nb_element:
int, default = 10. Number of element
- param ncp:
- type ncp:
int, default = 3. Number of componennts
- param to_markdown:
- type to_markdown:
Print DataFrame in Markdown-friendly format.
- param tablefmt:
- type tablefmt:
Table format. For more about tablefmt, see : https://pypi.org/project/tabulate/
- param **kwargs:
- type **kwargs:
These parameters will be passed to tabulate.
- param Author(s):
- param ———:
- param Duvérier DJIFACK ZEBAZE duverierdjifack@gmail.com:
discrimintools.get_disca module
- discrimintools.get_disca.get_disca(self, choice='ind')[source]
Extract the results - DISCA
- param self:
- type self:
an object of class DISCA
- param choice:
- returns:
a dictionary or a pandas dataframe
Author(s)
———
Duvérier DJIFACK ZEBAZE duverierdjifack@gmail.com
- discrimintools.get_disca.get_disca_classes(self)[source]
Extract the results for groups - DISCA
- param self:
- type self:
an object of class DISCA
- returns:
a dictionary of dataframes containing all the results for the groups including
- coord (coordinates for the individuals)
Author(s)
———
Duvérier DJIFACK ZEBAZE duverierdjifack@gmail.com
- discrimintools.get_disca.get_disca_coef(self)[source]
Extract coefficients - DISCA
- param self:
- type self:
an object of class DISCA
- returns:
a pandas dataframe containing coefficients
Author(s)
———
Duvérier DJIFACK ZEBAZE duverierdjifack@gmail.com
- discrimintools.get_disca.get_disca_ind(self)[source]
Extract the results for individuals - DISCA
- param self:
- type self:
an object of class DISCA
- returns:
a dictionary of dataframes containing all the results for the active individuals including
- coord (coordinates for the individuals)
Author(s)
———
Duvérier DJIFACK ZEBAZE duverierdjifack@gmail.com
- discrimintools.get_disca.get_disca_var(self)[source]
Extract the results for variables/categories - DISCA
- param self:
- type self:
an object of class DISCA
- returns:
a dictionary of dataframes containing all the results for the active variables including
- coord (coordinates for the variables/categories)
- contrib (contributions for the variables/categories)
Author(s)
———
Duvérier DJIFACK ZEBAZE duverierdjifack@gmail.com
- discrimintools.get_disca.summaryDISCA(self, digits=3, nb_element=10, ncp=3, to_markdown=False, tablefmt='pipe', **kwargs)[source]
Printing summaries of Discriminant Correspondence Analysis model
- param self:
- type self:
an object of class DISCA
- param digits:
- type digits:
int, default=3. Number of decimal printed
- param nb_element:
- type nb_element:
int, default = 10. Number of element
- param ncp:
- type ncp:
int, default = 3. Number of componennts
- param to_markdown:
- type to_markdown:
Print DataFrame in Markdown-friendly format.
- param tablefmt:
- type tablefmt:
Table format. For more about tablefmt, see : https://pypi.org/project/tabulate/
- param **kwargs:
- type **kwargs:
These parameters will be passed to tabulate.
- param Author(s):
- param ———:
- param Duvérier DJIFACK ZEBAZE duverierdjifack@gmail.com:
discrimintools.get_lda module
- discrimintools.get_lda.get_lda(self, choice='ind')[source]
Extract the results - LDA
- param self:
- type self:
an object of class LDA
- param choice:
- returns:
a dictionary or a pandas dataframe
Author(s)
———
Duvérier DJIFACK ZEBAZE duverierdjifack@gmail.com
- discrimintools.get_lda.get_lda_coef(self)[source]
Extract coefficients - LDA
- param self:
- type self:
an object of class LDA
- returns:
a pandas dataframe containing coefficients
Author(s)
———
Duvérier DJIFACK ZEBAZE duverierdjifack@gmail.com
- discrimintools.get_lda.get_lda_cov(self)[source]
Extract the results for variables - LDA
- param self:
- type self:
an object of class LDA
- returns:
a dictionary of dataframes containings all the results for the variables
Author(s)
———
Duvérier DJIFACK ZEBAZE duverierdjifack@gmail.com
- discrimintools.get_lda.get_lda_ind(self)[source]
Extract the results for individuals - LDA
- param self:
- type self:
an object of class LDA
- returns:
a dictionary of dataframes containing all the results for the active individuals including
- scores (scores for the individuals)
- generalied_dist2 (generalized distance)
Author(s)
———
Duvérier DJIFACK ZEBAZE duverierdjifack@gmail.com
- discrimintools.get_lda.summaryLDA(self, digits=3, nb_element=10, to_markdown=False, tablefmt='pipe', **kwargs)[source]
Printing summaries of Linear Discriminant Analysis model
- param self:
- type self:
an object of class LDA
- param digits:
- type digits:
int, default=3. Number of decimal printed
- param nb_element:
- type nb_element:
int, default = 10. Number of element
- param to_markdown:
- type to_markdown:
Print DataFrame in Markdown-friendly format.
- param tablefmt:
- type tablefmt:
Table format. For more about tablefmt, see : https://pypi.org/project/tabulate/
- param **kwargs:
- type **kwargs:
These parameters will be passed to tabulate.
- param Author(s):
- param ———:
- param Duvérier DJIFACK ZEBAZE duverierdjifack@gmail.com:
discrimintools.lda module
- class discrimintools.lda.LDA(target=None, features=None, priors=None)[source]
Bases:
BaseEstimator
,TransformerMixin
Linear Discriminant Analysis (LDA)
Description
This class inherits from sklearn BaseEstimator and TransformerMixin class
Develops a discriminant criterion to classify each observation into groups
Parameters:
target : The values of the classification variable define the groups for analysis.
features : list of quantitative variables to be included in the analysis. The default is all numeric variables in dataset
- priorsThe priors statement specifies the prior probabilities of group membership.
“equal” to set the prior probabilities equal,
“proportional” or “prop” to set the prior probabilities proportional to the sample sizes
a pandas series which specify the prior probability for each level of the classification variable.
- returns:
call_ (a dictionary with some statistics)
coef_ (DataFrame of shape (n_features,n_classes_))
intercept_ (DataFrame of shape (1, n_classes))
summary_information_ (summary information about the variables in the analysis. This information includes the number of observations,) – the number of quantitative variables in the analysis, and the number of classes in the classification variable. The frequency of each class is also displayed.
ind_ (a dictionary of pandas dataframe containing all the results for the active individuals (coordinates))
statistics_ (statistics)
classes_ (classes informations)
cov_ (covariances)
model_ (string. The model fitted = ‘lda’)
Author(s)
———
Duvérier DJIFACK ZEBAZE duverierdjifack@gmail.com
References
SAS Documentation, https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.3/statug/statug_discrim_overview.htm Ricco Rakotomalala, Pratique de l’analyse discriminante linéaire, Version 1.0, 2020
- decision_function(X)[source]
Apply decision function to an array of samples
The decision function is equal (up to a constant factor) to the log-posterior of the model, i.e. log p(y = k | x). In a binary classification setting this instead corresponds to the difference log p(y = 1 | x) - log p(y = 0 | x).
- param X:
DataFrame of samples (test vectors).
- type X:
DataFrame of shape (n_samples_, n_features)
- returns:
C – Decision function values related to each class, per sample. In the two-class case, the shape is (n_samples_,), giving the log likelihood ratio of the positive class.
- rtype:
DataFrame of shape (n_samples_,) or (n_samples_, n_classes)
- fit(X, y=None)[source]
Fit the Linear Discriminant Analysis model
- param X:
Training Data
- type X:
pandas/polars DataFrame,
- param Returns:
- param ——–:
- param self:
Fitted estimator
- type self:
object
- fit_transform(X)[source]
Fit to data, then transform it
Fits transformer to x and returns a transformed version of X.
Parameters:
- XDataFrame of shape (n_samples_, n_features_+1)
Input samples
- returns:
X_new – Transformed data.
- rtype:
DataFrame of shape (n_rows, n_classes_)
- pred_table()[source]
Prediction table
Notes
pred_table[i,j] refers to the number of times “i” was observed and the model predicted “j”. Correct predictions are along the diagonal.
- predict(X)[source]
Predict class labels for samples in X
- param X:
The data matrix for which we want to get the predictions.
- type X:
DataFrame of shape (n_samples_, n_features_)
- param Returns:
- param ——–:
- param y_pred:
Vectors containing the class labels for each sample
- type y_pred:
ndarray of shape (n_samples)
- predict_proba(X)[source]
Estimate probability
- param X:
Input data.
- type X:
DataFrame of shape (n_samples_,n_features_)
- param Returns:
- param ——–:
- param C:
Estimated probabilities.
- type C:
DataFrame of shape (n_samples_,n_classes_)
- score(X, y, sample_weight=None)[source]
Return the mean accuracy on the given test data and labels
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- param X:
Test samples.
- type X:
DataFrame of shape (n_samples_, n_features)
- param y:
True labels for X.
- type y:
array-like of shape (n_samples,) or (n_samples, n_outputs)
- param sample_weight:
Sample weights.
- type sample_weight:
array-like of shape (n_samples,), default=None
- returns:
score – Mean accuracy of
self.predict(X)
w.r.t. y.- rtype:
float
- transform(X)[source]
Project data to maximize class separation
- param X:
Input data
- type X:
DataFrame of shape (n_samples_, n_features_)
- param Returns:
- param ——–:
- param X_new:
Transformed data.
- type X_new:
DataFrame of shape (n_samples_, n_classes_)
discrimintools.pcada module
- class discrimintools.pcada.PCADA(n_components=None, target=None, features=None, priors=None, parallelize=False)[source]
Bases:
BaseEstimator
,TransformerMixin
Principal Components Analysis - Discriminant Analysis (PCADA)
Description
This class inherits from sklearn BaseEstimator and TransformerMixin class
Performs principal components analysis - discriminant analysis
Parameters:
n_components : number of dimensions kept in the results
target : The values of the classification variable define the groups for analysis.
features : list of quantitative variables to be included in the analysis. The default is all numeric variables in dataset
- priorsThe priors statement specifies the prior probabilities of group membership.
“equal” to set the prior probabilities equal,
“proportional” or “prop” to set the prior probabilities proportional to the sample sizes
a pandas series which specify the prior probability for each level of the classification variable.
- parallelizeboolean, default = False
- If model should be parallelize
If True : parallelize using mapply
If False : parallelize using apply
Returns:
coef_ : DataFrame of shape (n_features,n_classes_)
intercept_ : DataFrame of shape (1, n_classes)
lda_model_ : linear discriminant analysis model
factor_model_ : principal components analysis model
projection_function_ : projection function
coef_ : pandas dataframe of shpz (n_categories, n_classes)
intercept_ : pandas dataframe of shape (1, n_classes)
model_ : string. The model fitted = ‘disqual’
Author(s)
Duvérier DJIFACK ZEBAZE duverierdjifack@gmail.com
References:
Ricco Rakotomalala, Pratique de l’analyse discriminante linéaire, Version 1.0, 2020
- fit(X, y=None)[source]
Fit the Linear Discriminant Analysis with categories variables model
Parameters:
- Xpandas/polars DataFrame of shape (n_samples, n_features+1)
Training data
y : None
Returns:
- selfobject
Fitted estimator
- fit_transform(X)[source]
Fit to data, then transform it
Fits transformer to X and returns a transformed version of X.
- param X:
Input samples.
- type X:
DataFrame of shape (n_samples, n_features+1)
- returns:
X_new – Transformed array.
- rtype:
DataFrame of shape (n_samples, n_features_new)
- pred_table()[source]
Prediction table
Notes
pred_table[i,j] refers to the number of times “i” was observed and the model predicted “j”. Correct predictions are along the diagonal.
- predict(X)[source]
Predict class labels for samples in X
Parameters:
- XDataFrame of shape (n_samples, n_features)
The dataframe for which we want to get the predictions
Returns:
- y_predDtaFrame of shape (n_samples, 1)
DataFrame containing the class labels for each sample.
- predict_proba(X)[source]
Estimate probability
- param X:
Input data
- type X:
DataFrame of shape (n_samples, n_features)
- param Returns:
- param ——-:
- param C:
Estimate probabilities
- type C:
DataFrame of shape (n_samples, n_classes)
- score(X, y, sample_weight=None)[source]
Return the mean accuracy on the given test data and labels
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- param X:
Test samples.
- type X:
array-like of shape (n_samples, n_features)
- param y:
True labels for X.
- type y:
array-like of shape (n_samples,) or (n_samples, n_outputs)
- param sample_weight:
Sample weights.
- type sample_weight:
array-like of shape (n_samples,), default=None
- returns:
score – Mean accuracy of
self.predict(X)
w.r.t. y.- rtype:
float
discrimintools.revaluate_cat_variable module
discrimintools.stepdisc module
- class discrimintools.stepdisc.STEPDISC(model=None, method='forward', alpha=0.01, lambda_init=None, model_train=False, verbose=True)[source]
Bases:
BaseEstimator
,TransformerMixin
Stepwise Discriminant Analysis (STEPDISC)
Description
This class inherits from sklearn BaseEstimator and TransformerMixin class
Performs a stepwise discriminant analysis to select a subset of the quantitative variables for use in discriminating among the classes. It can be used for forward selection, backward elimination.
- param model:
- type model:
an object of class LDA, CANDISC
- param method:
“forward” for forward selection,
“backward” for backward elimination
- type method:
the feature selection method to be used :
- param alpha:
- type alpha:
Specifies the significance level for adding or retaining variables in stepwise variable selection, default = 0.01
- param lambda_init:
- type lambda_init:
Initial Wilks Lambda/ Default = None
- param model_train:
- type model_train:
if model should be train with selected variables
- param verbose:
if True, print intermediary steps during feature selection (default)
if False
- type verbose:
boolean,
- returns:
call_ (a dictionary with some statistics)
results_ (a dictionary with stepwise results)
model_ (string. The model fitted = ‘stepdisc’)
Author(s)
———
Duvérier DJIFACK ZEBAZE duverierdjifack@gmail.com
References
SAS documentation, https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.3/statug/statug_stepdisc_overview.htm Ricco Rakotomalala, Pratique de l’analyse discriminante linéaire, Version 1.0, 2020