Logistic Regression Classifier and Learner

Logistic regression is a popular classification method that comes from statistics. The model is described by a linear combination of coefficients,

F = beta_0 + beta_1*X_1 + beta_2*X_2 + ... + beta_k*X_k

and the probability (p) of a class value is  computed as:

p = exp(F)/(1+exp(F))

The outcome variable (class) must be binary (dichotomous) and discrete attributes must be translated to continuous. While Orange kernel provides the basic functionality, module orngLR.py covers the necessary adaptations and conversions.


LogRegClassifier

LogRegClassifier stores estimated values of regression coefficients and their significances, and uses them to predict classes and class probabilities using the equations described above.

Attributes

beta
Estimated regression coefficients.
beta_se
Estimated standard errors for regression coefficients.
wald_Z
Wald Z statistics for beta coefficients. Wald Z is computed as beta/beta_se.
P
List of P-values for beta coefficients, that is, the probability that beta coefficients differ from 0.0. The probability is computed from squared Wald Z statistics that is distributed with Chi-Square distribution.
likelihood
The probability of the sample (ie. learning examples) observed on the basis of the derived model, as a function of the regression parameters.
fitStatus
Tells how the model fitting ended - either regularly (LogRegFitter.OK), or it was interrupted due to one of beta coefficients escaping towards infinity (LogRegFitter.Infinity) or since the values didn't converge (LogRegFitter.Divergence). The value tells about the classifier's "reliability"; the classifier itself is useful in either case.

LogRegLearner

Logistic learner fits the beta coefficients and computes the related statistics by calling the specified fitter.

Attributes

fitter
An object that fits beta coefficients and corresponding standard errors from a set of data.

Methods

fitModel(examples[, weightID =])
Fits the model by calling fitter. If fitting succeeds, it returns a Classifer; if not, it returns the offending attribute. You should therefore always check the type of result returned, as follows. c = fitModel(examples) if isinstance(c, Variable): < remove the attribute c and see what happens > else: < we have a classifier, life is beautiful >

As all learners, LogRegLearner naturally provides the usual call operator, whom you pass examples (and weights, if you have them) and which returns a classifier or throws an exception if it can't. Use fitModel in the code that will iteratively remove problem attributes until it gets a classifier; in fact, that's exactly what orngLR does.

Logistic Regression Fitters

Fitters are objects that LogRegLearner uses to fit the model.

LogRegFitter

LogRegFitter is the abstract base class for logistic fitters. It defines the form of call operator and the constants denoting its (un)success:

Constants

OK
Fitter succeeded to converge to the optimal fit.
Infinity
Fitter failed due to one or more beta coefficients escaping towards infinity.
Divergence
Beta coefficients failed to converge, but none of beta coefficients escaped.
Constant
There is a constant attribute that causes the matrix to be singular.
Singularity
The matrix is singular.

Methods

__call__(examples, weightID)
Performs the fitting. There can be two different cases: either the fitting succeeded to find a set of beta coefficients (although possibly with difficulties) or the fitting failed altogether. The two cases return different results.
(status, beta, beta_se, likelihood)
The fitter managed to fit the model. The first element of the tuple, result, tells about the problems occurred; it can be either OK, Infinity or Divergence. In the latter cases, returned values may still be useful for making predictions, but it's recommended that you inspect the coefficients and their errors and make your decision whether to use the model or not.
(status, attribute)
The fitter failed and the returned attribute is responsible for it. The type of failure is reported in status, which can be either Constant or Singularity.

The proper way of calling the fitter is to expect and handle all the situations described. For instance, if fitter is an instance of some fitter and examples contain a set of suitable examples, a script should look like this:

res = fitter(examples) if res[0] in [fitter.OK, fitter.Infinity, fitter.Divergence]: status, beta, beta_se, likelihood = res < proceed by doing something with what you got > else: status, attr = res < remove the attribute or complain to the user or ... >

LogRegFitter_Cholesky

LogRegFitter_Cholesky is the sole fitter available at the moment. It is a C++ translation of Alan Miller's logistic regression code. It uses Newton-Raphson algorithm to iteratively minimize least squares error computed from learning examples.


Examples

Since basic logistic regression allows only continuous attributes and a dichotome class, we show only a very basic example. More detailed use of logistic regression is shown in logistic regression module.

Let us load the data, induce a classifier and see how it performs on the first five examples.

>>> data = orange.ExampleTable("ionosphere") >>> logistic = orange.LogRegLearner(data) >>> >>> for ex in data[:5]: ... print ex.getclass(), logistic(ex) g g b b g g b b g g