Logistic regression is a popular classification method that comes from statistics. The model is described by a linear combination of coefficients,
F = beta_0 + beta_1*X_1 + beta_2*X_2 + ... + beta_k*X_k
and the probability (p) of a class value is computed as:
p = exp(F)/(1+exp(F))
The outcome variable (class) must be binary (dichotomous) and discrete attributes must be translated to continuous. While Orange kernel provides the basic functionality, module orngLR.py covers the necessary adaptations and conversions.
LogRegClassifier
stores estimated values of regression coefficients and their significances, and uses them to predict classes and class probabilities using the equations described above.
Attributes
beta
/beta_se
.LogRegFitter.OK
), or it was interrupted due to one of beta coefficients escaping towards infinity (LogRegFitter.Infinity
) or since the values didn't converge (LogRegFitter.Divergence
). The value tells about the classifier's "reliability"; the classifier itself is useful in either case.Logistic learner fits the beta coefficients and computes the related statistics by calling the specified fitter
.
Attributes
Methods
fitter
. If fitting succeeds, it returns a Classifer
; if not, it returns the offending attribute. You should therefore always check the type of result returned, as follows.
As all learners, LogRegLearner
naturally provides the usual call operator, whom you pass examples (and weights, if you have them) and which returns a classifier or throws an exception if it can't. Use fitModel
in the code that will iteratively remove problem attributes until it gets a classifier; in fact, that's exactly what orngLR
does.
Fitters are objects that LogRegLearner uses to fit the model.
LogRegFitter
is the abstract base class for logistic fitters. It defines the form of call operator and the constants denoting its (un)success:
Constants
Methods
result
, tells about the problems occurred; it can be either OK
, Infinity
or Divergence
. In the latter cases, returned values may still be useful for making predictions, but it's recommended that you inspect the coefficients and their errors and make your decision whether to use the model or not.attribute
is responsible for it. The type of failure is reported in status
, which can be either Constant
or Singularity
.The proper way of calling the fitter is to expect and handle all the situations described. For instance, if fitter
is an instance of some fitter and examples
contain a set of suitable examples, a script should look like this:
LogRegFitter_Cholesky
is the sole fitter available at the moment. It is a C++ translation of Alan Miller's logistic regression code. It uses Newton-Raphson algorithm to iteratively minimize least squares error computed from learning examples.
Examples
Since basic logistic regression allows only continuous attributes and a dichotome class, we show only a very basic example. More detailed use of logistic regression is shown in logistic regression module.
Let us load the data, induce a classifier and see how it performs on the first five examples.