A library of extension and helper modules for Python's data analysis and machine learning libraries.
Those tools are intentionally not (yet) submitted to the main projects to avoid cluttering up the core libraries.
Link to the mlxtend
repository on GitHub: https://github.com/rasbt/mlxtend.
A collection of different functions for various data preprocessing procedures.
The preprocessing utilities
can be imported via
from mxtend.preprocessing import ...
class MeanCenterer(TransformerObj):
"""
Class for column centering of vectors and matrices.
Keyword arguments:
X: NumPy array object where each attribute/variable is
stored in an individual column.
Also accepts 1-dimensional Python list objects.
Class methods:
fit: Fits column means to MeanCenterer object.
transform: Uses column means from `fit` for mean centering.
fit_transform: Fits column means and performs mean centering.
The class methods `transform` and `fit_transform` return a new numpy array
object where the attributes are centered at the column means.
"""
Examples:
Use the fit
method to fit the column means of a dataset (e.g., the training dataset) to a new MeanCenterer object. Then, call the transform
method on the same dataset to center it at the sample mean.
>> X_train
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
>> mc = MeanCenterer().fit(X_train)
>> mc.transform(X_train)
array([[-3, -3, -3],
[ 0, 0, 0],
[ 3, 3, 3]])
To use the same parameters that were used to center the training dataset, simply call the transform
method of the MeanCenterer instance on a new dataset (e.g., test dataset).
>> X_test
array([[1, 1, 1],
[1, 1, 1],
[1, 1, 1]])
>> mc.transform(X_test)
array([[-3, -4, -5],
[-3, -4, -5],
[-3, -4, -5]])
The MeanCenterer
also supports Python list objects, and the fit_transform
method allows you to directly fit and center the dataset.
>> Z
[1, 2, 3]
>> MeanCenterer().fit_transform(Z)
array([-1, 0, 1])
import matplotlib.pyplot as plt
import numpy as np
X = 2 * np.random.randn(100,2) + 5
plt.scatter(X[:,0], X[:,1])
plt.grid()
plt.title('Random Gaussian data w. mean=5, sigma=2')
plt.show()
Y = MeanCenterer.fit_transform(X)
plt.scatter(Y[:,0], Y[:,1])
plt.grid()
plt.title('Data after mean centering')
plt.show()
The scikit-learn utilities
can be imported via
from mxtend.scikit-learn import ...
A feature selector for scikit-learn's Pipeline class that returns specified columns from a NumPy array; extremely useful in combination with scikit-learn's Pipeline
in cross-validation.
ColumnSelector
used in a pipeline for cross-validation on the Iris dataset.Example in Pipeline
:
from mlxtend.sklearn import ColumnSelector
from sklearn.pipeline import Pipeline
from sklearn.naive_bayes import GaussianNB
from sklearn.preprocessing import StandardScaler
clf_2col = Pipeline(steps=[
('scaler', StandardScaler()),
('reduce_dim', ColumnSelector(cols=(1,3))), # extracts column 2 and 4
('classifier', GaussianNB())
])
ColumnSelector
has a transform
method that is used to select and return columns (features) from a NumPy array so that it can be used in the Pipeline
like other transformation
classes.
### original data
print('First 3 rows before:\n', X_train[:3,:])
First 3 rows before:
[[ 4.5 2.3 1.3 0.3]
[ 6.7 3.3 5.7 2.1]
[ 5.7 3. 4.2 1.2]]
### after selection
cols = ColumnExtractor(cols=(1,3)).transform(X_train)
print('First 3 rows:\n', cols[:3,:])
First 3 rows:
[[ 2.3 0.3]
[ 3.3 2.1]
[ 3. 1.2]]
The matplotlib utilities
can be imported via
from mxtend.matplotlib import ...
A function to remove borders from matplotlib
plots.
def remove_borders(axes, left=False, bottom=False, right=True, top=True):
"""
A function to remove chartchunk from matplotlib plots, such as axes
spines, ticks, and labels.
Keyword arguments:
axes: An iterable containing plt.gca() or plt.subplot() objects, e.g. [plt.gca()].
left, bottom, right, top: Boolean to specify which plot axes to hide.
"""
Example
You can use the following command to install mlxtend
:
pip install mlxtend
or
easy_install mlxtend
Alternatively, you download the package manually from the Python Package Index https://pypi.python.org/pypi/mlxtend, unzip it, navigate into the package, and use the command:
python setup.py install