Module nmtf.modules.nmtf_base
Non-negative matrix and tensor factorization basic functions
Functions
def NMFInit(M, Mmis, Mt0, Mw0, nc, tolerance, LogIter, myStatusBox)
-
Initialize NMF components using NNSVD
Input
M: Input matrix Mmis: Define missing values (0 = missing cell, 1 = real cell) Mt0: Initial left hand matrix (may be empty) Mw0: Initial right hand matrix (may be empty) nc: NMF rank
Output
Mt: Left hand matrix Mw: Right hand matrix
Reference
C. Boutsidis, E. Gallopoulos (2008) SVD based initialization: A head start for nonnegative matrix factorization Pattern Recognition Pattern Recognition Volume 41, Issue 4, April 2008, Pages 1350-1362
def NTFInit(M, Mmis, Mt_nmf, Mw_nmf, nc, tolerance, precision, LogIter, NTFUnimodal, NTFLeftComponents, NTFRightComponents, NTFBlockComponents, NBlocks, init_type, myStatusBox)
-
Initialize NTF components for HALS
Input
M: Input tensor Mmis: Define missing values (0 = missing cell, 1 = real cell) Mt_nmf: initialization of LHM in NMF(unstacked tensor), may be empty Mw_nmf: initialization of RHM of NMF(unstacked tensor), may be empty nc: NTF rank tolerance: Convergence threshold precision: Replace 0-values in multiplication rules LogIter: Log results through iterations NTFUnimodal: Apply Unimodal constraint on factoring vectors NTFLeftComponents: Apply Unimodal/Smooth constraint on left hand matrix NTFRightComponents: Apply Unimodal/Smooth constraint on right hand matrix NTFBlockComponents: Apply Unimodal/Smooth constraint on block hand matrix NBlocks: Number of NTF blocks init_type : integer, default 0 init_type = 0 : NMF initialization applied on the reshaped matrix [1st dim x vectorized (2nd & 3rd dim)] init_type = 1 : NMF initialization applied on the reshaped matrix [vectorized (1st & 2nd dim) x 3rd dim]
Output
Mt: Left hand matrix Mw: Right hand matrix Mb: Block hand matrix
def nmf_permutation_test_score(estimator, y, n_permutations=100, verbose=0)
-
Do a permutation test to assess association between ordered samples and some covariate
Parameters
estimator
:tuplet as returned by non_negative_factorization() and nmf_predict()
y
:array-like, group to be predicted
n_permutations
:integer
, default: 100
verbose
:integer
, default: 0
- The verbosity level (0/1).
Returns
Completed estimator with following entries:
score
:float
- The true score without permuting targets.
pvalue
:float
- The p-value, which approximates the probability that the score would be obtained by chance.
CS
:array-like, shape(n_components)
- The size of each cluster
CP
:array-like, shape(n_components)
- The pvalue of the most significant group within each cluster
CG
:array-like, shape(n_components)
- The index of the most significant group within each cluster
CN
:array-like, shape(n_components, n_groups)
- The size of each group within each cluster
def nmf_predict(estimator, leverage='robust', blocks=None, cluster_by_stability=False, custom_order=False, verbose=0)
-
Derives ordered sample and feature indexes for future use in ordered heatmaps
Parameters
estimator
:tuplet as returned by non_negative_factorization()
leverage
:None | 'standard' | 'robust'
, default'robust'
- Calculate leverage of W and H rows on each component.
blocks
:array-like, shape(n_blocks)
, defaultNone
- Size of each block (if any) in ordered heatmap.
cluster_by_stability
:boolean
, defaultFalse
- Use stability instead of leverage to assign samples/features to clusters
custom_order
:boolean
, defaultFalse
- if False samples/features with highest leverage or stability appear on top of each cluster if True within cluster ordering is modified to suggest a continuum between adjacent clusters
verbose
:integer
, default: 0
- The verbosity level (0/1).
Returns
Completed estimator with following entries:
WL
:array-like, shape (n_samples, n_components)
- Sample leverage on each component
HL
:array-like, shape (n_features, n_components)
- Feature leverage on each component
QL
:array-like, shape (n_blocks, n_components)
- Block leverage on each component (NTF only)
WR
:vector-like, shape (n_samples)
- Ranked sample indexes (by cluster and leverage or stability) Used to produce ordered heatmaps
HR
:vector-like, shape (n_features)
- Ranked feature indexes (by cluster and leverage or stability) Used to produce ordered heatmaps
WN
:vector-like, shape (n_components)
- Sample cluster bounds in ordered heatmap
HN
:vector-like, shape (n_components)
- Feature cluster bounds in ordered heatmap
WC
:vector-like, shape (n_samples)
- Sample assigned cluster
HC
:vector-like, shape (n_features)
- Feature assigned cluster
QC
:vector-like, shape (size(blocks))
- Block assigned cluster (NTF only)
def non_negative_factorization(X, W=None, H=None, n_components=None, update_W=True, update_H=True, beta_loss='frobenius', use_hals=False, n_bootstrap=None, tol=1e-06, max_iter=150, max_iter_mult=20, regularization=None, sparsity=0, leverage='standard', convex=None, kernel='linear', skewness=False, null_priors=False, random_state=None, verbose=0)
-
Compute Non-negative Matrix Factorization (NMF)
Find two non-negative matrices (W, H) such as x = W @ H.T + Error. This factorization can be used for example for dimensionality reduction, source separation or topic extraction.
The objective function is minimized with an alternating minimization of W and H.
Parameters
X
:array-like, shape (n_samples, n_features)
- Constant matrix.
W
:array-like, shape (n_samples, n_components)
- prior W If n_update_W == 0 , it is used as a constant, to solve for H only.
H
:array-like, shape (n_features, n_components)
- prior H If n_update_H = 0 , it is used as a constant, to solve for W only.
n_components
:integer
- Number of components, if n_components is not set : n_components = min(n_samples, n_features)
update_W
:boolean
, default: True
- Update or keep W fixed
update_H
:boolean
, default: True
- Update or keep H fixed
beta_loss
:string
, default'frobenius'
- String must be in {'frobenius', 'kullback-leibler'}. Beta divergence to be minimized, measuring the distance between X and the dot product WH. Note that values different from 'frobenius' (or 2) and 'kullback-leibler' (or 1) lead to significantly slower fits. Note that for beta_loss == 'kullback-leibler', the input matrix X cannot contain zeros.
use_hals
:boolean
- True -> HALS algorithm (note that convex and kullback-leibler loss opions are not supported) False-> Projected gradiant
n_bootstrap
:integer
, default: 0
- Number of bootstrap runs.
tol
:float
, default: 1e-6
- Tolerance of the stopping condition.
max_iter
:integer
, default: 200
- Maximum number of iterations.
max_iter_mult
:integer
, default: 20
- Maximum number of iterations in multiplicative warm-up to projected gradient (beta_loss = 'frobenius' only).
regularization
:None | 'components' | 'transformation'
- Select whether the regularization affects the components (H), the transformation (W) or none of them.
sparsity
:float
, default: 0
- Sparsity target with 0 <= sparsity < 1 representing either: - the % rows in W or H set to 0 (when use_hals = False) - the mean % rows per column in W or H set to 0 (when use_hals = True) sparsity == 1: adaptive sparsity through hard thresholding and hhi
leverage
:None | 'standard' | 'robust'
, default'standard'
- Calculate leverage of W and H rows on each component.
convex
:None | 'components' | 'transformation'
, defaultNone
- Apply convex constraint on W or H.
kernel
:'linear', 'quadratic', 'radial'
, default'linear'
- Can be set if convex = 'transformation'.
null_priors
:boolean
, defaultFalse
- Cells of H with prior cells = 0 will not be updated. Can be set only if prior H has been defined.
skewness
:boolean
, defaultFalse
- When solving mixture problems, columns of X at the extremities of the convex hull will be given largest weights. The column weight is a function of the skewness and its sign. The expected sign of the skewness is based on the skewness of W components, as returned by the first pass of a 2-steps convex NMF. Thus, during the first pass, skewness must be set to False. Can be set only if convex = 'transformation' and prior W and H have been defined.
random_state
:int, RandomState instance
orNone
, optional, default: None
- If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used
by
np.random
. verbose
:integer
, default: 0
- The verbosity level (0/1).
Returns
Estimator (dictionary) with following entries
W
:array-like, shape (n_samples, n_components)
- Solution to the non-negative least squares problem.
H
:array-like, shape (n_features, n_components)
- Solution to the non-negative least squares problem.
volume
:scalar, volume occupied by W and H
WB
:array-like, shape (n_samples, n_components)
- Percent consistently clustered rows for each component. only if n_bootstrap > 0.
HB
:array-like, shape (n_features, n_components)
- Percent consistently clustered columns for each component. only if n_bootstrap > 0.
B
:array-like, shape (n_observations, n_components)
or(n_features, n_components)
- only if active convex variant, H = B.T @ X or W = X @ B
diff
:Objective minimum achieved
def non_negative_tensor_factorization(X, n_blocks, W=None, H=None, Q=None, n_components=None, update_W=True, update_H=True, update_Q=True, fast_hals=True, n_iter_hals=2, n_shift=0, regularization=None, sparsity=0, unimodal=False, smooth=False, apply_left=False, apply_right=False, apply_block=False, n_bootstrap=None, tol=1e-06, max_iter=150, leverage='standard', random_state=None, init_type=0, verbose=0)
-
Compute Non-negative Tensor Factorization (NTF)
Find three non-negative matrices (W, H, F) such as x = W @@ H @@ F + Error (@@ = tensor product). This factorization can be used for example for dimensionality reduction, source separation or topic extraction.
The objective function is minimized with an alternating minimization of W and H.
Parameters
X
:array-like, shape (n_samples, n_features x n_blocks)
- Constant matrix. X is a tensor with shape (n_samples, n_features, n_blocks), however unfolded along 2nd and 3rd dimensions.
n_blocks
:integer
W
:array-like, shape (n_samples, n_components)
- prior W
H
:array-like, shape (n_features, n_components)
- prior H
Q
:array-like, shape (n_blocks, n_components)
- prior Q
n_components
:integer
- Number of components, if n_components is not set : n_components = min(n_samples, n_features)
update_W
:boolean
, default: True
- Update or keep W fixed
update_H
:boolean
, default: True
- Update or keep H fixed
update_Q
:boolean
, default: True
- Update or keep Q fixed
fast_hals
:boolean
, default: True
- Use fast implementation of HALS
n_iter_hals
:integer
, default: 2
- Number of HALS iterations prior to fast HALS
n_shift
:integer
, default: 0
- max shifting in convolutional NTF
regularization
:None | 'components' | 'transformation'
- Select whether the regularization affects the components (H), the transformation (W) or none of them.
sparsity
:float
, default: 0
- Sparsity target with 0 <= sparsity <= 1 representing the mean % rows per column in W or H set to 0
unimodal
:Boolean
, default: False
smooth
:Boolean
, default: False
apply_left
:Boolean
, default: False
apply_right
:Boolean
, default: False
apply_block
:Boolean
, default: False
n_bootstrap
:integer
, default: 0
- Number of bootstrap runs.
tol
:float
, default: 1e-6
- Tolerance of the stopping condition.
max_iter
:integer
, default: 200
- Maximum number of iterations.
leverage
:None | 'standard' | 'robust'
, default'standard'
- Calculate leverage of W and H rows on each component.
random_state
:int, RandomState instance
orNone
, optional, default: None
- If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used
by
np.random
. init_type
:integer
, default0
- init_type = 0 : NMF initialization applied on the reshaped matrix [1st dim x vectorized (2nd & 3rd dim)] init_type = 1 : NMF initialization applied on the reshaped matrix [vectorized (1st & 2nd dim) x 3rd dim]
verbose
:integer
, default: 0
- The verbosity level (0/1).
Returns
Estimator (dictionary) with following entries W : array-like, shape (n_samples, n_components) Solution to the non-negative least squares problem. H : array-like, shape (n_features, n_components) Solution to the non-negative least squares problem. Q : array-like, shape (n_blocks, n_components) Solution to the non-negative least squares problem. volume : scalar, volume occupied by W and H WB : array-like, shape (n_samples, n_components) Percent consistently clustered rows for each component. only if n_bootstrap > 0. HB : array-like, shape (n_features, n_components) Percent consistently clustered columns for each component. only if n_bootstrap > 0.
Reference
A. Cichocki, P.H.A.N. Anh-Huym, Fast local algorithms for large scale nonnegative matrix and tensor factorizations, IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 92 (3) (2009) 708–721.
def rNMFSolve(M, Mmis, Mt0, Mw0, nc, tolerance, precision, LogIter, MaxIterations, NMFAlgo, NMFFixUserLHE, NMFFixUserRHE, NMFMaxInterm, NMFSparseLevel, NMFRobustResampleColumns, NMFRobustNRuns, NMFCalculateLeverage, NMFUseRobustLeverage, NMFFindParts, NMFFindCentroids, NMFKernel, NMFReweighColumns, NMFPriors, myStatusBox)
-
Estimate left and right hand matrices (robust version)
Input
M: Input matrix Mmis: Define missing values (0 = missing cell, 1 = real cell) Mt0: Initial left hand matrix Mw0: Initial right hand matrix nc: NMF rank tolerance: Convergence threshold precision: Replace 0-values in multiplication rules LogIter: Log results through iterations MaxIterations: Max iterations NMFAlgo: =1,3: Divergence; =2,4: Least squares; NMFFixUserLHE: = 1 => fixed left hand matrix columns NMFFixUserRHE: = 1 => fixed right hand matrix columns NMFMaxInterm: Max iterations for warmup multiplication rules NMFSparseLevel: Requested sparsity in terms of relative number of rows with 0 values in right hand matrix NMFRobustResampleColumns: Resample columns during bootstrap NMFRobustNRuns: Number of bootstrap runs NMFCalculateLeverage: Calculate leverages NMFUseRobustLeverage: Calculate leverages based on robust max across factoring columns NMFFindParts: Enforce convexity on left hand matrix NMFFindCentroids: Enforce convexity on right hand matrix NMFKernel: Type of kernel used; 1: linear; 2: quadraitc; 3: radial NMFReweighColumns: Reweigh columns in 2nd step of parts-based NMF NMFPriors: Priors on right hand matrix
Output
Mt: Left hand matrix Mw: Right hand matrix MtPct: Percent robust clustered rows MwPct: Percent robust clustered columns diff: Objective minimum achieved Mh: Convexity matrix flagNonconvex: Updated non-convexity flag on left hand matrix
def rNTFSolve(M, Mmis, Mt0, Mw0, Mb0, nc, tolerance, precision, LogIter, MaxIterations, NMFFixUserLHE, NMFFixUserRHE, NMFFixUserBHE, NMFAlgo, NMFRobustNRuns, NMFCalculateLeverage, NMFUseRobustLeverage, NTFFastHALS, NTFNIterations, NMFSparseLevel, NTFUnimodal, NTFSmooth, NTFLeftComponents, NTFRightComponents, NTFBlockComponents, NBlocks, NTFNConv, NMFPriors, myStatusBox)
-
Estimate NTF matrices (robust version)
Input
M: Input matrix Mmis: Define missing values (0 = missing cell, 1 = real cell) Mt0: Initial left hand matrix Mw0: Initial right hand matrix Mb0: Initial block hand matrix nc: NTF rank tolerance: Convergence threshold precision: Replace 0-values in multiplication rules LogIter: Log results through iterations MaxIterations: Max iterations NMFFixUserLHE: fix left hand matrix columns: = 1, else = 0 NMFFixUserRHE: fix right hand matrix columns: = 1, else = 0 NMFFixUserBHE: fix block hand matrix columns: = 1, else = 0 NMFAlgo: =5: Non-robust version, =6: Robust version NMFRobustNRuns: Number of bootstrap runs NMFCalculateLeverage: Calculate leverages NMFUseRobustLeverage: Calculate leverages based on robust max across factoring columns NTFFastHALS: Use Fast HALS (does not accept handle missing values and convolution) NTFNIterations: Warmup iterations for fast HALS NMFSparseLevel : sparsity level (as defined by Hoyer); +/- = make RHE/LHe sparse NTFUnimodal: Apply Unimodal constraint on factoring vectors NTFSmooth: Apply Smooth constraint on factoring vectors NTFLeftComponents: Apply Unimodal/Smooth constraint on left hand matrix NTFRightComponents: Apply Unimodal/Smooth constraint on right hand matrix NTFBlockComponents: Apply Unimodal/Smooth constraint on block hand matrix NBlocks: Number of NTF blocks NTFNConv: Half-Size of the convolution window on 3rd-dimension of the tensor NMFPriors: Elements in Mw that should be updated (others remain 0)
Output
Mt_conv: Convolutional Left hand matrix Mt: Left hand matrix Mw: Right hand matrix Mb: Block hand matrix MtPct: Percent robust clustered rows MwPct: Percent robust clustered columns diff : Objective minimum achieved
def rSVDSolve(M, Mmis, nc, tolerance, LogIter, LogTrials, Status0, MaxIterations, SVDAlgo, SVDCoverage, SVDNTrials, myStatusBox)
-
Estimate SVD matrices (robust version)
Input
M: Input matrix Mmis: Define missing values (0 = missing cell, 1 = real cell) nc: SVD rank tolerance: Convergence threshold LogIter: Log results through iterations LogTrials: Log results through trials Status0: Initial displayed status to be updated during iterations MaxIterations: Max iterations SVDAlgo: =1: Non-robust version, =2: Robust version SVDCoverage: Coverage non-outliers (robust version) SVDNTrials: Number of trials (robust version)
Output
Mt: Left hand matrix Mev: Scaling factors Mw: Right hand matrix Mmis: Matrix of missing/flagged outliers Mmsr: Vector of Residual SSQ Mmsr2: Vector of Reidual variance
Reference
L. Liu et al (2003) Robust singular value decomposition analysis of microarray data PNAS November 11, 2003 vol. 100 no. 23 13167–13172