Documentation for TopOGraph

topo.models.TopOGraph

Convenient TopOMetry class for building, clustering and visualizing n-order topological graphs.

From data, builds a topologically-oriented basis with optimized diffusion maps or a continuous k-nearest-neighbors Laplacian Eigenmap, and from this basis learns a topological graph (using a new diffusion process or a continuous kNN kernel). This model approximates the Laplace-Beltrami Operator multiple ways by different ways, depending on the user setup. The topological graph can then be visualized in two or three dimensions with Minimum Distortion Embeddings, which also allows for flexible setup and domain-adaptation. Alternatively, users can explore multiple classes for graph layout optimization in topo.layout.

Parameters


base_knn : int (optional, default 10). Number of k-nearest-neighbors to compute the Diffusor base operator on. The adaptive kernel will normalize distances by each cell distance of its median neighbor. Nonetheless, this hyperparameter remains as an user input regarding the minimal sample neighborhood resolution that drives the computation of the diffusion metrics. For practical purposes, the minimum amount of samples one would expect to constitute a neighborhood of its own. Increasing k can generate more globally-comprehensive metrics and maps, to a certain extend, however at the expense of fine-grained resolution. More generally, consider this a calculus discretization threshold.

graph_knn : int (optional, default 10). Number of k-nearest-neighbors to compute the graph operator on. The adaptive kernel will normalize distances by each cell distance of its median neighbor. Nonetheless, this hyperparameter remains as an user input regarding the minimal sample neighborhood resolution that drives the computation of the diffusion metrics. For practical purposes, the minimum amount of samples one would expect to constitute a neighborhood of its own. Increasing k can generate more globally-comprehensive metrics and maps, to a certain extend, however at the expense of fine-grained resolution. More generally, consider this a calculus discretization threshold.

n_eigs : int (optional, default 50). Number of components to compute. This number can be iterated to get different views from data at distinct spectral resolutions. If basis is set to diffusion, this is the number of computed diffusion components. If basis is set to continuous, this is the number of computed eigenvectors of the Laplacian Eigenmaps from the continuous affinity matrix.

basis : 'diffusion' or 'continuous' (optional, default 'diffusion'). Which topological basis to build from data. If diffusion, performs an optimized, anisotropic, adaptive diffusion mapping (default). If continuous, computes affinities from continuous k-nearest-neighbors, and a topological basis from the Laplacian Eigenmaps of such metric.

graph : 'diff' or 'cknn' (optional, default 'diff'). Which topological graph to learn from the built basis. If 'diff', uses a second-order diffusion process to learn similarities and transition probabilities. If 'cknn', uses the continuous k-nearest-neighbors algorithms. Both algorithms learn graph-oriented topological metrics from the learned basis.

ann : bool (optional, default True). Whether to use approximate nearest neighbors for graph construction. If False, uses sklearn default implementation.

base_metric : str (optional, default 'cosine'). Distance metrics for building a approximate kNN graphs. Defaults to 'cosine'. Users are encouraged to explore different metrics, such as 'cosine' and 'jaccard'. The 'hamming' and 'jaccard' distances are also available for string vectors. Accepted metrics include NMSLib metrics and sklearn metrics. Some examples are:

 -'sqeuclidean'

 -'euclidean'

 -'l1'

 -'lp' - requires setting the parameter ``p``

 -'cosine'

 -'angular'

 -'negdotprod'

 -'levenshtein'

 -'hamming'

 -'jaccard'

 -'jansen-shan'

graph_metric : str (optional, default 'cosine'). Exactly the same as base_matric, but used for building the topological graph.

p : int or float (optional, default 11/16 ) P for the Lp metric, when metric='lp'. Can be fractional. The default 11/16 approximates an astroid norm with some computational efficiency (2^n bases are less painstakinly slow to compute).

transitions : bool (optional, default False) Whether to estimate the diffusion transitions graph. If True, maps a basis encoding neighborhood transitions probability during eigendecomposition. If 'False' (default), maps the diffusion kernel.

alpha : int or float (optional, default 1) Alpha in the diffusion maps literature. Controls how much the results are biased by data distribution. Defaults to 1, which is suitable for normalized data.

kernel_use : str (optional, default 'decay_adaptive') Which type of kernel to use in the diffusion approach. There are four implemented, considering the adaptive decay and the neighborhood expansion, written as 'simple', 'decay', 'simple_adaptive' and 'decay_adaptive'. The first, 'simple', is a locally-adaptive kernel similar to that proposed by Nadler et al. (https://doi.org/10.1016/j.acha.2005.07.004) and implemented in Setty et al. (https://doi.org/10.1038/s41587-019-0068-4). The 'decay' option applies an adaptive decay rate, but no neighborhood expansion. Those, followed by '_adaptive', apply the neighborhood expansion process. The default and recommended is 'decay_adaptive'. The neighborhood expansion can impact runtime, although this is not usually expressive for datasets under 10e6 samples.

transitions : bool (optional, default False). Whether to decompose the transition graph when fitting the diffusion basis. n_jobs : int. Number of threads to use in calculations. Defaults to all but one. verbose : bool (optional, default False). Controls verbosity. cache : bool (optional, default True). Whether to cache nearest-neighbors (before fit) and to store diffusion matrices after mapping (before transform).

fit(self, data)

Learn topological distances with diffusion harmonics and continuous metrics. Computes affinity operators that approximate the Laplace-Beltrami operator

Parameters

data : High-dimensional data matrix. Currently, supports only data from similar type (i.e. all bool, all float)

Returns

TopoGraph instance with several slots, populated as per user settings. If basis=diffusion, populates TopoGraph.MSDiffMap with a multiscale diffusion mapping of data, and TopoGraph.DiffBasis with a fitted topo.tpgraph.diff.Diffusor() class containing diffusion metrics and transition probabilities, respectively stored in TopoGraph.DiffBasis.K and TopoGraph.DiffBasis.T

If basis=continuous, populates TopoGraph.CLapMap with a continous Laplacian Eigenmapping of data, and TopoGraph.ContBasis with a continuous-k-nearest-neighbors model, containing continuous metrics and adjacency, respectively stored in TopoGraph.ContBasis.K and TopoGraph.ContBasis.A.

Source code in topo/models.py
def fit(self, data):
    """
    Learn topological distances with diffusion harmonics and continuous metrics. Computes affinity operators
    that approximate the Laplace-Beltrami operator

    Parameters
    ----------
    data :
        High-dimensional data matrix. Currently, supports only data from similar type (i.e. all bool, all float)

    Returns
    -------

    TopoGraph instance with several slots, populated as per user settings.
    If `basis=diffusion`, populates `TopoGraph.MSDiffMap` with a multiscale diffusion mapping of data, and
            `TopoGraph.DiffBasis` with a fitted `topo.tpgraph.diff.Diffusor()` class containing diffusion metrics
            and transition probabilities, respectively stored in TopoGraph.DiffBasis.K and TopoGraph.DiffBasis.T

    If `basis=continuous`, populates `TopoGraph.CLapMap` with a continous Laplacian Eigenmapping of data, and
            `TopoGraph.ContBasis` with a continuous-k-nearest-neighbors model, containing continuous metrics and
            adjacency, respectively stored in `TopoGraph.ContBasis.K` and `TopoGraph.ContBasis.A`.

    """
    self.N = data.shape[0]
    self.M = data.shape[1]
    if self.random_state is None:
        self.random_state = random.RandomState()
    print('Building topological basis...')
    if self.basis == 'diffusion':
        start = time.time()
        self.DiffBasis = Diffusor(n_components=self.n_eigs,
                                  n_neighbors=self.base_knn,
                                  alpha=self.alpha,
                                  n_jobs=self.n_jobs,
                                  ann=self.ann,
                                  metric=self.base_metric,
                                  p=self.p,
                                  M=self.M,
                                  efC=self.efC,
                                  efS=self.efS,
                                  kernel_use=self.kernel_use,
                                  norm=self.norm,
                                  transitions=self.transitions,
                                  eigengap=self.eigengap,
                                  verbose=self.verbose,
                                  plot_spectrum=self.plot_spectrum,
                                  cache=self.cache_base)
        self.MSDiffMap = self.DiffBasis.fit_transform(data)
        end = time.time()
        print('Topological basis fitted with diffusion mappings in %f (sec)' % (end - start))

    elif self.basis == 'continuous':
        start = time.time()
        self.ContBasis = cknn_graph(data,
                                    n_neighbors=self.base_knn,
                                    delta=self.delta,
                                    metric=self.base_metric,
                                    t=self.t,
                                    include_self=True,
                                    is_sparse=True,
                                    return_instance=True)

        self.CLapMap = spt.LapEigenmap(
            self.ContBasis.K,
            self.n_eigs,
            self.random_state,
        )
        expansion = 10.0 / np.abs(self.CLapMap).max()
        self.CLapMap = (self.CLapMap * expansion).astype(
            np.float32
        ) + self.random_state.normal(
            scale=0.0001, size=[self.ContBasis.K.shape[0], self.n_eigs]
        ).astype(
            np.float32
        )
        end = time.time()
        print('Topological basis fitted with continuous mappings in %f (sec)' % (end - start))

    return self

MAP(self, data, graph, dims=2, min_dist=0.3, spread=1.2, initial_alpha=1, n_epochs=500, metric='cosine', metric_kwds={}, output_metric='euclidean', output_metric_kwds={}, gamma=1.2, negative_sample_rate=10, init='spectral', random_state=None, euclidean_output=True, parallel=True, njobs=-1, verbose=False, densmap=False, densmap_kwds={}, output_dens=False)

""

Manifold Approximation and Projection, as proposed by Leland McInnes with an uniform distribution assumption in the seminal UMAP algorithm. Perform a fuzzy simplicial set embedding, using a specified initialisation method and then minimizing the fuzzy set cross entropy between the 1-skeletons of the high and low dimensional fuzzy simplicial sets. The fuzzy simplicial set embedding was proposed and implemented by Leland McInnes in UMAP (see umap-learn <https://github.com/lmcinnes/umap>). Here we're using it only for the projection (layout optimization) by minimizing the cross-entropy between a phenotypic map (i.e. data, TopOMetry latent mappings) and its graph topological representation.

Parameters

!!! data "array of shape (n_samples, n_features)" The source data to be embedded by UMAP. !!! graph "sparse matrix" The 1-skeleton of the high dimensional fuzzy simplicial set as represented by a graph for which we require a sparse matrix for the (weighted) adjacency matrix. !!! n_components "int" The dimensionality of the euclidean space into which to embed the data. !!! initial_alpha "float" Initial learning rate for the SGD. !!! a "float" Parameter of differentiable approximation of right adjoint functor !!! b "float" Parameter of differentiable approximation of right adjoint functor !!! gamma "float" Weight to apply to negative samples. !!! negative_sample_rate "int (optional, default 5)" The number of negative samples to select per positive sample in the optimization process. Increasing this value will result in greater repulsive force being applied, greater optimization cost, but slightly more accuracy. !!! n_epochs "int (optional, default 0)" The number of training epochs to be used in optimizing the low dimensional embedding. Larger values result in more accurate embeddings. If 0 is specified a value will be selected based on the size of the input dataset (200 for large datasets, 500 for small). !!! init "string" How to initialize the low dimensional embedding. Options are: * 'spectral': use a spectral embedding of the fuzzy 1-skeleton * 'random': assign initial embedding positions at random. * A numpy array of initial embedding positions. !!! random_state "numpy RandomState or equivalent" A state capable being used as a numpy random state. !!! metric "string or callable" The metric used to measure distance in high dimensional space; used if multiple connected components need to be layed out. !!! metric_kwds "dict" Key word arguments to be passed to the metric function; used if multiple connected components need to be layed out. !!! densmap "bool" Whether to use the density-augmented objective function to optimize the embedding according to the densMAP algorithm. !!! densmap_kwds "dict" Key word arguments to be used by the densMAP optimization. !!! output_dens "bool" Whether to output local radii in the original data and the embedding. !!! output_metric "function" Function returning the distance between two points in embedding space and the gradient of the distance wrt the first argument. !!! output_metric_kwds "dict" Key word arguments to be passed to the output_metric function. !!! euclidean_output "bool" Whether to use the faster code specialised for euclidean output metrics !!! parallel "bool (optional, default False)" Whether to run the computation using numba parallel. Running in parallel is non-deterministic, and is not used if a random seed has been set, to ensure reproducibility. !!! return_init "bool , (optional, default False)" Whether to also return the multicomponent spectral initialization. !!! verbose "bool (optional, default False)" Whether to report information on the current progress of the algorithm. Returns


!!! embedding "array of shape (n_samples, n_components)" The optimized of graph into an n_components dimensional euclidean space. !!! aux_data "dict" Auxiliary dictionary output returned with the embedding. aux_data['Y_init']: array of shape (n_samples, n_components) The spectral initialization of graph into an n_components dimensional euclidean space.

When densMAP extension is turned on, this dictionary includes local radii in the original
data (``aux_data['rad_orig']``) and in the embedding (``aux_data['rad_emb']``).
Source code in topo/models.py
def MAP(self, data, graph,
        dims=2,
        min_dist=0.3,
        spread=1.2,
        initial_alpha=1,
        n_epochs=500,
        metric='cosine',
        metric_kwds={},
        output_metric='euclidean',
        output_metric_kwds={},
        gamma=1.2,
        negative_sample_rate=10,
        init='spectral',
        random_state=None,
        euclidean_output=True,
        parallel=True,
        njobs=-1,
        verbose=False,
        densmap=False,
        densmap_kwds={},
        output_dens=False,
        ):
    """""

    Manifold Approximation and Projection, as proposed by Leland McInnes with an uniform distribution assumption in
    the seminal [UMAP algorithm](https://umap-learn.readthedocs.io/en/latest/index.html). Perform a fuzzy simplicial set embedding, using a
    specified initialisation method and then minimizing the fuzzy set cross entropy between the 1-skeletons of the high
    and low dimensional fuzzy simplicial sets. The fuzzy simplicial set embedding was proposed and implemented by
    Leland McInnes in UMAP (see `umap-learn <https://github.com/lmcinnes/umap>`). Here we're using it only for the
    projection (layout optimization) by minimizing the cross-entropy between a phenotypic map (i.e. data, TopOMetry latent mappings)
    and its graph topological representation.


    Parameters
    ----------
    data: array of shape (n_samples, n_features)
        The source data to be embedded by UMAP.
    graph: sparse matrix
        The 1-skeleton of the high dimensional fuzzy simplicial set as
        represented by a graph for which we require a sparse matrix for the
        (weighted) adjacency matrix.
    n_components: int
        The dimensionality of the euclidean space into which to embed the data.
    initial_alpha: float
        Initial learning rate for the SGD.
    a: float
        Parameter of differentiable approximation of right adjoint functor
    b: float
        Parameter of differentiable approximation of right adjoint functor
    gamma: float
        Weight to apply to negative samples.
    negative_sample_rate: int (optional, default 5)
        The number of negative samples to select per positive sample
        in the optimization process. Increasing this value will result
        in greater repulsive force being applied, greater optimization
        cost, but slightly more accuracy.
    n_epochs: int (optional, default 0)
        The number of training epochs to be used in optimizing the
        low dimensional embedding. Larger values result in more accurate
        embeddings. If 0 is specified a value will be selected based on
        the size of the input dataset (200 for large datasets, 500 for small).
    init: string
        How to initialize the low dimensional embedding. Options are:
            * 'spectral': use a spectral embedding of the fuzzy 1-skeleton
            * 'random': assign initial embedding positions at random.
            * A numpy array of initial embedding positions.
    random_state: numpy RandomState or equivalent
        A state capable being used as a numpy random state.
    metric: string or callable
        The metric used to measure distance in high dimensional space; used if
        multiple connected components need to be layed out.
    metric_kwds: dict
        Key word arguments to be passed to the metric function; used if
        multiple connected components need to be layed out.
    densmap: bool
        Whether to use the density-augmented objective function to optimize
        the embedding according to the densMAP algorithm.
    densmap_kwds: dict
        Key word arguments to be used by the densMAP optimization.
    output_dens: bool
        Whether to output local radii in the original data and the embedding.
    output_metric: function
        Function returning the distance between two points in embedding space and
        the gradient of the distance wrt the first argument.
    output_metric_kwds: dict
        Key word arguments to be passed to the output_metric function.
    euclidean_output: bool
        Whether to use the faster code specialised for euclidean output metrics
    parallel: bool (optional, default False)
        Whether to run the computation using numba parallel.
        Running in parallel is non-deterministic, and is not used
        if a random seed has been set, to ensure reproducibility.
    return_init: bool , (optional, default False)
        Whether to also return the multicomponent spectral initialization.
    verbose: bool (optional, default False)
        Whether to report information on the current progress of the algorithm.
    Returns
    -------
    embedding: array of shape (n_samples, n_components)
        The optimized of ``graph`` into an ``n_components`` dimensional
        euclidean space.
    aux_data: dict
        Auxiliary dictionary output returned with the embedding.
        ``aux_data['Y_init']``: array of shape (n_samples, n_components)
            The spectral initialization of ``graph`` into an ``n_components`` dimensional
            euclidean space.

        When densMAP extension is turned on, this dictionary includes local radii in the original
        data (``aux_data['rad_orig']``) and in the embedding (``aux_data['rad_emb']``).


    """""

    start = time.time()
    results = uni.fuzzy_embedding(data, graph,
                                  n_components=dims,
                                  initial_alpha=initial_alpha,
                                  min_dist=min_dist,
                                  spread=spread,
                                  n_epochs=n_epochs,
                                  metric=metric,
                                  metric_kwds=metric_kwds,
                                  output_metric=output_metric,
                                  output_metric_kwds=output_metric_kwds,
                                  gamma=gamma,
                                  negative_sample_rate=negative_sample_rate,
                                  init=init,
                                  random_state=random_state,
                                  euclidean_output=euclidean_output,
                                  parallel=parallel,
                                  njobs=njobs,
                                  verbose=verbose,
                                  a=None,
                                  b=None,
                                  densmap=densmap,
                                  densmap_kwds=densmap_kwds,
                                  output_dens=output_dens)

    end = time.time()
    print('Fuzzy layout optimization embedding in = %f (sec)' % (end - start))
    return results

MDE(self, target, data=None, dim=2, n_neighbors=None, type='isomorphic', constraint='standardized', init='quadratic', attractive_penalty=<class 'pymde.functions.penalties.Log1p'>, repulsive_penalty=<class 'pymde.functions.penalties.Log'>, loss=<class 'pymde.functions.losses.Absolute'>, repulsive_fraction=None, max_distance=None, device='cpu', verbose=False)

This function constructs an MDE problem for preserving the structure of original data. This MDE problem is well-suited for visualization (using dim 2 or 3), but can also be used to generate features for machine learning tasks (with dim = 10, 50, or 100, for example). It yields embeddings in which similar items are near each other, and dissimilar items are not near each other. The original data can either be a data matrix, or a graph. Data matrices should be torch Tensors, NumPy arrays, or scipy sparse matrices; graphs should be instances of pymde.Graph. The MDE problem uses distortion functions derived from weights (i.e., penalties). To obtain an embedding, call the embed method on the returned MDE object. To plot it, use pymde.plot.

Parameters

data : torch.Tensor, numpy.ndarray, scipy.sparse matrix or pymde.Graph. The original data, a data matrix of shape (n_items, n_features) or a graph. Neighbors are computed using Euclidean distance if the data is a matrix, or the shortest-path metric if the data is a graph. dim : int. The embedding dimension. Use 2 or 3 for visualization. attractive_penalty : pymde.Function class (or factory). Callable that constructs a distortion function, given positive weights. Typically one of the classes from pymde.penalties, such as pymde.penalties.log1p, pymde.penalties.Huber, or pymde.penalties.Quadratic. repulsive_penalty : pymde.Function class (or factory). Callable that constructs a distortion function, given negative weights. (If None, only positive weights are used.) For example, pymde.penalties.Log or pymde.penalties.InversePower. constraint : str (optional), default 'standardized'. Constraint to use when optimizing the embedding. Options are 'standardized', 'centered', None or a pymde.constraints.Constraint() function. n_neighbors : int (optional) The number of nearest neighbors to compute for each row (item) of data. A sensible value is chosen by default, depending on the number of items. repulsive_fraction : float (optional) How many repulsive edges to include, relative to the number of attractive edges. 1 means as many repulsive edges as attractive edges. The higher this number, the more uniformly spread out the embedding will be. Defaults to 0.5 for standardized embeddings, and 1 otherwise. (If repulsive_penalty is None, this argument is ignored.) max_distance : float (optional) If not None, neighborhoods are restricted to have a radius no greater than max_distance. init : str or np.ndarray (optional, default 'quadratic') Initialization strategy; np.ndarray, 'quadratic' or 'random'. device : str (optional) Device for the embedding (eg, 'cpu', 'cuda'). verbose : bool If True, print verbose output.

Returns

torch.tensor A pymde.MDE object, based on the original data.

Source code in topo/models.py
def MDE(self, target, data=None,
        dim=2,
        n_neighbors=None,
        type='isomorphic',
        constraint='standardized',
        init='quadratic',
        attractive_penalty=penalties.Log1p,
        repulsive_penalty=penalties.Log,
        loss=losses.Absolute,
        repulsive_fraction=None,
        max_distance=None,
        device='cpu',
        verbose=False
        ):
    """
    This function constructs an MDE problem for preserving the
    structure of original data. This MDE problem is well-suited for
    visualization (using ``dim`` 2 or 3), but can also be used to
    generate features for machine learning tasks (with ``dim`` = 10,
    50, or 100, for example). It yields embeddings in which similar items
    are near each other, and dissimilar items are not near each other.
    The original data can either be a data matrix, or a graph.
    Data matrices should be torch Tensors, NumPy arrays, or scipy sparse
    matrices; graphs should be instances of ``pymde.Graph``.
    The MDE problem uses distortion functions derived from weights (i.e.,
    penalties).
    To obtain an embedding, call the ``embed`` method on the returned ``MDE``
    object. To plot it, use ``pymde.plot``.




    Parameters
    ----------
    data : torch.Tensor, numpy.ndarray, scipy.sparse matrix or pymde.Graph.
        The original data, a data matrix of shape ``(n_items, n_features)`` or
        a graph. Neighbors are computed using Euclidean distance if the data is
        a matrix, or the shortest-path metric if the data is a graph.
    dim : int.
        The embedding dimension. Use 2 or 3 for visualization.
    attractive_penalty : pymde.Function class (or factory).
        Callable that constructs a distortion function, given positive
        weights. Typically one of the classes from ``pymde.penalties``,
        such as ``pymde.penalties.log1p``, ``pymde.penalties.Huber``, or
        ``pymde.penalties.Quadratic``.
    repulsive_penalty : pymde.Function class (or factory).
        Callable that constructs a distortion function, given negative
        weights. (If ``None``, only positive weights are used.) For example,
        ``pymde.penalties.Log`` or ``pymde.penalties.InversePower``.
    constraint : str (optional), default 'standardized'.
        Constraint to use when optimizing the embedding. Options are 'standardized',
        'centered', `None` or a `pymde.constraints.Constraint()` function.
    n_neighbors : int (optional)
        The number of nearest neighbors to compute for each row (item) of
        ``data``. A sensible value is chosen by default, depending on the
        number of items.
    repulsive_fraction : float (optional)
        How many repulsive edges to include, relative to the number
        of attractive edges. ``1`` means as many repulsive edges as attractive
        edges. The higher this number, the more uniformly spread out the
        embedding will be. Defaults to ``0.5`` for standardized embeddings, and
        ``1`` otherwise. (If ``repulsive_penalty`` is ``None``, this argument
        is ignored.)
    max_distance : float (optional)
        If not None, neighborhoods are restricted to have a radius
        no greater than ``max_distance``.
    init : str or np.ndarray (optional, default 'quadratic')
        Initialization strategy; np.ndarray, 'quadratic' or 'random'.
    device : str (optional)
        Device for the embedding (eg, 'cpu', 'cuda').
    verbose : bool
        If ``True``, print verbose output.

    Returns
    -------
    torch.tensor
        A ``pymde.MDE`` object, based on the original data.
    """
    graph = Graph(target)

    if init == 'spectral':
        if data is None:
            print('Spectral initialization requires input data as argument. Falling back to quadratic...')
            init = 'quadratic'
        else:
            init = self.spectral_layout(data, dim)
    if constraint == 'standardized':
        constraint_use = constraints.Standardized()
    elif constraint == 'centered':
        constraint_use = constraints.Centered()
    elif isinstance(constraint, constraints.Constraint()):
        constraint_use = constraint
    else:
        constraint_use = None
    if type == 'isomorphic':
        emb = mde.IsomorphicMDE(graph,
                                attractive_penalty=attractive_penalty,
                                repulsive_penalty=repulsive_penalty,
                                embedding_dim=dim,
                                constraint=constraint_use,
                                n_neighbors=n_neighbors,
                                repulsive_fraction=repulsive_fraction,
                                max_distance=max_distance,
                                init=init,
                                device=device,
                                verbose=verbose)

    elif type == 'isometric':
        if max_distance is None:
            max_distance = 5e7
        emb = mde.IsometricMDE(graph,
                               embedding_dim=dim,
                               loss=loss,
                               constraint=constraint_use,
                               max_distances=max_distance,
                               device=device,
                               verbose=verbose
                               )

    return np.array(emb)

spectral_layout(self, data, target, dim=2)

Performs a multicomponent spectral layout of the data and the target similarity matrix.

Parameters

data : input data target : scipy.sparse.csr.csr_matrix. target similarity matrix. dim : int (optional, default 2) number of dimensions to embed into.

Returns

np.ndarray containing the resulting embedding.

Source code in topo/models.py
def spectral_layout(self, data, target, dim=2):
    """

    Performs a multicomponent spectral layout of the data and the target similarity matrix.

    Parameters
    ----------
    data :
        input data
    target : scipy.sparse.csr.csr_matrix.
        target similarity matrix.
    dim : int (optional, default 2)
        number of dimensions to embed into.

    Returns
    -------
    np.ndarray containing the resulting embedding.

    """



    if self.basis == 'diffusion':
        spt_layout = spt.spectral_layout(
            data,
            self.DiffBasis.T,
            dim,
            self.random_state,
            metric="precomputed",
        )
        expansion = 10.0 / np.abs(spt_layout).max()
        spt_layout = (spt_layout * expansion).astype(
            np.float32
        ) + self.random_state.normal(
            scale=0.0001, size=[self.DiffBasis.T.shape[0], dim]
        ).astype(
            np.float32
        )
    elif self.basis == 'continuous':
        spt_layout = spt.LapEigenmap(
            self.ContBasis.K,
            dim,
            self.random_state,
            metric="precomputed",
        )
        expansion = 10.0 / np.abs(spt_layout).max()
        spt_layout = (spt_layout * expansion).astype(
            np.float32
        ) + self.random_state.normal(
            scale=0.0001, size=[self.ContBasis.K.shape[0], dim]
        ).astype(
            np.float32
        )
    return spt_layout

transform(self, base)

Learns new affinity, topological operators from chosen basis.

Parameters

self : TopOGraph instance.

base : str, optional. Base to use when building the topological graph. Defaults to the active base ( TopOGraph.basis)

Returns

scipy.sparse.csr.csr_matrix, containing the similarity matrix that encodes the topological graph.

Source code in topo/models.py
def transform(self, base):
    """
    Learns new affinity, topological operators from chosen basis.

    Parameters
    ----------
    self :
        TopOGraph instance.

    base : str, optional.
        Base to use when building the topological graph. Defaults to the active base ( `TopOGraph.basis`)


    Returns
    -------
    scipy.sparse.csr.csr_matrix, containing the similarity matrix that encodes the topological graph.

    """
    if base is not None:
        self.basis = base
    print('Building topological graph...')
    start = time.time()
    if self.basis == 'continuous':
        use_basis = self.CLapMap
    elif self.basis == 'diffusion':
        use_basis = self.MSDiffMap
    if self.graph == 'diff':
        DiffGraph = Diffusor(n_neighbors=self.graph_knn,
                             alpha=self.alpha,
                             n_jobs=self.n_jobs,
                             ann=self.ann,
                             metric=self.graph_metric,
                             p=self.p,
                             M=self.M,
                             efC=self.efC,
                             efS=self.efS,
                             kernel_use='simple',
                             norm=self.norm,
                             transitions=self.transitions,
                             eigengap=self.eigengap,
                             verbose=self.verbose,
                             plot_spectrum=self.plot_spectrum,
                             cache=False
                             ).fit(use_basis)
        if self.cache_graph:
            self.DiffGraph = DiffGraph.T
    if self.graph == 'cknn':
        CknnGraph = cknn_graph(use_basis,
                               n_neighbors=self.graph_knn,
                               delta=self.delta,
                               metric=self.graph_metric,
                               t=self.t,
                               include_self=True,
                               is_sparse=True)
        if self.cache_graph:
            self.CknnGraph = CknnGraph
    end = time.time()
    print('Topological graph extracted in = %f (sec)' % (end - start))
    if self.graph == 'diff':
        return DiffGraph.T
    elif self.graph == 'cknn':
        return CknnGraph
    else:
        return self