sobolev_alignment package
Contents
sobolev_alignment package#
Submodules#
sobolev_alignment.data_normalisation module#
sobolev_alignment.feature_analysis module#
FEATURE_ANALYSIS
- sobolev_alignment.feature_analysis.basis(x, k, gamma)#
-
Computed the basis function for a single gene, except offset term.
- Parameters
-
- x: np.array
-
Column vector (each row corresponds to a sample).
- k: int
-
Order to compute.
- gamma: float
-
Parameter of Matérn kernel.
- Returns
-
- np.array
-
Value of the higher order feature.
- sobolev_alignment.feature_analysis.combinatorial_product(x, idx, gamma)#
-
Compute the basis function for a single gene, except offset term.
- Parameters
-
- x: np.array
-
Data matrix with samples in the rows and genes in the columns
- idx: tuple
-
Combinations, i.e. tuple of features to take into account.
- gamma: float
-
Parameter of Matérn kernel.
- Returns
-
- scipy.sparse.csc_matrix
-
Values of the higher order feature.
- sobolev_alignment.feature_analysis.higher_order_contribution(d: int, data: numpy.array, sample_offset: numpy.array, gene_names: list, gamma: float, n_jobs: int = 1, return_matrix: bool = False)#
-
Compute the features corresponding to the Taylor expansion of the kernel.
Compute the features corresponding to the Taylor expansion of the kernel, i.e. $x_j exp^{-gamma xx^T}$ for linear features. Returns a sparse pandas DataFrame containing all the features (columns) by samples (rows). We here critically rely on the sparsity of the data-matrix to speed up computations. The current implementation is relevant in two cases: -When dimensionality is small -When data is sparse.
High-dimensional and dense data matrices would lead to a significant over-head without computational gains, and could benefit from another implementation strategy.
- Parameters
-
- d: int
-
Order of the features to compute, e.g. 1 for linear, 2 for interaction terms.
- data: np.array
-
Data to compute features on, samples in the rows and genes (features) in the columns.
- sample_offset: np.array
-
Offset of each sample from data.
- gene_names: list
-
Names of each columns in data ; corresponds to features naming.
- gamma: float
-
Value of the gamma parameter for Matérn kernel.
- n_jobs: int, default to 1
-
Number of concurrent threads to use. -1 will use all CPU cores possible. WARNING: for d >= 2 and a large number of genes, the routine can be memory-intensive and a high n_jobs could lead to crash.
- return_matrix: bool, default to False
-
If True, then returns simply the feature-matrix without feature-naming. In cases when feature names are not relevant (e.g. computing the proportion of non-linearities), return_matrix=True can help speed-up the process.
- Returns
-
- pd.DataFrame
-
Sparse dataframe with samples in the rows and named features in the columns. For instance, when d=1, returns each column of data scaled by RKHS normalisation factor and multiplied by offset value.
sobolev_alignment.generate_artificial_sample module#
GENERATE ARTIFICIAL SAMPLE
@author: Soufiane Mourragui
Generate samples using scVI decoder from a multivariate gaussian noise. This module generates the training data used to approximate the VAE encoding functions by Matérn kernel machines.
- sobolev_alignment.generate_artificial_sample.generate_samples(sample_size: int, batch_names: list, covariates_values: list, lib_size: dict, model: scvi.model._scvi.SCVI, batch_key_dict: dict, return_dist: bool = False)#
-
Generates artificial gene expression profiles.
Note to developers: this method has been designed to be used with scvi-tools classes. Other VAE implementations may break here.
- Parameters
-
- sample_size: int
-
Number of samples to generate.
- batch_names: list or np.ndarray, default to None
-
List or array with sample_size str values indicating the batch of each sample.
- covariate_values: list or np.ndarray, default to None
-
List or array with sample_size float values indicating the covariate values of each sample to generate (as for training scVI model).
- lib_size
-
Dictionary of mean library size per batch.
- model
-
scVI model which decoder is here exploited to generate samples.
- batch_key_dict
-
Dictionary linking the values of the batch (scVI) and the key used in scVI.
- return_dist: bool, default to False
-
Whether to return the distribution parameters (True) or samples from this distribution (False).
- Returns
-
- If return_dist if False, torch.Tensor (on CPU) with artificial samples in the rows.
- If return_dist if True, torch.Tensor with distribution parameters (following scVI
- order) and one torch.Tensor with artificial samples in the rows (CPU).
- sobolev_alignment.generate_artificial_sample.parallel_generate_samples(sample_size, batch_names, covariates_values, lib_size, model, batch_key_dict: Optional[dict] = None, return_dist: bool = False, batch_size=1000, n_jobs=1)#
-
Generates artificial gene expression profiles.
Wrapper of parallelize generate_samples, running several threads in parallel. <b>Note to developers</b>: this function needs to be changed if applied to other VAE model than scVI.
- Parameters
-
- sample_size: int
-
Number of samples to generate.
- batch_names: list or np.ndarray, default to None
-
List or array with sample_size str values indicating the batch of each sample.
- covariate_values: list or np.ndarray, default to None
-
List or array with sample_size float values indicating the covariate values of each sample to generate (as for training scVI model).
- lib_size
-
Dictionary of mean library size per batch.
- model
-
scVI model which decoder is here exploited to generate samples.
- batch_key_dict
-
Dictionary linking the values of the batch (scVI) and the key used in scVI.
- return_dist: bool, default to False
-
Whether to return the distribution parameters (True) or samples from this distribution (False).
- batch_size: int, default to 10**3
-
Number of sample to generate per batch.
- n_jobs: int, default to 1
-
Number of threads to launch. n_jobs=-1 will launch as many threads as there are CPUs available.
- Returns
-
- If return_dist if False, torch.Tensor (on CPU) with artificial samples in the rows.
- If return_dist if True, torch.Tensor with distribution parameters (following scVI
- order) and one torch.Tensor with artificial samples in the rows (CPU).
sobolev_alignment.interpolated_features module#
- sobolev_alignment.interpolated_features.compute_optimal_tau(PV_number, pv_projections, principal_angles, n_interpolation=100)#
-
Compute the optimal interpolation step for each PV (Grassmann interpolation).
- sobolev_alignment.interpolated_features.project_on_interpolate_PV(angle, PV_number, tau_step, pv_projections)#
-
Project data on interpolated PVs.
sobolev_alignment.kernel_operations module#
- sobolev_alignment.kernel_operations.mat_inv_sqrt(M, threshold=1e-06)#
-
Compute the inverse square root of a symmetric matrix M by SVD.
sobolev_alignment.krr_approx module#
Encoder approximation by Kernel Ridge Regression
@author: Soufiane Mourragui
This modules train a Kernel Ridge Regression (KRR) on a pair of samples (x_hat) and embeddings (z_hat) using two possible implementations: - scikit-learn: deterministic, but limited in memory and time efficiency. - Falkon: stochastic Nyström approximation, faster both in memory and computation time. Optimised for multi GPUs.
References
Mourragui et al 2022 Meanti et al, Kernel methods through the roof: handling billions of points efficiently, NeurIPS, 2020. Pedregosa et al, Scikit-learn: Machine Learning in Python, Journal of Machine Learning Research, 2011.
- class sobolev_alignment.krr_approx.KRRApprox(method: str = 'sklearn', kernel: str = 'rbf', M: int = 100, kernel_params: Optional[dict] = None, penalization: float = 1e-06, maxiter: int = 20, falkon_options: Optional[dict] = None, mean_center: bool = False, unit_std: bool = False)#
-
Bases:
object
Kernel Ridge Regression approximation.
This class contains the functions used to approximate the encoding functions of a Variational Auto Encoder (VAE) by a kernel machines by means of Kernel Ridge Regression (KRR). <br/> This class takes as input a training data and executes the learning process. The generation of artificial samples and subsequent computation of embeddings is not part of this class.
Methods
anchors
()Return anchor points used in KRR.
fit
(X, y)Train a regression model (KRR) between X and all columns of Y.
load
()Load a KRRApprox instance.
save
([folder])Save the instance
transform
(X)Apply the trained KRR models to a given data.
- anchors()#
-
Return anchor points used in KRR.
- fit(X: torch.Tensor, y: torch.Tensor)#
-
Train a regression model (KRR) between X and all columns of Y.
- Parameters
-
- X: torch.Tensor
-
Tensor containing the artificial input (x_hat), with samples in the rows.
- y: torch.Tensor
-
Tensor containing the artificial embedding (z_hat). Called y for compliance with sklearn functions.
- Returns
-
- self: fitted KRRApprox instance.
- load()#
-
Load a KRRApprox instance.
- Parameters
-
- folder: str, default to ‘.’
-
Folder path where the instance is located
- Returns
-
- KRRApprox: instance saved at the folder location.
- save(folder: str = '.')#
-
Save the instance
- Parameters
-
- folder: str, default to ‘.’
-
Folder path to use for saving the instance
- Returns
-
- True if the instance was properly saved.
- transform(X: torch.Tensor)#
-
Apply the trained KRR models to a given data. This corresponds to the out-of-sample extension.
- Parameters
-
- X: torch.Tensor
-
Tensor containing gene expression profiles with samples in the rows. <b>WARNING:</b> genes (features) need to be following the same order as the training data.
- Returns
-
- torch.Tensor with predicted values for each of the encoding functions. Samples are in the
- rows and encoding functions (embedding) in the columns.
- default_kernel_params = {'falkon': {'gaussian': {'sigma': 1}, 'laplacian': {'sigma': 1}, 'matern': {'nu': 0.5, 'sigma': 1}, 'rbf': {'sigma': 1}}, 'sklearn': {'gaussian': {}, 'laplacian': {}, 'matern': {}, 'rbf': {}}}#
- falkon_kernel = {'gaussian': <class 'falkon.kernels.distance_kernel.GaussianKernel'>, 'laplacian': <class 'falkon.kernels.distance_kernel.LaplacianKernel'>, 'matern': <class 'falkon.kernels.distance_kernel.MaternKernel'>, 'rbf': <class 'falkon.kernels.distance_kernel.GaussianKernel'>}#
- sklearn_kernel = {'gaussian': 'wrapper', 'laplacian': 'wrapper', 'matern': <class 'sklearn.gaussian_process.kernels.Matern'>, 'rbf': 'wrapper'}#
sobolev_alignment.krr_model_selection module#
<h2>Kernel Ridge Regression (KRR) model search</h2>
@author: Soufiane Mourragui
Pipeline to perform model selection for the Kernel Ridge Regression (KRR) models, employing the protocol presented in the paper, i.e.,: - Selecting sigma as the value yielding an average of 0.5 for the Gaussian kernel. - Selecting model with lowest training error on input data (trained on artificial data).
- sobolev_alignment.krr_model_selection.model_alignment_penalization(X_data: anndata._core.anndata.AnnData, data_source: str, sobolev_alignment_clf, sigma: float, optimal_nu: float, M: int = 250)#
-
$\sigma$ and $nu$ selection.
Select the optimal penalization parameter given $\sigma$ and $nu$ by aligning the data_source model to itself and measuring the principal angles. Intuitively, aligning the model to itself must yield high principal angles. Low values indicate over-fitting of the KRR.
- Parameters
-
- X_data: AnnData
-
Dataset to employ.
- data_source: str, ‘source’ or ‘target’
-
Name of the data stream in SobolevAlignment parameters.
- sobolev_alignment_clf: SobolevAlignment
-
SobolevAlignment instance with scVI models trained. Used to find optimal $nu$ parameter on the KRR regression step.
- sigma: float
-
$\sigma$ parameter in KRR.
- optimal_nu: float
-
Value of $nu$ (Falkon) to be used in the optimization. Can be established using model_selection_nu
- M: int, default to 250
-
Number of anchor points to use in the KRR approximation. A larger M typically improves the prediction, but at the cost of longer compute time and memory cost.
- Returns
-
- DataFrame with principal angles between the same models.
- sobolev_alignment.krr_model_selection.model_selection_nu(X_source: anndata._core.anndata.AnnData, X_target: anndata._core.anndata.AnnData, sobolev_alignment_clf, sigma: float, M: int = 250, test_error_size: int = - 1)#
-
Select the optimal $nu$ parameter.
Select the optimal $nu$ parameter (Matérn kernel) by measuring the Spearman correlation for different values of $nu$ and penalization, and selecting the $nu$ with the highest correlation.
- Parameters
-
- X_source: AnnData
-
Source dataset.
- X_target: AnnData
-
Target dataset.
- sobolev_alignment_clf: SobolevAlignment
-
SobolevAlignment instance with scVI models trained. Used to find optimal $nu$ parameter on the KRR regression step.
- sigma: float
-
$sigma$ parameter in KRR.
- M: int, default to 250
-
Number of anchor points to use in the KRR approximation. A larger M typically improves the prediction, but at the cost of longer compute time and memory cost.
- test_error_size: float, default to -1
-
Number of input points to be considered when computing the error. Input (X_source and X_target) are not used to train the KRR (artificial points are) and are acting as proxy for validation set. Setting test_error_size=-1 would lead to using the complete input data
- Returns
-
- DataFrame with spearman correlation on source and target data for various
- hyper-parameter values.
sobolev_alignment.multi_krr_approx module#
- class sobolev_alignment.multi_krr_approx.MultiKRRApprox#
-
Bases:
object
Multi Kernel Ridge Regression approximation.
This class contains a wrapper around KRRApprox to serialise the approximation of latent factors. Several experiments show that such approach does not yield any advantage.
Methods
add_clf
(clf)Add a classifier
anchors
()Return anchors
predict
(X)Predict latent factor values given a tensor
Process the different classifiers
transform
(X)Predict latent factor values given a tensor
- add_clf(clf)#
-
Add a classifier
- anchors()#
-
Return anchors
- predict(X: torch.Tensor)#
-
Predict latent factor values given a tensor
- process_clfs()#
-
Process the different classifiers
- transform(X: torch.Tensor)#
-
Predict latent factor values given a tensor
sobolev_alignment.scvi_model_search module#
<h2>scVI model search</h2>
@author: Soufiane Mourragui
Pipeline to perform model selection for the scVI model.
- sobolev_alignment.scvi_model_search.make_objective_function(train_data_an, test_data_an, batch_key=None, model=<class 'scvi.model._scvi.SCVI'>)#
-
Generate Hyperopt objective function.
Generate the hyperopt objective function performing, for one set of hyperparameters, the training, the evaluation on test data and summing up all the results in a dictionary usable for Hyperopt.
- Parameters
-
- train_data_an: AnnData
-
AnnData containing the train samples.
- test_data_an: AnnData
-
AnnData containing the test samples.
- batch_key: str, default to None
-
Name of the batch key to be used in scVI.
- model: default to scvi.model.SCVI
-
Model from scvi-tools to be used.
- Returns
-
- function which can be called using a dictionary of parameters.
- sobolev_alignment.scvi_model_search.model_selection(data_an: anndata._core.anndata.AnnData, batch_key: typing.Optional[str] = None, model=<class 'scvi.model._scvi.SCVI'>, space={'dispersion': <hyperopt.pyll.base.Apply object>, 'dropout_rate': <hyperopt.pyll.base.Apply object>, 'early_stopping': <hyperopt.pyll.base.Apply object>, 'gene_likelihood': <hyperopt.pyll.base.Apply object>, 'lr': <hyperopt.pyll.base.Apply object>, 'n_hidden': <hyperopt.pyll.base.Apply object>, 'n_latent': <hyperopt.pyll.base.Apply object>, 'n_layers': <hyperopt.pyll.base.Apply object>, 'reduce_lr_on_plateau': <hyperopt.pyll.base.Apply object>, 'weight_decay': <hyperopt.pyll.base.Apply object>}, max_eval=100, test_size=0.1, save=None)#
-
Model selection for scVI instances (hyper-parameter search).
Perform model selection on an scVI model by dividing a dataset into training and testing, and subsequently performing Bayesian Optimisation on the test data.
- Parameters
-
- data_an: AnnData
-
Datasets to be used in the model selection.
- batch_key: str, default to None
-
Name of the batch key to be used in scVI.
- model: default to scvi.model.SCVI
-
Model from scvi-tools to be used.
- space: dict, default to DEFAULT_HYPEROPT_SPACE
-
Dictionary with hyper-parameter space to be used in Bayesian optimisation.
- max_eval: int, default to 100
-
Number of iterations in the Bayesian optimisation procedures, i.e., number of models assessed.
- test_size: float, default to 0.1
-
Proportion of samples (cells) to be taken inside the test data.
- save: str, default to None
-
Path to save Bayesian optimisation results to. Must be a csv file. If set to None, then results are not saved.
- Returns
-
- Tuple containing:
-
- Best model given by hyperopt.
-
- DataFrame with Bayesian optimisation results.
-
- Trials instance from hyperopt.
- sobolev_alignment.scvi_model_search.split_dataset(data_an, test_size=0.1)#
-
Split between training and testing
sobolev_alignment.sobolev_alignment module#
<h2>Sobolev Alignment</h2>
@author: Soufiane Mourragui
References
Mourragui et al, Identifying commonalities between cell lines and tumors at the single cell level using Sobolev Alignment of deep generative models, Biorxiv, 2022. Lopez et al, Deep generative modeling for single-cell transcriptomics, Nature Methods, 2018. Meanti et al, Kernel methods through the roof: handling billions of points efficiently, NeurIPS, 2020.
- class sobolev_alignment.sobolev_alignment.SobolevAlignment(source_batch_name: Optional[str] = None, target_batch_name: Optional[str] = None, continuous_covariate_names: Optional[list] = None, source_scvi_params: Optional[dict] = None, target_scvi_params: Optional[dict] = None, source_krr_params: Optional[dict] = None, target_krr_params: Optional[dict] = None, n_artificial_samples: int = 1000000, n_samples_per_sample_batch: int = 1000000, frac_save_artificial: float = 0.1, save_mmap: Optional[str] = None, log_input: bool = True, n_krr_clfs: int = 1, no_posterior_collapse=True, mean_center: bool = False, unit_std: bool = False, frob_norm_source: bool = False, lib_size_norm: bool = False, n_jobs=1)#
-
Bases:
object
Sobolev Alignment implementation
Main class for Sobolev Alignment, which wraps all the different operations presented in Sobolev Alignment procedure: - Model selection (scVI and KRR) - scVI models training. - Synthetic models generations. - KRR approximation. - Alignment of KRR models.
Methods
compute_consensus_features
(X_input, n_similar_pv)Project data on interpolated consensus features.
compute_error
([size])Compute error of the KRR approximation on the input (data used for VAE training) and used for KRR.
compute_random_direction_
(K_X, K_Y, K_XY)Sample randomly two vectors and compute cosine similarity
feature_analysis
([max_order, gene_names])Launch feature analysis for a trained scVI model.
fit
(X_source, X_target[, fit_vae, ...])Runs the complete Sobolev Alignment workflow between a source (e.g.
krr_model_selection
(X_source, X_target[, M, ...])Hyper-parameters selection for KRR.
load
([with_krr, with_model])Load a Sobolev Alignment instance.
null_model_similarity
([n_iter, quantile, ...])Compute the null model for PV similarities.
plot_cosine_similarity
([folder, absolute_cos])Plot cosine similarity
plot_training_metrics
([folder])Plot the different training metric for the source and target scVI modules.
sample_random_vector_
(data_source, K)Sample a vector randomly for either source or target
save
([folder, with_krr, with_model])Save Sobolev Alignment model
scvi_model_selection
(X_source, X_target[, ...])Hyperparameter selection for scVI models.
- compute_consensus_features(X_input: dict, n_similar_pv: int, fit: bool = True, return_anndata=False)#
-
Project data on interpolated consensus features.
Project the data on interpolated features, i.e., a linear combination of source and target SPVs which best balances the effect of source and target data.
- Parameters
-
- X_input: dict
-
Dictionary of data (AnnData) to project. Two keys are needed: ‘source’ and ‘target’.
- n_similar_pv: int
-
Number of top SPVs to project the data on.
- fit: bool, default to True
-
Whether the interpolated times must be computed. If False, will use previously computed times, but will return an error if not previously fitted.
- return_anndata: bool, default to False
-
Whether the projected consensus features must be formatted as an AnnData with overlapping indices in obs. This allows downstream analysis. By default, return a DataFrame.
- Returns
-
- interpolated_proj_df: pd.DataFrame or sc.AnnData
-
DataFrame or AnnData of concatenated source and target samples after projection on consensus features.
- compute_error(size=- 1)#
-
Compute error of the KRR approximation on the input (data used for VAE training) and used for KRR.
- compute_random_direction_(K_X, K_Y, K_XY)#
-
Sample randomly two vectors and compute cosine similarity
- feature_analysis(max_order: int = 1, gene_names: Optional[list] = None)#
-
Launch feature analysis for a trained scVI model.
Computes the gene contributions (feature weights) associated with the KRRs which approximate the latent factors and the SPVs. Technically, given the kernel machine which approximates a latent factor (KRR), this method computes the weights associated with the orthonormal basis in the Gaussian-kernel associated Sobolev space.
- Parameters
-
- max_order: int, default to 1
-
Order of the features to compute. 1 corresponds to linear features (genes), two to interaction terms.
- gene_names: list of str, default to None
-
Names of the genes passed as input to Sobolev Alignment. <b>WARNING</b> Must be in the same order as the input to SobolevAlignment.fit
- fit(X_source: anndata._core.anndata.AnnData, X_target: anndata._core.anndata.AnnData, fit_vae: bool = True, krr_approx: bool = True, sample_artificial: bool = True)#
-
Runs the complete Sobolev Alignment workflow between a source (e.g. cell line) and a target (e.g. tumor) dataset.
Source and target data should be passed as AnnData and potential batch names (source_batch_name, target_batch_name) should be part of the “obs” element of X_source and X_target.
- Parameters
-
- X_source: AnnData
-
Source data.
- X_target: AnnData
-
Target data.
- fit_vae: bool, default to True
-
Whether a scVI model (VAE) should be trained. If pre-trained VAEs are available, setting the scvi_models to these models and using fit_vae=False would allow to directly use these models.
- krr_approx: bool, default to True
-
Whether the KRR approximation should be performed for source and target scVI models.
- sample_artificial: bool, default to True
-
Whether model points should be sampled. In the case when artificial samples have already been sampled and saved, setting sample_artificial=False allows to use these points without need for re-sampling.
- Returns
-
- self: fitted Sobolev Alignment instance.
- krr_model_selection(X_source: anndata._core.anndata.AnnData, X_target: anndata._core.anndata.AnnData, M: int = 1000, same_model_alignment_thresh: float = 0.9)#
-
Hyper-parameters selection for KRR.
Routine to perform Bayesian hyper-parameter optimisation for scVI model (source and target). Can be called prior to fit. Best parameters will be saved in self.scvi_params
- Parameters
-
- X_source: AnnData
-
Source dataset.
- X_target: AnnData
-
Target dataset.
- M: int, default to 1000
-
Number of anchor points to use. Larger values of M leads to a better approximation of the latent factors, but come at the price of a higher computational time and memory.
- same_model_alignment_thresh: float, default to 0.9
-
Minimum top principal angles used during same-model alignment, i.e., when source or target models are aligned to themselves.
- Returns
-
- SobolevAlignment instance.
- load(with_krr: bool = True, with_model: bool = True)#
-
Load a Sobolev Alignment instance.
- Parameters
-
- folder: str, default to ‘.’
-
Folder path where the instance is located
- with_krr: bool, default to True
-
Whether KRR approximations must be loaded.
- with_model: bool, default to True
-
Whether scvi models (VAEs) must be loaded.
- Returns
-
- SobolevAlignment: instance saved at the folder location.
- null_model_similarity(n_iter=100, quantile=0.95, return_all=False, n_jobs=1)#
-
Compute the null model for PV similarities.
- plot_cosine_similarity(folder: str = '.', absolute_cos: bool = False)#
-
Plot cosine similarity
- plot_training_metrics(folder: str = '.')#
-
Plot the different training metric for the source and target scVI modules.
- sample_random_vector_(data_source, K)#
-
Sample a vector randomly for either source or target
- save(folder: str = '.', with_krr: bool = True, with_model: bool = True)#
-
Save Sobolev Alignment model
- scvi_model_selection(X_source: anndata._core.anndata.AnnData, X_target: anndata._core.anndata.AnnData, source_batch_name: typing.Optional[str] = None, target_batch_name: typing.Optional[str] = None, model=<class 'scvi.model._scvi.SCVI'>, space: dict = {'dispersion': <hyperopt.pyll.base.Apply object>, 'dropout_rate': <hyperopt.pyll.base.Apply object>, 'early_stopping': <hyperopt.pyll.base.Apply object>, 'gene_likelihood': <hyperopt.pyll.base.Apply object>, 'lr': <hyperopt.pyll.base.Apply object>, 'n_hidden': <hyperopt.pyll.base.Apply object>, 'n_latent': <hyperopt.pyll.base.Apply object>, 'n_layers': <hyperopt.pyll.base.Apply object>, 'reduce_lr_on_plateau': <hyperopt.pyll.base.Apply object>, 'weight_decay': <hyperopt.pyll.base.Apply object>}, max_eval: int = 100, test_size: float = 0.1)#
-
Hyperparameter selection for scVI models.
Routine to perform Bayesian hyper-parameter optimisation for scVI model (source and target). Can be called prior to fit. Best parameters will be saved in self.scvi_params
- Parameters
-
- X_source: AnnData
-
Source dataset.
- X_target: AnnData
-
Target dataset.
- source_batch_name: str, default to None
-
Batch key to use in scVI for the source dataset. If None, no native batch-effect correction performed in source scVI.
- target_batch_name: str, default to None
-
Batch key to use in scVI for the target dataset. If None, no native batch-effect correction performed in target scVI.
- model: default to scvi.model.SCVI
-
scvi-tools model to be used in the analysis.
- space: dict, default to DEFAULT_HYPEROPT_SPACE
-
Hyper-parameter space to be used in Bayesian Optimisation. Default is provided in sobolev_alignment.scvi_model_search.
- max_eval: int, default to 100
-
Number of iterations in the Bayesian optimisation procedures, i.e., number of models assessed.
- test_size: float, default to 0.1
-
Proportion of samples (cells) to be taken inside the test data.
- Returns
-
- SobolevAlignment instance.
- default_scvi_params = {'model': {}, 'plan': {}, 'train': {}}#