snpio.plotting package

Submodules

snpio.plotting.plotting module

class snpio.plotting.plotting.Plotting(genotype_data, show=None, plot_format=None, dpi=None, plot_fontsize=None, plot_title_fontsize=None, despine=None, verbose=None, debug=None)[source]

Bases: object

Class containing various methods for generating plots based on genotype data.

genotype_data

Initialized GenotypeData object containing necessary data.

Type:

GenotypeData

prefix

Prefix string for output directories and files.

Type:

str

output_dir

Output directory for saving plots.

Type:

Path

show

Whether to display the plots.

Type:

bool

plot_format

Format in which to save the plots.

Type:

str

dpi

Resolution of the saved plots.

Type:

int

plot_fontsize

Font size for the plot labels.

Type:

int

plot_title_fontsize

Font size for the plot titles.

Type:

int

despine

Whether to remove the top and right plot axis spines.

Type:

bool

verbose

Whether to enable verbose logging.

Type:

bool

debug

Whether to enable debug logging.

Type:

bool

logger

Logger object for logging messages.

Type:

logging.Logger

boolean_filter_methods

List of boolean filter methods.

Type:

list

missing_filter_methods

List of missing data filter methods.

Type:

list

maf_filter_methods

List of MAF filter methods.

Type:

list

mpl_params

Default Matplotlib parameters for the plots.

Type:

dict

plot_sankey_filtering_report()[source]

Plot a Sankey diagram for the filtering report.

plot_pca()[source]

Plot a PCA scatter plot with 2 or 3 dimensions, colored by missing data proportions, and labeled by population with symbols for each sample.

plot_summary_statistics()[source]

Plot summary statistics per sample and per population on the same figure. The summary statistics are plotted as lines for each statistic (Ho, He, Pi, Fst).

plot_dapc()[source]

Plot a DAPC scatter plot. with 2 or 3 dimensions, colored by population, and labeled by population with symbols for each sample.

plot_sfs()[source]

Plot a heatmap for the 2D SFS between two given populations and bar plots for the 1D SFS of each population. Not yet implemented.

plot_joint_sfs_grid()[source]

Plot the joint SFS between all possible pairs of populations in the popmap file in a grid layout. Not yet implemented.

_set_logger()

Set the logger object based on the debug attribute. If debug is True, the logger will log debug messages.

_get_attribute_value()[source]

Determine the value for an attribute based on the provided argument, genotype_data attribute, or default value. If a value is provided during initialization, it is used. Otherwise, the genotype_data attribute is used if available. If neither is available, the default value is used.

_plot_summary_statistics_per_sample()[source]

Plot summary statistics per sample. If an axis is provided, the plot is drawn on that axis.

_plot_summary_statistics_per_population()[source]

Plot summary statistics per population. If an axis is provided, the plot is drawn on that axis.

_plot_summary_statistics_per_population_grid()[source]

Plot summary statistics per population using a Seaborn PairGrid plot. Not yet implemented.

_plot_summary_statistics_per_sample_grid()[source]

Plot summary statistics per sample using a Seaborn PairGrid plot. Not yet implemented.

_plot_dapc_cv()[source]

Plot the DAPC cross-validation results. Not yet implemented.

plot_dapc(dapc, alignment, popmap, dimensions=2)[source]

Plot a DAPC scatter plot.

This method plots a DAPC scatter plot with 2 or 3 dimensions, colored by population, and labeled by population with symbols for each sample. The plot is saved to a file. If the show attribute is True, the plot is displayed. The plot is saved to the output_dir directory with the filename: <prefix>_output/gtdata/plots/dapc_plot.{plot_format}. The plot is saved in the format specified by the plot_format attribute.

Parameters:
  • dapc (sklearn.discriminant_analysis.LinearDiscriminantAnalysis) – The fitted DAPC object used for dimensionality reduction and transformation.

  • alignment (numpy.ndarray) – The genotype data in the form of a numpy array.

  • popmap (pd.DataFrame) – The DataFrame containing the population mapping information, with columns “SampleID” and “PopulationID”.

  • dimensions (int, optional) – Number of dimensions to plot (2 or 3). Defaults to 2.

Returns:

The DAPC scatter plot is saved to a file.

Return type:

None

Raises:

ValueError – Raised if the dimensions argument is neither 2 nor 3.

Note

  • The DAPC object must be fitted before calling this method.

  • The DAPC object must be fitted using the genotype data provided in the alignment argument.

  • The popmap DataFrame must contain the population mapping information with columns “SampleID” and “PopulationID”.

  • The dimensions argument must be either 2 or 3.

  • The plot is saved to a file in the output_dir directory.

  • The plot is displayed if the show attribute is True.

  • The plot is saved in the format specified by the plot_format attribute.

  • The plot is saved with the filename: <prefix>_output/gtdata/plots/dapc_plot.{plot_format}.

  • The plot is colored by population and labeled by population with symbols for each sample.

plot_filter_report(df)[source]

Plot the filter report.

This method plots the filter report data. The filter report data contains the proportion of loci removed and kept for each filtering threshold. The plot is saved to a file.

Parameters:

df (pd.DataFrame) – The dataframe containing the filter report data.

Returns:

A plot is saved to a file.

Return type:

None

Raises:

ValueError – Raised if the input dataframe is empty.

Note

  • The input dataframe must contain the filter report data.

  • The input dataframe must contain the filter report data for the removed and kept loci.

  • The input dataframe must contain the filter report data for the filtering threshold.

  • The input dataframe must contain the filter report data for the filtering method.

  • The input dataframe must contain the filter report data for the filtering step.

  • The input dataframe must contain the filter report data for the removed and kept loci counts.

  • The input dataframe must contain the filter report data for the removed and kept loci proportions.

  • The input dataframe must contain the filter report data for the filtering thresholds.

  • The input dataframe must contain the filter report data for the missing data threshold.

  • The input dataframe must contain the filter report data for the MAF threshold.

  • The input dataframe must contain the filter report data for the MAC threshold.

  • The input dataframe must contain the filter report data for the boolean threshold.

plot_gt_distribution(df, annotation_size=15)[source]

Plot the distribution of genotype counts.

Parameters:
  • df (pd.DataFrame) – The input dataframe containing the genotype counts.

  • annotation_size (int, optional) – The font size for count annotations. Defaults to 15.

Returns:

A plot is saved to a file.

Return type:

None

Raises:

ValueError – Raised if the input dataframe is empty.

Note

  • The input dataframe must contain the genotype counts.

  • The input dataframe must have the genotype counts as columns.

  • The input dataframe must have the genotype counts as rows.

  • The input dataframe must have the genotype counts as values.

  • The input dataframe must have the genotype counts as integers.

  • The input dataframe must have the genotype counts as strings.

  • The input dataframe must have the genotype counts as IUPAC codes.

plot_joint_sfs_grid(pop_gen_stats, populations, savefig=True)[source]

Plot the joint SFS between all possible pairs of populations in the popmap file in a grid layout.

This method plots the joint SFS between all possible pairs of populations in the popmap file in a grid layout. The joint SFS is calculated using the calculate_2d_sfs method. The joint SFS is plotted as a heatmap for each pair of populations. The heatmap is colored using a custom colormap. The custom colormap is created using the colors “white”, “green”, “yellow”, “orange”, and “red”. The number of colors in the colormap is set to 50. The custom colormap is used to color the joint SFS heatmap. The custom colormap is displayed in the colorbar. The custom colormap is saved to a file in the output_dir directory. The custom colormap is displayed if the show attribute is True. The custom colormap is saved in the format specified by the plot_format attribute. The plot is saved to a file in the output_dir directory. If the show attribute is True, the plot is displayed. The plot is saved in the format specified by the plot_format attribute. The plot is saved with the filename: <prefix>_output/gtdata/plots/joint_sfs_grid.{plot_format}.

Note

  • This method is not yet implemented.

Parameters:
  • pop_gen_stats (PopGenStatistics) – An instance of the PopGenStatistics class.

  • populations (List[Union[str, int]]) – A list of population names.

  • savefig (bool, optional) – Whether to save the figure to a file. Defaults to True. If True, the figure will be saved to a file.

Return type:

None

plot_pca(pca, alignment, popmap, dimensions=2)[source]

Plot a PCA scatter plot.

This method plots a PCA scatter plot with 2 or 3 dimensions, colored by population, and labeled by population with symbols for each sample. The plot is saved to a file. If the show attribute is True, the plot is displayed. The plot is saved to the output_dir directory with the filename: <prefix>_output/gtdata/plots/pca_plot.{plot_format}. The plot is saved in the format specified by the plot_format attribute.

Note

  • The PCA object must be fitted before calling this method.

  • The PCA object must be fitted using the genotype data provided in the alignment argument.

  • The popmap DataFrame must contain the population mapping information with columns “SampleID” and “PopulationID”.

  • The dimensions argument must be either 2 or 3.

  • The plot is saved to a file in the output_dir directory.

  • The plot is displayed if the show attribute is True.

  • The plot is saved in the format specified by the plot_format attribute.

  • The plot is saved with the filename: <prefix>_output/gtdata/plots/pca_plot.{plot_format}.

  • The plot is colored by population and labeled by population with symbols for each sample.

Parameters:
  • pca (sklearn.decomposition.PCA) – The fitted PCA object. The fitted PCA object used for dimensionality reduction and transformation.

  • alignment (numpy.ndarray) – The genotype data used for PCA. The genotype data in the form of a numpy array.

  • popmap (pd.DataFrame) – The DataFrame containing the population mapping information, with columns “SampleID” and “PopulationID”.

  • dimensions (int, optional) – Number of dimensions to plot (2 or 3). Defaults to 2.

Raises:

ValueError – Raised if the dimensions argument is neither 2 nor 3.

Returns:

The PCA scatter plot is saved to a file.

Return type:

None

plot_performance(resource_data, color='#8C56E3', figsize=(18, 10))[source]

Plots the performance metrics: CPU Load, Memory Footprint, and Execution Time using boxplots.

This function takes a dictionary of performance data and plots the metrics for each method using boxplots to show variability. The resulting plots are saved in a file of the specified format. The plot shows the CPU Load, Memory Footprint, and Execution Time for each method. The plot is colored based on the specified color.

Parameters:
  • resource_data (Dict[str, List[Union[int, float]]]) – Dictionary with performance data. Keys are method names, and values are lists of dictionaries with keys ‘cpu_load’, ‘memory_footprint’, and ‘execution_time’.

  • color (str, optional) – Color to be used in the plot. Should be a valid color string. Defaults to “#8C56E3”.

  • figsize (Tuple[int, int], optional) – Size of the figure. Should be a tuple of 2 integers. Defaults to (18, 10).

Return type:

None

Returns:

None. The function saves the plot to a file.

Raises:

ValueError – Raised if the input data is not a dictionary.

Note

  • The performance data should be in the format of a dictionary with method names as keys and lists of dictionaries as values. Each dictionary should have keys ‘cpu_load’, ‘memory_footprint’, and ‘execution_time’.

  • The plot will be saved in the ‘<prefix>_output/gtdata/plots/performance’ directory.

  • Supported image formats include: “pdf”, “svg”, “png”, and “jpeg” (or “jpg”).

plot_pop_counts(populations)[source]

Plot the population counts.

This function takes a series of population data and plots the counts and proportions of each population ID. The resulting plot is saved to a file of the specified format. The plot shows the counts and proportions of each population ID. The plot is colored based on the median count and proportion.

Parameters:

populations (pd.Series) – The series containing population data.

Returns:

A plot is saved to a file.

Return type:

None

Raises:

ValueError – Raised if the input data is not a pandas Series.

Note

  • The population data should be in the format of a pandas Series.

  • The plot will be saved in the ‘<prefix>_output/gtdata/plots’ directory.

  • Supported image formats include: “pdf”, “svg”, “png”, and “jpeg” (or “jpg”).

  • The plot will be colored based on the median count and proportion.

  • The plot will show the counts and proportions of each population ID.

  • The plot will show the counts and proportions of each population ID.

plot_sankey_filtering_report(df, search_mode=False)[source]

Plot a Sankey diagram for the filtering report.

This method plots a Sankey diagram for the filtering report. The Sankey diagram shows the flow of loci through the filtering steps. The loci are filtered based on the missing data proportion, MAF, MAC, and other filtering thresholds. The Sankey diagram shows the number of loci kept and removed at each step. The Sankey diagram is saved to a file. If the show attribute is True, the plot is displayed. The plot is saved to the output_dir directory with the filename: <prefix>_output/nremover/plots/sankey_plot_{thresholds}.{plot_format}. The plot is saved in the format specified by the plot_format attribute.

Parameters:
  • df (pd.DataFrame) – The input DataFrame containing the filtering report.

  • search_mode (bool, optional) – Whether the Sankey diagram is being plotted in search mode. Defaults to False.

Returns:

A plot is saved to a file.

Return type:

None

Raises:
  • ValueError – Raised if the input DataFrame is empty.

  • ValueError – Raised if multiple threshold combinations are detected when attempting to plot the Sankey diagram.

Note

The Sankey diagram shows the flow of loci through the filtering steps.

The loci are filtered based on the missing data proportion, MAF, MAC, and other filtering thresholds.

The Sankey diagram shows the number of loci kept and removed at each step.

The Sankey diagram is saved to a file in the output_dir directory.

The Sankey diagram is displayed if the show attribute is True.

The Sankey diagram is saved in the format specified by the plot_format attribute.

The Sankey diagram is saved with the filename: <prefix>_output/nremover/plots/sankey_plot_{thresholds}.{plot_format}.

The Sankey diagram is colored based on the kept and removed loci.

The kept loci are colored green, and the removed loci are colored red.

The Sankey diagram is plotted using the Bokeh plotting library.

The Sankey diagram is plotted using the Holoviews library.

The Sankey diagram is plotted using the Bokeh extension for Holoviews.

The Sankey diagram is plotted using the hv.Sankey method from Holoviews.

The Sankey diagram is plotted with edge labels showing the number of loci kept and removed at each step.

The Sankey diagram is plotted with a common “Removed” node for the removed loci at each step.

The Sankey diagram is plotted with a common “Kept” node for the kept loci moving to the next filter.

The Sankey diagram is plotted with a common “Unfiltered” node for the initial unfiltered loci.

The Sankey diagram is plotted with a common “Kept” node for the final kept loci.

The Sankey diagram is plotted with a common “Removed” node for the final removed loci.

Example

>>> from snpio import VCFReader, NRemover
>>> gd = VCFReader(filename="example.vcf", popmapfile="popmap.txt")
>>> nrm = NRemover(gd)
>>> nrm.filter_missing(0.75).filter_mac(2).filter_missing_pop(0.5).filter_singletons(exclude_heterozygous=True).filter_monomorphic().filter_biallelic().resolve()
>>> nrm.plot_sankey_filtering_report()
plot_search_results(df_combined)[source]

Plot and save the filtering results based on the available data.

This method plots the filtering results based on the available data. The filtering results are plotted for the per-sample and per-locus missing data proportions, MAF, and boolean filtering thresholds. The plots are saved to files in the output directory. If the show attribute is True, the plots are displayed. The plots are saved in the format specified by the plot_format attribute into the output_dir directory in the format: <prefix>_output/gtdata/plots/filtering_results_{method}.{plot_format}.

Parameters:

df_combined (pd.DataFrame) – The input dataframe containing the filtering results.

Returns:

Plots are saved to files.

Return type:

None

Raises:

ValueError – Raised if the input dataframe is empty.

Note

  • The input dataframe must contain the filtering results.

  • The input dataframe must contain the filtering results for the per-sample and per-locus missing data proportions.

  • The input dataframe must contain the filtering results for the MAF and boolean filtering thresholds.

  • The input dataframe must contain the filtering results for the removed and kept loci proportions.

  • The input dataframe must contain the filtering results for the removed and kept loci counts.

  • The input dataframe must contain the filtering results for the filtering method.

  • The input dataframe must contain the filtering results for the filtering step.

  • The input dataframe must contain the filtering results for the filtering thresholds.

  • The input dataframe must contain the filtering results for the removed and kept loci counts.

  • The input dataframe must contain the filtering results for the removed and kept loci proportions.

plot_sfs(pop_gen_stats, population1, population2, savefig=True)[source]

Plot a heatmap for the 2D SFS between two given populations and bar plots for the 1D SFS of each population.

Note

  • This method is not yet implemented.

  • The plot is saved to a file in the output_dir directory.

  • The plot is displayed if the show attribute is True.

  • The plot is saved in the format specified by the plot_format attribute.

  • The plot is saved with the filename: <prefix>_output/gtdata/plots/sfs_{population1}_{population2}.{plot_format}.

  • The 1D SFS for each population is calculated using the calculate_1d_sfs method.

  • The 2D SFS between the two populations is calculated using the calculate_2d_sfs method.

  • The 1D SFS for each population is plotted as a bar plot.

  • The 2D SFS between the two populations is plotted as a heatmap.

  • The 2D SFS heatmap is colored using a custom colormap.

  • The custom colormap is created using the colors “white”, “green”, “yellow”, “orange”, and “red”.

  • The number of colors in the colormap is set to 50.

  • The custom colormap is used to color the 2D SFS heatmap.

  • The custom colormap is displayed in the colorbar.

  • The custom colormap is saved to a file in the output_dir directory.

  • The custom colormap is displayed if the show attribute is True.

  • The custom colormap is saved in the format specified by the plot_format attribute.

Parameters:
  • pop_gen_stats (PopGenStatistics) – An instance of the PopGenStatistics class.

  • population1 (str) – The name of the first population.

  • population2 (str) – The name of the second population.

  • savefig (bool, optional) – Whether to save the figure to a file. Defaults to True. If True, the figure will be saved to a file.

Returns:

A plot is saved to a file.

Return type:

None

Raises:

NotImplementedError – Raised if the method is not yet implemented.

Note

  • The 1D SFS for each population is calculated using the calculate_1d_sfs method.

  • The 2D SFS between the two populations is calculated using the calculate_2d_sfs method.

  • The 1D SFS for each population is plotted as a bar plot.

  • The 2D SFS between the two populations is plotted as a heatmap.

  • The 2D SFS heatmap is colored using a custom colormap.

  • The custom colormap is created using the colors “white”, “green”, “yellow”, “orange”, and “red”.

  • The number of colors in the colormap is set to 50.

  • The custom colormap is used to color the 2D SFS heatmap.

  • The custom colormap is displayed in the colorbar.

  • The custom colormap is saved to a file in the output_dir directory.

  • The custom colormap is displayed if the show attribute is True.

  • The custom colormap is saved in the format specified by the plot_format attribute.

plot_summary_statistics(summary_statistics_df)[source]

Plot summary statistics per sample and per population on the same figure.

This method plots the summary statistics per sample and per population on the same figure. The summary statistics are plotted as lines for each statistic (Ho, He, Pi, Fst). The x-axis represents the locus, and the y-axis represents the value of the summary statistic. The plot is saved to a file. If the show attribute is True, the plot is displayed. The plot is saved to the output_dir directory with the filename: <prefix>_output/gtdata/plots/summary_statistics.{plot_format}. The plot is saved in the format specified by the plot_format attribute.

Parameters:

summary_statistics_df (pd.DataFrame) – The DataFrame containing the summary statistics to be plotted.

Returns:

The summary statistics plot is saved to a file.

Return type:

None

Raises:

ValueError – Raised if the dimensions argument is neither 2 nor 3.

Note

  • The summary statistics per sample and per population are plotted on the same figure.

  • The summary statistics are plotted as lines for each statistic (Ho, He, Pi, Fst).

  • The x-axis represents the locus, and the y-axis represents the value of the summary statistic.

  • The plot is saved to a file in the output_dir directory.

  • The plot is displayed if the show attribute is True.

  • The plot is saved in the format specified by the plot_format attribute.

  • The plot is saved with the filename: <prefix>_output/gtdata/plots/summary_statistics.{plot_format}.

  • The summary statistics per sample and per population are grouped by population.

  • The summary statistics are grouped by population.

  • The summary statistics are plotted as lines for each statistic (Ho, He, Pi, Fst).

  • The plot is colored by population and labeled by population with symbols for each sample.

run_pca(n_components=None, center=True, scale=False, n_axes=2, point_size=15, bottom_margin=0, top_margin=0, left_margin=0, right_margin=0, width=1088, height=700)[source]

Runs PCA and makes scatterplot with colors showing missingness.

Genotypes are plotted as separate shapes per population and colored according to missingness per individual.

This function is run at the end of each imputation method, but can be run independently to change plot and PCA parameters such as n_axes=3 or scale=True for full customization. Setting n_axes=3 will make a 3D PCA plot.

PCA (principal component analysis) scatterplot can have either two or three axes, set with the n_axes parameter.

The plot is saved as both an interactive HTML file and as a static image. Each population is represented by point shapes. The interactive plot has associated metadata when hovering over the points.

Files are saved to a reports directory as <prefix>_output/imputed_pca.<plot_format|html>. Supported image formats include: “pdf”, “svg”, “png”, and “jpeg” (or “jpg”).

Parameters:
  • n_components (int, optional) – Number of principal components to include in the PCA. Defaults to None (all components).

  • center (bool, optional) – If True, centers the genotypes to the mean before doing the PCA. If False, no centering is done. Defaults to True.

  • scale (bool, optional) – If True, scales the genotypes to unit variance before doing the PCA. If False, no scaling is done. Defaults to False.

  • n_axes (int, optional) – Number of principal component axes to plot. Must be set to either 2 or 3. If set to 3, a 3-dimensional plot will be made. Defaults to 2.

  • point_size (int, optional) – Point size for scatterplot points. Defaults to 15.

  • bottom_margin (int, optional) – Adjust bottom margin. If whitespace cuts off some of your plot, lower the corresponding margins. The default corresponds to that of plotly update_layout(). Defaults to 0.

  • top_margin (int, optional) – Adjust top margin. If whitespace cuts off some of your plot, lower the corresponding margins. The default corresponds to that of plotly update_layout(). Defaults to 0.

  • left_margin (int, optional) – Adjust left margin. If whitespace cuts off some of your plot, lower the corresponding margins. The default corresponds to that of plotly update_layout(). Defaults to 0.

  • right_margin (int, optional) – Adjust right margin. If whitespace cuts off some of your plot, lower the corresponding margins. The default corresponds to that of plotly update_layout(). Defaults to 0.

  • width (int, optional) – Width of plot space. If your plot is cut off at the edges, even after adjusting the margins, increase the width and height. Try to keep the aspect ratio similar. Defaults to 1088.

  • height (int, optional) – Height of plot space. If your plot is cut off at the edges, even after adjusting the margins, increase the width and height. Try to keep the aspect ratio similar. Defaults to 700.

Note

The PCA is run on the genotype data. Missing data is imputed using K-nearest-neighbors (per-sample) before running the PCA. The PCA is run using the sklearn.decomposition.PCA class.

The PCA data is saved as a numpy array with shape (n_samples, n_components).

The PCA object is saved as a sklearn.decomposition.PCA object. Any of the sklearn.decomposition.PCA attributes can be accessed from this object. See the sklearn documentation.

The explained variance ratio can be calculated from the PCA object.

The plot is saved as both an interactive HTML file and as a static image. Each population is represented by point shapes. The interactive plot has associated metadata when hovering over the points.

Files are saved to a reports directory as <prefix>_output/imputed_pca.<plot_format|html>. Supported image formats include: “pdf”, “svg”, “png”, and “jpeg” (or “jpg).

Returns:

PCA data as a numpy array with shape (n_samples, n_components).

sklearn.decomposision.PCA: Scikit-learn PCA object from sklearn.decomposision.PCA. Any of the sklearn.decomposition.PCA attributes can be accessed from this object. See sklearn documentation.

Return type:

numpy.ndarray

Raises:

ValueError – If n_axes is not set to 2 or 3.

Example

>>> from snpio import Plotting, VCFReader
>>>
>>> gd = VCFReader("snpio/example_data/vcf_files/phylogen_subset14K_sorted.vcf.gz", popmap="snpio/example_data/popmaps/phylogen_nomx.popmap", force_popmap=True, verbose=True)
>>>
>>> # Define the plotting object
>>> plotting = Plotting(gd)
>>>
>>> # Run the PCA and get the components and PCA object
>>> components, pca = plotting.run_pca()
>>>
>>> # Calculate and print explained variance ratio
>>> explvar = pca.explained_variance_ratio_
>>> print(explvar)
>>> # Output: [0.123, 0.098, 0.087, ...]
visualize_missingness(df, prefix=None, zoom=False, horizontal_space=0.6, vertical_space=0.6, bar_color='gray', heatmap_palette='magma')[source]

Make multiple plots to visualize missing data.

This method makes multiple plots to visualize missing data. The plots include per-individual and per-locus missing data proportions, per-population + per-locus missing data proportions, per-population missing data proportions, and per-individual and per-population missing data proportions.

Note

  • The plots are saved in the ‘<prefix>_output/gtdata/plots’ directory.

  • Supported image formats include: “pdf”, “svg”, “png”, and “jpeg” (or “jpg”).

  • The heatmap plot uses the seaborn library. The heatmap palette can be set using the heatmap_palette parameter. The default palette is ‘magma’.

  • The barplots use the matplotlib library. The color of the bars can be set using the bar_color parameter. The default color is ‘gray’.

Parameters:
  • df (pandas.DataFrame) – DataFrame with snps to visualize.

  • prefix (str, optional) – Prefix to use for the output files. If None, the prefix is set to the input filename. Defaults to None.

  • zoom (bool, optional) – If True, zooms in to the missing proportion range on some of the plots. If False, the plot range is fixed at [0, 1]. Defaults to False.

  • horizontal_space (float, optional) – Set width spacing between subplots. If your plot are overlapping horizontally, increase horizontal_space. If your plots are too far apart, decrease it. Defaults to 0.6.

  • vertical_space (float, optioanl) – Set height spacing between subplots. If your plots are overlapping vertically, increase vertical_space. If your plots are too far apart, decrease it. Defaults to 0.6.

  • bar_color (str, optional) – Color of the bars on the non-stacked barplots. Can be any color supported by matplotlib. See matplotlib.pyplot.colors documentation. Defaults to ‘gray’.

  • heatmap_palette (str, optional) – Palette to use for heatmap plot. Can be any palette supported by seaborn. See seaborn documentation. Defaults to ‘magma’.

Returns:

Returns the missing data proportions for per-individual, per-locus, per-population + per-locus, per-population, and per-individual + per-population.

Return type:

Tuple

Raises:

ValueError – If the input data is not a pandas DataFrame.

Module contents