Spatial Coherence Pipeline¶
- main.compute_shortest_paths(graph, args)¶
Step 1.5: Compute shortest paths and store it in args. This is done only once.
- main.load_and_initialize_graph()¶
Step 1: Load the graph with provided arguments and perform initial checks.
- main.main(graph, args)¶
Main function: graph loading, processing, and analysis.
- main.network_correlation_dimension(args)¶
Steps 3: Predict the dimension of the graph
- main.plot_and_analyze_graph(graph, args)¶
Plots the original graph and analyzes its properties.
- main.plot_profiling_results(args)¶
- main.profile(func)¶
- main.rank_matrix_analysis(args)¶
Step 4. Analyze the rank matrix
- main.reconstruct_graph(graph, args)¶
Reconstructs the graph if required based on the specifications in the args object. This involves running a graph reconstruction process, which may include converting the graph to a specific format, and potentially considering ground truth availability based on the reconstruction mode specified in args.
The reconstruction process is conditionally executed based on the reconstruct flag within the args object. If reconstruction is performed, the function also handles the determination of ground truth availability and executes the reconstruction process accordingly.
- Parameters:
graph – The graph to potentially reconstruct. This graph should be compatible with the reconstruction process and might be converted to a different format as part of the reconstruction.
args –
An object containing various configuration options and flags for the graph analysis and reconstruction process. This includes: - reconstruct (bool): Whether the graph should be reconstructed. - reconstruction_mode (str): The mode of reconstruction to be applied. - proximity_mode (str): The mode of proximity used for the graph, affecting ground
truth availability.
large_graph_subsampling (bool): A flag indicating whether subsampling for large graphs is enabled, also affecting ground truth availability.
Note
The function directly prints updates regarding the reconstruction process, including the mode of reconstruction and whether ground truth is considered available.
- main.spatial_constant_analysis(graph, args, false_edge_list=None)¶
Step 2: Analyze spatial constant
- main.subsample_graph_if_necessary(graph, args)¶
Subsamples the graph if it is too large for efficient processing.
- data_analysis.plot_graph_properties(args, igraph_graph)¶
Plots various graph properties including clustering coefficients, degree distributions, and shortest path distributions. It supports both unipartite and bipartite graphs. For bipartite graphs, properties are computed and plotted separately for each set.
This function also computes and stores spatial constant results based on the mean shortest path and average degree of the graph.
- Parameters:
args – An object containing configuration parameters and options for the graph analysis. This object should include fields for bipartite graph checks (is_bipartite), directory mappings (directory_map), graph titles (args_title), and placeholders for results (mean_clustering_coefficient, average_degree, etc.).
igraph_graph – An igraph graph object. If the graph is not of the desired igraph type, it will be converted within the function.
- Outputs:
Plots of clustering coefficient distributions and degree distributions saved to specified directories.
A CSV file containing spatial constant results saved in the specified directory.
- Side Effects:
Modifies the args object by setting various properties such as mean_clustering_coefficient, average_degree, and others depending on whether the graph is bipartite or not.
Generates and saves plots to the filesystem.
Saves spatial constant results as a CSV file to the filesystem.
Note
The function relies on several helper functions (convert_graph_type, bipartite_clustering_coefficient_optimized, get_bipartite_degree_distribution, plot_clustering_coefficient_distribution, plot_degree_distribution, get_local_clustering_coefficients, get_degree_distribution, get_mean_shortest_path, plot_shortest_path_distribution, get_spatial_constant_results) to perform its tasks.
Ensure that all necessary fields are present in the args object before calling this function.
- plots.plot_original_or_reconstructed_image(args, image_type='original', edges_df=None, position_filename=None, plot_weights_against_distance=False)¶
Plots the original or reconstructed image based on the provided arguments.
This function handles the plotting of either the original or reconstructed graph images, with options to include weights against distance. It sets up the plot based on the dimensionality specified in args, retrieves edge data, and plots node positions with optional coloring.
- Parameters:
args – An object containing configuration and graph arguments, including directory paths, dimensionality (dim), color mapping, and more.
image_type (str) – The type of image to plot, options are “original”, “mst”, or “reconstructed”. Defaults to “original”.
edges_df (pandas.DataFrame, optional) – DataFrame containing edge data. If None, the edge data is loaded from a file specified in args. Defaults to None.
position_filename (str, optional) – Filename of the position data CSV file. If None, the position data is loaded based on image_type and configurations in args. Defaults to None.
plot_weights_against_distance (bool) – Flag to enable plotting of weights against distance for edges. Defaults to False.
- Raises:
ValueError – If image_type is not one of the expected options (“original”, “reconstructed”, “mst”).
ValueError – If the edge list numbering is not valid, indicating a mismatch between edge data and node positions.
- algorithms.compute_shortest_path_matrix_sparse_graph(sparse_graph, args=None)¶
Computes the shortest path matrix for a given sparse graph. If args is provided and contains a precomputed shortest path matrix, that matrix is returned instead of recomputing it. Otherwise, the shortest path matrix is computed from the sparse graph, and if args is provided, the computed matrix and its mean are stored in args.
The function supports both the computation of shortest paths in the absence of the args object and the utilization of precomputed values within args to avoid redundant computations.
- Parameters:
sparse_graph – A sparse graph representation for which the shortest path matrix will be computed. The graph should be compatible with the shortest_path function requirements from scipy’s csgraph module.
args (Optional[object]) – An optional object that may contain the precomputed shortest path matrix and can store the computed shortest path matrix and its mean. This object should have shortest_path_matrix and mean_shortest_path attributes if utilized.
- Returns:
A numpy array representing the shortest path matrix of the given sparse graph.
- Return type:
numpy.ndarray
- Side Effects:
If args is provided and does not contain a precomputed shortest path matrix, the computed shortest path matrix and its mean are stored in args.shortest_path_matrix and args.mean_shortest_path, respectively.
Note
The function relies on convert_graph_type from a utils module to ensure the sparse graph is in the desired format for computation.
The shortest path computation is performed using the shortest_path function from scipy’s csgraph module, assuming an undirected graph.
- data_analysis.run_simulation_subgraph_sampling(args, graph, size_interval=100, n_subgraphs=10, add_false_edges=False, add_mst=False, parallel=True, false_edge_list=[0, 1, 2, 3, 4], plot_spatial_constant_against_false_edges=False)¶
Runs a simulation that samples subgraphs from a given graph (using BFS) to analyze various properties, optionally adding minimum spanning trees (MST) and/or false edges to the graph before sampling. The function supports parallel processing to speed up computations.
- Parameters:
args – An object containing configuration parameters and options for the graph analysis, including directory mappings and proximity mode settings.
graph – An igraph graph object to be analyzed. The graph is converted to the igraph format if not already in that format.
size_interval (int) – The interval size for subgraph sampling, determining the range of subgraph sizes to analyze. Defaults to 100.
n_subgraphs (int) – The number of subgraphs to sample at each size interval. Defaults to 10.
add_false_edges (bool) – Whether to add false edges to the graph before sampling. Defaults to False.
add_mst (bool) – Whether to compute and analyze the minimum spanning tree of the graph. Defaults to False.
false_edge_list (list of int) – A list specifying the numbers of false edges to add for each simulation run. Only relevant if add_false_edges is True. Defaults to [0,1,2,3,4].
plot_spatial_constant_against_false_edges (bool) – Whether to plot the spatial constant against the number of false edges added. Only relevant if add_false_edges is True. Defaults to False.
- Returns:
- A DataFrame containing the aggregated results of the subgraph sampling simulation,
including spatial constant calculations for various subgraph sizes and configurations.
- Return type:
pandas.DataFrame
- Raises:
ValueError – If args does not contain the necessary configuration for the simulation.
Note
This function modifies the args object by updating it with results from the simulation, such as mean shortest path and clustering coefficients. Ensure that args is properly configured before calling this function. It plots the spatial constant plot.
- spatial_constant_analysis.run_reconstruction(args, sparse_graph, node_embedding_mode='ggvec', manifild_learning_mode='UMAP', ground_truth_available=False)¶
Performs reconstruction of a graph from a sparse matrix representation, employing node embedding and manifold learning techniques. The process includes inferring node positions, plotting the reconstructed graph, and computing quality metrics related to the reconstruction accuracy.
- Parameters:
args – An object containing configuration parameters and options for the analysis, including the dimensionality (dim), directory mappings (directory_map), and graph analysis settings.
sparse_graph – A sparse matrix representation of the graph to be reconstructed.
node_embedding_mode (str) – The mode of node embedding to use for initial node position inference. Defaults to ‘ggvec’. Other options include ‘landmark_isomap’.
manifild_learning_mode (str) – The manifold learning technique to apply for dimensionality reduction. Defaults to ‘UMAP’.
ground_truth_available (bool) – Indicates whether ground truth data is available for evaluating the reconstruction quality. Defaults to False.
- Returns:
- A tuple containing:
reconstructed_points (numpy array): The inferred positions of nodes in the target dimensionality.
metrics (dict): A dictionary of quality metrics assessing the reconstruction, with keys for ground truth-based metrics (“ground_truth”) and metrics in the absence of ground truth (“gta”).
- Return type:
tuple
Note
The function modifies the args object by appending the node_embedding_mode to args_title for identification.