Metadata-Version: 2.2
Name: tpsimilarity
Version: 0.6.1
Summary: Package to compute TP similarities between nodes in a network.
Author: Sadamori Kojaku, Attila Varga
Author-email: "Filipi N. Silva" <filsilva@iu.edu>
License: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.0
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy
Requires-Dist: tqdm
Requires-Dist: pandas
Requires-Dist: python-igraph
Requires-Dist: cxrandomwalk
Requires-Dist: networkx
Requires-Dist: fastnode2vec
Requires-Dist: toml

# TP Similarity

**TP Similarity** is a Python package for computing Transition Probability (TP) similarities between nodes in a network. It offers various methods to estimate these similarities, including exact computation, estimation via random walks, shortest paths, and node2vec-based cosine similarity.

[![Build Status](https://github.com/filipinascimento/tpsimilarity/actions/workflows/test.yml/badge.svg)](https://github.com/filipinascimento/tpsimilarity/actions/workflows/test.yml)
[![Coverage Status](https://coveralls.io/repos/github/filipinascimento/tpsimilarity/badge.svg?branch=main)](https://coveralls.io/github/filipinascimento/tpsimilarity?branch=main)

## Table of Contents

- [Overview](#overview)
- [Installation](#installation)
- [Features](#features)
- [Getting Started](#getting-started)
  - [Prerequisites](#prerequisites)
  - [Importing the Package](#importing-the-package)
  - [Example Usage](#example-usage)
    - [Compute Exact Transition Probabilities (TP)](#1-compute-exact-transition-probabilities-tp)
    - [Compute Estimated Transition Probabilities](#2-compute-estimated-transition-probabilities)
    - [Compute Node2Vec Similarity](#3-compute-node2vec-similarity)
    - [Compute Shortest Paths Transition Probabilities](#4-compute-shortest-paths-transition-probabilities)
  - [Parameters](#parameters)
- [Examples](#examples)
- [Contributing](#contributing)
- [Authors](#authors)
- [License](#license)

## Overview

TP similarity is a measure designed for papers and authors, simulating a literature search procedure on citation networks. Inspired by information retrieval concepts, this approach does not rely on curated classification systems, avoids clustering complexities, and provides a continuous measure of similarity between nodes. By implementing the TP similarity measure, researchers can approximate the research interest similarity of individual scientists using publication-level information.

The package accompanies the paper:

**Varga, Attila, Sadamori Kojaku, and Filipi N. Silva. "Measuring Research Interest Similarity with Transition Probabilities."** *arXiv preprint arXiv:2409.18240* (2024). [Available on arXiv](https://arxiv.org/abs/2409.18240)

## Installation

Install the package using pip:

```bash
pip install tpsimilarity
```

## Features

- **Exact Transition Probabilities (TP):** Computes the exact transition probabilities between nodes in a graph.
- **Estimated Transition Probabilities:** Estimates transition probabilities using random walks.
- **Shortest Paths Transition Probabilities:** Computes transition probabilities along the shortest paths.
- **Node2Vec Similarity:** Computes cosine similarity between node embeddings generated by node2vec.

## Getting Started

### Prerequisites

- **Python**: Version 3.6 or higher
- **Dependencies**:
  - `numpy`
  - `scipy`
  - `gensim`
  - `tqdm`
  - `joblib`
  - `igraph` 
  - (optional) `networkx`

Install the dependencies using:

```bash
pip install numpy scipy networkx gensim tqdm joblib igraph
```

### Importing the Package

```python
from tpsimilarity import similarity
```

### Example Usage

#### 1. Compute Exact Transition Probabilities (TP)

```python
import networkx as nx
import igraph as ig
from tpsimilarity import similarity

# Create or load your graph
G = nx.karate_club_graph()

# Convert NetworkX graph to iGraph
G = ig.Graph.from_networkx(G)

# Define sources and targets
sources = [0, 1, 2]  # Source nodes
targets = [3, 4, 5]  # Target nodes

# Compute exact TP similarities
tp_sim = similarity.TP(
    graph=G,
    sources=sources,
    targets=targets,
    window_length=5
)

# tp_sim contains the similarity matrix or list based on return_type
```

#### 2. Compute Estimated Transition Probabilities

```python
# Estimate TP similarities using random walks
estimated_tp = similarity.estimatedTP(
    graph=G,
    sources=sources,
    targets=targets,
    window_length=5,
    walks_per_source=1000,
    batch_size=100,
    return_type="matrix",
    degreeNormalization=True,
    progressBar=True
)
```

#### 3. Compute Node2Vec Similarity

```python
# Compute node2vec-based cosine similarities
node2vec_sim = similarity.node2vec(
    graph=G,
    sources=sources,
    targets=targets,
    dimensions=64,
    window_length=40,
    context_size=5,
    workers=4,
    batch_walks=100,
    return_type="matrix",
    progressBar=True
)
```

#### 4. Compute Shortest Paths Transition Probabilities

```python
# Compute TP similarities along shortest paths
sp_tp = similarity.shortestPathsTP(
    graph=G,
    sources=sources,
    targets=targets,
    window_length=5
)
```

### Parameters

- **graph** (`networkx.Graph` or `igraph.Graph`): The graph on which to compute the similarities.
- **sources** (`list`): List of source node indices.
- **targets** (`list`): List of target node indices.
- **window_length** (`int`): The length of the random walks.
- **return_type** (`str`, optional): The format of the output (`"list"`, `"matrix"`, or `"dict"`). Default is `"matrix"`.
- **degreeNormalization** (`bool`, optional): Whether to normalize by the degree of the target node. Default is `True`.
- **dimensions** (`int`, optional): Number of dimensions for node embeddings in node2vec. Default is `64`.
- **context_size** (`int`, optional): Context size for the node2vec algorithm. Default is `10`.
- **workers** (`int`, optional): Number of parallel workers for node2vec. Default is `4`.
- **batch_walks** (`int`, optional): Number of walks per batch for node2vec. Default is `10000`.
- **progressBar** (`bool` or `tqdm`, optional): Whether to display a progress bar during computation. Default is `True`.

## Examples

You can find more examples and tutorials in the [examples directory](examples/) or in the [Jupyter notebooks](notebooks/) provided.

## Authors

- **Attila Varga**
- **Sadamori Kojaku**
- **Filipi N. Silva**

## License

This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
