Metadata-Version: 2.4
Name: NEExT
Version: 0.2.12
Summary: Network Embedding Experimentation Toolkit - A powerful framework for graph analysis, embedding computation, and machine learning on graph-structured data
Project-URL: Homepage, https://github.com/ashdehghan/NEExT
Project-URL: Documentation, https://neext.readthedocs.io
Project-URL: Repository, https://github.com/ashdehghan/NEExT
Project-URL: Issues, https://github.com/ashdehghan/NEExT/issues
Author-email: Ash Dehghan <ash.dehghan@gmail.com>
Maintainer-email: Ash Dehghan <ash.dehghan@gmail.com>
License: MIT
License-File: LICENSE
Keywords: embedding,graph,graph-ml,machine-learning,network,network-analysis
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Python: <3.13,>=3.9
Requires-Dist: cairocffi>=1.3.0
Requires-Dist: imbalanced-learn<0.13,>=0.11.0
Requires-Dist: joblib>=1.3.0
Requires-Dist: llvmlite>=0.41.0
Requires-Dist: networkx<4,>=3.0
Requires-Dist: numba>=0.58.0
Requires-Dist: numpy<2.0,>=1.24.0
Requires-Dist: pandas<3,>=2.0.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: python-igraph>=0.10.0
Requires-Dist: requests>=2.25.0
Requires-Dist: scikit-learn<1.5,>=1.3.0
Requires-Dist: scipy>=1.13.0
Requires-Dist: tqdm>=4.64.0
Requires-Dist: vectorizers<0.3,>=0.2
Requires-Dist: xgboost>=1.7.0
Provides-Extra: advanced
Requires-Dist: matplotlib>=3.5.0; extra == 'advanced'
Requires-Dist: optuna>=3.0.0; extra == 'advanced'
Requires-Dist: plotly>=5.0.0; extra == 'advanced'
Requires-Dist: seaborn>=0.12.0; extra == 'advanced'
Requires-Dist: umap-learn!=0.5.9.post2,>=0.5.7; extra == 'advanced'
Provides-Extra: all
Requires-Dist: black>=23.0.0; extra == 'all'
Requires-Dist: ipykernel>=6.0.0; extra == 'all'
Requires-Dist: isort>=5.12.0; extra == 'all'
Requires-Dist: jupyter>=1.0.0; extra == 'all'
Requires-Dist: matplotlib>=3.5.0; extra == 'all'
Requires-Dist: mypy>=1.0.0; extra == 'all'
Requires-Dist: myst-parser>=2.0.0; extra == 'all'
Requires-Dist: nbsphinx>=0.9.0; extra == 'all'
Requires-Dist: notebook>=7.0.0; extra == 'all'
Requires-Dist: optuna>=3.0.0; extra == 'all'
Requires-Dist: plotly>=5.0.0; extra == 'all'
Requires-Dist: pre-commit>=3.0.0; extra == 'all'
Requires-Dist: pytest-cov>=4.0.0; extra == 'all'
Requires-Dist: pytest-mock>=3.10.0; extra == 'all'
Requires-Dist: pytest>=7.0.0; extra == 'all'
Requires-Dist: ruff>=0.1.0; extra == 'all'
Requires-Dist: seaborn>=0.12.0; extra == 'all'
Requires-Dist: sphinx-autodoc-typehints>=1.25.0; extra == 'all'
Requires-Dist: sphinx-copybutton>=0.5.0; extra == 'all'
Requires-Dist: sphinx-rtd-theme>=1.3.0; extra == 'all'
Requires-Dist: sphinx>=7.0.0; extra == 'all'
Requires-Dist: umap-learn!=0.5.9.post2,>=0.5.7; extra == 'all'
Provides-Extra: all-dl
Requires-Dist: black>=23.0.0; extra == 'all-dl'
Requires-Dist: dgl<3.0,>=1.1.0; extra == 'all-dl'
Requires-Dist: ipykernel>=6.0.0; extra == 'all-dl'
Requires-Dist: isort>=5.12.0; extra == 'all-dl'
Requires-Dist: jupyter>=1.0.0; extra == 'all-dl'
Requires-Dist: matplotlib>=3.5.0; extra == 'all-dl'
Requires-Dist: mypy>=1.0.0; extra == 'all-dl'
Requires-Dist: myst-parser>=2.0.0; extra == 'all-dl'
Requires-Dist: nbsphinx>=0.9.0; extra == 'all-dl'
Requires-Dist: notebook>=7.0.0; extra == 'all-dl'
Requires-Dist: optuna>=3.0.0; extra == 'all-dl'
Requires-Dist: plotly>=5.0.0; extra == 'all-dl'
Requires-Dist: pre-commit>=3.0.0; extra == 'all-dl'
Requires-Dist: pytest-cov>=4.0.0; extra == 'all-dl'
Requires-Dist: pytest-mock>=3.10.0; extra == 'all-dl'
Requires-Dist: pytest>=7.0.0; extra == 'all-dl'
Requires-Dist: ruff>=0.1.0; extra == 'all-dl'
Requires-Dist: seaborn>=0.12.0; extra == 'all-dl'
Requires-Dist: sphinx-autodoc-typehints>=1.25.0; extra == 'all-dl'
Requires-Dist: sphinx-copybutton>=0.5.0; extra == 'all-dl'
Requires-Dist: sphinx-rtd-theme>=1.3.0; extra == 'all-dl'
Requires-Dist: sphinx>=7.0.0; extra == 'all-dl'
Requires-Dist: torch<3.0,>=2.0.0; extra == 'all-dl'
Requires-Dist: umap-learn!=0.5.9.post2,>=0.5.7; extra == 'all-dl'
Provides-Extra: dev
Requires-Dist: black>=23.0.0; extra == 'dev'
Requires-Dist: isort>=5.12.0; extra == 'dev'
Requires-Dist: mypy>=1.0.0; extra == 'dev'
Requires-Dist: pre-commit>=3.0.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0.0; extra == 'dev'
Requires-Dist: pytest-mock>=3.10.0; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Provides-Extra: dgl
Requires-Dist: dgl<3.0,>=1.1.0; extra == 'dgl'
Requires-Dist: torch<3.0,>=2.0.0; extra == 'dgl'
Provides-Extra: docs
Requires-Dist: myst-parser>=2.0.0; extra == 'docs'
Requires-Dist: nbsphinx>=0.9.0; extra == 'docs'
Requires-Dist: sphinx-autodoc-typehints>=1.25.0; extra == 'docs'
Requires-Dist: sphinx-copybutton>=0.5.0; extra == 'docs'
Requires-Dist: sphinx-rtd-theme>=1.3.0; extra == 'docs'
Requires-Dist: sphinx>=7.0.0; extra == 'docs'
Provides-Extra: experiments
Requires-Dist: ipykernel>=6.0.0; extra == 'experiments'
Requires-Dist: jupyter>=1.0.0; extra == 'experiments'
Requires-Dist: notebook>=7.0.0; extra == 'experiments'
Description-Content-Type: text/markdown

# NEExT: Network Embedding Experimentation Toolkit

NEExT is a powerful Python framework for graph analysis, embedding computation, and machine learning on graph-structured data. It provides a unified interface for working with different graph backends (NetworkX and iGraph), computing node features, generating graph embeddings, and training machine learning models.

## 📚 Documentation

Detailed documentation is available in the `docs` directory. Build it locally or visit the online documentation at [NEExT Documentation](https://neext.readthedocs.io/en/latest/).

## 🌟 Features

- **Flexible Graph Handling**
  - Support for both NetworkX and iGraph backends
  - Automatic graph reindexing and largest component filtering
  - Node sampling capabilities for large graphs
  - Rich attribute support for nodes and edges

- **Comprehensive Node Features**
  - PageRank
  - Degree Centrality
  - Closeness Centrality
  - Betweenness Centrality
  - Eigenvector Centrality
  - Clustering Coefficient
  - Local Efficiency
  - LSME (Local Structural Motif Embeddings)

- **Graph Embeddings**
  - Approximate Wasserstein
  - Exact Wasserstein
  - Sinkhorn Vectorizer
  - Customizable embedding dimensions

- **Machine Learning Integration**
  - Classification and regression support
  - Dataset balancing options
  - Cross-validation with customizable splits
  - Feature importance analysis

### Custom Node Feature Functions

NEExT allows you to define and compute your own custom node feature functions alongside the built-in ones. This provides great flexibility for experimenting with novel graph metrics.

**Defining a Custom Feature Function:**

Your custom feature function must adhere to the following structure:

1.  **Input**: It must accept a single argument, which will be a `graph` object. This object provides access to the graph's structure (nodes, edges) and properties (e.g., `graph.nodes`, `graph.graph_id`, `graph.G` which is the underlying NetworkX or iGraph object).
2.  **Output**: It must return a `pandas.DataFrame` with the following specific columns in order:
    *   `"node_id"`: Identifiers for the nodes for which features are computed.
    *   `"graph_id"`: The identifier of the graph to which these nodes belong.
    *   One or more feature columns: These columns should contain the computed feature values. The naming convention for these columns should ideally follow the pattern `your_feature_name_0`, `your_feature_name_1`, etc., if your feature has multiple components or is expanded over hops (though a single feature column like `your_feature_name` is also acceptable).

**Example:**

Here's how you can define a simple custom feature function and use it:

```python
import networkx as nx
import pandas as pd

# 1. Define your custom feature function
# Works from scripts, modules, and Jupyter notebook cells — the function is
# shipped to workers via cloudpickle, so top-level notebook definitions are fine.
# Avoid closing over unpicklable objects (open file handles, live DB connections, etc.).
def my_node_degree_squared(graph):
    nodes = list(graph.nodes) # or range(graph.G.vcount()) for igraph if nodes are 0-indexed
    graph_id = graph.graph_id
    
    if hasattr(graph.G, 'degree'): # Handles both NetworkX and iGraph
        if isinstance(graph.G, nx.Graph): # NetworkX
            degrees = [graph.G.degree(n) for n in nodes]
        else: # iGraph
            degrees = graph.G.degree(nodes)
    else:
        raise TypeError("Graph object does not have a degree method.")
        
    degree_squared_values = [d**2 for d in degrees]
    
    df = pd.DataFrame({
        'node_id': nodes,
        'graph_id': graph_id,
        'degree_sq_0': degree_squared_values
    })
    # Ensure the correct column order
    return df[['node_id', 'graph_id', 'degree_sq_0']]

# 2. Prepare the list of custom feature methods
my_feature_methods = [
    {"feature_name": "my_degree_squared", "feature_function": my_node_degree_squared}
]

# 3. Pass it to compute_node_features
# Initialize NEExT and load your graph_collection as shown in the Quick Start
# nxt = NEExT()
# graph_collection = nxt.read_from_csv(...)

features = nxt.compute_node_features(
    graph_collection=graph_collection,
    feature_list=["page_rank", "my_degree_squared"], # Include your custom feature name
    feature_vector_length=3, # Applies to built-in features that use it
    my_feature_methods=my_feature_methods
)

print(features.features_df.head())
```

When you include `"my_degree_squared"` in the `feature_list` and provide `my_feature_methods`, NEExT will automatically register and compute your custom function. If `"all"` is in `feature_list`, your custom registered function will also be included in the computation.

**Parallel execution controls:**

By default, `compute_node_features()` uses `n_jobs=-1` and `parallel_backend="loky"` for parallel execution. The `loky` process backend is the safer default for custom functions defined in Jupyter notebooks because joblib can serialize them with cloudpickle, but it may spend substantial time serializing large graph objects and feature functions.

For serialization-heavy workloads, try `parallel_backend="threading"`. Threads avoid sending graph objects to worker processes, which can be faster when pickling dominates runtime, but GIL-bound Python code may still scale poorly. Benchmark expensive custom features with `n_jobs=1`, `2`, `4`, and `-1` before choosing production settings.

Advanced joblib options can be passed through with `joblib_kwargs`, for example `joblib_kwargs={"idle_worker_timeout": 120}` for the loky backend or `joblib_kwargs={"timeout": 300}`. These are tuning controls for scheduling and worker behavior, not guaranteed fixes for memory pressure.

## 📦 Installation

### Basic Installation
```bash
pip install NEExT
```

### Development Installation
```bash
# Clone the repository
git clone https://github.com/ashdehghan/NEExT.git
cd NEExT

# Install with development dependencies
pip install -e ".[dev]"
```

### Additional Components
```bash
# For running tests
pip install -e ".[test]"

# For building documentation
pip install -e ".[docs]"

# For running experiments
pip install -e ".[experiments]"

# Install all components
pip install -e ".[dev,test,docs,experiments]"
```

## 🚀 Quick Start

### Basic Usage

```python
from NEExT import NEExT

# Initialize the framework
nxt = NEExT()
nxt.set_log_level("INFO")

# Load graph data
graph_collection = nxt.read_from_csv(
    edges_path="edges.csv",
    node_graph_mapping_path="node_graph_mapping.csv",
    graph_label_path="graph_labels.csv",
    reindex_nodes=True,
    filter_largest_component=True,
    graph_type="igraph"
)

# Compute node features
features = nxt.compute_node_features(
    graph_collection=graph_collection,
    feature_list=["all"],
    feature_vector_length=3
)

# Compute graph embeddings
embeddings = nxt.compute_graph_embeddings(
    graph_collection=graph_collection,
    features=features,
    embedding_algorithm="approx_wasserstein",
    embedding_dimension=3
)

# Train a classifier
model_results = nxt.train_ml_model(
    graph_collection=graph_collection,
    embeddings=embeddings,
    model_type="classifier",
    sample_size=50
)
```

### Working with Large Graphs

NEExT supports node sampling for handling large graphs:

```python
# Load graphs with 70% of nodes
graph_collection = nxt.read_from_csv(
    edges_path="edges.csv",
    node_graph_mapping_path="node_graph_mapping.csv",
    node_sample_rate=0.7  # Use 70% of nodes
)
```

### Feature Importance Analysis

```python
# Compute feature importance
importance_df = nxt.compute_feature_importance(
    graph_collection=graph_collection,
    features=features,
    feature_importance_algorithm="supervised_fast",
    embedding_algorithm="approx_wasserstein"
)
```

## 📊 Experiments

NEExT includes several pre-built experiments in the `examples/experiments` directory:

### Node Sampling Experiment
Investigates the effect of node sampling on classifier accuracy:
```bash
cd examples/experiments
python node_sampling_experiments.py
```

## 📝 Input File Formats

### edges.csv
```csv
src_node_id,dest_node_id
0,1
1,2
...
```

### node_graph_mapping.csv
```csv
node_id,graph_id
0,1
1,1
2,2
...
```

### graph_labels.csv
```csv
graph_id,graph_label
1,0
2,1
...
```

## 🛠️ Development

### Running Tests
```bash
# Run all tests
pytest

# Run with coverage
pytest --cov=NEExT

# Run specific test file
pytest tests/test_node_sampling.py
```

### Building Documentation
```bash
cd docs
make html
```

### Code Style
The project uses several tools for code quality:
```bash
# Format code
black .

# Sort imports
isort .

# Check style
flake8 .

# Type checking
mypy .
```

### Publishing to PyPI

NEExT uses direct local publication through the root `Makefile`. GitHub Releases
do not publish the package.

Before publishing:

1. Update `__version__` in `NEExT/__init__.py`. `pyproject.toml` and the docs
   derive the package version from that file.
2. Commit all release changes on `main`.
3. Create `.env` with `PYPI_API_TOKEN=pypi-...`.
4. Run the validation and publish flow:

```bash
make release-check
make deploy
```

`make deploy` verifies the working tree and token, builds with `uv build`, pushes
`main`, creates and pushes the `release_v_<version>` tag, and publishes with
`uv publish`.

## 🤝 Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Run tests
5. Submit a pull request

## 📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

## 👥 Authors

- Ash Dehghan - [ash.dehghan@gmail.com](mailto:ash.dehghan@gmail.com)

## 🙏 Acknowledgments

- NetworkX team for the graph algorithms
- iGraph team for the efficient graph operations
- Scikit-learn team for machine learning components

## 📧 Contact

For questions and support:
- Email: ash@anomalypoint.com
- GitHub Issues: [NEExT Issues](https://github.com/ashdehghan/NEExT/issues)

## 🔄 Version History

- 0.1.0
  - Initial release
  - Basic graph operations
  - Node feature computation
  - Graph embeddings
  - Machine learning integration
