Metadata-Version: 2.4
Name: python-gfm
Version: 0.1.10
Summary: An AI copilot for graph data and models (Under active development).
Author-email: BUAA SKLCCSE <your.email@example.com>
License: Apache-2.0
Requires-Python: >=3.12
Description-Content-Type: text/markdown
Requires-Dist: numpy>=1.20
Requires-Dist: pyyaml>=6
Provides-Extra: torch
Requires-Dist: torch==2.8.0; extra == "torch"
Requires-Dist: torch-geometric>=2.3.0; extra == "torch"
Requires-Dist: torch-scatter==2.1.2; extra == "torch"
Requires-Dist: torch-sparse==0.6.18; extra == "torch"
Requires-Dist: torch-cluster==1.6.3; extra == "torch"
Requires-Dist: torch-spline-conv==1.2.2; extra == "torch"
Requires-Dist: ogb==1.3.6; extra == "torch"
Requires-Dist: geoopt==0.5.1; extra == "torch"
Requires-Dist: deeprobust>=0.2.4; extra == "torch"
Requires-Dist: scipy>=1.9.0; extra == "torch"
Requires-Dist: transformers>=4.36.0; extra == "torch"
Requires-Dist: joblib>=1.3.0; extra == "torch"
Requires-Dist: scikit-learn>=1.3.0; extra == "torch"
Requires-Dist: ftfy>=6.1.0; extra == "torch"
Requires-Dist: regex>=2023.0.0; extra == "torch"
Requires-Dist: accelerate>=0.26.0; extra == "torch"
Requires-Dist: fschat>=0.2.36; extra == "torch"
Provides-Extra: full
Requires-Dist: numpy==2.3.2; extra == "full"
Requires-Dist: scipy==1.17.1; extra == "full"
Requires-Dist: pandas==3.0.2; extra == "full"
Requires-Dist: scikit-learn==1.8.0; extra == "full"
Requires-Dist: joblib==1.5.3; extra == "full"
Requires-Dist: networkx==3.5; extra == "full"
Requires-Dist: numba==0.65.0; extra == "full"
Requires-Dist: matplotlib==3.10.5; extra == "full"
Requires-Dist: PyYAML==6.0.2; extra == "full"
Requires-Dist: hydra-core==1.3.2; extra == "full"
Requires-Dist: omegaconf==2.3.0; extra == "full"
Requires-Dist: easydict==1.13; extra == "full"
Requires-Dist: rich==15.0.0; extra == "full"
Requires-Dist: tqdm==4.67.3; extra == "full"
Requires-Dist: requests==2.33.1; extra == "full"
Requires-Dist: httpx==0.28.1; extra == "full"
Requires-Dist: pydantic==2.13.0; extra == "full"
Requires-Dist: psutil==7.0.0; extra == "full"
Requires-Dist: regex==2026.4.4; extra == "full"
Requires-Dist: ftfy==6.3.1; extra == "full"
Requires-Dist: huggingface-hub==1.10.2; extra == "full"
Requires-Dist: openai==2.31.0; extra == "full"
Requires-Dist: anthropic>=0.40.0; extra == "full"
Requires-Dist: fastapi>=0.115.0; extra == "full"
Requires-Dist: uvicorn[standard]>=0.32.0; extra == "full"
Requires-Dist: shortuuid>=1.0.0; extra == "full"
Requires-Dist: nano-vectordb==0.0.4.3; extra == "full"
Requires-Dist: gdown==5.2.0; extra == "full"
Requires-Dist: einops==0.8.2; extra == "full"
Provides-Extra: llm
Requires-Dist: transformers==5.5.4; extra == "llm"
Requires-Dist: tokenizers==0.22.2; extra == "llm"
Requires-Dist: safetensors==0.7.0; extra == "llm"
Requires-Dist: accelerate==1.13.0; extra == "llm"
Requires-Dist: bitsandbytes==0.49.2; extra == "llm"
Requires-Dist: peft==0.19.0; extra == "llm"
Requires-Dist: datasets==4.8.4; extra == "llm"
Requires-Dist: sentence-transformers==5.4.1; extra == "llm"
Requires-Dist: fschat>=0.2.36; extra == "llm"
Provides-Extra: serve
Requires-Dist: gradio>=5.0.0; extra == "serve"
Provides-Extra: distributed
Requires-Dist: ray[default]>=2.40.0; extra == "distributed"
Provides-Extra: trackers
Requires-Dist: wandb>=0.19.0; extra == "trackers"
Requires-Dist: swanlab<0.8,>=0.7.11; extra == "trackers"
Provides-Extra: gcp
Requires-Dist: google-cloud-aiplatform>=1.38.0; extra == "gcp"
Provides-Extra: deepspeed
Requires-Dist: deepspeed==0.18.9; extra == "deepspeed"
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: ruff; extra == "dev"

<div align="center">
<img src="https://raw.githubusercontent.com/RingBDStack/pygfm/main/assets/LOGO.png" style="width:30%; display:block; margin:0 auto;" alt="LOGO">

[![PyPI version](https://img.shields.io/pypi/v/python-gfm?color=blue&logo=pypi&logoColor=white)](https://pypi.org/project/python-gfm/)
[![Python](https://img.shields.io/pypi/pyversions/python-gfm?logo=python&logoColor=white)](https://pypi.org/project/python-gfm/)
[![License](https://img.shields.io/badge/license-Apache%202.0-green.svg)](LICENSE)
[![PyPI Downloads](https://img.shields.io/pypi/dm/python-gfm?color=orange)](https://pypi.org/project/python-gfm/)

[Installation](#installation) · [Quick Start](#quick-start) · [Supported Baselines](#supported-baselines) · [Documentation](#baseline-documentation)

</div>

---

`pygfm` is a unified Python toolkit for **Graph Foundation Model (GFM)** research. It integrates **19 state-of-the-art baseline methods** under a single, pip-installable package with shared utilities, standardized interfaces, and fully reproducible experiment pipelines.

Developed by **Beihang University · School of Computer Science and Engineering · ACT Lab · MAGIC GROUP**.

## Framework Overview

<div align="center">
  <img src="https://raw.githubusercontent.com/RingBDStack/pygfm/main/assets/framework.png" alt="PyGFM Framework Overview" width="90%">
</div>

PyGFM is organized into four stacked layers — **Graph Data Abstraction → Alignment & Fusion Bridge → Representation Backbones → Task Heads & Orchestration** — with a unified CLI, model recipes, and an auto-experiment tracker sitting on top.

## Highlights

- **One package, 19 baselines** — prompt-based GFMs, structure-aware models, LLM-integrated approaches, and retrieval-augmented methods all available via a single `pip install`.
- **Reproducible pipelines** — every baseline ships with YAML-driven experiment configs, training scripts, and evaluation helpers.
- **Shared backbone library** — common GNN encoders, loss functions, and data utilities are factored out and reused across all baselines, reducing code duplication.
- **CLI-first design** — launch pre-training, fine-tuning, and evaluation jobs directly from the command line without writing any boilerplate.
- **LLM-ready** — first-class support for LLM-integrated GFMs (GraphGPT, GraphText, LLaGA, OneForAll) with HuggingFace-compatible YAML configs.

## Installation

### Minimal install (utilities only)

```bash
pip install python-gfm
```

### With PyTorch + PyG (recommended for running experiments)

```bash
# 1. Install PyTorch with CUDA 12.8 support
pip install torch==2.8.0 --index-url https://download.pytorch.org/whl/cu128

# 2. Install pygfm with the full ML stack (PyG extensions are resolved automatically)
pip install "python-gfm[torch]" -f https://data.pyg.org/whl/torch-2.8.0+cu128.html
```

> **CPU-only machines:** replace the CUDA index URLs with `https://download.pytorch.org/whl/cpu` and `https://data.pyg.org/whl/torch-2.8.0+cpu.html` respectively.

### Full stack (split extras)

**`[full]`** pins a **lightweight** science and tooling set (numpy/scipy/sklearn, Hydra, HTTP clients, etc.) and **does not** include PyTorch or PyG. Use **`[torch]`** for the GPU graph stack, **`[llm]`** for Hugging Face–style models and quantization helpers, and add other extras only when you need them.

| Extra | Includes |
|-------|----------|
| `full` | Pinned lightweight deps (no torch, no DeepSpeed, no Ray/Gradio/loggers) |
| `torch` | PyTorch, PyG, scatter/sparse/cluster/spline, ogb/geoopt/deeprobust, and the loose pins listed under `[torch]` in `pyproject.toml` |
| `llm` | transformers, tokenizers, safetensors, accelerate, bitsandbytes, peft, datasets, sentence-transformers, fschat |
| `serve` | Gradio |
| `distributed` | Ray |
| `trackers` | Weights & Biases, SwanLab |
| `gcp` | google-cloud-aiplatform |
| `deepspeed` | DeepSpeed only (add when you run GraphGPT/FastChat LoRA paths that import DeepSpeed) |

**Recommended:** use **PyPI** as the default index; add PyTorch with **`--extra-index-url`**, not as **`--index-url`** (making the PyTorch index primary often makes resolution slow or flaky). Add **`-f`** for PyG wheels when you install `[torch]`, same as in the [PyTorch + PyG](#with-pytorch--pyg-recommended-for-running-experiments) section above.

```bash
python -m pip install -U pip setuptools wheel
export PIP_DEFAULT_TIMEOUT=120   # optional: reduce ReadTimeout errors

# Structure / GNN experiments
pip install "python-gfm[torch,full]" \
  --extra-index-url https://download.pytorch.org/whl/cu128 \
  -f https://data.pyg.org/whl/torch-2.8.0+cu128.html
```

```bash
# Plus the LLM stack (GraphGPT, LLaGA, etc.)
pip install "python-gfm[torch,full,llm]" \
  --extra-index-url https://download.pytorch.org/whl/cu128 \
  -f https://data.pyg.org/whl/torch-2.8.0+cu128.html
```

Append extras as needed: `serve`, `distributed`, `trackers`, `gcp`, `deepspeed` (for example `python-gfm[torch,full,llm,deepspeed]`).

**Mirrors:** you may point the **default PyPI** index to a mirror (for example `-i https://pypi.tuna.tsinghua.edu.cn/simple`); keep **`--extra-index-url`** and **`-f`** for PyTorch and PyG so CUDA wheels still resolve.

On Windows PowerShell, replace the line-ending backslashes with carets (`^`) or put the command on one line.

### Development install (full checkout with experiment scripts)

```bash
git clone <repo-url> && cd pygfm
pip install -e ".[torch,dev]"
# e.g. local “everything” for development: pip install -e ".[torch,full,llm,dev]"  (use the same --extra-index-url / -f as above when [torch] is included)
```

The `dev` extra adds `pytest` and `ruff` for testing and linting.

## Quick Start

```python
import pygfm

print(pygfm.__version__)
```

Run a pre-training job from the CLI:

```bash
# SA2GFM contrastive pre-training
gfm-sa2gfm-pretrain -c scripts/sa2gfm/configs/pretrain.yaml

# SA2GFM downstream fine-tuning
gfm-sa2gfm-downstream -c scripts/sa2gfm/configs/downstream.yaml
```

## Package Structure

```
pygfm/
├── src/pygfm/
│   ├── baseline_models/   # 19 GFM baseline implementations
│   ├── public/            # Shared utilities, losses, and backbone encoders
│   │   ├── backbone_models/
│   │   ├── utils/
│   │   └── cli/
│   ├── private/           # Core encoders and internal data generation
│   └── cli/               # Console entry points
└── scripts/               # Per-baseline experiment scripts and configs
    ├── <baseline>/
    │   ├── README.md
    │   ├── configs/
    │   ├── pretrain.py / downstream.py / ...
    │   └── eval_script/
```

## Supported Baselines

| Category | Methods |
|---|---|
| **Prompt-based GFM** | MDGPT, SAMGPT, MDGFM, GraphPrompt, HGPrompt, MultiGPrompt, GCoT |
| **Structure-aware GFM** | SA2GFM, Bridge, GraphKeeper, GraphMore, Graver, BIM-GFM |
| **LLM-integrated GFM** | GraphGPT, GraphText, LLaGA, OneForAll |
| **Retrieval-augmented GFM** | RAG-GFM |
| **Classic Baseline** | Classic GNN |

## Running Experiments

All scripts are under `scripts/<baseline>/` and should be run from the repository root.

```bash
# Prompt-based: MDGPT pre-training
python scripts/mdgpt/pretrain.py

# Structure-aware: SA2GFM downstream fine-tuning
python scripts/sa2gfm/downstream.py

# LLM-integrated: GCoT full pipeline
python scripts/gcot/pretrain.py
python scripts/gcot/finetune.py
python scripts/gcot/finetune_graph.py

# LLM-integrated: GraphGPT (YAML-driven HuggingFace-style training)
python scripts/graphgpt/run_with_config.py -c scripts/graphgpt/configs/train_mem_template.yaml
```

## Console Commands

After installation the following CLI entry points are registered:

| Command | Description |
|---|---|
| `pygfm` / `gfm` | Generic YAML-driven runner (`-c <config.yaml>`) |
| `gfm-sa2gfm-pretrain` | SA2GFM contrastive pre-training |
| `gfm-sa2gfm-downstream` | SA2GFM MoE downstream fine-tuning |

## Configuration

All experiment hyperparameters are stored as YAML files under `scripts/<baseline>/configs/`. Pass configs via the `-c` flag:

```bash
python scripts/<baseline>/pretrain.py -c scripts/<baseline>/configs/default.yaml
```

**API keys:** baselines that call external LLM APIs (e.g., GraphText) read credentials from a local env file. **Never commit API keys to the repository.** Copy the example template and fill in your keys:

```bash
cp scripts/graphtext/config/user/env.yaml.example scripts/graphtext/config/user/env.yaml
# Then edit env.yaml and add your API key
```

## Baseline Documentation

Each baseline ships a dedicated README with setup instructions, data preparation steps, and evaluation notes:

| Baseline | Docs |
|---|---|
| MDGPT | [scripts/mdgpt/README.md](scripts/mdgpt/README.md) |
| SA2GFM | [scripts/sa2gfm/README.md](scripts/sa2gfm/README.md) |
| SAMGPT | [scripts/samgpt/README.md](scripts/samgpt/README.md) |
| MDGFM | [scripts/mdgfm/README.md](scripts/mdgfm/README.md) |
| GraphPrompt | [scripts/graphprompt/README.md](scripts/graphprompt/README.md) |
| HGPrompt | [scripts/hgprompt/README.md](scripts/hgprompt/README.md) |
| MultiGPrompt | [scripts/multigprompt/README.md](scripts/multigprompt/README.md) |
| GCoT | [scripts/gcot/README.md](scripts/gcot/README.md) |
| Graver | [scripts/graver/README.md](scripts/graver/README.md) |
| GraphMore | [scripts/graphmore/README.md](scripts/graphmore/README.md) |
| Bridge | [scripts/bridge/README.md](scripts/bridge/README.md) |
| GraphKeeper | [scripts/graphkeeper/README.md](scripts/graphkeeper/README.md) |
| GraphGPT | [scripts/graphgpt/README.md](scripts/graphgpt/README.md) |
| GraphText | [scripts/graphtext/README.md](scripts/graphtext/README.md) |
| LLaGA | [scripts/llaga/README.md](scripts/llaga/README.md) |
| OneForAll | [scripts/oneforall/README.md](scripts/oneforall/README.md) |
| RAG-GFM | [scripts/rag_gfm/README.md](scripts/rag_gfm/README.md) |

## Requirements

| Dependency | Version |
|---|---|
| Python | ≥ 3.12 |
| PyTorch | 2.8.0 (CUDA 12.8 recommended) |
| PyTorch Geometric | ≥ 2.3.0 |
| Transformers | ≥ 4.36.0 |
| Accelerate | ≥ 0.26.0 |

See [`pyproject.toml`](pyproject.toml) for the full dependency specification.

## License

This project is licensed under the **[Apache License 2.0](LICENSE)**.

## Team

**MAGIC GROUP** — Beihang University, School of Computer Science and Engineering, ACT Lab.

---

<div align="center">
<sub>If you find this toolkit useful in your research, please consider starring the repository ⭐</sub>
</div>
