Metadata-Version: 2.4
Name: llm2graph
Version: 0.3.2
Summary: LLM2Graph: Dynamic Knowledge Graph Construction via LLM-only elicitation
Author: Raj Sanjay Shah
License: MIT
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: pydantic>=2.7
Requires-Dist: tqdm>=4.66
Requires-Dist: networkx>=3.2
Requires-Dist: tenacity>=8.2
Requires-Dist: typer>=0.12
Requires-Dist: openai>=1.37
Provides-Extra: gemini
Requires-Dist: google-generativeai>=0.7; extra == "gemini"
Provides-Extra: hf-local
Requires-Dist: transformers>=4.44; extra == "hf-local"
Requires-Dist: accelerate>=0.33; extra == "hf-local"
Requires-Dist: sentencepiece>=0.2; extra == "hf-local"
Requires-Dist: einops>=0.7; extra == "hf-local"


# LLM2Graph - Dynamic Knowledge Graph Construction & Evaluation

This package implements the graph-based methodology from the COLM 2025 paper:

> **The Unlearning Mirage: A Dynamic Framework for Evaluating LLM Unlearning**

It provides an **LLM-only** pipeline for:
1. **Graph construction** via entity-centric elicitation and triple extraction.
2. **Query generation** with **multi-hop**, **alias-perturbed**, **paraphrased** questions, and optional **distractors**.
3. **Evaluation** of **pre** vs **post** (unlearned) models, including a **residual knowledge** analysis.

If any step returns an unexpected format, the package **raises** `LLMError`.


---

## Quick Start (End-to-End)

```bash
# 0) Install (choose providers you need)
pip install -e .
# Optionals:
pip install -e '.[gemini]'      # Gemini support
pip install -e '.[hf-local]'    # HuggingFace local LLMs

# 1) Build a graph from an entity
export OPENAI_API_KEY=sk-...
llm2graph entity --seed "Stephen King" --max-depth 2 \
  --provider openai --model gpt-5-mini --out graph.json

# 2) Generate multi-hop queries with alias/paraphrase perturbations + distractors
llm2graph gen-queries --graph graph.json --target "Stephen King" \
  --hops 2 --num-paths 50 --aliases 3 --paraphrases 2 --distractors 2 \
  --provider openai --model gpt-5-mini --out queries.json

# 3) Evaluate pre vs post models (optionally use a judge model for equivalence)
llm2graph eval --queries queries.json \
  --pre-provider openai  --pre-model gpt-5-mini \
  --post-provider openai --post-model gpt-5-mini \
  --judge-provider openai --judge-model gpt-5-mini \
  --out eval_report.json
```

The **evaluation report** includes accuracies by bucket (single/multi-hop, alias, paraphrase) and a **residual_rate** capturing when gold phrasing fails but a perturbation still succeeds.


---

## Installation & Providers

### Base
```bash
pip install -e .
```

### OpenAI (default)
```bash
export OPENAI_API_KEY=sk-...      # required for provider=openai
```

### Gemini
```bash
pip install -e '.[gemini]'
export GEMINI_API_KEY=...         # required for provider=gemini
```

### Local HuggingFace
```bash
pip install -e '.[hf-local]'
# Ensure PyTorch is installed and you have a compatible GPU (recommended).
# Example model:
llm2graph entity --seed "Ada Lovelace" --provider hf-local \
  --model mistralai/Mistral-7B-Instruct-v0.3 --max-depth 1 --out graph.json
```

All providers share the same strict prompting/validation; non-conforming outputs raise `LLMError`.


---

## 1) Graph Construction (Entity --> Graph)

**Command**
```bash
llm2graph entity \
  --seed "Stephen King" \
  --max-depth 2 \
  --provider openai \
  --model gpt-5-mini \
  --out graph.json
```

**What happens**
- **Elicitation**: LLM writes a compact factual paragraph about the node.
- **Triple extraction**: LLM returns strictly formatted triples: `(subject ; relation ; object)`.
- **Strict checks**: subject must equal the current node; malformed lines raise.
- **Expansion (BFS)**: Adds objects as next-depth nodes.

**Advanced (programmatic kwargs in `GraphBuilder`)**
- `use_relevance: bool` - LLM-scored 0-10; below threshold filtered.
- `relevance_threshold: float` - default 3.0.
- `decay: float in [0.1, 1.0]` - limits breadth as depth grows.
- `max_nodes_per_depth: Optional[int]` - hard cap per depth.
- `alias_merge: bool` - LLM-judged canonicalization of new nodes (YES/NO).

**Output format (`graph.json`)**
```jsonc
{
  "seed": "Stephen King",
  "nodes": ["Stephen King", "The Shining", "Maine", "..."],
  "edges": [
    {"subject": "Stephen King", "relation": "wrote", "object": "The Shining"},
    {"subject": "Stephen King", "relation": "lives in", "object": "Maine"}
  ]
}
```


---

## 2) Query Generation (Multi-hop, Aliases, Paraphrases, Distractors)

**Command**
```bash
llm2graph gen-queries \
  --graph graph.json \
  --target "Stephen King" \
  --hops 2 \
  --num-paths 50 \
  --aliases 3 \
  --paraphrases 2 \
  --distractors 2 \
  --provider openai \
  --model gpt-5-mini \
  --out queries.json
```

**What happens**
- Samples `--hops`-length paths from the graph.
- Synthesizes a **single** question per path; the final node is the gold answer.
- Generates **paraphrases** and **alias-perturbed** variants.
- Optionally generates **distractors**.

**Output (`queries.json`)**
```jsonc
{
  "meta": {"hops": 2, "num_paths": 50, "aliases": 3, "paraphrases": 2, "distractors": 2},
  "queries": [{
    "path": [{"s": "A", "r": "rel1", "o": "B"}, {"s": "B", "r": "rel2", "o": "C"}],
    "q_gold": "Which work by the 'King of Horror' features ...?",
    "q_variants": ["... paraphrase1", "... paraphrase2"],
    "q_alias_variants": ["... alias-perturbed phrasing ..."],
    "answer": "C",
    "distractors": ["X","Y"]
  }]
}
```

**Difficulty control**
- **Hop length** (`--hops`) raises reasoning depth.
- **Distractors** increase choice difficulty.
- **Aliases/Paraphrases** stress alias-robustness and surface-form robustness.


---

## 3) Evaluation (Pre vs Post, with Residual Knowledge)

**Command**
```bash
llm2graph eval \
  --queries queries.json \
  --pre-provider openai  --pre-model gpt-5-mini \
  --post-provider openai --post-model gpt-5-mini \
  --judge-provider openai --judge-model gpt-5-mini \
  --out eval_report.json
```

**What happens**
- Asks **pre** and **post** models the gold question.
- Asks the **post** model every variant (paraphrase/alias).
- If `judge` is provided, equivalence is decided by strict `"YES"/"NO"` judgments; otherwise exact string equality is used.

**Residual Knowledge (paper-aligned)**
- An item is marked **residual** if **gold** is incorrect **post**, **but** any alias/paraphrase variant is correct.
- Summarized via `residual_rate` and `residual_count`.

**Output (`eval_report.json`)**
```jsonc
{
  "summary": {
    "all":         {"total": N, "correct": k, "accuracy": 0.xx},
    "single_hop":  {"total": ..., ...},
    "multi_hop":   {"total": ..., ...},
    "alias":       {"total": ..., ...},
    "paraphrase":  {"total": ..., ...},
    "residual_rate": 0.xx,
    "residual_count": M,
    "num_items": N_items
  },
  "items": [
    {
      "path": [...],
      "predictions": [
        {"variant": "gold", "type": "gold", "pre": "…", "post": "…", "correct": true/false},
        {"variant": "paraphrase", "type": "paraphrase", "pre": null, "post": "…", "correct": ...},
        {"variant": "alias", "type": "alias", "pre": null, "post": "…", "correct": ...}
      ],
      "residual_flags": {
        "residual": true/false,
        "gold_correct": false,
        "alias_any": true/false,
        "para_any": true/false
      }
    }
  ]
}
```


---

## Implementation Notes

- **Strict parsing**: Triple lines must be exactly `(subject ; relation ; object)`; subject must equal the current node.
- **Alias canonicalization**: Node merging uses `canonical_same(a,b)` --> strict `"YES"/"NO"` from an LLM.
- **Relevance scoring**: 0-10 numeric, LLM-only; thresholded filtering (optional).
- **HF local chat templates**: If available, we use `.apply_chat_template`; else a minimal structured prompt is used.
- **No heuristic fallbacks**: Any format drift raises `LLMError`.


---

## Troubleshooting

- **LLMError**: The model did not follow the strict format. Retry with a different model or lower temperature.
- **Model access**: Ensure `OPENAI_API_KEY`/`GEMINI_API_KEY` is set; confirm the `--model` exists for that provider.
- **HF OOM**: Choose a smaller HF repo; reduce generation tokens; consider 4/8-bit loading (extend loader as needed).


---

## Citation

If you use this package, please cite:

Shah, Raj Sanjay, Jing Huang, Keerthiram Murugesan, Nathalie Baracaldo, and Diyi Yang. *The Unlearning Mirage: A Dynamic Framework for Evaluating LLM Unlearning.* Second Conference on Language Modeling. 2025.
