Metadata-Version: 2.4
Name: manifold-scorer
Version: 0.1.2
Summary: Local acceptance-manifold scoring for implicit RLHF preference pair generation
Author-email: Patrick Gerard <pgerard@isi.edu>
License: MIT
Project-URL: Homepage, https://github.com/patrikgerard/manifold-scorer
Keywords: RLHF,DPO,preference learning,alignment,density estimation,NLP
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: numpy>=1.24
Requires-Dist: scikit-learn>=1.2
Provides-Extra: embed
Requires-Dist: sentence-transformers>=2.2; extra == "embed"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: black; extra == "dev"
Requires-Dist: ruff; extra == "dev"

# manifold-scorer

Local acceptance-manifold scoring for implicit RLHF/DPO preference pair generation.

Implementation of the local density scorer from Gerard & Volkova (2026), *"Density-Guided Response Optimization."*

The idea: posts a community has accepted — upvoted, engaged with, allowed to persist — cluster in coherent, high-density regions of embedding space. That structure encodes community preference without any labels. You fit the scorer on those posts, then use it to rank candidate responses and generate DPO pairs.

---

## Install

```bash
pip install manifold-scorer
```

---

## Usage

### 1. Embed your community posts

```python
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("sentence-transformers/all-mpnet-base-v2")

# Just your community posts — upvoted comments, replies, whatever they wrote.
# The fact that they exist = they were accepted. No labels needed.
post_embs = model.encode(community_posts, normalize_embeddings=True)  # (N, d)
```

### 2. Find a good k (optional but recommended)

```python
from manifold import tune_k

result = tune_k(post_embs)
# prints:
#      k  consistency (rho)
# --------------------------
#     25              0.61
#     50              0.68
#    100              0.71  ←
#    150              0.70
#    200              0.68

best_k = result["best_k"]
```

### 3. Fit the scorer

```python
from manifold import ManifoldScorer

scorer = ManifoldScorer(k=best_k)
scorer.fit(post_embs)
```

### 4. Generate DPO pairs

```python
# You have prompts and multiple candidate responses per prompt
prompt_embs    = model.encode(prompts, normalize_embeddings=True)     # (N, d)
candidate_embs = model.encode(candidates, normalize_embeddings=True)  # (N, n_cands, d)

pairs = scorer.make_pairs(prompt_embs, candidate_embs, margin_threshold=0.5)

# pairs["mask"] tells you which pairs had strong enough signal
chosen_texts   = [candidates[i][pairs["chosen_idx"][i]]
                  for i in range(N) if pairs["mask"][i]]
rejected_texts = [candidates[i][pairs["rejected_idx"][i]]
                  for i in range(N) if pairs["mask"][i]]

# feed chosen_texts / rejected_texts into your DPO trainer
```

### 5. Score a single candidate

```python
score = scorer.score(prompt_emb, candidate_emb)
# log-density — higher means more aligned with community norms
```

### 6. Rank candidates for a prompt

```python
ranked_indices = scorer.rank_candidates(prompt_emb, candidate_embs)
best = candidates[ranked_indices[0]]
```

### 7. Save / load

```python
scorer.save("my_community.npz")
scorer = ManifoldScorer.load("my_community.npz")
```

---

## Citation

```bibtex
@article{gerard2026dgro,
  title   = {Density-Guided Response Optimization: Community-Grounded Alignment via Implicit Acceptance Signals},
  author  = {Gerard, Patrick and Volkova, Svitlana},
  journal = {ACM FAccT},
  year    = {2026},
}
```
