Metadata-Version: 2.4
Name: manifold-scorer
Version: 0.1.1
Summary: Local acceptance-manifold scoring for implicit RLHF preference pair generation
Author-email: Patrick Gerard <pgerard@isi.edu>
License: MIT
Project-URL: Homepage, https://github.com/patrikgerard/manifold-scorer
Keywords: RLHF,DPO,preference learning,alignment,density estimation,NLP
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: numpy>=1.24
Requires-Dist: scikit-learn>=1.2
Requires-Dist: scipy>=1.10
Provides-Extra: embed
Requires-Dist: sentence-transformers>=2.2; extra == "embed"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: black; extra == "dev"
Requires-Dist: ruff; extra == "dev"

# manifold-scorer

Local acceptance-manifold scoring for implicit RLHF/DPO preference pair generation.

Implementation of the local density scorer from Gerard & Volkova (2026), *"Density-Guided Response Optimization."*

The idea: responses a community accepts cluster in coherent, high-density regions of embedding space. Local density — conditioned on nearby conversation histories — reliably recovers human preference ordering, and can substitute for explicit preference annotations when building DPO training pairs.

---

## Install

```bash
pip install manifold-scorer

# if you want the built-in Embedder helper too:
pip install "manifold-scorer[embed]"
```

---

## Usage

### 1. Fit on accepted community responses

```python
from manifold import ManifoldScorer

# hist_embs: embeddings of conversation histories (N, d)
# resp_embs: embeddings of accepted responses (N, d)
# — these are your unlabeled community posts/replies

scorer = ManifoldScorer(k=150)
scorer.fit(hist_embs, resp_embs)
```

### 2. Generate DPO pairs from candidates

```python
# For each query you have multiple candidate responses
# cand_embs: shape (N_queries, n_candidates, d)

pairs = scorer.make_pairs(query_hist_embs, cand_embs)

# pairs["chosen_emb"]   — shape (N, d), highest-density candidate
# pairs["rejected_emb"] — shape (N, d), lowest-density candidate
# pairs["margin"]       — score gap; filter on this for data quality

# Feed directly to your DPO trainer:
high_quality = pairs["margin"] > threshold
dpo_chosen   = pairs["chosen_emb"][high_quality]
dpo_rejected = pairs["rejected_emb"][high_quality]
```

### 3. Score a single candidate

```python
score = scorer.score(history_emb, candidate_emb)
# log-density; higher = more aligned with community norms
```

### 4. Rank candidates for a query

```python
ranked_indices = scorer.rank_candidates(history_emb, candidates_emb)
best = candidates_emb[ranked_indices[0]]
```

### 5. Save / load

```python
scorer.save("my_community_scorer.npz")
scorer2 = ManifoldScorer.load("my_community_scorer.npz")
```

---

## Embedding your text

The scorer is embedding-agnostic — pass any float32 arrays. For a quick start with sentence-transformers:

```python
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("sentence-transformers/all-mpnet-base-v2")
hist_embs = model.encode(histories, normalize_embeddings=True)
resp_embs = model.encode(responses, normalize_embeddings=True)
```

---

## Citation

```bibtex
@article{gerard2026dgro,
  title   = {Density-Guided Response Optimization: Community-Grounded Alignment via Implicit Acceptance Signals},
  author  = {Gerard, Patrick and Volkova, Svitlana},
  journal = {ACM FAccT},
  year    = {2026},
}
```
