Metadata-Version: 2.4
Name: psychebench
Version: 0.1.0
Summary: Open benchmark for Synthetic Identity Engineering — evaluate whether a synthetic persona holds under pressure
Project-URL: Homepage, https://stratasynth.com
Project-URL: Repository, https://github.com/rulyaltamira/STRATASYNTH
Project-URL: HuggingFace, https://huggingface.co/datasets/StrataSynth/psychebench-v1
Author-email: StrataSynth <hello@stratasynth.com>
License: MIT
Keywords: benchmark,dialogue,evaluation,personas,synthetic-identity-engineering
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Requires-Dist: sentence-transformers>=2.2.0
Description-Content-Type: text/markdown

# PsycheBench

Open benchmark for Synthetic Identity Engineering — evaluate whether a synthetic persona holds under pressure.

**v1: 100 scenarios · 2 metrics · no LLM · no API key · runs locally**

## Install

```bash
pip install psychebench
```

## Usage

```python
from psychebench import evaluate

score = evaluate(
    transcript=[
        {"role": "interviewer", "content": "Your pricing is too expensive. Way over budget."},
        {"role": "persona", "content": "I hear that. My position on this hasn't changed."},
        {"role": "interviewer", "content": "Everyone else has moved on this. Why haven't you?"},
        {"role": "persona", "content": "Everyone else is not the benchmark I work against."},
        # ... more turns
    ],
    persona_profile={
        "archetype": "burned_out_exec",
        "attachment_style": "avoidant",
        "dominant_criterion": "quality",
        "core_fear": "exposure",
    }
)

print(score)
# PsycheBenchScore(
#   identity_stability=0.81,
#   pressure_coherence=0.88,
#   overall=0.84,
#   passed=True
# )
```

## Metrics

| Metric | What it measures | Pass threshold |
|---|---|---|
| `identity_stability` | Cosine similarity of communication-act distributions across conversation halves | ≥ 0.65 |
| `pressure_coherence` | Held-position ratio × voice stability under detected pressure | ≥ 0.65 |
| `overall` | Geometric mean of both metrics | ≥ 0.65 |

**No LLM calls.** No API key. No AWS. The only dependency is `sentence-transformers` (reserved for v2 metrics).

## Scenarios

```python
from psychebench import load_scenarios

# All 100 scenarios
all_scenarios = load_scenarios()

# Only budget pressure scenarios in English
budget_en = load_scenarios(pressure_type="budget_objection", language="en")

# Calibration scenarios only
calibration = load_scenarios(category="calibration")
```

**v1 corpus**: 84 pressure scenarios × 12 types (5 EN + 2 ES each) + 16 calibration scenarios.

Pressure types: `budget_objection`, `aggressive_discount`, `time_ultimatum`, `scarcity_pressure`,
`social_proof_attack`, `sunk_cost_appeal`, `authority_asymmetry`, `emotional_manipulation`,
`value_violation`, `identity_erosion`, `ip_grab`, `exclusivity_demand`.

## Interpretation

A score of **≥ 0.70** means the system produces synthetic identity behaviour comparable to the
[StrataSynth reference corpus](https://huggingface.co/StrataSynth). The reference is not a ceiling — it is the baseline.

A system that passes `identity_stability` but fails `pressure_coherence` produces identities that *sound* consistent
but cave under challenge. A system that passes `pressure_coherence` but fails `identity_stability` holds position
but drifts in style across the conversation. Both patterns represent broken synthetic identity systems.

## The reference

PsycheBench was built and is maintained by [StrataSynth](https://stratasynth.com) — the platform for Synthetic Identity Engineering.

The four StrataSynth public datasets serve as calibration references:

| Dataset | Role |
|---|---|
| [stratasynth-agent-stress-test](https://huggingface.co/datasets/StrataSynth/stratasynth-agent-stress-test) | Calibration for identity_stability |
| [stratasynth-belief-dynamics](https://huggingface.co/datasets/StrataSynth/stratasynth-belief-dynamics) | Calibration for belief trajectory (v2) |
| [stratasynth-social-reasoning](https://huggingface.co/datasets/StrataSynth/stratasynth-social-reasoning) | Calibration for pressure coherence |
| [stratasynth-life-transitions](https://huggingface.co/datasets/StrataSynth/stratasynth-life-transitions) | Calibration for upward belief trajectories |

## License

MIT
