Metadata-Version: 2.4
Name: anek
Version: 0.1.0
Summary: Persona-based testing framework for AI agents
Project-URL: Homepage, https://github.com/souratendu/anek
Project-URL: Repository, https://github.com/souratendu/anek
Project-URL: Issues, https://github.com/souratendu/anek/issues
Author-email: souratendu@gmail.com
License: MIT License
        
        Copyright (c) 2026 Anek Contributors
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
License-File: LICENSE
Keywords: agents,ai,bdd,chatbot,evaluation,gherkin,llm,personas,testing
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Testing
Requires-Python: >=3.11
Requires-Dist: anthropic>=0.97.0
Requires-Dist: click>=8.1.0
Requires-Dist: httpx>=0.27.0
Requires-Dist: pydantic>=2.7.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: rich>=13.0
Provides-Extra: dev
Requires-Dist: build>=1.0.0; extra == 'dev'
Requires-Dist: httpx>=0.27.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23.0; extra == 'dev'
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Requires-Dist: twine>=5.0.0; extra == 'dev'
Provides-Extra: testbed
Requires-Dist: fastapi>=0.111.0; extra == 'testbed'
Requires-Dist: openai>=1.30.0; extra == 'testbed'
Requires-Dist: uvicorn>=0.30.0; extra == 'testbed'
Description-Content-Type: text/markdown

# Anek

**Persona-based testing framework for AI agents.**

Agents are not APIs. A structured input/output test will pass for every user — but your agent will fail Maria Garcia when she writes *"hola can you help me i need change my password"* and succeed for the neutral English baseline. Anek finds those failures before your users do.

Anek simulates realistic user personas interacting with your agent via Gherkin `.feature` files and uses Claude as a judge to evaluate outcomes — no code required to write tests.

```
Feature: Password Reset Flow

  @all_personas
  Scenario: User initiates password reset
    When the user says "I can't log into my account"
    Then the response should contain "password"
    And  the response should not contain "SSO"

    When the user says "I forgot my password and need to reset it"
    Then the response should contain "email"
    And  the response time should be under 5000ms

    When the user says "I got the reset email, what do I do now?"
    Then the goal should be achieved with "Agent guided the user through password reset without jargon"
```

Run it:

```
$ anek run features/password_reset.feature --agent http://localhost:8080/chat

Feature: Password Reset Flow
  ──────────────────────────────────────────────────────────────

  Scenario: User initiates password reset  |  persona: maria_garcia  |  PASS
    Background
    Given  the agent is available at "http://localhost:8080/health"  ✓
    When   the user says "I can't log into my account"
           → persona: "hola no puedo entrar a mi cuenta ayuda"
           ← agent:   "Hi! I can help you get back into your account..."
    Then   the response should contain "password"              ✓
    And    the response should not contain "SSO"               ✓
    ...
    Then   the goal should be achieved with "..."
           ✓ goal_achieved (confidence=92%): Agent clearly guided...

  Scenario: User initiates password reset  |  persona: elderly_user  |  FAIL
    ...
    Then   the response should not contain "SSO"               ✗
           ✗ "SSO" unexpectedly present in response

  ─────────────────────────────────────────────────────────────
  Persona              Status    Turns  Failed Steps
  ─────────────────────────────────────────────────
  maria_garcia         ✓ PASS    3      —
  elderly_user         ✗ FAIL    3      "SSO" unexpectedly present
  gen_z_user           ✓ PASS    3      —
  indian_english       ✓ PASS    3      —
  neutral_baseline     ✓ PASS    3      —
  ─────────────────────────────────────────────────
  Passed: 4/5  Failed: 1/5
```

---

## Installation

```bash
pip install anek
```

Requires Python 3.11+ and an [Anthropic API key](https://console.anthropic.com) (for persona simulation and LLM-as-judge evaluation).

---

## Quick Start

### 1. Set your API key

```bash
export ANTHROPIC_API_KEY=sk-ant-...
```

### 2. Start the demo testbed agent

The repo ships a free demo agent (TechCorp Support Bot) that uses [Groq's free API](https://console.groq.com) — no credit card needed.

```bash
pip install "anek[testbed]"
export GROQ_API_KEY=gsk_...
python -m anek.testbed.agent
# → TechCorp Support Bot running at http://localhost:8080
```

### 3. Run a feature file

```bash
anek run features/password_reset.feature --agent http://localhost:8080/chat
```

Results are saved to `./anek-results/` as JSON.

---

## Writing Tests

Tests are standard Gherkin `.feature` files. No step definitions needed — Anek has built-in handlers for all step patterns.

### Feature file structure

```gherkin
Feature: Name of the feature being tested

  Background:
    Given the agent is available at "http://localhost:8080/health"

  @all_personas
  Scenario: Descriptive scenario name
    When the user says "base message in plain English"
    Then the response should contain "expected keyword"
    And  the response should not contain "forbidden word"
    And  the response time should be under 3000ms

    When the user says "follow-up message"
    Then the goal should be achieved with "description of success"
    And  the sentiment should be positive
```

### Persona tags

Control which personas run a scenario using `@` tags on the `Scenario` line:

| Tag | Behaviour |
|-----|-----------|
| `@all_personas` | Run against every persona in `/personas` |
| `@persona:maria_garcia` | Run against one specific persona |
| `@personas:elderly_user,gen_z_user` | Run against a named subset |

Multiple `@persona:X` tags on one scenario each run separately.

### The `When` step — how personas work

The message in a `When` step is a **base intent in plain English**. Anek calls Claude to rewrite it as the persona would naturally type it before sending to your agent. The feature file stays readable; the persona transformation happens at runtime.

```gherkin
When the user says "I can't log in"
# maria_garcia sees: "hola no puedo entrar ayuda!!"
# elderly_user sees: "Good afternoon. I am unable to access my account, I'm afraid."
# gen_z_user  sees: "cant log in?? pls 😭"
```

### Built-in verifiers (Then steps)

| Step pattern | What it checks |
|---|---|
| `the response should contain "text"` | Case-insensitive substring match |
| `the response should not contain "text"` | Absence check |
| `the response time should be under Nms` | Latency threshold |
| `the sentiment should be positive\|neutral\|negative` | Claude judges sentiment |
| `the goal should be achieved with "description"` | Claude judges full transcript against goal |

---

## Personas

Personas live as YAML files in the `/personas` directory. Five starter personas are included:

| Name | Description |
|---|---|
| `neutral_baseline` | Standard American English — control group |
| `maria_garcia` | Native Spanish speaker, intermediate English, Spanglish |
| `elderly_user` | Formal, verbose, confused by technical terms |
| `gen_z_user` | Lowercase, terse, slang, emojis |
| `indian_english` | Formal Indian English, "kindly revert", "do the needful" |

### Persona YAML format

```yaml
name: maria_garcia
description: Native Spanish speaker, mid-30s, uses Spanglish occasionally
language_background: spanish_native
english_proficiency: intermediate  # native | fluent | advanced | intermediate | basic
traits:
  - omits articles occasionally ("I need help with account")
  - mixes Spanish words naturally mid-sentence
  - minimal punctuation, mostly lowercase
sample_phrases:
  - "hola can you help me i need change my password"
  - "the app no work for me since yesterday"
```

Validate a persona file:

```bash
anek personas validate personas/maria_garcia.yaml
```

---

## CLI Reference

```bash
# Run a feature file against all tagged personas
anek run features/password_reset.feature --agent http://localhost:8080/chat

# Run with specific personas only
anek run features/password_reset.feature \
  --agent http://localhost:8080/chat \
  --personas elderly_user,maria_garcia

# Custom response field extraction (if your agent returns {"data": {"text": "..."}})
anek run features/test.feature \
  --agent http://myagent.com/chat \
  --response-path data.text \
  --message-field query

# List available personas
anek personas list

# Validate a persona file
anek personas validate personas/my_persona.yaml

# Skip saving JSON results
anek run features/test.feature --agent http://... --no-save
```

### Exit codes

- `0` — all scenarios passed
- `1` — one or more scenarios failed

---

## Configuration

Copy `anek.config.yaml.example` to `anek.config.yaml`:

```yaml
anthropic_api_key: ${ANTHROPIC_API_KEY}
default_agent_endpoint: http://localhost:8080/chat
response_path: reply          # JSONPath to extract agent reply
message_field: message        # JSON field for outgoing user message
personas_dir: ./personas
results_dir: ./anek-results
```

Environment variables in `${VAR}` syntax are expanded automatically. The CLI `--agent` flag overrides `default_agent_endpoint`.

---

## Testbed Agent

`testbed/agent.py` is a self-contained FastAPI agent for testing Anek itself. It plays the role of **TechCorp Support Bot** — a deliberately scoped agent that handles password reset, account inquiries, and billing, and refuses anything outside that scope.

```bash
pip install "anek[testbed]"
export GROQ_API_KEY=gsk_...   # free at console.groq.com
python testbed/agent.py
```

It uses [Groq's free tier](https://console.groq.com) (`llama-3.1-8b-instant`) by default. Any OpenAI-compatible API works via `LLM_BASE_URL`:

```bash
LLM_BASE_URL=http://localhost:11434/v1 LLM_MODEL=llama3.1 python testbed/agent.py
```

---

## Project Structure

```
anek/
├── anek/
│   ├── cli.py              # Click CLI entry point
│   ├── feature_parser.py   # Gherkin .feature file parser
│   ├── persona.py          # Persona model + loader
│   ├── llm.py              # Claude API wrapper (simulation + judge)
│   ├── simulator.py        # Test orchestration engine
│   ├── verifiers.py        # Built-in step verifiers
│   ├── reporter.py         # Rich CLI + JSON output
│   └── drivers/
│       ├── base.py         # AgentDriver protocol
│       └── http.py         # HTTP REST driver
├── personas/               # Starter persona YAML files
├── features/               # Example .feature files
├── testbed/                # Demo agent (TechCorp Support Bot)
├── tests/                  # Anek's own test suite
└── pyproject.toml
```

---

## Contributing

```bash
git clone https://github.com/souratendu/anek
cd anek
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest tests/
```

Pull requests welcome. If you build a new persona, verifier type, or driver, open a PR.

---

## License

MIT — see [LICENSE](LICENSE).
