Metadata-Version: 2.4
Name: claude-harness
Version: 3.2.2
Summary: Production-ready autonomous coding harness using Claude Code SDK
Home-page: https://github.com/nirmalarya/claude-harness
Author: Nirmalarya
Author-email: Nirmalarya <hello@nirmalarya.com>
License: MIT
Project-URL: Homepage, https://github.com/nirmalarya/claude-harness
Project-URL: Documentation, https://github.com/nirmalarya/claude-harness/blob/main/README.md
Project-URL: Repository, https://github.com/nirmalarya/claude-harness
Project-URL: Issues, https://github.com/nirmalarya/claude-harness/issues
Keywords: claude,autonomous,coding,agent,ai,sdk
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: claude-code-sdk>=0.0.25
Requires-Dist: pyyaml>=6.0
Dynamic: author
Dynamic: home-page
Dynamic: requires-python

# claude-harness

Production-ready autonomous coding harness using Claude Code SDK. Build complete applications autonomously with a two-agent pattern (initializer + coding agents).

**Proven Success:** Built [SHERPA v1.0](https://github.com/nirmalarya/sherpa) - 165 features, production-ready, A- grade quality.

## Prerequisites

**Required:** Install the latest versions of both Claude Code and the Claude Agent SDK:

```bash
# Install Claude Code CLI (latest version required)
npm install -g @anthropic-ai/claude-code

# Install Python dependencies
pip install -r requirements.txt
```

Verify your installations:
```bash
claude --version  # Should be latest version
pip show claude-code-sdk  # Check SDK is installed
```

**OAuth Token:** Generate and set your Claude Code OAuth token:
```bash
# Generate the token using Claude Code CLI
claude setup-token

# Set the environment variable
export CLAUDE_CODE_OAUTH_TOKEN='your-oauth-token-here'
```

## Installation

```bash
# Install from source
cd /path/to/claude-harness
pip install -e .

# Verify installation
claude-harness --version
```

## Quick Start

```bash
# Set OAuth token (required)
export CLAUDE_CODE_OAUTH_TOKEN='your-token-here'

# Build a new app
claude-harness --project-dir ./my_project

# Test with limited iterations
claude-harness --project-dir ./my_project --max-iterations 3

# Enhancement mode (existing projects)
claude-harness --mode enhancement --project-dir ./existing-app --spec ./features.txt
```

📖 **[Read the full User Guide →](USER_GUIDE.md)**

## What's New in v3.1.0

✅ **Production Reliability Features:**
- **Triple Timeout Protection** - 15/10/120 min timeouts prevent infinite hangs
- **Retry + Skip Logic** - Auto-retry failed features (3 attempts), skip after max failures
- **Loop Detection** - Prevents infinite loops and repeated file reads
- **Comprehensive Error Logging** - Structured error logs in `.claude/errors.json`
- **E2E Validation Enforcement** - Commits blocked without E2E tests (CRITICAL BUG FIX)
- **MCP Auto-Configuration** - Context7 and Puppeteer servers pre-configured
- **Security Hooks** - Secrets scanning, bash allowlist, filesystem restrictions

📖 **[Full v3.1.0 changelog →](CHANGELOG_v3.1.0.md)**

## Important Timing Expectations

> **Warning: This demo takes a long time to run!**

- **First session (initialization):** The agent generates a `feature_list.json` with 200 test cases. This takes several minutes and may appear to hang - this is normal. The agent is writing out all the features.

- **Subsequent sessions:** Each coding iteration can take **5-15 minutes** depending on complexity.

- **Full app:** Building all 200 features typically requires **many hours** of total runtime across multiple sessions.

**Tip:** The 200 features parameter in the prompts is designed for comprehensive coverage. If you want faster demos, you can modify `prompts/initializer_prompt.md` to reduce the feature count (e.g., 20-50 features for a quicker demo).

## How It Works

### Two-Agent Pattern

1. **Initializer Agent (Session 1):** Reads `app_spec.txt`, creates `feature_list.json` with 200 test cases, sets up project structure, and initializes git.

2. **Coding Agent (Sessions 2+):** Picks up where the previous session left off, implements features one by one, and marks them as passing in `feature_list.json`.

### Session Management

- Each session runs with a fresh context window
- Progress is persisted via `feature_list.json` and git commits
- The agent auto-continues between sessions (3 second delay)
- Press `Ctrl+C` to pause; run the same command to resume

## Security Model

This demo uses a defense-in-depth security approach (see `security.py` and `client.py`):

1. **OS-level Sandbox:** Bash commands run in an isolated environment
2. **Filesystem Restrictions:** File operations restricted to the project directory only
3. **Bash Allowlist:** Only specific commands are permitted:
   - File inspection: `ls`, `cat`, `head`, `tail`, `wc`, `grep`
   - Node.js: `npm`, `node`
   - Version control: `git`
   - Process management: `ps`, `lsof`, `sleep`, `pkill` (dev processes only)

Commands not in the allowlist are blocked by the security hook.

## Project Structure

```
claude-harness/
├── autonomous_agent.py       # Main entry point
├── agent.py                  # Agent session logic
├── client.py                 # Claude SDK client configuration
├── security.py               # Bash command allowlist and validation
├── progress.py               # Progress tracking utilities
├── prompts.py                # Prompt loading utilities
├── prompts/
│   ├── app_spec.txt          # Application specification
│   ├── initializer_prompt.md # First session prompt
│   └── coding_prompt.md      # Continuation session prompt
└── requirements.txt          # Python dependencies
```

## Generated Project Structure

After running, your project directory will contain:

```
my_project/
├── feature_list.json         # Test cases (source of truth)
├── app_spec.txt              # Copied specification
├── init.sh                   # Environment setup script
├── claude-progress.txt       # Session progress notes
├── .claude_settings.json     # Security settings
└── [application files]       # Generated application code
```

## Running the Generated Application

After the agent completes (or pauses), you can run the generated application:

```bash
cd generations/my_project

# Run the setup script created by the agent
./init.sh

# Or manually (typical for Node.js apps):
npm install
npm run dev
```

The application will typically be available at `http://localhost:3000` or similar (check the agent's output or `init.sh` for the exact URL).

## Command Line Options

| Option | Description | Default |
|--------|-------------|---------|
| `--project-dir` | Directory for the project | `./autonomous_demo_project` |
| `--mode` | Mode: greenfield/enhancement/bugfix | `greenfield` |
| `--spec` | Specification file path | None |
| `--max-iterations` | Max agent iterations | Unlimited |
| `--model` | Claude model to use | `claude-sonnet-4-5-20250929` |
| `--session-timeout` | Session timeout (minutes) | 120 |
| `--stall-timeout` | Stall timeout (minutes) | 10 |
| `--max-retries` | Max retry attempts per feature | 3 |
| `--version` | Show version and exit | - |
| `--help` | Show help and exit | - |

📖 **[Full command reference in User Guide →](USER_GUIDE.md#command-reference)**

## Customization

### Changing the Application

Edit `prompts/app_spec.txt` to specify a different application to build.

### Adjusting Feature Count

Edit `prompts/initializer_prompt.md` and change the "200 features" requirement to a smaller number for faster demos.

### Modifying Allowed Commands

Edit `security.py` to add or remove commands from `ALLOWED_COMMANDS`.

## Troubleshooting

**"Appears to hang on first run"**
This is normal. The initializer agent is generating 200 detailed test cases, which takes significant time. Watch for `[Tool: ...]` output to confirm the agent is working.

**"Command blocked by security hook"**
The agent tried to run a command not in the allowlist. This is the security system working as intended. If needed, add the command to `ALLOWED_COMMANDS` in `security.py`.

**"OAuth token not set"**
Run `claude setup-token` to generate your token, then ensure `CLAUDE_CODE_OAUTH_TOKEN` is exported in your shell environment.

## License

Internal Anthropic use.
