Metadata-Version: 2.4
Name: iflow-mcp_zilbonn-owasp-wstg-rag
Version: 0.1.0
Summary: A Retrieval-Augmented Generation (RAG) system that indexes the OWASP Web Security Testing Guide (WSTG) into a vector database, providing instant access to security testing methodologies via MCP (Model Context Protocol) for Claude Code integration.
Project-URL: Homepage, https://github.com/zilbonn/OWASP-WSTG-Rag
Project-URL: Repository, https://github.com/zilbonn/OWASP-WSTG-Rag
Project-URL: Documentation, https://github.com/zilbonn/OWASP-WSTG-Rag#readme
Project-URL: Issues, https://github.com/zilbonn/OWASP-WSTG-Rag/issues
Author-email: zilbonn <zilbonn@example.com>
License: Creative Commons Attribution-ShareAlike 4.0
Keywords: mcp,owasp,rag,security,web-security-testing,wstg
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Security
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Requires-Dist: beautifulsoup4>=4.12.0
Requires-Dist: chromadb>=0.4.0
Requires-Dist: httpx>=0.25.0
Requires-Dist: mcp>=1.0.0
Provides-Extra: dev
Requires-Dist: black; extra == 'dev'
Requires-Dist: flake8; extra == 'dev'
Requires-Dist: pytest; extra == 'dev'
Provides-Extra: embeddings
Requires-Dist: openai>=1.0.0; extra == 'embeddings'
Requires-Dist: sentence-transformers>=2.2.0; extra == 'embeddings'
Description-Content-Type: text/markdown

# OWASP WSTG RAG

A Retrieval-Augmented Generation (RAG) system that indexes the **OWASP Web Security Testing Guide (WSTG)** into a vector database, providing instant access to security testing methodologies via REST API and MCP (Model Context Protocol) for Claude Code integration.

## Features

- **Complete WSTG Coverage** - All 12 WSTG testing categories indexed and searchable
- **Semantic Search** - Find relevant testing methodologies using natural language queries
- **MCP Integration** - Direct integration with Claude Code for AI-assisted penetration testing
- **REST API** - HTTP endpoints for programmatic access
- **WSTG ID Lookup** - Retrieve complete test cases by WSTG identifier (e.g., `WSTG-INPV-05`)

## WSTG Categories

| Category | WSTG ID | Description |
|----------|---------|-------------|
| Information Gathering | WSTG-INFO | Fingerprinting, enumeration, mapping |
| Configuration | WSTG-CONF | Server/platform configuration testing |
| Identity Management | WSTG-IDNT | User registration, account provisioning |
| Authentication | WSTG-ATHN | Login, password policy, MFA testing |
| Authorization | WSTG-ATHZ | Privilege escalation, IDOR, access control |
| Session Management | WSTG-SESS | Session tokens, cookies, fixation |
| Input Validation | WSTG-INPV | SQLi, XSS, command injection, SSTI |
| Error Handling | WSTG-ERRH | Error messages, stack traces |
| Cryptography | WSTG-CRYP | TLS, encryption, hashing |
| Business Logic | WSTG-BUSL | Workflow bypass, file upload |
| Client-Side | WSTG-CLNT | DOM XSS, clickjacking, WebSockets |
| API Testing | WSTG-APIT | REST, GraphQL, API security |

## Quick Start

### 1. Install Dependencies

```bash
cd RAG_runner
pip install -r requirements.txt
```

### 2. Build the Database

```bash
python3 build_database.py
```

This will:
- Parse all OWASP WSTG HTML files
- Create semantic chunks for retrieval
- Build the ChromaDB vector database

### 3. Start the Server

```bash
python3 -m server.http_server
```

Server runs on `http://localhost:5004`

### 4. Test the API

```bash
# Health check
curl http://localhost:5004/health

# Search for SQL injection testing
curl -X POST http://localhost:5004/search \
  -H "Content-Type: application/json" \
  -d '{"query": "SQL injection testing methodology"}'

# Get specific WSTG test case
curl http://localhost:5004/wstg/WSTG-INPV-05
```

## REST API Endpoints

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/health` | GET | Health check |
| `/info` | GET | Database statistics |
| `/list` | GET | List all documents |
| `/categories` | GET | List categories and WSTG IDs |
| `/doc/{id}` | GET | Get document by ID |
| `/wstg/{id}` | GET | Get all chunks for WSTG ID |
| `/search` | POST | Semantic search |

### Search Request Body

```json
{
  "query": "SQL injection testing",
  "n_results": 5,
  "category": "input_validation",
  "wstg_id": "WSTG-INPV-05"
}
```

## Claude Code Integration (MCP)

Add to `~/.claude.json`:

```json
{
  "mcpServers": {
    "owasp-wstg-rag": {
      "command": "python3",
      "args": ["/path/to/OWASP_WSTG_Rag/RAG_runner/server/mcp_client.py"],
      "env": {
        "WSTG_RAG_URL": "http://localhost:5004"
      }
    }
  }
}
```

### MCP Tools

| Tool | Description |
|------|-------------|
| `search_wstg` | Search WSTG for testing methodologies |
| `search_test_methodology` | Search for how-to testing guides |
| `search_test_objectives` | Search for test objectives |
| `get_wstg_test_case` | Get complete test case by WSTG ID |
| `get_wstg_document` | Get document by ID |
| `list_wstg_categories` | List all categories and WSTG IDs |
| `wstg_health` | Health check |
| `wstg_info` | Database statistics |

### Example Usage in Claude Code

```python
# Search for SQL injection testing methodology
search_wstg("SQL injection testing methodology")

# Get specific test case
get_wstg_test_case("WSTG-INPV-05")

# Search within a category
search_wstg("authentication bypass", category_filter="authentication")

# Get test objectives for IDOR
search_test_objectives("IDOR insecure direct object reference")
```

## Project Structure

```
OWASP_WSTG_Rag/
├── README.md
├── CLAUDE.md                    # Claude Code project guide
├── raw_data/                    # OWASP WSTG HTML source files
│   ├── 01-Information_Gathering/
│   ├── 02-Configuration_and_Deployment_Management_Testing/
│   ├── 03-Identity_Management_Testing/
│   ├── 04-Authentication_Testing/
│   ├── 05-Authorization_Testing/
│   ├── 06-Session_Management_Testing/
│   ├── 07-Input_Validation_Testing/
│   ├── 08-Testing_for_Error_Handling/
│   ├── 09-Testing_for_Weak_Cryptography/
│   ├── 10-Business_Logic_Testing/
│   ├── 11-Client-side_Testing/
│   └── 12-API_Testing/
└── RAG_runner/
    ├── build_database.py        # Main build pipeline
    ├── requirements.txt
    ├── parsers/
    │   └── wstg_parser.py       # HTML parser for WSTG
    ├── chunking/
    │   └── chunker.py           # Semantic chunking
    ├── server/
    │   ├── vector_store.py      # ChromaDB wrapper
    │   ├── http_server.py       # REST API server
    │   └── mcp_client.py        # MCP tools for Claude Code
    └── data/
        ├── processed/           # Intermediate JSON files
        └── chroma_db/           # Vector database
```

## Architecture

```
┌─────────────────────────────────────────────────────────────────┐
│                    OWASP WSTG HTML Files                        │
│                      (raw_data/*.html)                          │
└────────────────────────────┬────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│                     wstg_parser.py                              │
│              Parse HTML → Structured JSON                       │
└────────────────────────────┬────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│                       chunker.py                                │
│              Create Semantic Chunks for RAG                     │
└────────────────────────────┬────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│                   ChromaDB Vector Store                         │
│                 (data/chroma_db/)                               │
└────────────────────────────┬────────────────────────────────────┘
                             │
              ┌──────────────┴──────────────┐
              ▼                             ▼
┌──────────────────────────┐   ┌──────────────────────────┐
│    http_server.py        │   │    mcp_client.py         │
│    REST API :5004        │   │    MCP for Claude Code   │
│                          │   │                          │
│  GET  /health            │   │  search_wstg()           │
│  GET  /info              │   │  get_wstg_test_case()    │
│  GET  /wstg/{id}         │   │  search_test_methodology │
│  POST /search            │   │  list_wstg_categories()  │
└──────────────────────────┘   └──────────────────────────┘
```

## Use Cases

### AI-Assisted Penetration Testing

Integrate with Claude Code to get instant access to OWASP testing methodologies during security assessments:

```
User: "How do I test for SQL injection?"

Claude: [Queries WSTG RAG]
→ Returns WSTG-INPV-05 methodology with:
  - Test objectives
  - Step-by-step testing procedures
  - Example payloads
  - Tools to use
```

### Automated Security Testing

Use the REST API to integrate WSTG methodologies into automated security pipelines:

```python
import requests

# Get testing methodology for current test
response = requests.post('http://localhost:5004/search', json={
    'query': 'session fixation testing',
    'n_results': 3
})
methodology = response.json()['results']
```

### Security Training

Quick reference for security testing methodologies during training or CTF challenges.

## Requirements

- Python 3.8+
- ChromaDB
- BeautifulSoup4
- httpx
- MCP SDK (for Claude Code integration)

## License

This project uses content from the [OWASP Web Security Testing Guide](https://owasp.org/www-project-web-security-testing-guide/), which is licensed under [Creative Commons Attribution-ShareAlike 4.0](https://creativecommons.org/licenses/by-sa/4.0/).

## Related Projects

- [OWASP WSTG](https://owasp.org/www-project-web-security-testing-guide/) - Source material
- [Claude Code](https://claude.ai/claude-code) - AI coding assistant with MCP support
- [ChromaDB](https://www.trychroma.com/) - Vector database for embeddings
