Metadata-Version: 2.4
Name: repoguide-ai
Version: 0.1.0
Summary: Scan any codebase and generate onboarding docs automatically
Project-URL: Homepage, https://github.com/M33p5t3r/codescout-py
Project-URL: Repository, https://github.com/M33p5t3r/codescout-py
Author: Abram Manaka
License-Expression: MIT
License-File: LICENSE
Keywords: ai,cli,codebase,documentation,mcp,onboarding
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Documentation
Requires-Python: >=3.11
Requires-Dist: anthropic>=0.40.0
Requires-Dist: mcp[cli]>=1.0.0
Requires-Dist: openai>=1.0.0
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.40.0; extra == 'anthropic'
Provides-Extra: openai
Requires-Dist: openai>=1.0.0; extra == 'openai'
Description-Content-Type: text/markdown

# RepoGuide

A CLI tool that scans any local codebase and generates a detailed onboarding document — the kind of walkthrough a senior dev would write for a new team member, produced automatically.

Works with **Anthropic (Claude)**, **OpenAI (GPT)**, or **Ollama (local models, no API key needed)**.

---

## What This Demonstrates

**Business problem solved:** Onboarding to a new codebase is slow. Architecture lives in someone's head, entry points aren't obvious, and setup instructions are outdated or missing. RepoGuide reads the repo and produces a structured markdown guide covering architecture, tech stack, entry points, key patterns, setup instructions, and areas of complexity.

**Why this matters for AI engineering:** This is AI applied to a real developer workflow — not a chatbot demo. It demonstrates MCP tool orchestration, structured LLM output generation, a provider abstraction that makes the LLM swappable, and a two-pass analysis strategy that keeps token usage efficient by scanning first and reading selectively.

---

## How It Works

RepoGuide uses a two-pass architecture:

**Pass 1 — Structural scan (no LLM, no tokens spent):**
An MCP server scans the directory tree, detects the tech stack from config files (package.json, requirements.txt, Cargo.toml, etc.), counts file types, and identifies likely entry points based on framework-specific knowledge.

**Pass 2 — Selective deep read (targeted, token-efficient):**
Based on what Pass 1 found, the agent reads only the files that matter — entry points, config files, READMEs. The LLM receives the full tree structure plus contents of key files, not the entire codebase.

**Generation:**
Everything gathered is sent to the configured LLM provider with a structured system prompt. The output is a comprehensive ONBOARDING.md written in a direct, specific tone with actual file names, function references, and code patterns.

---

## Quick Start

### 1. Install

**From PyPI:**
```bash
pip install repoguide
```

**From source:**
```bash
git clone https://github.com/M33p5t3r/codescout-py.git
cd codescout-py
pip install .
```

### 2. Run RepoGuide

**With Anthropic (default):**
```bash
set ANTHROPIC_API_KEY=sk-ant-...          # Windows
export ANTHROPIC_API_KEY=sk-ant-...       # Mac/Linux
repoguide
```

**With OpenAI:**
```bash
set OPENAI_API_KEY=sk-...                 # Windows
export OPENAI_API_KEY=sk-...              # Mac/Linux
repoguide --provider openai
```

**With Ollama (local, no API key):**
```bash
ollama pull llama3.1                      # Download a model first
repoguide --provider ollama
repoguide --provider ollama --model mistral   # Use a different model
```

### 3. Follow the Prompts

You'll be asked for:
- **Repository path** — absolute path to any local repo
- **Ignore patterns** — optional comma-separated folder names to skip

RepoGuide scans the repo, shows what it detected, asks you to confirm, then generates and saves `ONBOARDING.md` in the target repo's root.

---

## Architecture

```
┌─────────────────────────────────────────────────────┐
│                  CLI (cli.py)                        │
│                                                     │
│  1. Get repo path from user                         │
│  2. Connect to MCP server                           │
│  3. Call scan_repo → display detected stack          │
│  4. Confirm with user                               │
│  5. Call detect_entry_points → get reading list      │
│  6. Call read_file on each entry point               │
│  7. Send everything to LLM → generate markdown       │
│  8. Save ONBOARDING.md to the target repo            │
└────────────┬──────────────────────┬─────────────────┘
             │ MCP Protocol         │ LLM Call
             ▼                      ▼
┌────────────────────────┐  ┌──────────────────────────┐
│  MCP Server            │  │  Provider (providers.py)  │
│  (repo_server.py)      │  │                          │
│                        │  │  AnthropicProvider       │
│  scan_repo             │  │  OpenAIProvider          │
│  read_file             │  │  OllamaProvider          │
│  detect_entry_points   │  │                          │
└────────────────────────┘  └──────────────────────────┘
```

### MCP Server Tools

| Tool | Purpose | Token Cost |
|------|---------|-----------|
| `scan_repo` | Walk directory tree, detect stack from config files, count file types | Zero (pure Python) |
| `read_file` | Read a specific file with max line guard | Proportional to file size |
| `detect_entry_points` | Suggest key files based on detected frameworks | Zero (pure Python) |

### LLM Providers

| Provider | Command | API Key Required | Best For |
|----------|---------|-----------------|----------|
| Anthropic | `--provider anthropic` | Yes (`ANTHROPIC_API_KEY`) | Best output quality (default) |
| OpenAI | `--provider openai` | Yes (`OPENAI_API_KEY`) | Alternative cloud provider |
| Ollama | `--provider ollama` | No | Offline use, privacy, free |

Override the default model with `--model`:
```bash
repoguide --provider anthropic --model claude-opus-4-6
repoguide --provider openai --model gpt-4o-mini
repoguide --provider ollama --model codellama
```

### Supported Frameworks

The stack detection recognizes: Next.js, React, Vue, Nuxt, SvelteKit, Express, NestJS, FastAPI, Flask, Django, Streamlit, Astro, Remix, Gatsby, and generic Node/Python projects. Adding a new framework means adding entries to the detection maps — no logic changes needed.

---

## Example Output

Run RepoGuide against any local repo to generate a full `ONBOARDING.md`. The output covers architecture, tech stack, entry points, key patterns, setup instructions, and areas of complexity — written in a direct, specific tone with actual file names and function references.

---

## Adding a New Provider

1. Create a class in `providers.py` that inherits from `LLMProvider`
2. Implement `generate(system_prompt, user_message, max_tokens) -> str`
3. Implement `validate_config() -> str | None`
4. Add it to the `PROVIDERS` dict

The MCP server, scanning logic, and output format are all provider-agnostic.

---

## Design Principles

RepoGuide uses the MCP client/server pattern (FastMCP framework) applied to a developer tooling problem. The provider abstraction demonstrates clean separation between orchestration logic and model calls — the LLM is a swappable component, not the product.
