Metadata-Version: 2.4
Name: docpilot-cli
Version: 1.0.4
Summary: A local-first RAG pipeline CLI tool
Requires-Python: >=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: bs4>=0.0.2
Requires-Dist: beautifulsoup4>=4.12.0
Requires-Dist: langchain>=0.2.0
Requires-Dist: langchain-chroma>=0.1.0
Requires-Dist: langchain-ollama>=0.1.0
Requires-Dist: ollama>=0.2.0
Requires-Dist: pandas>=2.0.0
Requires-Dist: typer>=0.12.0
Requires-Dist: httpx>=0.27.0
Requires-Dist: tomli-w>=1.0.0
Requires-Dist: pypdf>=5.0.0
Requires-Dist: pyfiglet>=0.8.0
Requires-Dist: rich>=13.0.0
Dynamic: license-file

# 🚀 [DocPilot CLI](https://pypi.org/project/docpilot-cli/)

[![PyPI - Version](https://img.shields.io/pypi/v/docpilot-cli?color=blue&style=flat-square)](https://pypi.org/project/docpilot-cli/)
![Python Version](https://img.shields.io/badge/python-3.12%2B-blue?style=flat-square)
![License](https://img.shields.io/badge/license-MIT-green?style=flat-square)

**DocPilot** is a lightning-fast, local-first CLI for document ingestion and interactive question-answering. Powered by [Ollama](https://ollama.com/) and Chroma, it allows you to ingest websites, PDFs, and CSVs directly from your terminal and chat with your documents—keeping 100% of your data safely on your own machine.

It’s built for practical developer workflows: crawl sites concurrently, prepare chunks with multi-threading, and iterate rapidly without ever paying for a cloud API.

---

## ✨ Features

- **🔒 100% Local**: No data ever leaves your machine. Powered by Ollama.
- **⚡ Interactive Setup Wizard**: Get up and running instantly with smart model auto-detection.
- **🌐 Universal Ingestion**: Seamlessly ingest Website URLs, XML Sitemaps, PDFs, and CSVs.
- **🚀 Concurrent Processing**: Lightning-fast crawling and multi-threaded document chunking.
- **🎛️ Performance Profiles**: Switch between `fast`, `balanced`, and `quality` inference speeds on the fly.
- **🎨 Beautiful Terminal UI**: Rich markdown rendering, ASCII art, and intuitive progress bars.

---

## 📦 Installation

DocPilot is available on PyPI! You can install it globally using `pip`, `uv`, or `pipx`.

```bash
# Recommended: Install using uv or pipx
uv tool install docpilot-cli

# Or via standard pip
pip install docpilot-cli

# Optional: Add PDF parsing support
pip install "docpilot-cli[pdf]"
```

### Prerequisites

1. **Python 3.12+**
2. **[Ollama](https://ollama.com/)**: Installed and running in the background.
3. Pull your preferred models:

```bash
ollama pull qwen2.5:latest
ollama pull mxbai-embed-large:335m
```

---

## 🛠️ Quick Start

The very first time you run a DocPilot command, it will launch the **Interactive Setup Wizard** to help you configure your chat and embedding models.

### 1. Ingest Knowledge

Point DocPilot to any documentation site, sitemap, PDF, or CSV:

```bash
# Crawl a website
docpilot ingest "https://docs.python.org/3/" --max-pages 100 --workers 16

# Ingest a local PDF
docpilot ingest "./docs/engineering_handbook.pdf"

# Ingest a CSV
docpilot ingest "./data/faq.csv"
```

> [!TIP]
> If you installed via standard `pip` and get a "command not found" error because your binary path isn't configured, you can always run docpilot by prefixing commands with `python -m docpilot` (e.g., `python -m docpilot ingest ...`).

### 2. Ask Questions

Query your newly created local knowledge base:

```bash
docpilot ask "How do I create a virtual environment?"
```

---

## 🧰 CLI Command Reference

Manage your configuration, models, and local database with ease.

### `docpilot setup`

Re-run the interactive setup wizard at any time to change your default models.

### `docpilot project`

Switch to a different project or check the active project. Each project has its own isolated vector database.

```bash
docpilot project
docpilot project <project-name>
```

### `docpilot clear`

Wipe your local Chroma vector database to start fresh. Prompts for safety confirmation.

### `docpilot speed [profile]`

Adjust the retrieval and generation settings for your desired use case.

- `fast`: Lower latency, shorter context limits.
- `balanced`: Default trade-off.
- `quality`: Larger context, more comprehensive answers, slower inference.

### `docpilot model`

Manually manage your Ollama models without the interactive setup wizard.

```bash
docpilot model list
docpilot model set <chat-model>
docpilot model setembed <embedding-model>
```

### `docpilot render`

Parse and beautifully render any markdown file or text string directly in your terminal.

### `docpilot show`

Display your current project version and configuration in beautiful ASCII art.

---

## 🏗️ Architecture Under the Hood

DocPilot employs an optimized RAG (Retrieval-Augmented Generation) pipeline:

1. **Ingestion**: Native Python extractors (BeautifulSoup4, `csv`, `pypdf`) parse the raw data.
2. **Chunking**: Multi-threaded chunkers slice the documents into semantically coherent pieces.
3. **Embedding**: `langchain-ollama` creates local vector embeddings via Ollama.
4. **Storage**: `chromadb` persistently stores vectors on disk at `~/.docpilot/chroma_langchain_db`.
5. **Retrieval**: User queries are embedded, matched via similarity search, and fed into a system prompt.
6. **Generation**: The designated Ollama chat model generates the final response streamed to the terminal using `rich`.

---

## 🤝 Contributing

Contributions are welcome! If you are using DocPilot for your daily workflows or in a hackathon, feel free to open issues and pull requests.

To set up a local development environment:

```bash
git clone https://github.com/yourusername/docpilot.git
cd docpilot
uv pip install -e ".[dev]"
uv run pytest
```

---

## 📄 License

This project is licensed under the MIT License.
