Metadata-Version: 2.4
Name: pgagent-llm
Version: 0.1.0
Summary: Lightweight multi-agent runtime for Postgres teams with planning and resilience
Author-email: Siddharth7786 <siddharthbalaji99@gmail.com>
License-Expression: MIT
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Database
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: google-genai>=0.3.0
Requires-Dist: asyncpg>=0.30.0
Requires-Dist: sqlparse>=0.5.0
Requires-Dist: sqlglot>=25.0.0
Requires-Dist: redis>=5.0.0
Requires-Dist: pydantic>=2.7.0
Requires-Dist: pandas>=2.2.0
Requires-Dist: numexpr>=2.9.0
Provides-Extra: dev
Requires-Dist: pytest>=8.0.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.23.0; extra == "dev"
Requires-Dist: pytest-mock>=3.12.0; extra == "dev"
Requires-Dist: ruff>=0.4.0; extra == "dev"
Requires-Dist: mypy>=1.10.0; extra == "dev"
Requires-Dist: bandit>=1.7.0; extra == "dev"
Requires-Dist: safety>=3.2.0; extra == "dev"
Requires-Dist: twine>=5.0.0; extra == "dev"
Requires-Dist: build>=1.2.0; extra == "dev"
Dynamic: license-file

# PG Agent

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
[![CI](https://github.com/siddharth7786/pg-agent/actions/workflows/ci.yml/badge.svg)](https://github.com/siddharth7786/pg-agent/actions/workflows/ci.yml)
[![PyPI](https://img.shields.io/pypi/v/pgagent-llm.svg)](https://pypi.org/project/pgagent-llm/)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)

PG Agent is a multi-agent runtime for PostgreSQL that turns natural language into secure, executable query plans. It supports multi-turn conversations, schema-aware RAG, self-correcting execution, and result critique — all with zero infrastructure required beyond a Postgres database.

---

## Architecture

```
User Question
     │
     ▼
 Supervisor (coordinator.py)
     │
     ├── RAG Pipeline ──► LongTermMemory (pgvector)
     │       └── Retrieves relevant schema context
     │
     ├── PlanningAgent ──► Gemini 2.5 Flash
     │       └── Decomposes question into typed steps (SQL / ANALYZE / SEARCH / VERIFY)
     │
     ├── ExecutionAgent
     │       ├── SecureSQLTool   — validates + runs SELECT queries
     │       ├── AnalyticsTool   — pandas/numexpr for math & aggregation
     │       └── SearchTool      — web search for external context
     │
     └── CriticAgent ──► Gemini 2.5 Flash
             └── Post-execution result validation
```

**Key properties:**
- **Multi-turn chat** — conversation history is passed to the planner on every turn
- **Schema-aware RAG** — your table/column schema is embedded in pgvector and retrieved per query
- **Self-correcting** — failed SQL steps are re-planned up to 2 times automatically
- **Read-only enforcement** — all queries are parsed with sqlglot; `DROP`, `DELETE`, `UPDATE`, etc. are blocked at the tool level
- **No LIMIT required** — the SQL validator automatically adds `LIMIT 1000` to unconstrained queries

---

## Quick Start

### 1. Install

```bash
pip install pgagent-llm
```

Or from source:

```bash
git clone https://github.com/siddharth7786/pg-agent
cd pg-agent
pip install -e .
```

### 2. Configure Environment

Create a `.env` file or export these variables:

```env
# Required
GEMINI_API_KEY=your_gemini_api_key
DATABASE_URL=postgresql://user:password@localhost:5432/your_db

# Optional — in-memory state is used if not set
REDIS_URL=redis://localhost:6379
```

### 3. First-Time Setup

Run once to create the `long_term_memory` table and enable the required PostgreSQL extensions (`pgvector`, `pg_trgm`):

```bash
pg-agent setup
```

### 4. Index Your Schema

Scan your database schema and store it in pgvector so the planner can write accurate SQL:

```bash
pg-agent --index ask "What tables do I have?"
```

Re-run with `--index` any time your schema changes.

---

## CLI Reference

### Commands

| Command | Description |
| :--- | :--- |
| `pg-agent setup` | One-time setup: enables pgvector/pg_trgm and creates memory tables |
| `pg-agent ask "<question>"` | Ask a single question and get a result |
| `pg-agent chat` | Start an interactive multi-turn session |

### Global Flags

| Flag | Description |
| :--- | :--- |
| `--index` | Re-index the database schema before running |
| `--verbose`, `-v` | Show full internal logs (planning, execution, critique) |

### `ask` Flags

| Flag | Default | Description |
| :--- | :--- | :--- |
| `--pattern` | `SEQUENTIAL` | Orchestration pattern: `SEQUENTIAL` or `PARALLEL` |
| `--retries` | `1` | Max planning retries on execution failure |
| `--redis` | `$REDIS_URL` | Override Redis URL |

### `chat` Flags

| Flag | Default | Description |
| :--- | :--- | :--- |
| `--pattern` | `SEQUENTIAL` | Orchestration pattern: `SEQUENTIAL` or `PARALLEL` |
| `--redis` | `$REDIS_URL` | Override Redis URL |

### Examples

```bash
# Single question
pg-agent ask "Who are the top 5 customers by total payments?"

# Index schema and ask in one shot
pg-agent --index ask "What tables exist and how are they related?"

# With verbose logging
pg-agent -v ask "What is the total revenue per store?"

# Interactive chat
pg-agent chat

# Chat with verbose output
pg-agent -v chat

# Override Redis for this session
pg-agent chat --redis redis://localhost:6379
```

### Chat Session Commands

Inside `pg-agent chat`:

| Input | Action |
| :--- | :--- |
| Any question | Plan and execute against your database |
| `index` | Re-index the database schema |
| `exit` or `quit` | End the session |
| `Ctrl+C` | End the session |

---

## Python Library Usage

Embed PG Agent directly in your application:

```python
import asyncio
from pg_agent.orchestration.coordinator import Supervisor, CoordinationPattern

async def main():
    agent = Supervisor(
        api_key="your-gemini-key",
        db_url="postgresql://user:pass@localhost:5432/db",
        redis_url="redis://localhost:6379",  # optional
    )

    # Index schema once (or after migrations)
    await agent.index_database_schema()

    # Single question
    state = await agent.orchestrate(
        session_id="session-123",
        pattern=CoordinationPattern.SEQUENTIAL,
        question="What is the total revenue per store?"
    )

    print(state.plan.reasoning)
    for step in state.plan.steps:
        print(f"[{step.id}] {step.description}: {state.results.get(step.id)}")

asyncio.run(main())
```

For multi-turn conversations, reuse the same `session_id` across calls — the supervisor persists and passes conversation history automatically.

---

## How It Works

### Planning

The `PlanningAgent` sends your question (plus schema context from RAG) to Gemini and receives a structured JSON plan with typed steps:

| Step Type | Tool | When used |
| :--- | :--- | :--- |
| `SQL` | `SecureSQLTool` | Any database query |
| `ANALYZE` | `AnalyticsTool` | Math, aggregation, combining results |
| `SEARCH` | `SearchTool` | External web context |
| `VERIFY` | Internal | Checking prior step results |

### Execution

The `ExecutionAgent` runs steps with dependency resolution — steps with `depends_on` wait for their dependencies before starting. A circuit breaker halts execution after 5 failures in 60 seconds.

### Self-Correction

If a SQL step fails, the planner receives the error message and the failed query, and generates a corrected plan. This happens up to 2 times per question.

### Critique

After execution, the `CriticAgent` asks Gemini whether the results actually answer the original question. If not, a warning is logged (results are still returned).

### Schema RAG

When you run `--index`, `fetch_schema()` reads `information_schema` for all public tables, columns, and foreign key relationships. Each table and relationship is embedded with `gemini-embedding-001` (768 dimensions) and stored in pgvector. On each question, the top-10 most relevant schema chunks are retrieved and injected into the planner's system prompt.

### State Persistence

Session state (plan, results, conversation history) is stored as JSON, either in Redis (if `REDIS_URL` is set) or in-memory. The same `session_id` is reused across turns in `chat` mode, enabling multi-turn context while resetting the plan and results for each new question.

---

## Database Requirements

| Requirement | Notes |
| :--- | :--- |
| PostgreSQL 14+ | Required |
| pgvector extension | Installed by `pg-agent setup`; requires superuser on managed DBs |
| pg_trgm extension | Installed by `pg-agent setup`; used for hybrid keyword search |

### Manual Extension Install (if `setup` lacks permissions)

```sql
CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS pg_trgm;
```

---

## Development

```bash
# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Lint
ruff check src/

# Type check
mypy src/

# Security scan
bandit -r src/
```

---

## License

MIT — see [LICENSE](LICENSE) for full text.
