Metadata-Version: 2.4
Name: datascience-agent
Version: 0.4.0
Summary: AI Agent with dynamic planning and persistent Jupyter kernel execution for data analysis
Project-URL: Homepage, https://github.com/nmlemus/dsagent
Project-URL: Documentation, https://github.com/nmlemus/dsagent#readme
Project-URL: Repository, https://github.com/nmlemus/dsagent
Author: DSAgent Contributors
License-Expression: MIT
Keywords: agent,ai,autonomous-agent,data-analysis,datascience-agent,dsagent,jupyter,llm,machine-learning,planner
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Python: >=3.10
Requires-Dist: ipykernel>=6.0.0
Requires-Dist: jupyter-client>=8.0.0
Requires-Dist: litellm>=1.0.0
Requires-Dist: matplotlib>=3.7.0
Requires-Dist: numpy>=1.24.0
Requires-Dist: openpyxl>=3.1.0
Requires-Dist: pandas>=2.0.0
Requires-Dist: pycaret>=3.0.0
Requires-Dist: pydantic-settings>=2.0.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: scikit-learn>=1.3.0
Requires-Dist: seaborn>=0.12.0
Requires-Dist: statsmodels>=0.14.0
Provides-Extra: api
Requires-Dist: fastapi>=0.100.0; extra == 'api'
Requires-Dist: sse-starlette>=1.0.0; extra == 'api'
Requires-Dist: uvicorn>=0.20.0; extra == 'api'
Provides-Extra: dev
Requires-Dist: mypy>=1.0.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.21.0; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Description-Content-Type: text/markdown

# DSAgent

An AI-powered autonomous agent for data analysis with dynamic planning and persistent Jupyter kernel execution.

## Features

- **Dynamic Planning**: Agent creates and follows plans with [x]/[ ] step tracking
- **Persistent Execution**: Code runs in a Jupyter kernel with variable persistence
- **Multi-Provider LLM**: Supports OpenAI, Anthropic, Google, Ollama via LiteLLM
- **Notebook Generation**: Automatically generates clean, runnable Jupyter notebooks
- **Event Streaming**: Real-time events for UI integration
- **Comprehensive Logging**: Full execution logs for debugging and ML retraining
- **Session Management**: State persistence for multi-user scenarios
- **Human-in-the-Loop**: Configurable checkpoints for human approval and feedback

## Installation

Using pip:
```bash
pip install datascience-agent
```

With FastAPI support:
```bash
pip install "datascience-agent[api]"
```

Using uv (recommended):
```bash
uv pip install datascience-agent
uv pip install "datascience-agent[api]"  # with FastAPI
```

For development:
```bash
git clone https://github.com/nmlemus/dsagent
cd dsagent
uv sync --all-extras
```

## Quick Start

### Basic Usage

```python
from dsagent import PlannerAgent

# Create agent
with PlannerAgent(model="gpt-4o", workspace="./workspace") as agent:
    result = agent.run("Analyze sales_data.csv and identify top performing products")

    print(result.answer)
    print(f"Notebook: {result.notebook_path}")
```

### With Streaming

```python
from dsagent import PlannerAgent, EventType

agent = PlannerAgent(model="claude-3-sonnet-20240229")
agent.start()

for event in agent.run_stream("Build a predictive model for customer churn"):
    if event.type == EventType.PLAN_UPDATED:
        print(f"Plan: {event.plan.raw_text if event.plan else ''}")
    elif event.type == EventType.CODE_SUCCESS:
        print("Code executed successfully")
    elif event.type == EventType.CODE_FAILED:
        print("Code execution failed")
    elif event.type == EventType.ANSWER_ACCEPTED:
        print(f"Answer: {event.message}")

# Get result with notebook after streaming
result = agent.get_result()
print(f"Notebook: {result.notebook_path}")

agent.shutdown()
```

### FastAPI Integration

```python
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from uuid import uuid4
from dsagent import PlannerAgent, EventType

app = FastAPI()

@app.post("/analyze")
async def analyze(task: str):
    async def event_stream():
        agent = PlannerAgent(
            model="gpt-4o",
            session_id=str(uuid4()),
        )
        agent.start()

        try:
            for event in agent.run_stream(task):
                yield f"data: {event.to_sse()}\n\n"
        finally:
            agent.shutdown()

    return StreamingResponse(event_stream(), media_type="text/event-stream")
```

## Command Line Interface

The package includes a CLI for quick analysis from the terminal:

```bash
dsagent "Analyze this dataset and create visualizations" --data ./my_data.csv
```

### CLI Options

| Option | Short | Description |
|--------|-------|-------------|
| `--data` | `-d` | Path to data file or directory (required) |
| `--model` | `-m` | LLM model to use (default: gpt-4o) |
| `--workspace` | `-w` | Output directory (default: ./workspace) |
| `--run-id` | | Custom run ID for this execution |
| `--max-rounds` | `-r` | Max iterations (default: 30) |
| `--quiet` | `-q` | Suppress verbose output |
| `--no-stream` | | Disable streaming output |

### CLI Examples

```bash
# Basic analysis
dsagent "Find trends and patterns" -d ./sales.csv

# With specific model
dsagent "Build ML model" -d ./dataset -m claude-3-sonnet-20240229

# Custom output directory
dsagent "Create charts" -d ./data -w ./output

# With custom run ID
dsagent "Analyze" -d ./data --run-id my-analysis-001

# Quiet mode
dsagent "Analyze" -d ./data -q
```

### Output Structure

Each run creates an isolated workspace:
```
workspace/
└── runs/
    └── {run_id}/
        ├── data/          # Input data (copied)
        ├── notebooks/     # Generated notebooks
        ├── artifacts/     # Images, charts, outputs
        └── logs/
            ├── run.log        # Human-readable log
            └── events.jsonl   # Structured events for ML
```

## Configuration

```python
from dsagent import PlannerAgent, RunContext

# With automatic run isolation
context = RunContext(workspace="./workspace")
agent = PlannerAgent(
    model="gpt-4o",           # Any LiteLLM-supported model
    context=context,          # Run context for isolation
    max_rounds=30,            # Max agent iterations
    max_tokens=4096,          # Max tokens per response
    temperature=0.2,          # LLM temperature
    timeout=300,              # Code execution timeout (seconds)
    verbose=True,             # Print to console
    event_callback=None,      # Callback for events
)
```

## Human-in-the-Loop (HITL)

Control agent autonomy with configurable HITL modes:

```python
from dsagent import PlannerAgent, HITLMode, EventType

# Create agent with HITL enabled
agent = PlannerAgent(
    model="gpt-4o",
    hitl=HITLMode.PLAN_ONLY,  # Pause for plan approval
)
agent.start()

# Run with streaming to handle HITL events
for event in agent.run_stream("Analyze sales data"):
    if event.type == EventType.HITL_AWAITING_PLAN_APPROVAL:
        print(f"Plan proposed:\n{event.plan.raw_text}")
        # Approve the plan
        agent.approve()
        # Or reject: agent.reject("Bad plan")
        # Or modify: agent.modify_plan("1. [ ] Better step")

    elif event.type == EventType.ANSWER_ACCEPTED:
        print(f"Answer: {event.message}")

agent.shutdown()
```

### HITL Modes

| Mode | Description |
|------|-------------|
| `HITLMode.NONE` | Fully autonomous (default) |
| `HITLMode.PLAN_ONLY` | Pause after plan generation for approval |
| `HITLMode.ON_ERROR` | Pause when code execution fails |
| `HITLMode.PLAN_AND_ANSWER` | Pause on plan + before final answer |
| `HITLMode.FULL` | Pause before every code execution |

### HITL Actions

```python
# Approve current pending item
agent.approve("Looks good!")

# Reject and abort
agent.reject("This approach won't work")

# Modify the plan
agent.modify_plan("1. [ ] New step\n2. [ ] Another step")

# Modify code before execution (FULL mode)
agent.modify_code("import pandas as pd\ndf = pd.read_csv('data.csv')")

# Skip current step
agent.skip()

# Send feedback to guide the agent
agent.send_feedback("Try using a different algorithm")
```

### HITL Events

```python
EventType.HITL_AWAITING_PLAN_APPROVAL    # Waiting for plan approval
EventType.HITL_AWAITING_CODE_APPROVAL    # Waiting for code approval (FULL mode)
EventType.HITL_AWAITING_ERROR_GUIDANCE   # Waiting for error guidance
EventType.HITL_AWAITING_ANSWER_APPROVAL  # Waiting for answer approval
EventType.HITL_FEEDBACK_RECEIVED         # Human feedback was received
EventType.HITL_PLAN_APPROVED             # Plan was approved
EventType.HITL_PLAN_MODIFIED             # Plan was modified
EventType.HITL_PLAN_REJECTED             # Plan was rejected
EventType.HITL_EXECUTION_ABORTED         # Execution was aborted
```

## Supported Models

Any model supported by [LiteLLM](https://docs.litellm.ai/docs/providers):

- OpenAI: `gpt-4o`, `gpt-4-turbo`, `gpt-3.5-turbo`
- Anthropic: `claude-3-opus-20240229`, `claude-3-sonnet-20240229`
- Google: `gemini-pro`, `gemini-1.5-pro`
- Ollama: `ollama/llama3`, `ollama/codellama`
- And many more...

## Event Types

```python
from dsagent import EventType

EventType.AGENT_STARTED       # Agent started processing
EventType.AGENT_FINISHED      # Agent finished
EventType.AGENT_ERROR         # Error occurred
EventType.ROUND_STARTED       # New iteration round
EventType.ROUND_FINISHED      # Round completed
EventType.LLM_CALL_STARTED    # LLM call started
EventType.LLM_CALL_FINISHED   # LLM response received
EventType.PLAN_CREATED        # Plan was created
EventType.PLAN_UPDATED        # Plan was updated
EventType.CODE_EXECUTING      # Code execution started
EventType.CODE_SUCCESS        # Code execution succeeded
EventType.CODE_FAILED         # Code execution failed
EventType.ANSWER_ACCEPTED     # Final answer generated
EventType.ANSWER_REJECTED     # Answer rejected (plan incomplete)
```

## Architecture

```
dsagent/
├── agents/
│   └── base.py          # PlannerAgent - main user interface
├── core/
│   ├── context.py       # RunContext - workspace management
│   ├── engine.py        # AgentEngine - main loop
│   ├── executor.py      # JupyterExecutor - code execution
│   ├── hitl.py          # HITLGateway - human-in-the-loop
│   └── planner.py       # PlanParser - response parsing
├── schema/
│   └── models.py        # Pydantic models
└── utils/
    ├── logger.py        # AgentLogger - console logging
    ├── run_logger.py    # RunLogger - comprehensive logging
    └── notebook.py      # NotebookBuilder - notebook generation
```

## License

MIT
