Metadata-Version: 2.4
Name: datasight
Version: 0.3.0
Summary: AI-powered data exploration with natural language
Project-URL: Repository, https://github.com/dsgrid/datasight
Project-URL: Documentation, https://dsgrid.github.io/datasight/
Project-URL: Issues, https://github.com/dsgrid/datasight/issues
Author-email: Daniel Thom <daniel.thom@nlr.gov>
License-Expression: BSD-3-Clause
License-File: LICENSE
Requires-Python: >=3.11
Requires-Dist: adbc-driver-flightsql<2,>=1.0
Requires-Dist: anthropic<1,>=0.40
Requires-Dist: chevron<1,>=0.14
Requires-Dist: click<9,>=8.0
Requires-Dist: duckdb<2,>=1.0
Requires-Dist: fastapi<1,>=0.110
Requires-Dist: loguru<1,>=0.7
Requires-Dist: openai<3,>=2
Requires-Dist: pandas<4,>=2.0
Requires-Dist: plotly<6,>=5.0
Requires-Dist: psycopg[binary]<4,>=3.1
Requires-Dist: pyarrow<19,>=14.0
Requires-Dist: python-dotenv<2,>=1.0
Requires-Dist: pyyaml<7,>=6.0
Requires-Dist: rich-click<2,>=1.8
Requires-Dist: rich<15,>=13.0
Requires-Dist: sqlglot<27,>=26
Requires-Dist: truststore>=0.10.4
Requires-Dist: uvicorn<1,>=0.29
Provides-Extra: dev
Requires-Dist: prek<1,>=0.2; extra == 'dev'
Requires-Dist: pytest-asyncio<1,>=0.23; extra == 'dev'
Requires-Dist: pytest-cov; extra == 'dev'
Requires-Dist: pytest<9,>=8.0; extra == 'dev'
Requires-Dist: ruff<1,>=0.15; extra == 'dev'
Requires-Dist: ty>=0.0.26; extra == 'dev'
Requires-Dist: zensical>=0.0.31; extra == 'dev'
Provides-Extra: export
Requires-Dist: kaleido<1,>=0.2; extra == 'export'
Description-Content-Type: text/markdown

# datasight

> **Status: early and evolving.** This project is in active development and the
> code is changing rapidly — APIs, CLI flags, and behavior may shift between
> commits. Feedback and bug reports from users are very welcome; please open an
> issue on GitHub.

AI-powered data exploration with natural language.

datasight connects an AI agent to your database and provides a web UI
where you can ask questions in natural language. The agent writes SQL, runs
queries, and generates interactive Plotly visualizations.

Supports **DuckDB**, **PostgreSQL**, **SQLite**, and **Flight SQL** databases.
Supports **Anthropic Claude** (default), **GitHub Models** (open source),
and **Ollama** (local) as LLM backends.

## Quick start

```bash
uv tool install "datasight @ git+https://github.com/dsgrid/datasight.git"

# Create a new project
mkdir my-project && cd my-project
datasight init

# Edit .env with your API key and database path
# Edit schema_description.md to describe your data
# Edit queries.yaml with example questions

# Run the web UI
datasight run
```

Open http://localhost:8084 and start asking questions.

Or ask from the command line without starting a server:

```bash
datasight ask "What are the top 10 records?"
datasight ask "Show trends by year" --chart-format html -o chart.html
datasight profile
datasight quality --format markdown -o quality.md
datasight ask --file questions.txt --output-dir batch-output
```

## Features

- **Natural language queries** — ask questions in English, get SQL + results
- **Interactive charts** — Plotly visualizations with chart-type switching
- **Multiple databases** — DuckDB, PostgreSQL, SQLite, and Flight SQL
- **Headless CLI** — `datasight ask` runs queries without a web server
- **Deterministic CLI workflows** — profile, quality, dimension, trend, and recipe commands that do not require an LLM
- **Schema browser** — sidebar with tables, columns, and example queries
- **Schema auto-discovery** — tables, columns, and types detected automatically
- **Domain context** — describe your data in Markdown for better AI understanding
- **Example queries** — seed the AI with question/SQL pairs
- **Reusable prompt recipes** — project-specific analysis prompts derived from the schema
- **Multi-chart dashboard** — pin results, filter cards, and configure layouts
- **Session export** — export conversations as shareable HTML pages
- **Keyboard shortcuts** — `?` to see all shortcuts, `/` to focus input
- **Streaming responses** — real-time SSE streaming from the LLM

## Architecture

datasight pairs a FastAPI backend with a Svelte 5 + TypeScript + Tailwind CSS
frontend built with Vite. It supports multiple LLM backends — Anthropic
(default), OpenAI, GitHub Models, and Ollama — selectable via `LLM_PROVIDER` in `.env`.

```
datasight run / datasight ask / datasight profile / datasight quality
  → LLM provider (Anthropic / OpenAI / GitHub Models / Ollama)
    → DuckDB / PostgreSQL / SQLite / Flight SQL
    → Plotly chart generator
  → Web UI (SSE streaming) or CLI output
```

## Documentation

```bash
uv sync --extra dev
. .venv/bin/activate
zensical serve
zensical build
python scripts/generate_cli_reference.py
```

## Development Tests

```bash
# Build frontend assets for FastAPI serving after a clean checkout
bash scripts/build-frontend.sh

# Python test suite
pytest

# CI-safe Python test suite, excluding tests that need local Ollama
pytest -m "not integration"

# Frontend unit tests (Vitest)
cd frontend && npm test

# Frontend E2E tests (Playwright, requires datasight run)
cd frontend && npm run test:e2e

# Rebuild frontend for FastAPI serving after frontend changes
bash scripts/build-frontend.sh
```

Generated web assets under `src/datasight/web/static/` and
`src/datasight/web/templates/index.html` are not checked in. Run
`bash scripts/build-frontend.sh` before using `datasight run` from a clean
checkout when you want FastAPI to serve the production UI.

Ollama-backed CLI tests are marked `integration` because they require a running
local Ollama server with the `qwen3:8b` model available. CI runs `pytest -m "not
integration"`; run `pytest -m integration` locally when you want to exercise the
live LLM path.

## Software Record

datasight is developed under NLR Software Record SWR-26-045.
