Metadata-Version: 2.4
Name: lacon
Version: 1.0.2
Summary: Token-lean agent↔data query interface — curated DuckDB primitives, not raw SQL passthrough.
Author-email: Andrii Suruhov <andrii.suruhov@gmail.com>
License: MIT License
        
        Copyright (c) 2026 Andrii Suruhov
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: Homepage, https://github.com/andrii-su/lacon
Project-URL: Repository, https://github.com/andrii-su/lacon
Project-URL: Issues, https://github.com/andrii-su/lacon/issues
Keywords: llm,agent,duckdb,data,csv,parquet,mcp
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: duckdb>=1.0.0
Requires-Dist: sqlglot>=25.0.0
Provides-Extra: dev
Requires-Dist: pytest>=9.0.3; extra == "dev"
Requires-Dist: pre-commit>=4.6.0; extra == "dev"
Provides-Extra: tokens
Requires-Dist: tiktoken>=0.13.0; extra == "tokens"
Provides-Extra: mcp
Requires-Dist: mcp>=1.27.1; extra == "mcp"
Dynamic: license-file

# lacon

**Token-lean data query interface for agents.** Query CSV, Parquet, and JSON files
via curated DuckDB primitives — instead of dumping files into context — and get back
only the answer.

```bash
lacon describe sales.csv --pretty
# → schema, row count, file size. Zero data rows.

lacon aggregate sales.csv --group-by country --metrics revenue:sum --pretty
# → {"op": "aggregate", "schema": [...], "rows": [...], "shown": 3, "~tokens": 62}
```

## Not another DuckDB-over-MCP

SQL-passthrough DuckDB MCP servers already exist. Lacon is the opposite bet:

- **Curated primitives** — `describe` / `sample` / `profile` / `aggregate` / `filter` / `distinct` / `find-duplicates` / `query`. An agent can't get them syntactically wrong.
- **Progressive disclosure** — cheap-first: `describe` → `sample` → targeted primitive. Agents rarely need `SELECT *`.
- **Guardrails baked in** — read-only, auto-LIMIT (max 1000), SQL validated via sqlglot.
- **Token-shaped output** — every response includes `~tokens` so the agent knows what it costs.
- **HITL for `query`** — preview SQL before executing via `--show-sql`. Curated primitives need no confirmation.

North star: **minimize what data costs an LLM.**
Sibling to [`datoon`](https://github.com/andrii-su/datoon) (cheaper representation in-prompt) — Lacon's lever is to not send the data at all.

## Install

```bash
pip install lacon          # CLI + DuckDB + sqlglot
pip install lacon[tokens]  # + tiktoken for ~tokens estimates
```

Or from source:

```bash
git clone https://github.com/andrii-su/lacon
cd lacon
uv run lacon --version
```

## Claude Code skill

```bash
claude skill install https://github.com/andrii-su/lacon/releases/latest/download/lacon.skill
```

Once installed, Claude automatically uses lacon when you mention a data file — no manual invocation needed. Raw `cat`/`Read` on `.csv`/`.parquet`/`.json` files is replaced by `lacon describe` → curated query.

## Primitives

| Command | What it answers |
|---|---|
| `describe` | schema, row count, file size — always start here |
| `sample` | first / random N rows |
| `count` | row count with optional WHERE |
| `profile` | per-column stats: nulls, distinct, min/max/mean or top-k |
| `distinct` | unique values for a column |
| `aggregate` | GROUP BY with sum / avg / min / max / count |
| `filter` | rows matching WHERE, with column projection |
| `find-duplicates` | duplicate groups + counts |
| `query` | escape hatch — arbitrary read-only SQL, HITL required |

All commands accept `--pretty` for human-readable output.

## Quick examples

```bash
# What's in the file?
lacon describe data.csv --pretty

# First 5 rows
lacon sample data.csv --n 5 --pretty

# How many orders from Ukraine?
lacon count orders.csv --where "country = 'Ukraine'"

# Revenue by country
lacon aggregate sales.csv --group-by country --metrics revenue:sum --pretty

# Duplicate emails?
lacon find-duplicates users.csv --columns email --pretty

# Column stats
lacon profile users.csv --column age --pretty

# Rows matching filter, projected columns
lacon filter sales.csv --where "revenue > 5000" --columns name country revenue --pretty

# Custom SQL — HITL: preview first, then execute
lacon query sales.csv "SELECT year, SUM(revenue) FROM {file} GROUP BY year" --show-sql --pretty
lacon query sales.csv "SELECT year, SUM(revenue) FROM {file} GROUP BY year" --pretty
```

## Output envelope

Every response is a shaped JSON object:

```json
{
  "op": "aggregate",
  "schema": ["country", "sum_revenue"],
  "rows": [["UA", 4500.25], ["US", 2001.25], ["DE", 2200.0]],
  "shown": 3,
  "~tokens": 62
}
```

- `schema` — always present, agent never guesses shape
- `shown` — how many rows returned (honest truncation)
- `~tokens` — estimated token cost of this response (requires `lacon[tokens]`)
- `query` results also include `sql` — what actually ran

## Human-in-the-loop for `query`

The `query` escape hatch runs arbitrary SQL. Before executing, preview:

```bash
# Step 1 — see what will run
lacon query data.csv "SELECT country, AVG(revenue) FROM {file} GROUP BY country" --show-sql --pretty
# → {"op": "query", "sql": "SELECT ... FROM read_csv('data.csv') ... LIMIT 50", "will_execute": false}

# Step 2 — confirm, then execute
lacon query data.csv "SELECT country, AVG(revenue) FROM {file} GROUP BY country" --pretty
```

Curated primitives (`describe`, `filter`, etc.) need no confirmation — their SQL is fully determined by the parameters.

## Safety

- **Read-only** — no writes, no DDL, no COPY, no INSTALL
- **SQL validation** — sqlglot parses every `query` call; rejects non-SELECT statements
- **Auto-LIMIT** — injected if missing, capped at 1000
- **Path escaping** — single quotes in paths are escaped before passing to DuckDB

## Stack

Python 3.12+, [DuckDB](https://duckdb.org), [sqlglot](https://github.com/tobymao/sqlglot), [tiktoken](https://github.com/openai/tiktoken) (optional).

## License

MIT
