Metadata-Version: 2.3
Name: csv-ingest-mcp
Version: 0.1.0
Summary: MCP server that ingests CSV/TSV/Parquet/Excel/JSON files into any SQLAlchemy database
Keywords: mcp,model-context-protocol,csv,ingest,sqlalchemy,claude,llm
Author: priyak.salot
Author-email: priyak.salot <priyank.salot@dbcorp.in>
License: MIT
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Database
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Dist: celery>=5.6.3
Requires-Dist: mcp[cli]>=1.27.0
Requires-Dist: openpyxl>=3.1.5
Requires-Dist: pandas>=3.0.2
Requires-Dist: pyarrow>=24.0.0
Requires-Dist: pydantic>=2.13.3
Requires-Dist: redis>=7.4.0
Requires-Dist: sqlalchemy>=2.0.49
Requires-Dist: watchdog>=6.0.0
Requires-Python: >=3.13
Description-Content-Type: text/markdown

# csv-ingest-mcp

[![PyPI](https://img.shields.io/pypi/v/csv-ingest-mcp.svg)](https://pypi.org/project/csv-ingest-mcp/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)

MCP server for ingesting **CSV / TSV / Parquet / Excel / JSON / JSONL** files into any SQLAlchemy-supported database (Postgres, MySQL, SQLite, MSSQL, ...). Plug into Claude Desktop, Cursor, Cline, or Continue and let the LLM load tabular data on demand.

## Install

```bash
uvx csv-ingest-mcp        # one-shot, no global install
# or
pip install csv-ingest-mcp
```

## Quick start (Claude Desktop)

`~/.config/Claude/claude_desktop_config.json` (Linux) / `~/Library/Application Support/Claude/claude_desktop_config.json` (macOS):

```json
{
  "mcpServers": {
    "csv-ingest": {
      "command": "uvx",
      "args": ["csv-ingest-mcp"],
      "env": {
        "CSV_MCP_ALLOWED_DIRS": "/home/you/data",
        "CSV_MCP_DB_URL": "postgresql://user:pass@localhost/mydb"
      }
    }
  }
}
```

Restart Claude Desktop. The model now has six tools.

## Tools

| Tool | Purpose |
|------|---------|
| `ingest_file(path, table, if_exists, sheet, auto_alter, dry_run)` | Load file into table |
| `upsert_file(path, table, key_columns, sheet, dedupe_source)` | Key-based replace + insert |
| `list_tables()` | Database table names |
| `describe_table(table)` | Schema + row count |
| `ingest_async(path, table, if_exists, auto_alter)` | Queue large file via Celery |
| `job_status(job_id)` | Poll Celery job |

## Configuration

| Env var | Default | Purpose |
|---------|---------|---------|
| `CSV_MCP_ALLOWED_DIRS` | (required) | `:`-separated path allowlist |
| `CSV_MCP_DB_URL` | `sqlite:///./csv_ingest.db` | SQLAlchemy DSN |
| `CSV_MCP_MAX_ROWS` | `1000000` | Per-file row cap |
| `CSV_MCP_CHUNK_SIZE` | `50000` | Streaming chunk rows |
| `CSV_MCP_BROKER_URL` | `redis://localhost:6379/0` | Celery broker |
| `CSV_MCP_RESULT_BACKEND` | `redis://localhost:6379/1` | Celery backend |
| `CSV_MCP_WATCH_IF_EXISTS` | `append` | Watcher write mode |

## Cursor / Cline / Continue

```json
{
  "mcp": {
    "servers": {
      "csv-ingest": {
        "command": "uvx",
        "args": ["csv-ingest-mcp"],
        "env": { "CSV_MCP_ALLOWED_DIRS": "/abs/data" }
      }
    }
  }
}
```

## Async + watcher (optional)

```bash
# 1. Redis
docker run -d -p 6379:6379 redis:7

# 2. Celery worker
CSV_MCP_ALLOWED_DIRS=/data CSV_MCP_DB_URL=postgresql://... \
  uv run celery -A csv_ingest_mcp.tasks worker -l info

# 3. File watcher (drops a CSV in /data/inbox -> table auto-created)
CSV_MCP_ALLOWED_DIRS=/data uv run csv-ingest-watch
```

## Docker (self-host)

```bash
docker compose up -d
```

See `docker-compose.yml` (server + worker + redis).

## Security

- Path allowlist (`CSV_MCP_ALLOWED_DIRS`) — only listed dirs readable
- SQL identifier regex — table/column names validated
- Parametrized inserts (no string concat)
- Row cap + chunk streaming — bounded memory
- JSON-only Celery serializer — no pickle RCE

Disclose security issues to: **priyank.salot@dbcorp.in**

## Architecture

See [ARCHITECTURE.md](ARCHITECTURE.md) for component map, request flows, and publishing approach.

## Development

```bash
git clone https://github.com/priyak-salot/csv-ingest-mcp
cd csv-ingest-mcp
uv sync
uv run pytest
```

## License

[MIT](LICENSE)
