Metadata-Version: 2.4
Name: tusk-cluster
Version: 0.1.0
Summary: Cluster mode plugin for Tusk - distributed queries with DataFusion
Project-URL: Homepage, https://github.com/tuskdata/tusk-cluster
Project-URL: Repository, https://github.com/tuskdata/tusk-cluster
Author: Jearel Alcantara
License-Expression: MIT
License-File: LICENSE
Keywords: cluster,datafusion,distributed,plugin,tusk
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Web Environment
Classifier: Framework :: AsyncIO
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Database
Requires-Python: >=3.11
Requires-Dist: datafusion>=43.0.0
Requires-Dist: pyarrow>=18.0.0
Requires-Dist: tuskdata>=0.1.0
Provides-Extra: dev
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Description-Content-Type: text/markdown

# tusk-cluster

Cluster mode plugin for Tusk - distributed queries with DataFusion and Arrow Flight.

## Installation

```bash
pip install tusk-cluster
```

Or for development:

```bash
cd plugins/tusk-cluster
pip install -e .
```

## Usage

### CLI Commands

```bash
# Start a local dev cluster (scheduler + workers)
tusk cluster-dev --workers 3

# Or start components separately:
tusk cluster-scheduler --port 8814
tusk cluster-worker --scheduler localhost:8814 --port 8815
```

### Web UI

Once installed, a "Cluster" tab will appear in Tusk Studio where you can:

- Connect to remote schedulers
- Start/stop local clusters
- Monitor workers and jobs
- Submit distributed queries

## Architecture

- **Scheduler**: Coordinates job distribution using Arrow Flight
- **Worker**: Executes queries using DataFusion
- **Communication**: Arrow Flight for efficient data transfer

## Requirements

- `tuskdata>=0.1.0`
- `datafusion>=43.0.0`
- `pyarrow>=18.0.0`
