Metadata-Version: 2.4
Name: seeknal
Version: 2.8.2
Summary: All-in-one platform for data and AI/ML engineering
Author: Fitra Kacamarga
License-File: LICENSE
Requires-Python: >=3.11
Requires-Dist: croniter>=3.0.0
Requires-Dist: cryptography>=46.0.7
Requires-Dist: ddgs>=9.0
Requires-Dist: duckdb>=1.1.3
Requires-Dist: httpx>=0.28.1
Requires-Dist: ipykernel>=6.29.5
Requires-Dist: jinja2>=3.1.0
Requires-Dist: libsql-experimental>=0.0.41
Requires-Dist: mack>=0.5.0
Requires-Dist: openpyxl>=3.1
Requires-Dist: orjson>=3.11.6
Requires-Dist: pandas>=1.3.0
Requires-Dist: pendulum>=3.0.0
Requires-Dist: psycopg2-binary>=2.9.0
Requires-Dist: pyarrow>=18.1.0
Requires-Dist: pyasn1>=0.6.3
Requires-Dist: pydantic-ai-slim[google,openai]>=1.71.0
Requires-Dist: pydantic-deep>=0.3.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: pygments>=2.20.0
Requires-Dist: pyiceberg>=0.8.1
Requires-Dist: pymysql>=1.1.0
Requires-Dist: python-box>=7.3.0
Requires-Dist: python-decouple>=3.8
Requires-Dist: python-dotenv>=1.0.1
Requires-Dist: python-multipart>=0.0.9
Requires-Dist: python-telegram-bot>=21.0
Requires-Dist: pyyaml>=6.0.0
Requires-Dist: redis>=4.5.0
Requires-Dist: rich>=13.0.0
Requires-Dist: ruamel-yaml>=0.18.0
Requires-Dist: s3fs>=2024.12.0
Requires-Dist: scikit-learn>=1.5.0
Requires-Dist: scipy>=1.14.0
Requires-Dist: sqlalchemy-libsql>=0.1.0
Requires-Dist: sqlalchemy>=1.4.0
Requires-Dist: sqlglot~=25.0.0
Requires-Dist: sqlmodel>=0.0.22
Requires-Dist: starlette>=0.40.0
Requires-Dist: tabulate>=0.9.0
Requires-Dist: tenacity>=9.0.0
Requires-Dist: typer>=0.13.1
Requires-Dist: uvicorn>=0.30.0
Provides-Extra: all
Requires-Dist: aiohttp>=3.13.4; extra == 'all'
Requires-Dist: aiohttp>=3.9.0; extra == 'all'
Requires-Dist: black>=26.3.1; extra == 'all'
Requires-Dist: delta-spark==3.2.0; extra == 'all'
Requires-Dist: icecream>=2.1.3; extra == 'all'
Requires-Dist: prefect<4.0,>=3.1.10; extra == 'all'
Requires-Dist: pyspark>=3.0.0; extra == 'all'
Requires-Dist: pytest>=8.3.4; extra == 'all'
Requires-Dist: quinn>=0.10.3; extra == 'all'
Requires-Dist: temporalio>=1.9.0; extra == 'all'
Requires-Dist: tornado>=6.5.5; extra == 'all'
Provides-Extra: ask
Requires-Dist: ddgs>=9.0; extra == 'ask'
Requires-Dist: openpyxl>=3.1; extra == 'ask'
Requires-Dist: pydantic-ai-slim[google,openai]>=1.71.0; extra == 'ask'
Requires-Dist: pydantic-deep>=0.3.0; extra == 'ask'
Requires-Dist: python-multipart>=0.0.9; extra == 'ask'
Requires-Dist: redis>=4.5.0; extra == 'ask'
Requires-Dist: ruamel-yaml>=0.18.0; extra == 'ask'
Requires-Dist: scikit-learn>=1.5.0; extra == 'ask'
Requires-Dist: scipy>=1.14.0; extra == 'ask'
Requires-Dist: starlette>=0.40.0; extra == 'ask'
Requires-Dist: uvicorn>=0.30.0; extra == 'ask'
Provides-Extra: dev
Requires-Dist: black>=26.3.1; extra == 'dev'
Requires-Dist: icecream>=2.1.3; extra == 'dev'
Requires-Dist: pytest>=8.3.4; extra == 'dev'
Provides-Extra: prefect
Requires-Dist: aiohttp>=3.13.4; extra == 'prefect'
Requires-Dist: prefect<4.0,>=3.1.10; extra == 'prefect'
Requires-Dist: tornado>=6.5.5; extra == 'prefect'
Provides-Extra: report-server
Requires-Dist: starlette>=0.40.0; extra == 'report-server'
Requires-Dist: uvicorn>=0.30.0; extra == 'report-server'
Provides-Extra: spark
Requires-Dist: delta-spark==3.2.0; extra == 'spark'
Requires-Dist: pyspark>=3.0.0; extra == 'spark'
Requires-Dist: quinn>=0.10.3; extra == 'spark'
Provides-Extra: telegram
Requires-Dist: python-telegram-bot>=21.0; extra == 'telegram'
Provides-Extra: temporal
Requires-Dist: aiohttp>=3.9.0; extra == 'temporal'
Requires-Dist: temporalio>=1.9.0; extra == 'temporal'
Description-Content-Type: text/markdown

<div align="center">
    <picture>
      <source media="(prefers-color-scheme: dark)" srcset="docs/assets/logos/seeknal-mark-dark.svg">
      <source media="(prefers-color-scheme: light)" srcset="docs/assets/logos/seeknal-mark.svg">
      <img src="docs/assets/logos/seeknal-mark-dark.svg" alt="Seeknal" width="96" height="96">
    </picture>
    <h1>Seeknal</h1>
    <p><strong>Transform data with SQL and Python. Build ML features with point-in-time joins. Materialize to PostgreSQL and Iceberg — all from one CLI.</strong></p>
    <p>
        <a href="https://pypi.org/project/seeknal/"><img src="https://img.shields.io/pypi/v/seeknal.svg" alt="PyPI version"></a>
        <a href="https://pypi.org/project/seeknal/"><img src="https://img.shields.io/pypi/pyversions/seeknal.svg" alt="Python versions"></a>
        <a href="LICENSE"><img src="https://img.shields.io/github/license/mta-tech/seeknal.svg" alt="License"></a>
        <a href="https://github.com/mta-tech/seeknal/actions"><img src="https://img.shields.io/github/actions/workflow/status/mta-tech/seeknal/release.yml" alt="CI"></a>
    </p>
</div>

Seeknal is an all-in-one platform for data and AI/ML engineering. Define pipelines in YAML or Python, run them through a safe `draft → dry-run → apply` workflow, and materialize outputs to PostgreSQL and Apache Iceberg simultaneously. Python 3.11+ required.

## Quick Start

```bash
pip install seeknal
# Optional, only for distributed Spark execution:
# pip install "seeknal[spark]"

seeknal init --name my_project
seeknal draft --name my_pipeline --type transform
seeknal dry-run
seeknal apply
```

Explore your data interactively or search docs from the terminal:

```bash
seeknal repl          # Interactive SQL on pipeline outputs
seeknal docs query    # Search documentation from the CLI
```

```sql
SELECT customer_id, COUNT(*) as order_count
FROM target.my_transform
GROUP BY customer_id;
```

## Key Features

**Dual Pipeline Authoring** — Write pipelines in YAML, Python decorators, or both:

```python
from seeknal.pipeline import source, transform

@source(name="orders", source="csv", table="data/orders.csv")
def orders():
    pass

@transform(name="order_metrics", inputs=["source.orders"])
def order_metrics(ctx):
    df = ctx.ref("source.orders")
    return ctx.duckdb.sql(
        "SELECT customer_id, SUM(amount) as total FROM df GROUP BY customer_id"
    ).df()
```

**Multi-Target Materialization** — Write to PostgreSQL and Iceberg from a single node:

```yaml
materializations:
  - type: postgresql
    connection: local_pg
    table: analytics.my_table
    mode: upsert_by_key
    unique_keys: [id]
  - type: iceberg
    table: atlas.namespace.my_table
```

**Environment Management** — Isolated namespaces with per-environment profiles:

```bash
seeknal env plan dev --profile profiles-dev.yml
seeknal env apply dev
seeknal run --env dev
```

**Feature Store** — Define ML features in YAML or Python with entity keys, point-in-time joins, and automatic versioning. Supports offline (batch) and online (real-time) serving.

```yaml
# seeknal/feature_groups/customer_features.yml
kind: feature_group
name: customer_features
entity:
  name: customer
  join_keys: ["customer_id"]
materialization:
  event_time_col: latest_order_date
  offline: { enabled: true, format: parquet }
  online: { enabled: false, ttl: 7d }
features:
  total_orders: { dtype: integer }
  total_spent: { dtype: float }
  avg_order_value: { dtype: float }
inputs:
  - ref: transform.customer_orders
```

```python
# Or use Python decorators
@feature_group(name="customer_rfm", entity="customer")
def customer_rfm(ctx):
    df = ctx.ref("transform.clean_transactions")
    return ctx.duckdb.sql("""
        SELECT CustomerID, COUNT(DISTINCT InvoiceNo) as frequency,
               SUM(TotalAmount) as monetary_value
        FROM df GROUP BY CustomerID
    """).df()
```

```bash
seeknal entity list                           # Cross-feature-group consolidation
seeknal entity show customer                  # Inspect entity schema and feature groups
```

**Interactive SQL REPL** — Auto-registers parquets, PostgreSQL, and Iceberg sources at startup. Query pipeline outputs, explore data, iterate on SQL — all without leaving the terminal.

**AI-Powered Thinking Partner** — `seeknal ask chat` is your collaborative partner for data work. The agent uses 16 tools for fast data access and 11 built-in skills for multi-step workflows like report generation, pipeline building, and data profiling — all loaded on demand to keep responses fast:

```bash
seeknal ask chat                        # Start a brainstorm / build session
seeknal ask "What are the top 5 customers by revenue?"  # Quick one-shot question
seeknal ask report "customer analysis"  # Generate interactive HTML dashboard
seeknal ask chat --web                  # Enable web search for benchmarks
```

Ask it to build a pipeline from scratch, and it will draft a plan, walk you through the design, and wait for your go-ahead before generating code. Publish reports to a self-hosted **Seeknal Report Server** and share them with your team via a URL.

```bash
seeknal report-server start             # Host published reports
seeknal gateway start                   # Expose ask as an API (WebSocket/SSE/REST)
```

Supports Google Gemini (default) and Ollama (local). Use `--provider ollama` for fully local, private analysis.

## Documentation

| | |
|---|---|
| **[Getting Started](docs/index.md)** | Installation, configuration, first pipeline |
| **[CLI Reference](docs/reference/cli.md)** | All commands and flags |
| **[YAML Schema](docs/reference/yaml-schema.md)** | Pipeline YAML reference |
| **[CLI Docs Search](docs/cli/docs.md)** | Search documentation from the terminal (`seeknal docs`) |
| **Tutorials** | [YAML Pipelines](docs/tutorials/yaml-pipeline-tutorial.md) · [Python Pipelines](docs/tutorials/python-pipelines-tutorial.md) · [Mixed](docs/tutorials/mixed-yaml-python-pipelines.md) · [Seeknal Ask Agent](docs/tutorials/seeknal-ask-agent.md) · [Report Exposures](docs/tutorials/report-exposures.md) |
| **Guides** | [Python Pipelines](docs/guides/python-pipelines.md) · [Testing & Audits](docs/guides/testing-and-audits.md) · [Iceberg Materialization](docs/iceberg-materialization.md) · [Training to Serving](docs/guides/training-to-serving.md) |
| **Servers** | [Gateway Server](docs/cli/gateway.md) · [Report Server](docs/cli/report-server.md) |
| **Concepts** | [Point-in-Time Joins](docs/concepts/point-in-time-joins.md) · [Virtual Environments](docs/concepts/virtual-environments.md) · [Exposures](docs/concepts/exposures.md) · [Glossary](docs/concepts/glossary.md) |

## Changelog

### v2.8.0 (April 2026)

**OpenAI/Anthropic providers + SQL safety + context files** — Adds two new LLM provider families, execution guards on `execute_sql`, a pre-execution `preview_query` tool, persistent context files, and durable preferences.

- **OpenAI + Anthropic support**: `gpt-4o`, `claude-*`, Azure OpenAI, Together, Groq, vLLM, LM Studio, and any OpenAI-compatible proxy via `SEEKNAL_ASK_OPENAI_BASE_URL` / `SEEKNAL_ASK_ANTHROPIC_BASE_URL`
- **`execute_sql` guards**: rows capped at 500, columns at 50, per-cell length at 200 chars, 50 KB markdown budget — every truncation emits an actionable notice with accurate total row count
- **`preview_query` tool**: four pre-execution safety probes (row count, column count, JOIN fan-out, dry-run reachability) — blocks queries returning ≥100k rows; pure aggregations auto-skip
- **Context files**: `list_context_files` and `write_project_file` tools scan/write `{project}/context/` with path-traversal guards
- **Durable preferences**: `save_preference` appends to `preferences.yml`; preferences are injected into the system prompt on every session

### v2.7.1 (April 2026)

**Gateway pairing + `execute_uv_script` + pipeline runtime helpers** — Additive batch combining gateway Telegram pairing, a new agent tool for running uv-managed scripts, and lightweight per-node runtime helpers.

- **Gateway pairing**: `FilePairingStore`, `TelegramLinkStore`, `PublicSessionStore` wired into lifespan; `/pair` Telegram command for admin-generated codes
- **`execute_uv_script` tool**: run arbitrary uv-managed Python scripts from the agent with full dependency isolation
- **Pipeline runtime**: `ctx.llm` (Ask-aligned text/JSON generation) and `ctx.state` (lightweight per-node persistent state) helpers available inside `@transform` functions
- **Config discovery**: `find_agent_config_path()` locates `seeknal_agent.yml` under project root or `seeknal/` directory

### v2.6.0 (April 2026)

**Skills-Powered Agent + Report Server** — The ask agent now uses a thin-tools/fat-skills architecture: 16 lean tools for fast data access, 11 built-in skills for multi-step workflows (reports, pipelines, profiling, metrics, publishing). Skills load on demand via progressive disclosure, keeping the agent's context lean.

- **Seeknal Report Server** (`seeknal report-server start`): self-hosted server for publishing and sharing reports via unique URLs — publish from the chat TUI or the agent tool
- **11 built-in skills**: report generation, pipeline building, data profiling, Python analysis, semantic model bootstrap, metric query/save, report exposure codification, Proof Editor publishing
- **Chat enhancements**: `--style` (concise/explanatory/formal/conversational), `--budget` (USD cap), `--web` (DuckDuckGo search), `--session`/`--name` (named session resume)
- **Gateway improvements**: cloud-only backend mode, standalone workers, Redis multi-replica, split topology
- **Auto `.env` loading**: `--project <path>` loads `<path>/.env` automatically
- **Error UX**: network errors classified with actionable hints; error logs saved to `~/.seeknal/logs/`

## Install from Source

For development or contributing:

```bash
git clone https://github.com/mta-tech/seeknal.git
cd seeknal
uv venv --python 3.11 && source .venv/bin/activate
uv pip install -e ".[all]"
```

## Contributing

Contributions are welcome! See [CONTRIBUTING.md](CONTRIBUTING.md) for setup, code style, testing, and PR guidelines.

## License

Seeknal is [Apache 2.0 licensed](LICENSE).
