Metadata-Version: 2.4
Name: peppermint-lang
Version: 0.3.2
Summary: A pipe-first language for data and ML work, running on top of Python
Author-email: "Chayapatr (Pub) Archiwaranguprok" <pub@mit.edu>
License-Expression: MIT
Project-URL: Repository, https://github.com/chayapatr/peppermint
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: data
Requires-Dist: pandas; extra == "data"
Provides-Extra: ml
Requires-Dist: pandas; extra == "ml"
Requires-Dist: scikit-learn; extra == "ml"
Requires-Dist: umap-learn; extra == "ml"
Requires-Dist: openai; extra == "ml"
Provides-Extra: viz
Requires-Dist: pandas; extra == "viz"
Requires-Dist: matplotlib; extra == "viz"
Requires-Dist: seaborn; extra == "viz"
Provides-Extra: lsp
Requires-Dist: pygls; extra == "lsp"
Provides-Extra: all
Requires-Dist: pandas; extra == "all"
Requires-Dist: scikit-learn; extra == "all"
Requires-Dist: umap-learn; extra == "all"
Requires-Dist: openai; extra == "all"
Requires-Dist: matplotlib; extra == "all"
Requires-Dist: seaborn; extra == "all"
Requires-Dist: pygls; extra == "all"
Dynamic: license-file

# Peppermint

A pipe-first language for data and ML work, running on top of Python. Every operation is a pipeline step and errors propagate automatically. The Python ecosystem (pandas, scikit-learn, or your own code) is accessible from within the language.

## Install

```sh
pip install peppermint-lang
```

## Run

```sh
pep file.pep  # run a file
pep           # interactive REPL
```

## Examples

### Transform

```
load("employees.csv")
  |> filter(it.age > 18)
  |> add(tax: it.salary * 0.2)
  |> sort(by: "salary", dir: "desc")
  |> print()
```

Each step prints a live summary:

```
|> filter    → List  843 rows × 5 cols  (157 dropped)
|> add       → List  843 rows × 6 cols  (+tax)
|> sort      → List  843 rows × 6 cols
```

### Aggregate

```
load("sales.csv")
  |> collapse(by: "region",
      avg: mean(col.revenue),
      n:   count()
  )
  |> sort(by: "avg", dir: "desc")
  |> print()
```

### Top N per group

```
load("sales.csv")
  |> each(by: "region",
      |> add(rank: rank(col.revenue, dir: "desc"))
      |> filter(it.rank <= 3)
      |> drop("rank")
  )
  |> print()
```

### ML pipeline

```
use ml
use viz
use env

load("data.csv")
  |> ml.embed(
      on: "text", out: "embedding",
      source: "deepinfra", model: "Qwen/Qwen3-Embedding-4B",
      apikey: env.get("DEEPINFRA_TOKEN"))
  |> ml.kmeans(k: 2..8, on: "embedding", out: "cluster")
  |> ml.umap(dims: 2, on: "embedding", out: "umap")
  |> viz.scatter(x: "umap1", y: "umap2", color: "cluster", label: "text", display: ["labels", "legend"])
```

### Error handling

```
result = load("data.csv")
  |> filter(it.score > 0.5)

match(result,
  Ok(data): data |> print(),
  Err(msg):  print(msg)
)
```

---

See [docs/language.md](docs/language.md) for the full reference and [examples/](examples/) for more.
