Metadata-Version: 2.4
Name: Grimmerie
Version: 0.1.7
Summary: Functions for Prototyping, QOL and Sanity checking
Author: Joe Petrecca
License-Expression: MIT
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: torch>=2.0
Requires-Dist: transformers>=4.38
Requires-Dist: adapters>=1.0
Requires-Dist: numpy>=1.23
Requires-Dist: sentencepiece
Requires-Dist: scikit-learn>=1.2
Requires-Dist: pandas>=1.5
Provides-Extra: nlp
Requires-Dist: spacy>=3.0; extra == "nlp"

# Grimmerie

A spellbook for Python.

Grimmerie is a collection of high-level utilities (“spells”) for rapid prototyping, sanity checking, and removing friction from common ML and NLP workflows.

Each spell compresses a multi-step pipeline into a **single call**.

---

## Installation

```bash
pip install grimmerie
```

---

## Core Idea

Instead of wiring pipelines manually:

```python
# lots of setup...
```

You do:

```python
embeddings = specterize(data)
```

Behind that one call:
- Input normalization (strings, dicts, lists, pandas Series)
- Model loading and caching
- Tokenization
- Adapter activation
- Output formatting

You get vectors. Immediately.

---

## Input Philosophy

All spells accept:

- `str`
- `dict`
- `list`
- any iterable
- **pandas Series (first-class supported)**

Example:

```python
df["title"] + " " + df["abstract"]
```

goes straight in.

---

## Spells

## `specterize`

Generate SPECTER2 embeddings using Hugging Face Transformers + adapters.

```python
from grimmerie import specterize

texts = [
    {"title": "BERT", "abstract": "We introduce a new model"},
    {"title": "Attention", "abstract": "Transformers dominate NLP"},
]

emb = specterize(texts, return_type="numpy")
```

### Return Types

```python
return_type = ["list", "numpy", "tensor"]
```

- "list" → list[list[float]]
- "numpy" → np.ndarray (n, 768)
- "tensor" → torch.Tensor

---

## `tfidfize`

Generate TF-IDF vectors using scikit-learn.

```python
from grimmerie import tfidfize

X = tfidfize(df["title"] + " " + df["abstract"], return_type="array")
```

### Return Types

```python
return_type = ["sparse", "array", "list", "frame"]
```

- "sparse" → scipy sparse matrix (default, best for large data)
- "array" → np.ndarray (n, d)
- "list" → list[list[float]]
- "frame" → pandas DataFrame (columns = vocab)

---

## Common Pattern

Both spells preserve **row alignment**:

```python
X = tfidfize(df["title"], return_type="array")
E = specterize(df["title"], return_type="numpy")

# row i ↔ df.iloc[i]
```

---

## Saving Outputs

Dense:

```python
import numpy as np

X = tfidfize(df["title"], return_type="array")
np.save("tfidf.npy", X)
```

Sparse (recommended for TF-IDF):

```python
from scipy import sparse

X = tfidfize(df["title"], return_type="sparse")
sparse.save_npz("tfidf.npz", X)
```

---

## Design Principles

### 1. One-call workflows
You should not need to think about setup.

### 2. Strong defaults
Everything is preconfigured to “just work”.

### 3. Hidden complexity
Spells handle the annoying parts so you can focus on ideas.

### 4. Consistent interfaces
Same input patterns across spells.

---

## When to Use Grimmerie

- Rapid experimentation
- Prototyping NLP pipelines
- Testing ideas quickly
- Building demos

---

## When Not to Use It

- You need full control over every step
- You care about exact pipeline reproducibility
- You are debugging low-level model behavior

---

## Notes

- First call may download models
- Models are cached automatically
- Inputs are normalized internally
- Large TF-IDF outputs may be sparse

---

## Direction

Grimmerie is evolving toward a unified set of spells for:

- Vectorization
- Dimensionality reduction
- Visualization
- Data inspection

All following the same idea:

```python
result = spell(data)
```

---

## Minimal Mental Model

```python
data → normalize → compute → return vectors
```

That’s it.
