Metadata-Version: 2.4
Name: pyllmr
Version: 0.1.0
Summary: A library for spec-based, validated R execution from Python via a bridge (rpy2), with optional LLM planning and CLI commands.
Author-email: Rodolphe Priam <rpriam@gmail.com>
Maintainer-email: Rodolphe Priam <rpriam@gmail.com>
License-Expression: GPL-2.0-or-later
Project-URL: Homepage, https://github.com/rpriam/pyllmr
Project-URL: Repository, https://github.com/rpriam/pyllmr
Project-URL: Issues, https://github.com/rpriam/pyllmr/issues
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: openai>=1.0.0
Requires-Dist: rpy2>=3.6.0
Requires-Dist: pandas>=2.0.0
Dynamic: license-file

# pyllmr

Define and validate specs from Python to R for calling functions via a bridge (rpy2): search/install packages, inspect docs/functions, run reproducible calls, with an optional LLM planner and CLI.

---

## What it does

- Search **R packages** on CRAN by keyword(s)
- List / find / doc / exported functions for installed R packages
- Install / update R packages from a chosen CRAN repo
- Inspect installed **Python packages** (local environment metadata only)
- Plan a reproducible **execution spec** from natural language (LLM)
- Execute a spec deterministically (no LLM required)
- Emit a Python runner script to replay a call/spec

The library `pyllmr` may be used as a structured call/execution layer for automated LLM-driven reporting (e.g., “plan, review, execute” pipelines), but it does not guarantee correctness of model outputs for this version. It may be used for learning about r functions for python users, and used for testing stuctured runs from Python code or cli. It was iteratively developed using modern tooling for coding, docstrings, and debugging. This requires a working R installation (available on PATH / `R_HOME`).

This project is provided **"as is"**, without warranty of any kind. You are responsible for reviewing any generated spec/code and for running it in a safe environment. This provides a deterministic local runtime to validate and execute R call specs. LLM-based planning is provided for convenience and should be reviewed before execution (use “plan, review, execute”) or the plan schema written by the human if required.

---

## Install

### From PyPI
```bash
pip install pyllmr
```

### From source
```bash
pip install -e .
```

### Compatibility for version v0.1.0

- Package currently tested on **Windows 11 only**.
- **Linux or macOS are not supported yet for now**.

### Quick sanity

```bash
pyllmr --version
pyllmr --checkenv
```

---

## CLI overview

```bash
pyllmr --help
```

The CLI is organized as:
- **R discovery & maintenance** (CRAN + installed R)
- **Python discovery** (installed env only)
- **Repro flows** (spec JSON → `--exec-spec`, optional `--emit-py`)
- **LLM flows** (`--plan`, `--call`, optional `--explain`)
- **I/O & formatting** (`--json`, `--tsv`, `--out`, `--out-spec`, `--file`)
- **Diagnostics** (`--checkenv`, `--verbose`)

---

## 1) R: search, list, find, docs, functions

### 1.1 Search CRAN
```bash
pyllmr --search-r excel
pyllmr --search-r excel --limit 15
pyllmr --search-r excel --json
pyllmr --search-r excel --tsv
pyllmr --search-r excel --installed
pyllmr --search-r excel --installed-only
pyllmr --search-r clustering --limit 30 --installed
```

### 1.2 List installed R packages
```bash
pyllmr --list-r
pyllmr --list-r --json
```

### 1.3 Find by exact name (basic metadata)
```bash
pyllmr --find-r readxl
pyllmr --find-r readxl --json
```

### 1.4 Show R package documentation
```bash
pyllmr --doc dplyr
```

### 1.5 List exported R functions
```bash
pyllmr --funs-r dplyr
pyllmr --funs-r stats
```

---

## 2) R: check / install

### 2.1 Check presence/versions
```bash
pyllmr --check dplyr readxl ggplot2
```

### 2.2 Install R packages (CRAN)
```bash
pyllmr --install-r readxl
pyllmr --install-r dplyr tidyr ggplot2
pyllmr --install-r readxl --repos https://cloud.r-project.org
```

---

## 3) Python: installed-only discovery

`--search-py` searches **installed packages in the current environment** using local env metadata (not online PyPI).

### 3.1 Search installed Python packages by keywords
```bash
pyllmr --search-py excel
pyllmr --search-py excel --limit 20
pyllmr --search-py excel --json --limit 20
pyllmr --search-py excel --tsv --limit 20
```

### 3.2 List installed Python packages
```bash
pyllmr --list-py
```

### 3.3 List public functions for a Python module
```bash
pyllmr --funs-py pandas
pyllmr --funs-py scipy.stats
pyllmr --funs-py json
```

---

## 4) Reproducible execution (NO LLM): spec JSON → --exec-spec

This is the “manual plan” path:
1) you create a **spec JSON**
2) `pyllmr --exec-spec` executes it locally via R

### 4.1 Minimal spec: t.test
Create the spec:
```bash
python -c "import json;json.dump({'package':'stats','function':'t.test','inputs':{'x':{'value':[1,2,3]},'y':{'value':[2,3,4]}},'kwargs':{}},open('spec_ttest.json','w'))"
```

Run it:
```bash
pyllmr --exec-spec spec_ttest.json
pyllmr --exec-spec spec_ttest.json --json
pyllmr --exec-spec spec_ttest.json --verbose
```

### 4.2 Matrix spec: chisq.test
Create the spec:
```bash
python -c "import json;json.dump({'package':'stats','function':'chisq.test','inputs':{'x':{'matrix':{'data':[[10,20,30],[15,25,35]],'byrow':True}}},'kwargs':{}},open('spec_chisq_matrix.json','w'))"
```

Run it:
```bash
pyllmr --exec-spec spec_chisq_matrix.json --json
```

### 4.3 Literal expression spec (if supported in your build)
Create the spec:
```bash
python -c "import json;json.dump({'package':'base','function':'sqrt','inputs':{'x':{'expr':'1+2*3'}},'kwargs':{}},open('spec_expr.json','w'))"
```

Run it:
```bash
pyllmr --exec-spec spec_expr.json --json
```

---

## 5) Files: whitelist local files with --file

When a spec references file paths (e.g. CSV wrapper), pass the file with `--file`.

Create the spec:
```bash
python -c "import json;json.dump({'package':'utils','function':'head','inputs':{'x':{'csv':{'path':'iris.csv','sep':',','encoding':'utf-8'}}},'kwargs':{'n':6}},open('spec_readcsv.json','w'))"
```

Execute (with file attached):
```bash
pyllmr --file iris.csv --exec-spec spec_readcsv.json --json
```

---

## 6) LLM planning/execution: --plan / --call

> Requires your model/provider configuration (e.g. `OPENAI_API_KEY`).

### 6.1 Plan only (returns a spec)
```bash
pyllmr --plan stats "Run a t-test between x=[1,2,3] and y=[2,3,4]"
pyllmr --plan stats "Compute mean of x=[1,2,3,4,5]"
pyllmr --plan stats "Compute sd of x=[1,2,3,4,5]"
pyllmr --plan stats "Compute quantiles of x=[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30] with probs=[0.25,0.5,0.75]"
```

Save the generated spec to a file:
```bash
pyllmr --plan stats "Run a t-test between x=[1,2,3] and y=[2,3,4]" --out-spec spec_from_llm.json
pyllmr --exec-spec spec_from_llm.json
```

### 6.2 Call (plan + execute)
```bash
pyllmr --unsafe --call stats "Run a t-test between x=[1,2,3] and y=[2,3,4]"
pyllmr --unsafe --call stats "Compute mean of x=[1,2,3,4,5]"
pyllmr --unsafe --call stats "Compute sd of x=[1,2,3,4,5]"
pyllmr --unsafe --call stats "Compute Pearson correlation between x=[1,2,3,4,5] and y=[2,1,4,3,5]"
pyllmr --unsafe --call stats "Fit a linear model y=[3.2,3.9,5.1,5.8,7.2] as a function of x=[1,2,3,4,5]"
pyllmr --unsafe --call stats "Run one-way ANOVA for y=[5,6,7,5,6,8,9,7,6] across group=['A','A','A','B','B','B','C','C','C']"
pyllmr --unsafe --call stats "Fit logistic regression for y=[0,0,1,0,1,1,0,1] with predictors x=[1,2,3,4,5,6,7,8]"
```

### 6.3 Show generated R code without executing
```bash
pyllmr --unsafe --call stats "Run a t-test between x=[1,2,3] and y=[2,3,4]" --show-r
pyllmr --unsafe --call stats "chisq test with matrix x=[[10,20,30],[15,25,35]]" --show-r
```

### 6.4 Explain (optional)
```bash
pyllmr --unsafe --call stats "Run a t-test between x=[1,2,3] and y=[2,3,4]" --explain
```

### 6.5 Override models
```bash
pyllmr --plan stats "Compute mean of x=[1,2,3,4,5]" --model gpt-4o-mini
pyllmr --unsafe --call stats "Compute mean of x=[1,2,3,4,5]" --model gpt-4o-mini
pyllmr --unsafe --call stats "Run a t-test between x=[1,2,3] and y=[2,3,4]" --explain --model-explain gpt-4o-mini
```

---

## 7) Emit a reproducible Python runner: --emit-py

Emit a runner script (replays the call/spec):
```bash
pyllmr --plan stats "Run a t-test between x=[1,2,3] and y=[2,3,4]" --emit-py run_call_ttest.py
pyllmr --unsafe --call stats "chisq test with matrix x=[[10,20,30],[15,25,35]]" --emit-py run_call_chisq.py
```

---

## 8) Output / formats / I/O flags

### 8.1 Machine-friendly output
```bash
pyllmr --search-r excel --json
pyllmr --search-r excel --tsv
```

### 8.2 Write outputs to files
```bash
pyllmr --search-r excel --json --out out.json
pyllmr --plan stats "Compute mean of x=[1,2,3]" --out-spec spec.json
```

### 8.3 Redaction (best-effort)
```bash
pyllmr --redact --plan stats "Run a t-test between x=[1,2,3] and y=[2,3,4]"
```

### 8.4 Attach local files as context
```bash
pyllmr --file notes.txt --plan stats "Run a t-test between x=[1,2,3] and y=[2,3,4]"
pyllmr --file data.csv --file schema.json --call stats "Compute correlation between x and y using data.csv"
```

---

## 9) Diagnostics / debugging

### 9.1 Environment diagnostics
```bash
pyllmr --checkenv
pyllmr --checkenv --verbose
```

### 9.2 Verbose logs
```bash
pyllmr --verbose --search-r excel
```

---

## 10) LLM provider/model overrides (advanced)

This section shows how to override the LLM **model** and **provider** explicitly. Use it when you want fast planning, a different model for explanations, or a local provider (e.g., Ollama).
LLM planning currently uses the OpenAI Python client and also supports **OpenAI-compatible endpoints** (e.g. local gateways) via environment variables like `OPENAI_BASE_URL` and `OPENAI_API_KEY`.

### 10.1 Override models per command

```bash
pyllmr --plan stats "mean of x=[1,2,3]" --model gpt-4o-mini --out-spec plan_mean.json
pyllmr --exec-spec plan_mean.json
pyllmr --unsafe --call stats "mean of x=[1,2,3]" --model gpt-4o-mini --show-r
pyllmr --unsafe --call stats "mean of x=[1,2,3]" --model gpt-4o-mini --explain --model-explain gpt-4o
```

### 10.2 Set default models via environment variables (PowerShell)

```powershell
$env:PYLLMR_MODEL="gpt-4o-mini"
$env:PYLLMR_MODEL_EXPLAIN="gpt-4o"
pyllmr --checkenv
```

### 10.3 OpenAI provider (PowerShell)

```powershell
$env:PYLLMR_PROVIDER="openai"
$env:OPENAI_API_KEY="..."
pyllmr --checkenv
```

### 10.4 OpenAI-compatible endpoint (proxy/self-host) (PowerShell)

```powershell
$env:PYLLMR_PROVIDER="openai"
$env:PYLLMR_BASE_URL="http://localhost:1234/v1"
$env:OPENAI_API_KEY="..."
pyllmr --checkenv
```

### 10.5 Ollama (local) example (PowerShell)

Example with a local Ollama server. Adjust the model name to one you have pulled (e.g., `llama3.1:8b`).

```powershell
$env:PYLLMR_PROVIDER="ollama"
$env:OLLAMA_HOST="http://localhost:11434"
$env:PYLLMR_MODEL="llama3.1:8b"
$env:PYLLMR_MODEL_EXPLAIN="llama3.1:8b"
pyllmr --checkenv

pyllmr --plan stats "mean of x=[1,2,3]" --out-spec plan_mean_ollama.json
pyllmr --exec-spec plan_mean_ollama.json
pyllmr --unsafe --call stats "mean of x=[1,2,3]" --show-r
```

Notes:
- `--model` controls the model used for `--plan` / `--call`.
- `--model-explain` controls the model used for `--explain`.
- `pyllmr --checkenv` prints the detected provider/model configuration (it does not change them).

---

## 11) Security model and recommended workflow (avoid direct execution without file checking)

- `--exec-spec` runs **local execution** (R) from a JSON spec. Treat specs as code: only run specs you trust.
- `--file` is a whitelist: attach only files you want the tool to access.
- `--redact` is best-effort redaction before sending context to the model. Do not rely on it for strict secrecy.
- LLM features can generate code/specs. Review with `--show-r` and/or use `--out-spec` + `--exec-spec` for a controlled workflow.

Recommended safe workflow:
1) `--plan ... --out-spec spec.json`
2) inspect spec (and optionally `--show-r`)
3) run `--exec-spec spec.json`

---

## 12) Troubleshooting

### R not found / rpy2 errors
- Make sure `R` (RStudio) is installed and accessible from the same shell where you run `pyllmr`.
- If needed, set `R_HOME` to your R installation directory.

### Windows tips
- Prefer running in a clean venv.
- If you see encoding errors, run with `--verbose` and verify your console/codepage settings.


---

## Python calls overview examples

```python
import json
from pyllmr import RCallSpec, validate_spec, execute_spec

def main() -> None:
    spec = RCallSpec(
        package="stats",
        function="t.test",
        inputs={"x": {"value": [1, 2, 3]}, "y": {"value": [2, 3, 4]}},
        kwargs={},
    )
    allowlist = {"stats": ["t.test"]}
    validate_spec(spec, files=set(), allowlist=allowlist)
    out = execute_spec(spec, file_paths={}, verbose=False)
    print(json.dumps(out, ensure_ascii=False, indent=2))

if __name__ == "__main__":
    main()
```

```python
import json
from pyllmr import run

def main() -> None:
    allowlist = {"stats": ["cor.test", "cor", "t.test", "lm"]}
    out = run(
        prompt="Compute Pearson correlation between x=[1,2,3,4,5] and y=[2,1,4,3,5].",
        file_paths={},
        allowlist=allowlist,
        model="gpt-4o-mini",
    )
    print(json.dumps(out, ensure_ascii=False, indent=2))

if __name__ == "__main__":
    main()
```

---

## Requirements

- Python 3.10+ (recommended)
- R installed and available on PATH (or configured with `R_HOME`)
- For LLM features (`--plan`, `--call`, `--explain`): provider env vars (e.g. `OPENAI_API_KEY`)

> This project **requires R (see also RStudio)**. Without R, the tool is mostly useless by design.

## Warning (Safety note)
The recommended workflow is **plan → review → execute** (`--plan --out-spec ...` then `--exec-spec ...`).<br>
Direct execution via `--call` requires `--unsafe`. Prefer `--plan` + inspect the file/spec JSON + run `--exec-spec`.

## TO DO NEXT

- Extend file and variable inputs as first-class arguments for R function calls.
- Publish a JSON Schema for `RCallSpec` and validate every spec against it.
- Add a small catalog of schemas for common R functions (discoverable + retrievable).
- Support YAML specs with lossless conversion to/from JSON for human-friendly reviews.
