Metadata-Version: 2.4
Name: semiformal-py
Version: 0.2.0
Summary: Runtime semiformal system: @semiformal and semi() for semantically underspecified logic
Author-email: Ryan Yen <ryanyen2@mit.edu>
License: MIT
Keywords: ai,code-generation,llm,semiformal,pydantic-ai
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: License :: OSI Approved :: MIT License
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: python-dotenv>=1.0
Requires-Dist: pydantic-ai>=0.0.54
Requires-Dist: rich>=13.0
Provides-Extra: jupyter
Requires-Dist: ipywidgets>=8.1; extra == "jupyter"
Provides-Extra: pdf
Requires-Dist: liteparse>=1.0; extra == "pdf"
Requires-Dist: llama-cloud>=1.6.0; extra == "pdf"
Provides-Extra: e2b
Requires-Dist: e2b-code-interpreter>=2.4.1; extra == "e2b"
Provides-Extra: cocoindex
Requires-Dist: cocoindex>=0.3.19; extra == "cocoindex"
Provides-Extra: search
Requires-Dist: firecrawl-py>=1.0; extra == "search"
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: ruff>=0.4; extra == "dev"
Dynamic: license-file

# semipy

Write Python logic in natural language. `semipy` generates, validates, and caches real Python functions from your specs — no boilerplate, no hardcoded rules.

```python
from semipy import semiformal, semi, configure

configure(openai_api_key="sk-...")  # or set OPENAI_API_KEY in environment

@semiformal
def parse_log_line(line: str) -> dict:
    #> extract timestamp, level, and message from an Apache log line
    result = ...
    return result

# First call: agent generates a Python function and caches it.
# All later calls: load the cached implementation instantly.
row = parse_log_line('127.0.0.1 - - [10/Oct/2000:13:55:36] "GET /index.html" 200')
```

## Install

```bash
pip install semipy
```

For Jupyter notebook display:

```bash
pip install "semipy[jupyter]"
```

For PDF input materialization:

```bash
pip install "semipy[pdf]"
```

## API key

semipy uses the OpenAI API for code generation. Set your key in one of three ways:

**Environment variable** (recommended for scripts):

```bash
export OPENAI_API_KEY=sk-...
```

**`.env` file** at the project root (loaded automatically):

```
OPENAI_API_KEY=sk-...
```

**In code** (overrides env):

```python
from semipy import configure
configure(openai_api_key="sk-...")
```

## Configuration

```python
from semipy import configure

configure(
    openai_api_key="sk-...",       # defaults to OPENAI_API_KEY env var
    openai_model="gpt-4o",         # generation model (default: gpt-5.4)
    verbose=True,                  # rich terminal output during generation (default: True)
    cache_dir=".semiformal",       # where portal JSON and dispatch modules are stored
    max_retries=3,                 # agent retry limit on validation failure
    session_source=None,           # pin portal identity (useful for Jupyter; see below)
)
```

All fields are optional — call `configure()` with only what you want to override.

## Terminal output

When `verbose=True` (the default), semipy prints a live Rich panel showing the agent's reasoning, tool calls, and generated code as they stream in. This works in both terminal and Jupyter.

```
 Implementing code...
 ─────────────────────────────────────────────────
  Reasoning  The function needs to parse a standard Apache
             Combined Log Format line...
 ─────────────────────────────────────────────────
  Tool  build_and_run_gist  passed
 ─────────────────────────────────────────────────
  Reusing cached implementation; runtime verify passed.
  parse_log_line  GENERATE  a1b2c3d4  examples/logs.py:12
```

To silence all output: `configure(verbose=False)`.

To see full prompt, decision, and tool-call dumps, set the environment variable:

```bash
export SEMIPY_PIPELINE_TRACE=1
```

## Usage patterns

### `@semiformal` with `#>` spec blocks

```python
from semipy import semiformal

@semiformal
def extract_fields(record: str) -> dict:
    #> extract date, sender, and subject from an email header
    result = ...
    return result
```

The `#>` block is the spec. The `result = ...` is the slot anchor — the agent fills it in.

### Inline `semi()` for expressions

```python
from semipy import semiformal, semi

@semiformal
def classify_rows(rows):
    return [r for r in rows if semi(f"is {repr(r['status'])} a client error?")]
```

### Standalone `semi()` in any function

```python
from semipy import semi

def process(text):
    label = semi(f"classify '{text}' as positive, negative, or neutral")
    return label
```

### Multiple slots in one function

```python
from semipy import semiformal

@semiformal
def analyze(entry: str) -> dict:
    #> extract the IP address from the log entry
    ip = ...

    #> determine if the HTTP status code in the entry indicates an error
    is_error = ...

    return {"ip": ip, "error": is_error}
```

## Reasoning surface (`#<` lines)

After each generation, semipy writes a small set of `#<` comment lines around each slot anchor. These are not part of your spec — they are system-managed traces that describe what the generated implementation decided and why, so you can read and steer it without digging into the cache.

```python
@semiformal
def infer_datetime_formatter(date_str: str) -> str:
    #< intent: infer strptime pattern from observed date text
    #< by: probing a regex-gated candidate table; because the table covers all observed formats
    #< unless: empty or unmatched input raises ValueError
    #> verified: 'Mar 2025' -> {'input_pattern':'%b %Y'}, return error if no match
    input_pattern = ... #> infer the input date regex/strptime pattern from the observed string format in this session.
    output_pattern = "%b %Y"
    return datetime.strptime(str(date_str), input_pattern).strftime(output_pattern)
```

### Placement

`#<` lines appear in two zones around the slot anchor:

- **Zone P (provenance, above the anchor):** `intent`, `given`, `by`, `unless` — why this implementation exists, what it assumes, and how it handles failure.
- **Zone E (effect, below the anchor):** `yields`, `verified` — what the generated code produces and what was observed at runtime.

### Keywords

| Key | Zone | Meaning |
|---|---|---|
| `intent` | above | One-phrase task summary (emitted only when the spec is long or ambiguous). |
| `given` | above | Input-shape assumptions beyond the signature (multi-param slots only). |
| `by` | above | Strategy/mechanism this implementation uses. Embed the reason inline when the choice is non-obvious: `<strategy>; because <reason>`. Always present. |
| `unless` | above | Fallback or exceptional path (emitted only when the generated code has a raise/except). May repeat for distinct failure modes. |
| `yields` | below | Output shape beyond the return annotation (skipped for simple builtins like `str`, `int`). |
| `verified` | below | Sample input → observed output, derived from the validation run (never LLM-generated). |

### Steering

To change what the next generation produces, edit the `#< by:` line (strategy) or the `#< unless:` line (exceptional path). On the next run where the implementation needs to change, semipy reads the override and adapts accordingly.

### Promoting a constraint

To lock an inference note into the contract permanently, flip `#<` to `#>` on the same line. This extends the spec text, causing a new ADAPT on the next run, and suppresses the duplicate `#<` line from being re-emitted.

```python
# Before: system-managed trace
#< by: probing a regex-gated candidate table; because the table covers all observed formats

# After: promoted to user contract — fixes the strategy for future runs
#> by: regex-gated candidate table
```

`#<` lines are stable across runs: semipy only rewrites them when the generated implementation changes. A no-op re-run produces a byte-identical file.

## Caching and reuse

Generated functions are stored in `.semiformal/` relative to your working directory:

- `.semiformal/<session>.portal.json` — versioned DAG of all commits, branches, and decisions.
- `.semiformal/runtime/<module>.semi.py` — compiled Python implementations for import.

On subsequent runs, semipy loads the cached implementation without calling the LLM. It re-verifies when it detects new input shapes, and runs ADAPT (re-generation from the prior commit) when verification fails.

To force regeneration, delete `.semiformal/` or the relevant portal file.

## Jupyter

In Jupyter notebooks, semipy detects the `ipykernel` environment automatically. The portal is keyed to `os.getcwd()` so one portal persists across kernel restarts.

Install the optional display extras for inline Rich output:

```bash
pip install "semipy[jupyter]"
```

If multiple notebooks share a working directory and need separate caches:

```python
configure(session_source="/path/to/my_notebook.ipynb")
```

## VS Code extension

The [Semipy VS Code extension](https://marketplace.visualstudio.com/items?itemName=semipy.semipy-vscode) adds:

- Syntax highlighting for `#>` spec lines (teal) and `#<` reasoning lines (green)
- Slot history tree in the Explorer panel
- Split-view to inspect generated `.semi.py` alongside your source
- Inlay hints and CodeLens showing commit id and decision

## Examples

See the `examples/` directory:

- `examples/apache_log_semiformal_stages.py` — staged walkthrough from simple extraction to INSTANTIATE and promotion workflow
- `examples/datetime_test.py` — datetime format inference with observed samples and steered `#<` surface
- `examples/fasta_header_metadata.py` — FASTA header parsing with multiple slots, nested anchors, and standalone `semi()` calls
- `examples/use_contract_intelligence.py` — contract field extraction from PDF
- `examples/use_sponsorship_canonicalizer.py` — entity canonicalization pipeline

## License

MIT
