Metadata-Version: 2.4
Name: efrog
Version: 0.4.0
Summary: Universal decompiler for the EML substrate — Forge spelled backwards. Anything → EML.
Author-email: "Arturo R. Almaguer" <almaguer1986@gmail.com>
License: Apache-2.0
Project-URL: Homepage, https://monogateforge.com
Project-URL: Source, https://github.com/agent-maestro/efrog
Keywords: eml,decompiler,monogate,forge,symbolic,ast
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Compilers
Classifier: Topic :: Scientific/Engineering :: Mathematics
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: pycparser>=2.21
Requires-Dist: esprima>=4.0
Provides-Extra: dev
Requires-Dist: pytest>=8; extra == "dev"
Provides-Extra: mic
Requires-Dist: numpy>=1.21; extra == "mic"
Requires-Dist: sounddevice>=0.4; extra == "mic"

# eFrog — universal decompiler for the EML substrate

> *Forge compiles math into everything. eFrog compiles everything into math.*

`pip install efrog` *(pre-release)*

eFrog reads source files (and eventually sound, sensors, and shaders)
and extracts the underlying mathematical structure as
[EML](https://monogateforge.com) — the same intermediate language
[Monogate Forge](https://github.com/agent-maestro/monogate-forge)
emits *into*. eFrog is the reverse direction: every input becomes an
EML tree you can profile, classify against a corpus of canonical
math patterns, optimize, and re-compile to any of Forge's targets.

## What's shipped (E1 + E2 + E2.5 + early E3 + E4 scaffolding)

```bash
efrog gaussian.py              # Python AST → EML
efrog gaussian.c               # C (math.h) → EML, via pycparser
efrog gaussian.js              # JavaScript / TypeScript → EML, via esprima
efrog gaussian.rs              # Rust → EML, hand-rolled parser
efrog gaussian.m               # MATLAB / Octave → EML, hand-rolled parser

efrog --profile  gaussian.py   # per-fn chain order, drift risk, node count
efrog --verify   gaussian.py   # round-trip the EML, sample N inputs, prove
                               # bit-exact agreement with the original
efrog --genome   gaussian.py   # classify each fn against a small corpus
                               # (gaussian / sigmoid / softplus / polynomial …)
efrog --lean     gaussian.py   # emit Lean 4 theorem skeletons
efrog --optimize gaussian.py   # conservative algebraic simplifier
                               # (x+0, x*1, exp(0), x**2 → x*x, …)
efrog --mic                    # capture audio, FFT, emit single-sine EML
```

A typical extraction looks like this:

```eml
module gaussian;

fn gaussian(mu: Real, sigma: Real, x: Real) -> Real
    where chain_order <= 1
{
    let dx = x - mu;
    exp(-dx * dx / (2.0 * sigma * sigma)) / sigma
}
```

### E1 — Python pure math

- `math.exp/log/sqrt/sin/cos/tan/asin/acos/atan/sinh/cosh/tanh/pow/fabs`
  → EML builtins
- `math.pi`, `math.e`, `math.tau` → exact-repr numeric literals
- `**` (power), `+`, `-`, `*`, `/`, unary `-` → EML operators
- `def f(x: float, ...) -> float` → `fn f(x: Real, ...) -> Real`
- `let` bindings via local `name = expr` lines, then `return expr`
- Top-level `MODULE_NAME = "..."` overrides the inferred module name

### E2 — Loops, conditionals, NumPy, C

- **Fixed-iteration loop unrolling** — `for i in range(N):` with
  literal N (≤ 64) expands to a flat let-chain
- **Augmented assigns** — `x *= y`, `x += y` etc. lower to
  `let x = x * y;`
- **Conditional flattening** — `if cond: A else B` and ternary
  `A if cond else B` become `lerp(B, A, step01(cond))`. Since EML has
  no native conditional, eFrog emits a `step01` shim into the module
  preamble (`clamp(x * 1e30, 0, 1)`). Boolean composition: `and` →
  product of selectors, `or` → 1 − product of complements, `not` →
  `1 - sel`.
- **NumPy element-wise** — `np.exp/sin/...` map to the same EML
  builtins as `math.*`; aliases like `np.maximum/minimum/arcsin/...`
  resolve to `max/min/asin/...`; `np.pi/np.e/np.tau` inlined
- **C decompiler** — `double f(double x) { ... }` style; `math.h`
  calls; `M_PI`/`M_E`/`M_SQRT2`/etc. constants; compound assigns
  (`y *= x`); cast strip (`(double) n`); `f(void)` parameter lists.
  Uses pycparser; no preprocessor required (we strip `#`-lines and
  comments)

### E2.5 — JavaScript, Rust, MATLAB

- **JavaScript / TypeScript** — `function f(x) { ... }` and arrow
  forms `const f = (x) => …`. `Math.exp/sin/...` calls strip the
  `Math.` namespace; `Math.PI`/`E`/`SQRT2`/... inlined; ternaries
  flatten branch-free. Pure-Python esprima parser, no native deps.
- **Rust** — `fn f(x: f64) -> f64 { … }` with explicit `return` or
  trailing tail expression; `let` (incl. `let mut`) bindings;
  compound assigns; method-call lowering (`x.exp()` → `exp(x)`,
  `x.powf(2.0)` → `pow(x, 2.0)`); associated-function form
  (`f64::sqrt(x)`); path constants (`std::f64::consts::PI`).
- **MATLAB / Octave** — `function y = f(x) ... end`; output-var
  binding becomes the function's tail expression; `pi`/`e` inlined;
  `.*`/`.^` treated as scalar; `%` and `#` comments; `...` line
  continuation.

### E3 partial — Numerical round-trip + Lean scaffolding + per-fn profile

- `--profile` — per-fn chain order, transcendental count, node
  count, drift-risk hint (`low/medium/high`), and flags for
  `div`/`sub` (the two ops most commonly responsible for fp64
  cancellation).
- `--verify` — re-emits the decompiled EML as runnable Python via
  a self-contained primitive shim (no Forge install needed), samples
  `--samples N` random inputs per parameter using sane per-name
  domains (sigma → positive, omega → \[0, 2π], ...), runs both the
  original and the round-trip on every sample, and reports max
  relative error. PASS if every sample agrees to within 1e-9
  relative. The NumPy-using examples work without numpy installed
  thanks to a `sys.modules['numpy']` shim.
- `--lean` — emits a Lean 4 module per source: a `def` translating
  the EML body into `Real.exp/sin/...` calls (Mathlib-style), plus
  two `theorem` skeletons per function (`<name>_chain_order`,
  `<name>_eml_consistent`). Bodies are deliberately `sorry` /
  `trivial` — these are the scaffolds the MonogateEML proof sprint
  discharges.
- `--genome` — classifies every decompiled function against a small
  curated corpus of canonical math landmarks (gaussian, sigmoid,
  softplus, ReLU, sinusoid, polynomial, …) with a structural
  similarity score (Jaccard over transcendentals + helpers + binops,
  weighted with a chain-order penalty). The full SuperBEST corpus
  ships separately.

### E5 partial — Algebraic simplifier

- `--optimize` — conservative bottom-up rewriter with fixed-point
  iteration. Safe identities only: `x + 0 → x`, `x * 1 → x`,
  `x * 0 → 0`, `-(-x) → x`, `x + x → 2*x`, `pow(x, 2) → x*x`,
  `exp(0) → 1`, `log(1) → 0`, `sin(0) → 0`, `cos(0) → 1`,
  `sqrt(0/1) → 0/1`, plus constant folding for two-literal binops.
  Pair with `--verify` to confirm the rewrite preserved every value.

### E4 scaffolding — The math microphone

- `--mic` — captures `--mic-duration` seconds from the default input
  device, runs an FFT, picks the dominant non-DC bin, and emits a
  single-sine EML fit `mic_signal(t) = A * sin(2π f t + φ)` plus
  amplitude / phase / SNR diagnostics. Multi-tone, harmonic, and
  envelope decomposition land in full E4. `pip install efrog[mic]`
  pulls in `numpy` + `sounddevice`.

### Coming

| Phase | What | When |
|-------|------|------|
| E3-full | Lean theorem generation (BFS proof attacks)    | month 4–6 |
| E4-full | Multi-tone / harmonic / envelope decomposition | month 6–8 |
| E5-full | Universal optimizer pipe (CSE, trig identities) | month 6–8 |
| E6 | Sensor expansion (camera / stethoscope / ...)       | month 8+ |

Full roadmap: `monogate-research/products/software/efrog/ROADMAP.md`.

## Status

174 tests green. Five source languages (Python, C, JavaScript, Rust,
MATLAB). Loops, ternaries, branch-free conditionals, NumPy aliases,
per-function profiling, bit-exact numerical round-trip, Lean 4
scaffolding, genome classification, algebraic simplification, and
single-sine audio fit all working. `while` loops, comprehensions,
classes, real vector operations, pointers, arrays, structs, and
multi-output MATLAB functions still raise honest "not supported,
see ROADMAP.md" errors.

## License

Apache 2.0.
