Metadata-Version: 2.4
Name: efrog
Version: 0.7.0
Summary: Supported source frontends to Forge-compatible EML — eFrog is Forge spelled backwards.
Author-email: "Arturo R. Almaguer" <almaguer1986@gmail.com>
License: Apache-2.0
Project-URL: Homepage, https://monogateforge.com
Project-URL: Source, https://github.com/agent-maestro/efrog
Keywords: eml,decompiler,monogate,forge,symbolic,ast
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Compilers
Classifier: Topic :: Scientific/Engineering :: Mathematics
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: pycparser>=2.21
Requires-Dist: esprima>=4.0
Requires-Dist: javalang>=0.13
Provides-Extra: dev
Requires-Dist: pytest>=8; extra == "dev"
Provides-Extra: mic
Requires-Dist: numpy>=1.21; extra == "mic"
Requires-Dist: sounddevice>=0.4; extra == "mic"

# eFrog — supported source frontends for the EML substrate

> *Forge compiles EML into targets. eFrog lifts supported source inputs into EML.*

`pip install efrog` *(pre-release)*

eFrog reads supported source files and extracts their mathematical structure as
[EML](https://monogateforge.com) — the same intermediate language
[Monogate Forge](https://github.com/agent-maestro/monogate-forge)
emits *into*. eFrog is the reverse direction for covered frontends:
supported inputs become EML trees you can profile, classify against a
corpus of canonical math patterns, optimize, and compile through Forge
targets.

Current capability boundary:

- Local package: 12 source frontends — Python, C, JavaScript, Rust,
  MATLAB, Java, Go, Kotlin, GDScript, Lua, Julia, Solidity.
- Hosted `efrog.dev` / MCP: 5 source frontends — Python, C,
  JavaScript, Rust, MATLAB.
- Forge contract: all 12 local frontends pass the small-fixture
  matrix for EML emission, Forge format check, Forge Python compile,
  and Python bytecode compile.
- This is small-fixture compatibility evidence, not a claim that
  arbitrary source programs decompile perfectly.

## What's shipped (E1 + E2 + E2.5 + early E3 + E4 scaffolding + E5 + E6)

```bash
efrog gaussian.py              # Python AST → EML
efrog gaussian.c               # C (math.h) → EML, via pycparser
efrog gaussian.js              # JavaScript / TypeScript → EML, via esprima
efrog gaussian.rs              # Rust → EML, hand-rolled parser
efrog gaussian.m               # MATLAB / Octave → EML, hand-rolled parser
efrog gaussian.java            # Java → EML, via javalang
efrog gaussian.go              # Go → EML, hand-rolled parser
efrog gaussian.kt              # Kotlin → EML, hand-rolled parser
efrog gaussian.gd              # GDScript (Godot) → EML, hand-rolled parser
efrog gaussian.lua             # Lua → EML, hand-rolled parser
efrog gaussian.jl              # Julia → EML, hand-rolled parser
efrog gaussian.sol             # Solidity (pure/view fns) → EML, hand-rolled parser

efrog --profile  gaussian.py   # per-fn chain order, drift risk, node count
efrog --verify   gaussian.py   # round-trip the EML, sample N inputs, and
                               # compare numerically with the original
efrog --genome   gaussian.py   # classify each fn against a small corpus
                               # (gaussian / sigmoid / softplus / polynomial …)
efrog --lean     gaussian.py   # emit Lean 4 theorem skeletons
efrog --prove    gaussian.py   # run BFS prover; closes chain_order
                               # via concrete Nat reduction (--prove
                               # implies --lean)
efrog --optimize gaussian.py   # conservative algebraic simplifier
                               # (x+0, x*1, exp(0), x**2 → x*x, …)
efrog --mic                    # capture audio, FFT, emit single-sine EML
```

A typical extraction looks like this:

```eml
module gaussian;

fn gaussian(mu: Real, sigma: Real, x: Real) -> Real
    where chain_order <= 1
{
    let dx = x - mu;
    exp(-dx * dx / (2.0 * sigma * sigma)) / sigma
}
```

### E1 — Python pure math

- `math.exp/log/sqrt/sin/cos/tan/asin/acos/atan/sinh/cosh/tanh/pow/fabs`
  → EML builtins
- `math.pi`, `math.e`, `math.tau` → exact-repr numeric literals
- `**` (power), `+`, `-`, `*`, `/`, unary `-` → EML operators
- `def f(x: float, ...) -> float` → `fn f(x: Real, ...) -> Real`
- `let` bindings via local `name = expr` lines, then `return expr`
- Top-level `MODULE_NAME = "..."` overrides the inferred module name

### E2 — Loops, conditionals, NumPy, C

- **Fixed-iteration loop unrolling** — `for i in range(N):` with
  literal N (≤ 64) expands to a flat let-chain
- **Augmented assigns** — `x *= y`, `x += y` etc. lower to
  `let x = x * y;`
- **Conditional flattening** — `if cond: A else B` and ternary
  `A if cond else B` become `lerp(B, A, step01(cond))`. Since EML has
  no native conditional, eFrog emits a `step01` shim into the module
  preamble (`clamp(x * 1e30, 0, 1)`). Boolean composition: `and` →
  product of selectors, `or` → 1 − product of complements, `not` →
  `1 - sel`.
- **NumPy element-wise** — `np.exp/sin/...` map to the same EML
  builtins as `math.*`; aliases like `np.maximum/minimum/arcsin/...`
  resolve to `max/min/asin/...`; `np.pi/np.e/np.tau` inlined
- **C decompiler** — `double f(double x) { ... }` style; `math.h`
  calls; `M_PI`/`M_E`/`M_SQRT2`/etc. constants; compound assigns
  (`y *= x`); cast strip (`(double) n`); `f(void)` parameter lists.
  Uses pycparser; no preprocessor required (we strip `#`-lines and
  comments)

### E5 — Java

- **Java** — `public static double f(double x) { ... }` style methods
  in one or more top-level classes; `Math.exp/sin/...` calls strip
  the `Math.` namespace; `Math.PI`/`Math.E` inlined; `Math.pow(x, y)`
  lowers to the EML `^` operator; numeric literal suffixes
  (`f`/`F`/`d`/`D`/`l`/`L`) and `_` separators stripped; compound
  assigns (`acc *= x`) lower to a re-bound `let`. Instance methods
  and `void` returns are skipped/rejected with a clear message.
  Pure-Python parser via `javalang` — no JDK required.

### E2.5 — JavaScript, Rust, MATLAB

- **JavaScript / TypeScript** — `function f(x) { ... }` and arrow
  forms `const f = (x) => …`. `Math.exp/sin/...` calls strip the
  `Math.` namespace; `Math.PI`/`E`/`SQRT2`/... inlined; ternaries
  flatten branch-free. Pure-Python esprima parser, no native deps.
- **Rust** — `fn f(x: f64) -> f64 { … }` with explicit `return` or
  trailing tail expression; `let` (incl. `let mut`) bindings;
  compound assigns; method-call lowering (`x.exp()` → `exp(x)`,
  `x.powf(2.0)` → `pow(x, 2.0)`); associated-function form
  (`f64::sqrt(x)`); path constants (`std::f64::consts::PI`).
- **MATLAB / Octave** — `function y = f(x) ... end`; output-var
  binding becomes the function's tail expression; `pi`/`e` inlined;
  `.*`/`.^` treated as scalar; `%` and `#` comments; `...` line
  continuation.

### E3 partial — Numerical round-trip + Lean scaffolding + per-fn profile

- `--profile` — per-fn chain order, transcendental count, node
  count, drift-risk hint (`low/medium/high`), and flags for
  `div`/`sub` (the two ops most commonly responsible for fp64
  cancellation).
- `--verify` — re-emits the decompiled EML as runnable Python via
  a self-contained primitive shim (no Forge install needed), samples
  `--samples N` random inputs per parameter using sane per-name
  domains (sigma → positive, omega → \[0, 2π], ...), runs both the
  original and the round-trip on every sample, and reports max
  relative error. PASS if every sample agrees to within 1e-9
  relative. The NumPy-using examples work without numpy installed
  thanks to a `sys.modules['numpy']` shim.
- `--lean` — emits a Lean 4 module per source: a `def` translating
  the EML body into `Real.exp/sin/...` calls (Mathlib-style), plus
  two `theorem` skeletons per function (`<name>_chain_order`,
  `<name>_eml_consistent`). Bodies are deliberately `sorry` /
  `trivial` — these are the scaffolds the MonogateEML proof sprint
  discharges.
- `--genome` — classifies every decompiled function against a small
  curated corpus of canonical math landmarks (gaussian, sigmoid,
  softplus, ReLU, sinusoid, polynomial, …) with a structural
  similarity score (Jaccard over transcendentals + helpers + binops,
  weighted with a chain-order penalty). The full SuperBEST corpus
  ships separately.

### E6 — Six more languages (v0.5.0)

- **Go** — `func f(x, sigma float64) float64 { … }` with both shared
  (`x, y float64`) and per-param (`x int, y float64`) parameter
  shapes; `math.Exp/Sin/Pow/...` with `Math.` namespace strip;
  `math.Pi`/`math.E`/`math.Sqrt2` inlined; `:=` and `var x = …`
  bindings; numeric suffix-free literals.
- **Kotlin** — both expression-bodied `fun f(x: Double): Double = expr`
  and block-bodied `fun f(...): Double { … }` forms;
  `kotlin.math.exp/...` strip; `Math.PI` / `kotlin.math.PI` and
  bare `PI`/`E` from `import kotlin.math.*` resolved; `val`/`var`
  bindings.
- **GDScript (Godot)** — indent-based parser for `func f(x: float)
  -> float:` signatures with body parsed as a token stream; Godot
  globals (`PI`, `TAU`, `E`) inlined; `pow(x, n)`, `sqrt`, `sin`,
  `cos`, `tan`, `exp`, `log` as built-ins; `var x = expr` lowered
  to a `let`.
- **Lua** — both `function name(...) … end` and `local function
  name(...) … end`; `^` exponentiation lowered to `pow(a, b)`
  (right-associative); `math.exp/sin/...` table-dispatch + `math.pi`
  / `math.huge` constants; `local x = expr` bindings.
- **Julia** — both block form (`function f(x) … end`) and one-liner
  assignment form (`f(x) = expr`); Unicode π / ℯ supported in the
  identifier regex; `exp/sin/...` and `Base.MathConstants.pi`
  resolved; `let` re-bindings.
- **Solidity** — `pragma solidity ^0.8.x` accepted; only `pure` /
  `view` functions decompiled (state-mutating ones silently skipped);
  `require` / `assert` / `revert` and `unchecked { … }` blocks
  pass through; ternary `a > b ? a : b` lowered to `max(a, b)` and
  `a < b ? a : b` to `min(a, b)`. Useful for verifying that
  on-chain math kernels match an EML reference.

### E3-full — BFS Lean prover (v0.6.0 + Phase 2 in v0.7.0)

`--prove` runs a structural BFS over a small tactic ladder and
discharges *both* the chain-order and consistency theorems
`--lean` was previously emitting as `True := by trivial`. The
prover:

1. Mirrors the decompiled body to a concrete `EML` inductive AST in
   the Lean preamble (the `inductive EML` plus a `chainOrder : EML
   → Nat` recursor land beside Mathlib imports). v0.7 adds an
   `eval : EML → (String → ℝ) → ℝ` evaluator and an `evalCall`
   atlas covering the 18 known transcendentals.
2. Emits `def <name>_eml : EML := …` per function — a structural
   lift of the body into the `add` / `sub` / `mul` / `div` / `neg`
   / `call` constructors. Integer-power expansion (`x ** n` →
   `x * x * … * x`) for `n ∈ [1, 8]`. **v0.7 inlines let
   bindings** so the AST never has free occurrences of let-bound
   names.
3. Replaces `theorem <name>_chain_order : True := by trivial` with
   `theorem <name>_chain_order : chainOrder <name>_eml ≤ N := by
   decide` — `decide` reduces a closed `Nat` inequality.
4. **v0.7** — Replaces the consistency theorem's `True` body with
   a real equality:
   `theorem <name>_eml_consistent : ∀ args, eval <name>_eml
   <env_of args> = <name> args`. Tactic chosen by body shape:
    - **Polynomial body** (no calls) → `simp [eval, fn,
      fn_eml]; ring` (confidence: `proven`).
    - **Transcendental body** (has calls) → `simp [eval,
      evalCall, fn, fn_eml]` (confidence: `likely` — we predict
      closure but don't run Lean).
    - **Var-only body** (`f x = x`) → `simp [eval, fn, fn_eml]`
      reduces by definitional unfolding (confidence: `proven`).
    - **Unsupported shape** → `True := by trivial` fallback.
5. `--prove-report` prints a per-function pass/fail summary to
   stderr.

For the bundled gaussian / softplus / quadratic demo: 6/6 theorems
emitted with non-trivial propositions — the polynomial quadratic
closes via `ring`, gaussian + softplus go to `simp [eval,
evalCall, ...]`. Bodies the AST mirror can't represent (NaN
literals, comparison ops in conditionals, etc.) fall back to the
`True := by trivial` scaffold with `confidence = unknown`.

### E5 partial — Algebraic simplifier

- `--optimize` — conservative bottom-up rewriter with fixed-point
  iteration. Safe identities only: `x + 0 → x`, `x * 1 → x`,
  `x * 0 → 0`, `-(-x) → x`, `x + x → 2*x`, `pow(x, 2) → x*x`,
  `exp(0) → 1`, `log(1) → 0`, `sin(0) → 0`, `cos(0) → 1`,
  `sqrt(0/1) → 0/1`, plus constant folding for two-literal binops.
  Pair with `--verify` to confirm the rewrite preserved every value.

### E4 scaffolding — The math microphone

- `--mic` — captures `--mic-duration` seconds from the default input
  device, runs an FFT, picks the dominant non-DC bin, and emits a
  single-sine EML fit `mic_signal(t) = A * sin(2π f t + φ)` plus
  amplitude / phase / SNR diagnostics. Multi-tone, harmonic, and
  envelope decomposition land in full E4. `pip install efrog[mic]`
  pulls in `numpy` + `sounddevice`.

### Coming

| Phase | What | When |
|-------|------|------|
| E3-Lean | Hosted Lean runner — actually invoke `lake build` to confirm `likely` proofs close | month 4–6 |
| E4-full | Multi-tone / harmonic / envelope decomposition | month 6–8 |
| E5-full | Broader optimizer pipe (CSE, trig identities) | month 6–8 |
| E7 | Sensor expansion (camera / stethoscope / ...)       | month 8+ |

Full roadmap: `monogate-research/products/software/efrog/ROADMAP.md`.

## Status

249 tests green. Twelve local source frontends (Python, C, JavaScript,
Rust, MATLAB, Java, Go, Kotlin, GDScript, Lua, Julia, Solidity).
Loops, ternaries, branch-free conditionals, NumPy aliases,
per-function profiling, sample-based numerical round-trip, Lean 4
scaffolding plus BFS prover recipes for chain-order theorem shapes,
genome classification, algebraic simplification, and single-sine
audio fit all working. `while` loops, comprehensions, classes,
real vector operations, pointers, arrays, structs, multi-output
MATLAB functions, state-mutating Solidity, and non-pure functions
in any language still raise honest "not supported, see
ROADMAP.md" errors.

## License

Apache 2.0.
