Metadata-Version: 2.4
Name: span-lea
Version: 0.1.0
Summary: Align a math paper (LaTeX) with its Lean formalization (the .lea ledger).
Author: Lior Pachter
License-Expression: Apache-2.0
Keywords: lean,latex,formalization,blueprint,leanblueprint
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Mathematics
Classifier: Topic :: Text Processing :: Markup :: LaTeX
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Dynamic: license-file

# span

`span` bridges a LaTeX mathematics paper and its Lean 4 formalization. It
indexes raw Lean declarations, parses the paper's standard `\label{...}` commands, and
uses a **ledger** (`build/<paper>.lea`, JSON) to align paper objects
and Lean declarations. The ledger records, for every object, which Lean declarations realize it and whether they are formalized.

Optional `span:` comments in Lean can provide hints, but they are not required:
the core model is raw Lean facts + paper objects + a curated ledger. For
compatibility, `span` can also read a
[leanblueprint](https://github.com/PatrickMassot/leanblueprint) `content.tex`.

It is both a CLI and a Python library.

## Project layout it expects

```
<project>/
  latex/   main.tex            # paper, indexed by \label{...}
  lean/    *.lean              # Lean source; span indexes declarations directly
  build/   <paper>.lea         # curated/generated ledger (+ spines)
  span_notes.json             # optional per-entry notes, keyed by align id
```

`span` auto-detects the project root (the nearest ancestor with `latex/` and
`lean/`); override with `--project/--paper/--lean-source/--blueprint/--out`.
`--paper`, `--lean-source`, and `--blueprint` are repeatable. With no explicit
Lean input, `span` indexes `lean/` directly and, when `build/<paper>.lea`
already exists, uses that ledger as the curated alignment.

## CLI

```
span build              # reconcile -> build/main.lea (+ spines) and run checks
span init-ledger        # create a starter ledger from paper ids and raw Lean declarations
span build --lean-source lean
                        # preferred: index raw Lean declarations directly
span build --lean-source lean --ledger build/main.lea
                        # validate/update using an existing curated alignment ledger
span build --full-lean-coverage
                        # include every unaligned raw Lean declaration as a lean half-span
span build --paper latex/main.tex --paper latex/appendix.tex
                        # reconcile a corpus of multiple labeled paper roots
span build --blueprint lean/blueprint/src/content.tex --blueprint other/content.tex
                        # reconcile against multiple leanblueprint roots
span build --latex      # also run a pdflatex compile check (C8)
span build --decls      # also check named lean decls exist in source (C13)
span check [--strict]   # run the verification suite (C1-C15); nonzero exit on errors
span check --decls      # add C13: cross-check \lean{...} names against the Lean source
span check --deps       # add C14: cross-check \uses vs the elaborator's statement deps (runs Lean)
span check --deps --deps-strict   # make missing dependency edges hard errors
span compare --ledger build/main.lea
                        # print the curated paper<->Lean alignment table
span compare --against build/main.lea   # diff the entry set against a saved ledger
span stats [--json]     # statistics from the paper, Lean spine, and alignment
span graph [--overlay] [--numbered] [--statements] [--source paper|lean|both]
           [--render] [--dot g.dot]
                        # Lean dependency graph (Graphviz), filled by kind.
                        # --overlay: paint the paper side on; --numbered: paper reading-order
                        # numbers; --statements: label nodes with the human statement;
                        # --source: take that statement from the paper, the Lean spine, or
                        # both stacked for comparison (default paper); --render: typeset with LaTeX
span order              # paper statement order vs a dependency-respecting order (C12 detail)
span lean | span paper  # emit either spine as JSON
span vocabulary [--json] # print the controlled vocabulary: kinds, statuses, checks
span --version
```

## Minimal Example

The directory `examples/one_theorem/` is a complete ledger-first
project aligning a human-readable proof the 1=1 with a Lean 4 formalization of this statement.:

```
examples/one_theorem/
  latex/main.tex
  lean/One.lean
  build/main.lea
```

From that directory, the normal workflow is:

```bash
span build
span check
span compare
span stats
span graph --overlay --statements --dot build/one.dot
```

To initialize the ledger from the paper and raw Lean source:

```bash
rm build/main.lea
span init-ledger
```

The paper label names the paper object:

```tex
\begin{theorem}[One equals one]\label{thm:one}
\(1=1\).
\end{theorem}
```

The ledger supplies the Lean declaration:

```json
"lean_decls": ["One.one_eq_one"]
```

For corpus builds, the output ledger is `build/corpus.lea` unless `--out` points
elsewhere. The paper side is the coproduct of the input paper summands, the Lean
side is the coproduct of the input Lean summands, and the ledger records the
alignment span between them. If a ledger entry needs to point to a Lean object
whose id is duplicated across Lean summands, qualify the ref with the source path
recorded by `span`, the basename, or the basename stem. Ambiguous unqualified
refs are reported by C15 as ambiguity across coproduct summands.

## Ledger-first Alignment

Paper labels identify paper objects:

```tex
\begin{theorem}[One equals one]\label{thm:one}
...
\end{theorem}
```

The ledger supplies the Lean declarations:

```json
{
  "id": "align:one",
  "kind": "theorem",
  "title": "One equals one",
  "paper_ids": ["thm:one"],
  "lean_decls": ["One.one_eq_one"]
}
```

`span check --lean-source lean --ledger build/main.lea` then deterministically
checks that the named paper objects and Lean declarations still exist and remain
kind-compatible. The ledger can be created by hand, by an LLM-assisted matching
workflow, or by starting from blueprint output and editing it.

## Optional Lean Source Hints

Direct Lean mode also understands inert `span:` comments immediately before
declarations. These are optional hints; they are not required when a ledger
supplies the alignment.

Single-line form:

```lean
-- span: id=thm:one kind=theorem title="One equals one" uses={def:one}
theorem one_eq_one ... := by
  ...
```

Doc-comment form:

```lean
/-- span:
id=thm:one
kind=theorem
title=One equals one
uses=def:one
proof_uses=lem:reflexivity
statement=The distinguished object equals itself.
-/
theorem one_eq_one ... := by
  ...
```

Supported keys are `id`, `kind`, `title`, `uses`, `proof_uses`, and
`statement`. The declaration itself supplies the fully-qualified Lean
declaration name. If no `span:` comment is present, the declaration is still
indexed under its fully-qualified name.

## Library

```python
import span
lean    = span.parse_lean_sources(["lean"])
paper   = span.parse_paper("latex/main.tex")
entries = span.reconcile(lean, paper)
ledger  = span.build_ledger_dict(lean, paper, entries, "main.tex", "lean")
for f in span.run_checks(lean, paper, entries):
    print(f.severity, f.code, f.message)
```

Recognized paper/Lean spine kinds are:

```
definition, lemma, theorem, proposition, corollary, claim, conjecture,
remark, example, notation, convention, assumption, constant
```

`constant` is normally a paper-side label kind.  Run `span vocabulary` for the
full compatibility table, span statuses, and check families.

## Alignment Model

The span tool is, at the top level, the one span that
bridges the paper side and the Lean side:

```
Paper  <---  span  --->  Lean
```

That global span is assembled from one small span per aligned concept. The
structure is self-similar, not a name clash: you *run* `span` (the tool) and
*read* spans (the entries) in the **ledger** (`.lea`) it writes.

- **Each ledger entry is an alignment span**: an apex (the aligned concept) with
  a *paper leg* and a *lean leg*. The canonical `span_status` is the span's
  shape: `coherent`, `paper_half`, `lean_half`, or `incoherent`.
- **The checks are grouped by alignment family**, shown in every report:
  *object* (well-defined and kind-preserving: C1-C4, C8, C10, C13),
  *arrow* (`\uses` dependencies: C9, C14), *coherence* (dependency order: C12),
  *coverage* (paper and Lean half-spans: C6, C7), and *fiber* (shared
  formalizations and fan-out: C5).
- **Fan-out is a fiber** (`largest_fiber` in `span stats`), and
  `build`/`compare`/`stats` print an **alignment verdict** naming each
  obstruction.

## Checks

C1 id uniqueness · C2 dangling refs · C3 kind compatibility · C4 stated results
in labeled environments · C5 duplicate/shared formalizations · C6 lean half-span
coverage · C7 partial formalization · C8 LaTeX compile check · C9 Lean
dependency integrity · C10 compatible-kind notes · C12 statement order is a
linear extension of the dependency DAG · C13 (opt-in, `--decls`) named
`\lean{...}` decls exist in the Lean source with a matching kind · C14 (opt-in,
`--deps`) declared `\uses` matches the elaborator's statement dependencies · C15
corpus id ambiguity/source qualification. See `CHECKS.md`.

C13 is the strongest *machine-checkable* paper<->Lean consistency check that
needs no proof-assistant run and no LLM: a heuristic source scan confirms every
named declaration exists and that a definition node is backed by a `def` (a
theorem node by a `theorem`/`lemma`).

C14 goes one level deeper by actually running Lean (`lake env lean`): it
traverses the constants each declaration's *statement* (type) references --
through project-internal helpers, stopping at spine-owned decls -- to build
the true statement-level dependency graph, and reports edges the spine omits
(`warn`, or `error` with `--deps-strict`) or declares without type-level support
(`info`). It needs the Lean toolchain and a built project. Scope: *statement*
dependencies only -- Lean drops theorem proof terms from oleans, so proof-`\uses`
cannot be validated this way.

Neither check can verify that the two statements *mean* the same thing -- that
semantic faithfulness is the specification gap, which only a human or
(heuristically) an LLM can judge.
