Metadata-Version: 2.4
Name: g2n
Version: 1.1.0
Summary: Optimize and run PyTorch models: an open-core compiler (fusion, buffer planning, persistent compile cache), a quantum circuit simulator (g2n.quantum), plus a license-gated serving platform that runs your models behind an inference server.
Author: g2n
Maintainer: g2n
License: Apache-2.0
Project-URL: Homepage, https://g2n.dev
Project-URL: Documentation, https://g2n.dev/docs
Project-URL: Source, https://github.com/your-org/g2n
Project-URL: Issues, https://github.com/your-org/g2n/issues
Keywords: pytorch,compiler,triton,gpu,inference,serving,model-server,torch.compile,quantum,quantum-simulator,statevector
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Compilers
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: cryptography>=41.0
Provides-Extra: torch
Requires-Dist: torch>=2.4; extra == "torch"
Provides-Extra: triton
Requires-Dist: triton>=2.2; extra == "triton"
Provides-Extra: dev
Requires-Dist: build>=1.2; extra == "dev"
Requires-Dist: twine>=5; extra == "dev"
Requires-Dist: pytest>=7; extra == "dev"

# g2n

**Optimize and run PyTorch models.** `g2n` is the open-core compiler at the
center of the g2n platform: pointwise fusion, buffer-reuse planning, and a
persistent cross-run compile cache so repeat builds skip recompilation. A
license unlocks the enhanced planner, the persistent cache, and a full
**serving** layer that *runs* your models behind an HTTP inference server.

```bash
pip install g2n
```

```python
import torch
import g2n

model = MyModule().eval()
compiled = g2n.compile(model)                 # optimize
# or register as a torch.compile backend:
compiled = torch.compile(model, backend="g2n")
```

## Two halves, one license

| | Community (free) | Pro | Enterprise |
|---|:--:|:--:|:--:|
| **Optimize** — fusion, JIT pointwise codegen, CPU fallback | ✓ | ✓ | ✓ |
| Enhanced buffer planner + persistent compile cache | | ✓ | ✓ |
| **Run** — model registry + inference server (`g2n.serve()`) | | ✓ | ✓ |
| **Simulate** — quantum circuits up to 24 qubits (`g2n.quantum`) | ✓ | ✓ | ✓ |
| Quantum: unlimited qubits + circuit-fusion optimizer | | ✓ | ✓ |
| Quantum: batched parameter sweeps (VQE/QML loops) | | | ✓ |
| Dynamic batching, multi-accelerator routing, model-zoo | | | ✓ |

Activate a license to light up the paid tiers (the same code path — gated
features turn on, otherwise it falls back to the open-core path):

```bash
g2n activate G2N-XXXX-XXXX-XXXX
```

## Run your models (Pro+)

The enterprise client (`pip install g2n-enterprise`) adds the serving platform:

```python
import g2n_enterprise as g2n
g2n.register_model("resnet", "torchscript:/models/resnet50.pt",
                   precision="auto", cuda_graph=True, max_batch=16)
g2n.serve(port=8900)        # POST /v1/models/resnet/predict
res = g2n.benchmark("resnet", sample, rounds=200)   # eager vs optimized, measured on your box
```

Serving applies real inference techniques — `inference_mode`, fp16/bf16/int8,
CUDA-graph capture/replay (which removes the launch overhead that makes
"compiled tie eager" on small GPUs), and a VRAM residency manager so a small card
serves more models than fit. Speedups are hardware-dependent: benchmark on your
own GPU rather than trusting a quoted number.

## Quantum circuit simulator (new in 1.0)

`g2n.quantum` is a statevector **simulator** — classical hardware, torch
tensor contractions, CPU or CUDA. It is not a quantum computer and doesn't
pretend to be; it's for developing, testing, and teaching quantum algorithms.

```python
import g2n.quantum as qf

c = qf.Circuit(2).h(0).cnot(0, 1)   # Bell state
c.measure_all(shots=1000)           # {'00': ~500, '11': ~500}
c.expectation("ZZ")                 # tensor(1.)
```

Free up to 24 qubits. Pro removes the qubit cap (memory-bound) and adds
`optimize()`, a circuit-fusion pass (measured 1.2–1.8× on layered circuits —
run `benchmarks/quantum_bench.py` on your own machine). Enterprise adds
vectorized batched parameter sweeps for variational loops (measured 15.7× vs
a Python loop on a 64-point sweep). Full API: https://g2n.dev/docs

## Custom kernels (Pro / Enterprise)

With a licensed tier, the `g2n` backend runs a real custom compile pass: it fuses
`LayerNorm` (and a trailing `GELU`) into a Triton kernel via a `torch.library`
custom op, then hands the rest of the graph to TorchInductor. See
`ARCHITECTURE.md`. Correctness is covered by `tests/test_layernorm.py`.

**The fusion is inference-only.** The fused kernel is forward-only, so the pass
skips any differentiable (training) graph and lets stock lowering handle it —
training compiles correctly, just unfused. Inference under
`torch.no_grad()` / `torch.inference_mode()` (which the serving runtime always
uses) gets the fused kernel. Benchmark on your own GPU before quoting a speedup.

Docs: https://g2n.dev/docs · Pricing: https://g2n.dev/pricing
