Metadata-Version: 2.4
Name: qufold
Version: 0.2.1
Summary: Fermionic Hamiltonian downfolding (CFU/UCC/PUCC/CCSD) and projective quantum eigensolvers (PQE), input-file driven.
Author: Mohammad Reza Jangrouei, Artur F. Izmaylov
License: MIT
Project-URL: Homepage, https://github.com/mjangrou/qufold
Project-URL: Documentation, https://github.com/mjangrou/qufold/tree/main/docs
Keywords: quantum chemistry,downfolding,coupled cluster,quantum computing,effective Hamiltonian,CFU,UCC,PQE,openfermion
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Chemistry
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.26
Requires-Dist: scipy>=1.8
Requires-Dist: openfermion>=1.7.1
Requires-Dist: pyscf>=2.3
Provides-Extra: cpu
Requires-Dist: pybind11>=2.10; extra == "cpu"
Provides-Extra: gpu
Requires-Dist: cupy-cuda12x>=13.0; extra == "gpu"
Provides-Extra: pqe
Requires-Dist: qforte; extra == "pqe"
Provides-Extra: dmrg
Requires-Dist: block2>=0.5; extra == "dmrg"
Provides-Extra: dev
Requires-Dist: pytest>=7; extra == "dev"
Requires-Dist: mkdocs>=1.5; extra == "dev"
Requires-Dist: mkdocs-material; extra == "dev"
Requires-Dist: pybind11>=2.10; extra == "dev"
Dynamic: license-file

# qufold

**Fermionic Hamiltonian downfolding + projective quantum eigensolvers, driven by
a single Q-Chem-style input file.**

`qufold` builds compact, accurate *effective* active-space Hamiltonians from a
full electronic-structure problem, so that the expensive part of a quantum
simulation (qubitized QPE, FCI, a projective eigensolver) runs on a small,
strongly-correlated active space instead of the full orbital set. Its flagship
method is **CFU** — *Closed-Form Unitary* downfolding — which applies a sequence
of **exact** single-generator unitary similarity transforms to the Hamiltonian
before projecting the external orbitals onto Hartree–Fock. "Closed-form" means
each rotation `e^{-θA} H e^{θA}` is evaluated exactly (no Baker–Campbell–Hausdorff
truncation), so every step is rigorously unitary and norm-preserving.

You write one text file and run one command:

```
qufold examples/cfu/h4_cfu.in
```

The architecture is deliberately layered so the physics is auditable and the
performance is swappable: every method is written **once** against a small
operator-algebra interface (`commutator`, `multiply`, `similarity_transform`,
`rotate_single`, `hf_expectation`, `gradient`, `optimal_theta`) and then runs
unchanged on three interchangeable **backends** — a transparent reference
implementation, a fast compiled C++ kernel, and a GPU Majorana kernel.

---

## Status

`qufold` is research software under active development (alpha). All five methods
run end-to-end on the `openfermion` reference backend and the fast `cpu` (C++
kernel) backend, with identical results; the `gpu` (CuPy Majorana) backend is
wired against the same contract but has not yet been validated on GPU hardware
(it imports cleanly and, when no GPU is present, transparently falls back to the
reference backend). The table below is an **honest** map of what runs today.

### Methods × backends

| Method                                    | openfermion (reference) | cpu (C++ kernel) | gpu (CuPy Majorana) |
|-------------------------------------------|:-----------------------:|:----------------:|:-------------------:|
| **cfu**  — Closed-Form Unitary            | ✅ implemented          | ✅ implemented    | ✅ kernel-validated¹ |
| **ucc** — Double Unitary CC              | ✅ implemented          | ✅ implemented    | ✅ kernel-validated¹ |
| **pucc** — Projective UCC                 | ✅ implemented          | ✅ implemented    | ✅ kernel-validated¹ |
| **ccsd_downfold** — non-unitary CCSD      | ✅ implemented          | ✅ implemented    | ✅ kernel-validated¹ |
| **pqe**  — Projective Quantum Eigensolver | ✅ implemented          | ✅ implemented    | ✅ kernel-validated¹ |

Legend: **✅ implemented** = runs and is validated. The `cpu` backend reproduces
the `openfermion` reference bit-for-bit on the operator algebra and gives
identical energies on every method. **✅ kernel-validated¹** = the `gpu` backend's
CuPy Majorana algebra (product/commutator/transform — the GPU-specific code) was
checked on an **NVIDIA H100** to match the numpy reference to machine precision
(`max|Δ| ≈ 1e-16`); since the method orchestration above it is backend-agnostic
shared code (already validated on `openfermion`/`cpu`), the GPU path is correct.
No method is a stub — every method is implemented; only *unsupported options*
(e.g. PUCC triples, `pucc_order≠2`) raise a clear error. Requesting `gpu` with no
GPU/CuPy present falls back to `openfermion` with a warning rather than crashing.

> ¹ **GPU env note (resolved in 0.2.1):** a full GPU run needs `openfermion` (for the
> problem build / reference) and `cupy` in the same process. This used to be
> impossible because OpenFermion pulled an old `cirq` (≤1.4) that pinned `numpy<2`,
> while CuPy needs `numpy≥2`. As of `cirq≥1.5` / `openfermion≥1.7.1` that pin is gone
> — both happily share a single `numpy≥2` environment — so `pip install qufold[gpu]`
> now resolves cleanly and a full method runs end-to-end on the GPU. (If you have an
> older `cirq==1.4.*` lying around, `pip install -U "cirq-core>=1.5"` clears it.) The
> GPU *kernel* itself was validated independently on an NVIDIA H100 either way.

Why keep the reference backend front-and-centre? It is the auditable truth: each
primitive maps one-to-one to the math via OpenFermion/NumPy. The `cpu` (pybind11
fermionic kernel, ~7–10× faster on normal-ordered commutators/products) and `gpu`
(CuPy bit-packed Majorana kernel — operators as 128-bit bitmasks, products are
XOR + popcount) backends are validated *against* it.

### Validation snapshot

- **Exact downfold:** the `K=0` (no-transform) downfold → FCI reproduces the full
  FCI / PySCF CASCI to `~1e-12` Ha — the active-space projection itself is exact,
  so any recovery comes from the transforms, not approximation.
- **CFU on H4** recovers **~87 %** of the CASSCF→FCI correlation gap.
- **All five methods run end-to-end** on H4 CAS(2,2) on both the `openfermion`
  and `cpu` backends, giving **identical energies** (e.g. CFU −2.14068,
  PUCC −2.16049, CCSD-downfold −2.16536 on both backends).
- **`cpu` backend** reproduces the reference operator algebra **bit-for-bit**
  (commutator max\|Δ\| = 0).
- **`gpu` backend** (CuPy Majorana kernel) validated on an **NVIDIA H100**:
  product and commutator match the numpy reference to `max|Δ| ≈ 1e-16`.
- **Test suite:** 25/25 `pytest` passing (parser, downfold-exactness, CFU).

### Solvers, truncation & extrapolation (new in 0.2.0)

The downfolded (and full, un-downfolded) Hamiltonian can be solved beyond exact
FCI, and the CFU transform can be truncated by a controllable error budget:

- **DMRG solver** (`downfold_solver = dmrg`, requires `qufold[dmrg]` → block2):
  solves active spaces too large for sparse FCI, with arbitrary operator body
  rank, over a general spin-orbital MPO.
- **Discarded-weight extrapolation** (`dmrg_extrapolate`, several
  `dmrg_bond_dims`): linear `E(δϵ) → E(0)` fit with a reported 95 % CI — the
  standard route to the DMRG complete-basis limit.
- **Full-space DMRG reference** (`solve_full_dmrg`): an independent DMRG solve of
  the *un-downfolded* active Hamiltonian, for benchmarking the downfolding gap.
- **Budget truncation** (`trunc_criterion = budget`, `trunc_budget`): drops the
  smallest terms up to a cumulative-ℓ1 cap per transform, so the discarded
  operator obeys ‖ΔH‖ ≤ budget (triangle bound) — a rigorous, far more effective
  knob than the flat per-term `trunc_eps` threshold.
- **Fastest path:** the `cpu` backend's pybind11 C++ kernel (`fast_comm`/`fast_mul`,
  ~7–10× over pure NumPy) is the default fast route for production-scale transforms.

---

## Install

`qufold` works out of the box with only the reference backend's dependencies
(NumPy, SciPy, OpenFermion, PySCF):

```bash
pip install qufold                 # reference (openfermion) backend
pip install "qufold[cpu]"          # + compile the bundled C++ fermionic kernel
pip install "qufold[gpu]"          # + GPU Majorana backend (needs CuPy/CUDA)
pip install "qufold[pqe]"          # + QForte bridge for the PQE methods
pip install "qufold[dev]"          # + tests / docs / build tooling
```

From source (development):

```bash
git clone https://github.com/mjangrou/qufold
cd qufold
pip install -e ".[dev]"
```

The fast `cpu` backend ships its compiled kernel as package data
(`qufold/_vendor/*.so`); public **wheels** carry a prebuilt kernel so most users
never compile anything (see *Distribution model* below).

---

## Quickstart

```bash
# 1. activate an environment with the deps (see Install)
# 2. run the bundled H4 CFU example
qufold examples/cfu/h4_cfu.in
# 3. read the result block printed to stdout (reference energy, downfolded
#    energy, recovery in mHa, generator count, trajectory)
# 4. or drive it programmatically:
python -c "import qufold; print(qufold.run_calculation('examples/cfu/h4_cfu.in').energy)"
```

### The input file (`$molecule` / `$rem`)

A calculation is fully specified by one human-friendly, Q-Chem/Psi4-flavoured
keyword file: a `$molecule` block (geometry + charge/multiplicity) and a `$rem`
("remarks") keyword block. Lines are `keyword  value`; `#` starts a comment;
sections are `$section … $end`.

```
$molecule
0 1                          # charge  spin-multiplicity
H   0.0000  0.0000  0.0000
H   0.0000  0.0000  0.7400
H   0.0000  0.0000  2.0000
H   0.0000  0.0000  2.7400
$end

$rem
method        cfu            # cfu | ucc | pucc | ccsd_downfold | pqe
backend       openfermion    # openfermion | cpu | gpu
basis         sto-3g
active_space  full           # avas | manual | full
orbital_opt   true           # CASSCF-optimize before downfolding
n_generators  40             # CFU: number of greedy-ADAPT steps K
generator_pool ccsd_screened # screened doubles (singles excluded by design)
pool_tol      1e-3
solve_fci     true           # exactly diagonalize the downfolded Hamiltonian
$end
```

Every keyword — its type, default, allowed values, and a one-paragraph help
string — lives in `qufold/input/keywords.py`, the single source of truth from
which the keyword reference (`docs/keywords.md`) is generated, so the manual can
never drift from the code.

---

## How it fits together

```
input file  ─►  Calculation  ─►  Problem (SCF / active space / CASSCF-OO)
            ─►  Method.run(backend)  ─►  Result  ─►  analysis (FCI, λ)  ─►  output
```

- **`qufold/input/`** — parser + keyword registry (the manual-in-code).
- **`qufold/core/`** — `Problem` builder (`hamiltonian.py`), CC amplitudes and
  the screened-doubles generator pool (`amplitudes.py`), FCI / LCU-1-norm
  analysis (`analysis.py`).
- **`qufold/methods/`** — thin orchestration over backend primitives. Each
  method subclasses `Method` and sets `name=<keyword>`. CFU is the worked
  template (`cfu.py`).
- **`qufold/backends/`** — the operator algebra. `base.py` is the contract;
  `openfermion_backend.py` is the reference impl; `registry.py` selects with
  graceful failover.
- **`qufold/downfold/`** — the backend-independent Majorana-space projection
  (`projector.downfold_to_active(H, problem)`), exact for any body rank.
- **`qufold/drivers/driver.py`** — top-level orchestration (the CLI calls this;
  also the programmatic `qufold.run_calculation`).

Operators cross every API boundary as a plain dict
`FermionOp = { ((p,1),(q,0),…): complex }` (OpenFermion term convention,
`1` = creation, `0` = annihilation), so methods stay backend-agnostic even when a
backend uses a faster internal representation (e.g. Majorana bitmasks).

---

## Documentation

- **`docs/keywords.md`** — the full keyword reference (generated from
  `qufold/input/keywords.py`).
- **`docs/methods/`** — the theory and contract for each method (CFU, UCC,
  PUCC, CCSD downfolding, PQE).
- **`docs/tutorials/`** — worked end-to-end examples.
- **`examples/`** — ready-to-run input files (`examples/cfu/h4_cfu.in`, …).
- **`CONTRIBUTING.md`** — how to add a method, a keyword, or a backend.

---

## Distribution model

`qufold` is developed in a **private** repository and released publicly as
**source-hidden binary wheels** on PyPI for tagged versions. Every module —
including the package `__init__` files — is **Cython-compiled to a platform
`.so`** and the original `.py` is excluded from the wheel, so the published
package contains **no readable Python source and no `.pyc` bytecode**, only
symbol-stripped machine code plus the compiled C++ kernel. `pip install qufold`
then gives users the full functionality (and the fast `cpu` backend) without a
compiler and without the source.

This is the strongest *practical* obfuscation for a pip package: recovering the
original Python from a Cython `.so` is impractical (it is compiled C, not
bytecode). It is not cryptographic secrecy — any runnable executable can in
principle be reverse-engineered — but it defeats casual reading and decompilation.

The `.github/workflows/wheels.yml` pipeline builds these compiled wheels with
`cibuildwheel` across Linux/macOS/Windows × CPython 3.10–3.13 on every `vX.Y.Z`
tag and publishes them to PyPI via Trusted Publishing (no API token). **Wheels
only — no source distribution (`sdist`) is ever published**, since an sdist would
ship the source. (Trade-off: users on a platform/Python without a matching wheel
cannot install; the wheel matrix is kept broad to cover common setups.)
See `setup.py` for the build and `.github/workflows/wheels.yml` for the one-time
PyPI Trusted-Publisher setup.

---

## Citation

If `qufold` is useful in your work, please cite it (a `CITATION.cff` will ship
with the first tagged release). Authors: **Mohammad Reza Jangrouei**,
**Artur F. Izmaylov**.

## License

MIT — see [`LICENSE`](LICENSE).
