Metadata-Version: 2.4
Name: linearscript
Version: 3.1.1
Summary: SCRIPT V3.1: A deterministic, RDKit-independent molecular notation with 100% round-trip stereo parity, materials science extensions, biopolymer support, and formal LALR grammar.
Author-email: SCRIPT Development Team <script@example.com>
Maintainer-email: SCRIPT Development Team <script@example.com>
License: MIT
Project-URL: Homepage, https://github.com/script-notation/script
Project-URL: Documentation, https://script-notation.readthedocs.io
Project-URL: Repository, https://github.com/script-notation/script.git
Project-URL: Bug Tracker, https://github.com/script-notation/script/issues
Keywords: chemistry,cheminformatics,materials-science,smiles,molecular-notation,biopolymer,stereochemistry,alloys,crystallography
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Chemistry
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: lark>=1.1.0
Provides-Extra: rdkit
Requires-Dist: rdkit>=2023.3.1; extra == "rdkit"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Requires-Dist: black>=22.0; extra == "dev"
Requires-Dist: flake8>=5.0; extra == "dev"
Requires-Dist: mypy>=1.0; extra == "dev"
Provides-Extra: all
Requires-Dist: rdkit>=2023.3.1; extra == "all"
Dynamic: license-file

# SCRIPT: Structural Chemical Representation In Plain Text

**SCRIPT** is a deterministic molecular notation system with an RDKit-independent core engine. Every molecule has exactly one canonical SCRIPT string. No ambiguity. No post-hoc sanitization.

```
Aspirin in SMILES:  CC(=O)Oc1ccccc1C(=O)O  (one of many valid forms)
Aspirin in SCRIPT:  CC(=O)OC:C:C:C:C:C&6:C(=O)O  (always and only this)
```


## Install

```bash
# Core (no RDKit required)
pip install linearscript

# With RDKit bridge for SMILES interop
pip install linearscript[rdkit]
```

---

## Key Features

| Feature | SMILES | SCRIPT |
|---------|--------|--------|
| Canonical | No | Yes (DFS + Morgan) |
| Human-readable | Yes | Yes |
| Validation on parse | No | Yes (Sandhi state machine) |
| Organometallics | Partial | Full (dative, haptic, coordinate) |
| Alloys / Fractional occupancy | No | Yes (`<~0.9>`) |
| Crystallographic context | No | Yes (`[[Rutile]]`) |
| Surface chemistry | No | Yes (`|`) |
| Electronic / excited states | No | Yes (`<s:3>`, `<*>`) |
| Biopolymers (peptide / nucleic) | No | Yes (`{A.G.S}`) |
| Query atoms (SMARTS-style) | No | Yes (`[#6]`, `[R]`, `[v3]`) |
| Polymers / stochastic chains | No | Yes (`{[CC]}n`) |
| Reactions | Partial | Yes (3-part `R>A>P`) |
| RDKit-free core | No | Yes |

---

## Quick Start

### Parse and canonicalize

```python
from script.parser import SCRIPTParser
from script.canonical import SCRIPTCanonicalizer

parser = SCRIPTParser()
result = parser.parse("CC(=O)Oc1ccccc1C(=O)O")  # aspirin from SMILES-style input

mol = result["molecule"]
print(len(mol.atoms))   # 13
print(len(mol.bonds))   # 13

canon = SCRIPTCanonicalizer().canonicalize_core(mol)
print(canon)   # CC(=O)OC:C:C:C:C:C&6:C(=O)O
```

### Stereochemistry

```python
result = parser.parse("C[C@H](O)C(=O)O")   # L-Lactic acid
mol = result["molecule"]
# Chirality stored in mol.chiral_centers — DFS-invariant, CIP-verified
```

### Reactions

```python
result = parser.parse("[C:1]OCO>>[C:1]O")  # reaction with atom mapping
rxn = result["molecule"]                    # Reaction object
print(rxn.reactants, rxn.products)
```

### Materials Science

```python
# Alloy with fractional site occupancy
result = parser.parse("Ti<~0.9>N<~0.1>")
mol = result["molecule"]
print(mol.atoms[0].occupancy)   # 0.9

# Crystallographic phase
result = parser.parse("[[Rutile]] Ti(O)2")
print(result["molecule"].macroscopic_context)   # "Rutile"

# Surface adsorption
result = parser.parse("[[Pt_111]] | >C=O")
print(result["success"])   # True

# Triplet oxygen
result = parser.parse("O=O<s:3>")
print(result["molecule"].atoms[-1].spin)   # 3
```

### Biopolymers

```python
# Peptide chain (expands to atomic graph)
result = parser.parse("{A.G.S}")

# DNA oligonucleotide
result = parser.parse("{dA.dG.dC.dT}")

# Nucleotide modifications
result = parser.parse("{m5C.m6A.psU}")
```

### RDKit interop

```python
from rdkit import Chem
from script.rdkit_bridge import SCRIPTFromMol, MolFromSCRIPT

# SMILES -> SCRIPT
mol = Chem.MolFromSmiles("CN1CCC[C@H]1c2cccnc2")   # Nicotine
script_str = SCRIPTFromMol(mol)

# SCRIPT -> RDKit mol (100% InChI parity verified)
mol_back = MolFromSCRIPT(script_str)
```

---

## Benchmark

Tested on a diverse 97-compound set (alkanes, rings, aromatics, stereocenters, drugs, natural products):

- **100% InChI round-trip parity** (SCRIPT -> RDKit -> InChI matches original)
- **100% native round-trip** (SCRIPT -> CoreMolecule -> SCRIPT)
- **22/22 Materials Science tests passing** (Alloys, Surfaces, Excited States)


---

## License

MIT with Commons Clause. Free for academic and non-commercial use.
Commercial licensing available separately.

---

Developed by **SCRIPT Development Team**.

#

GitHub: [sangeet01/script](https://github.com/sangeet01/script.git)
