Metadata-Version: 2.4
Name: jucrypt
Version: 0.3.0
Summary: A Fully Parameterised, Story-Key Driven Experimental SPN Cipher
Author-email: "I. Nabil" <w3nabil@gmail.com>
License-Expression: Apache-2.0
Project-URL: Homepage, https://github.com/w3nabil/jucrypt
Project-URL: Repository, https://github.com/w3nabil/jucrypt
Project-URL: Issues, https://github.com/w3nabil/jucrypt/issues
Keywords: cryptography,symmetric encryption,educational crypto,experimental cipher,story-based key derivation
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Topic :: Security :: Cryptography
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Operating System :: OS Independent
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: experiment
Requires-Dist: numpy>=1.23; extra == "experiment"
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Dynamic: license-file

# Ju's Story (STORY)

> **Your story is your key.**
> A story-key driven Substitution-Permutation Network cipher.

> **Pre-review research cipher.** No external cryptanalysis or formal peer review has been conducted yet. We are actively seeking feedback from the research community. See [Current Limitations and Open Issues](#current-limitations-and-open-issues) before use.

[![Python](https://img.shields.io/badge/Python-3.9%2B-blue?logo=python)](https://www.python.org/)
[![License](https://img.shields.io/badge/Apace-2.0-green)](LICENSE)
[![PyPI](https://img.shields.io/badge/PyPI-jucrypt-orange?logo=pypi)](https://pypi.org/project/jucrypt/)

STORY is an experimental symmetric block cipher in which the secret key is a natural-language narrative — a sentence, a paragraph, or any Unicode text — rather than a raw byte string. It operates on 128-bit blocks using a Substitution-Permutation Network (SPN), and provides authenticated encryption through CTR mode combined with HMAC-SHA-256 in an Encrypt-then-MAC construction.

The project is published as `jucrypt` on PyPI. The cipher, key derivation pipeline, and test suite are entirely open. We are conducting empirical security testing and would genuinely appreciate researchers using, testing, and critiquing the design.

---

## Contents

- [Design Overview](#design-overview)
- [Key Derivation Pipeline](#key-derivation-pipeline)
- [Installation](#installation)
- [Python API](#python-api)
- [Command-Line Interface](#command-line-interface)
- [Empirical Test Results](#empirical-test-results)
- [Current Limitations and Open Issues](#current-limitations-and-open-issues)
- [Custom S-box Pool](#custom-s-box-pool)
- [For Researchers](#for-researchers)
- [Citation](#citation)

---

## Design Overview

| Property | Value |
|---|---|
| Block size | 128 bits (16 bytes) |
| Key material | Any Unicode narrative string |
| Derived key width | 256 bits (enc\_key) + 256 bits (mac\_key) |
| Round structure | SPN: ARK → SubBytes → Permute → MDS Mix |
| Round count | 8–15 per message (base 8–12 key-derived, offset 0–3 from round salt) |
| S-box | 8-bit bijection, key-selected from a pre-validated pool |
| Diffusion layer | 16×16 Cauchy MDS matrix over GF(2⁸), branch number = 17 |
| Permutation | 16-byte key-derived permutation (Fisher-Yates, rejection sampling) |
| Mode | CTR (counter mode) |
| Authentication | HMAC-SHA-256 over nonce ‖ round\_salt ‖ ciphertext |
| Construction | Encrypt-then-MAC (EtM) → IND-CCA2 |
| Implementation | Pure Python (`STORY`) + optional C extension (`STORYC`, ~12 MB/s) |

The cipher is defined entirely in `story.py`. The C-accelerated variant in `storyc.py` and `story_core.c` is a drop-in replacement with an identical API.

---

## Key Derivation Pipeline

This is the part of STORY that distinguishes it from conventional designs. The story string is the only secret. All cipher parameters — S-box selection, byte permutation, round count, round keys, whitening key, and MAC key — are derived deterministically from it.

```
Story string (Unicode)
  │
  ├─ NFC normalise  →  UTF-16-LE encode
  │
  └─ SHAKE-256  →  32-byte IKM
       │
       ├─ HMAC-SHA256(IKM, "enc||story_v1_master")  →  enc_key (32 B)
       │    │
       │    ├─ SHAKE-256("story_v1_sbox||" + enc_key)    →  S-box index (rejection sampling)
       │    ├─ SHAKE-256("story_v1_perm||" + enc_key)    →  16-byte permutation
       │    ├─ SHAKE-256("story_v1_rounds||" + enc_key)  →  base round count [8–12]
       │    ├─ HMAC-SHA256(enc_key, "story_v1_round||r") →  rk[r]  (one per round)
       │    └─ HMAC-SHA256(enc_key, "story_v1_whitening||") →  whitening key
       │
       └─ HMAC-SHA256(IKM, "mac||story_v1_master")  →  mac_key (32 B)
```

The NFC normalisation step ensures that the same story entered on different platforms produces the same key material regardless of the Unicode form used by the operating system. UTF-16-LE encoding maps the normalised text to a canonical byte sequence.

The enc\_key and mac\_key are domain-separated HMAC outputs from the same IKM and are therefore computationally independent. Round keys are further domain-separated by round index, so recovering enc\_key from any set of round keys requires inverting HMAC-SHA-256 — a 2²⁵⁶-work preimage problem under standard assumptions.

Each encryption also draws a 1-byte random `round_salt`. The salt modifies the actual round count as `actual_rounds = base_rounds + (round_salt[0] % 4)`, so the round count varies from message to message even under the same story key. The salt is transmitted openly alongside the ciphertext and is covered by the authentication tag.

---

## Installation

```bash
pip install jucrypt
```

If the C extension is not present, `STORYC` falls back to pure Python automatically with no change to the API or output.

---

## Python API

Both `STORY` (pure Python) and `STORYC` (C-accelerated) expose an identical static-method API.

```python
from jucrypt import STORYC as STORY   # C-accelerated, falls back to pure Python
# from jucrypt import STORY            # pure Python only
```

### Encrypt

```python
ciphertext, nonce, tag, round_salt = STORY.encrypt(
    "The quick brown fox jumps over the lazy dog",
    "Once upon a time in a kingdom by the sea, there lived a cryptographer"
)
```

`encrypt` accepts `str`, `bytes`, or `bytearray` as plaintext and returns four `bytes` objects. All four must be stored and transmitted together to allow decryption.

### Decrypt to bytes

```python
plaintext_bytes = STORY.decrypt(
    ciphertext,
    "Once upon a time in a kingdom by the sea, there lived a cryptographer",
    nonce,
    tag,
    round_salt,
)
```

If the authentication tag does not verify, `decrypt` raises `ValueError` immediately. No partial plaintext is ever returned on authentication failure.

### Decrypt to string

```python
plaintext_str = STORY.decrypt_str(
    ciphertext,
    "Once upon a time in a kingdom by the sea, there lived a cryptographer",
    nonce,
    tag,
    round_salt,
    encoding="utf-16-le",   # default; change if you encrypted raw bytes
)
```

### Hex string parameters

All four ciphertext parameters accept either `bytes` or a hex-encoded `str`:

```python
pt = STORY.decrypt(
    "a3f1...",     # ciphertext as hex string
    story,
    "8c2d4f1a...", # nonce as hex string
    "e9b0...",     # tag as hex string
    "03",          # round_salt as hex string (1 byte = 2 hex chars)
)
```

### Checking C extension availability

```python
from jucrypt import STORYC
print(STORYC.C_AVAILABLE)   # True if story_core.so / .pyd was compiled
```

---

## Command-Line Interface

The CLI is available after installation via `python storyc.py` or `python -m jucrypt.storyc`.

### Encrypt

```bash
# Inline text
python storyc.py --enc "Hello world" --story "My secret story"

# From .txt files
python storyc.py --enc message.txt --story story.txt

# Save output to file
python storyc.py --enc message.txt --story story.txt --out cipher.txt
```

Output format:

```
CHUNK : $$<ciphertext_hex>$$<nonce_hex>$$<tag_hex>
SALT  : <round_salt_hex>
```

Both `CHUNK` and `SALT` are required for decryption. They can be stored in the same file.

### Decrypt

```bash
# Inline chunk and salt
python storyc.py --dec "$$aa..$$bb..$$cc.." --story "My secret story" --r 03

# From files — chunk and salt in the same output file
python storyc.py --dec cipher.txt --story story.txt --r cipher.txt

# Save decrypted output
python storyc.py --dec cipher.txt --story story.txt --r cipher.txt --out plain.txt
```

### Backend selection

```bash
python storyc.py --enc "text" --story "key" --impl auto    # default: C if available
python storyc.py --enc "text" --story "key" --impl c       # force C extension
python storyc.py --enc "text" --story "key" --impl python  # force pure Python
```

---

## Empirical Test Results

The following results are from our internal test suite run across 2,000 randomly sampled story keys using the C-accelerated implementation. They are **not peer-reviewed**. All numbers are reported as-is, including the ones that fall short of ideal.

### Diffusion and confusion

| Metric | Mean | Std | Min | Max | Ideal |
|---|---|---|---|---|---|
| Avalanche (%) | 49.992 | 0.395 | 48.68 | 51.30 | 50.0 |
| Key sensitivity (%) | 49.997 | 0.275 | 48.99 | 50.90 | 50.0 |
| Shannon entropy (bits/byte) | 7.992 | 0.001 | 7.989 | 7.994 | 8.0 |

### S-box and diffusion layer properties

These are constant across all story keys — S-boxes are validated before inclusion in the pool, and the MDS matrix is fixed.

| Metric | STORY | AES reference |
|---|---|---|
| Differential uniformity (DDT max) | 4 | 4 |
| Nonlinearity (NL) | 120 | 112 |
| Max LAT bias | 8 | 16 |
| Linear correlation ε | 0.0625 | 0.125 |
| Algebraic degree | 7 | 7 |
| MDS branch number | 17 | — (AES uses 4×4, BN=5) |

### Statistical uniformity (n = 2,000)

| Test | Pass rate | Note |
|---|---|---|
| Roundtrip correctness | 100.0% | |
| Chi-squared byte uniformity | 98.95% | Expected ~99% at α = 0.01 |
| NIST SP 800-22 monobit | 98.85% | Expected ~99% at α = 0.01 |

The ~1% failure rates on chi-squared and NIST monobit are consistent with the expected false-positive rate of a correctly uniform distribution tested at α = 0.01. They are not evidence of cipher weakness.

### IND-CPA / IND-CCA2 (n = 2,000)

| Sub-test | Pass rate |
|---|---|
| Ciphertext distribution | 98.8% |
| Semantic security | 100.0% |
| Key-change indistinguishability | 99.0% |
| Length leakage | 100.0% |
| Prefix indistinguishability | 100.0% |
| **IND-CPA composite** | **97.7%** |
| MAC tamper rejection | 100.0% (144,000 / 144,000 attempts) |
| Bit-flip rejection | 100.0% (256,000 / 256,000 attempts) |
| Truncation rejection | 100.0% (26,000 / 26,000 attempts) |
| Replay rejection | 100.0% |
| **IND-CCA2 composite** | **100.0%** |
| **Overall pass** | **97.7%** |

The 97.7% overall rate is driven entirely by the IND-CPA composite. See issue #3 below for the current status of the investigation.

---

## Current Limitations and Open Issues

We are disclosing all known issues in full. This is a pre-review cipher and transparency is more useful to the community than silence.

**Issue 1 — No formal peer review**

The cipher design, key derivation pipeline, and all empirical results above have not undergone external cryptanalysis or formal peer review. STORY should be treated as a research prototype. We are actively seeking differential, linear, algebraic, and structural cryptanalysis. If you attempt an attack, successful or not, we would like to hear about it.

**Issue 2 — IND-CPA composite pass rate of 97.7%**

The individual IND-CPA sub-tests (semantic security, length leakage, prefix indistinguishability) all pass at 100%. The composite failure originates in the statistical distribution test (98.8%) and the key-change test (99.0%). Approximately 1% false failures are expected under α = 0.01, but the remaining ~1.3% excess is currently under investigation. No confirmed root cause has been identified yet.

**Issue 3 — SAC measurement error in test suite prior to v4.2.0**

In `story_basic.py` versions prior to v4.2.0, the `sac_avg` column was numerically identical to `avalanche_pct / 100` — a redundant column carrying no independent information. The SAC figures in CSV files up to `story_basic_7.csv` should be read as a restatement of the avalanche figure, not an independent measurement. Fixed in v4.2.0, which now reports genuine per-output-bit SAC variance (`sac_std`, `sac_min_bit`, `sac_max_bit`).

**Issue 4 — BIC implementation error in test suite prior to v4.2.0**

The `_bic()` function in `story_deep.py` prior to v4.2.0 measured per-input-bit avalanche rate rather than the pairwise output-bit independence criterion defined by Webster and Tavares (1986). The BIC columns in `story_deep` CSV files prior to v4.2.0 are mislabelled and should be disregarded. Fixed in v4.2.0, which now computes Pearson correlation across all C(128, 2) = 8,128 output-bit pairs.

**Issue 5 — Variable round count increases timing variance**

Actual rounds per message vary from 8 to 15. This is intentional — it adds per-message variability — but it causes higher coefficient of variation in timing measurements than a fixed-round cipher would produce. The API does not currently expose a fixed-round mode, so timing benchmarks reflect an average over the round-count distribution.

**Issue 6 — C extension requires manual compilation**

The `story_core.c` extension provides roughly 10× throughput improvement but must be compiled manually. A pip-installable binary wheel is not yet available. The pure-Python fallback is automatic.

**Issue 7 — No custom S-box generation tooling yet**

A tool to generate validated custom S-box pools with verified DDT and LAT properties is planned but not yet released.

---

## Custom S-box Pool

STORY supports user-supplied S-boxes. If a file exists at `customju/sboxes.json` relative to `story.py`, it takes precedence over the default pool. Format:

```json
{
  "0": "1,200,87,...",
  "1": "43,11,..."
}
```

Each value is a comma-separated list of 256 integers forming a bijection of 0–255. Values are stored 1-indexed in the JSON (add 1 to each actual S-box value when writing). The loader validates each entry as a permutation of 0–255 and raises `ValueError` if the check fails.

Any custom pool should be validated for DDT max ≤ 4 and NL ≥ 112 before deployment. Pools with weaker properties will reduce the security margins reported in the test results above.

---

## For Researchers

**Source files**

| File | Contents |
|---|---|
| `story.py`            | Pure-Python reference implementation, fully commented |
| `storyc.py`           | C-accelerated variant and CLI |
| `story_core.c`        | C extension: GF(2⁸) multiply table, MDS Mix, CTR-mode kernel |
| `default_sboxes.py`   | S-BOX Pool file |

**Reproducing the test results**

```bash
git clone https://github.com/w3nabil/jucrypt
cd jucrypt/analyse
pip install numpy scipy pulp
python story_basic.py --workers 4
python story_ind.py --workers 4
python formal.py --quick
```

**Attack surfaces we have not fully explored**

- Algebraic attacks exploiting the HMAC-based key schedule structure
- Related-story attacks (stories differing by a single character or punctuation mark)
- Timing side-channels in the pure-Python execution path
- The confirmed root cause of the 2.3% IND-CPA composite failure
- Invariant subspace attacks using the full round function under key-derived parameters

If you find a weakness — or confirm the absence of one — please open an issue or contact us directly. We would rather know.

---

## Citation

If you reference STORY in research, please cite:

```
Islam, N. (2026). STORY: A Fully Parameterised, Story-Key Driven SPN Cipher.
DOI: <coming_soon>
```
