Metadata-Version: 2.4
Name: live-casi
Version: 0.8.3
Summary: Universal structural analysis: encryption validation, AI text detection (GPT-4o AUC=0.871), author fingerprinting (R@1=76.3%), model collapse prevention. 
Author-email: David Tom Foss <david@foss.com.de>
License: MIT
Project-URL: Homepage, https://github.com/DT-Foss/live-casi
Project-URL: Documentation, https://github.com/DT-Foss/live-casi#readme
Project-URL: Repository, https://github.com/DT-Foss/live-casi
Project-URL: Issues, https://github.com/DT-Foss/live-casi/issues
Keywords: cryptography,cryptanalysis,encryption,security,testing,validation,CASI,cipher,black-box,pcap,network,tls,knowledge-graph,causal,channel-probe,inference,post-quantum,pqc,ml-kem,kyber,dilithium,hqc,ai-detection,academia,dissertation,plagiarism,chatgpt,deepl,text-analysis,authorship
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Security :: Cryptography
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Software Development :: Testing
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.20
Provides-Extra: network
Requires-Dist: scapy>=2.5; extra == "network"
Provides-Extra: causal
Requires-Dist: dotcausal>=0.3.1; extra == "causal"
Provides-Extra: pqc
Requires-Dist: pqcrypto>=0.4.0; extra == "pqc"
Provides-Extra: academia
Requires-Dist: transformers>=4.20; extra == "academia"
Provides-Extra: all
Requires-Dist: scapy>=2.5; extra == "all"
Requires-Dist: dotcausal>=0.3.1; extra == "all"
Requires-Dist: pqcrypto>=0.4.0; extra == "all"
Requires-Dist: transformers>=4.20; extra == "all"
Dynamic: license-file

# live-casi

**Universal Structural Analysis Engine** — encryption validation, AI text detection, author fingerprinting, and model collapse prevention.

One metric, four domains. CASI measures structural integrity of any data stream: bytes, text, weights, or network traffic.

**Author:** David Tom Foss

## Install

```bash
pip install live-casi            # Core (encryption + text detection)
pip install live-casi[academia]  # + PDF analysis with HuggingFace tokenizers
pip install live-casi[network]   # + pcap/network analysis
pip install live-casi[all]       # Everything
```

**Requirements:** Python 3.8+, NumPy. Optional: transformers, scapy, dotcausal, pqcrypto.

---

## What is CASI?

CASI (Causal Amplification Security Index) measures how much *structure* exists in data compared to random noise:

- **CASI ~ 1.0** → No detectable structure → random/secure
- **CASI > 2.0** → Structural patterns detected → anomalous
- **CASI > 10.0** → Strong structural bias → broken/artificial

Works on: encrypted bytes, generated text, neural network weights, network traffic.

---

## 1. AI Text Detection

### Is this text written by AI?

```python
from live_casi.academia import analyze_text

text = open("essay.txt").read()
result = analyze_text(text, language='de')  # or 'en'

print(result.verdict)        # CLEAN / LOW_RISK / SUSPICIOUS / HIGH_RISK
print(result.ai_probability) # 0-99%
print(result.flags)          # ['ChatGPT-Starters', 'Hedging', ...]
```

### Analyze a PDF (dissertation, thesis, paper)

```python
from live_casi.academia import analyze_pdf

result = analyze_pdf("dissertation.pdf", language='de')
print(f"{result.verdict}: {result.ai_probability}% AI probability")
print(f"Flags: {', '.join(result.flags)}")

# Detailed layer results
for layer, data in result.layers.items():
    print(f"  {layer}: {data}")
```

**5-layer analysis:**
1. **Token residuum** — statistical distribution of token IDs mod K
2. **CASI profile** — 26-strategy structural fingerprint
3. **ChatGPT markers** — typical phrases, sentence starters, hedging patterns
4. **DeepL markers** — translationese patterns, anglicisms
5. **Consistency** — intra-document style breaks (mixed human+AI)

**Validated:** AUC=0.989 (GPT-4 German), AUC=0.871 (GPT-4o English), paraphrasing-robust (d=-2.17).

### Detect specific AI tools

```python
from live_casi.tool_signatures import detect_tools

result = detect_tools(text, language='de')
print(result['flags'])      # ['ChatGPT-Phrases', 'DeepL-Markers']
print(result['chatgpt'])    # {phrase_density: 1.2, starter_pct: 2.1, ...}
print(result['deepl'])      # {marker_density: 0.8, ttr: 0.72, ...}
```

### Raw token-residuum detection

```python
from live_casi.text_detect import detect_ai, compute_reference

# Build reference from known human texts
ref = compute_reference(human_texts)

# Detect
result = detect_ai(suspicious_text, reference=ref)
print(f"Distance: {result['distance']}, Flagged: {result['flagged']}")

# Segment-level: pinpoint which passages are AI
from live_casi.text_detect import detect_segments
segs = detect_segments(text, reference=ref)
print(f"{segs['pct_flagged']}% of segments flagged")
print(f"Heatmap: [{segs['heatmap']}]")
```

---

## 2. Author Fingerprinting

### Who wrote this text?

```python
from live_casi.fingerprint import Fingerprinter

fp = Fingerprinter(vector_type='fusion')
fp.add_author("alice", alice_texts)
fp.add_author("bob", bob_texts)

matches = fp.identify(unknown_texts, top_k=5)
print(f"Most likely author: {matches[0][0]}")
# → "alice" with similarity score
```

### Raw vectors for custom matching

```python
from live_casi.fingerprint import byte_vector, full_vector

v1 = byte_vector(texts)   # 256-dim byte frequency
v2 = full_vector(texts)   # 456-dim (bytes + words + sentences + punctuation + bigrams)
```

**Validated:** R@1=76.3% Reddit re-identification, beats [Lermen et al.](https://arxiv.org/abs/2602.16800) (55.2%) at $0 cost.

---

## 3. Model Collapse Detection

### Is my model collapsing?

```python
from live_casi.collapse import measure_collapse_risk

risk = measure_collapse_risk(
    generated_texts=model_outputs,
    reference_texts=human_texts,
)
print(f"{risk['verdict']}: risk={risk['risk_score']}")
# → "EARLY_WARNING: risk=25"
```

### Live training monitor

```python
from live_casi.collapse import CollapseMonitor

monitor = CollapseMonitor(reference_texts=training_data)

for step in range(num_steps):
    # ... training step ...
    if step % 100 == 0:
        outputs = model.generate(prompts)
        alert = monitor.check(outputs, step=step)
        if alert['should_stop']:
            print(f"Collapse detected at step {step}!")
            break
```

### Track collapse across generations

```python
from live_casi.collapse import detect_collapse

generations = [real_texts, gen1_texts, gen2_texts, gen3_texts]
result = detect_collapse(generations)
print(f"Collapse detected: {result['collapse_detected']}")
for g in result['generations']:
    print(f"  Gen {g['generation']}: CASI={g['casi']}")
```

**Detects collapse ~150 steps before perplexity degrades.** Validated on GPT-2 self-training experiments.

---

## 4. Neural Network Analysis

### Analyze model weights

```python
from live_casi.nn_analysis import model_casi_profile, layer_casi

# Full model profile
profile = model_casi_profile(model)
for layer_name, score in profile.items():
    print(f"{layer_name}: CASI={score}")

# Single layer
score = layer_casi(model.fc1.weight)
```

### Activation & gradient analysis

```python
from live_casi.nn_analysis import activation_casi, gradient_casi

# During forward pass
activations = model.hidden_layer(x)
print(f"Activation CASI: {activation_casi(activations)}")

# During backward pass
loss.backward()
print(f"Gradient CASI: {gradient_casi(model.fc1.weight.grad)}")
```

### Universal matrix analysis

```python
from live_casi.nn_analysis import global_quantile_matrix
from live_casi import compute_casi_score

# Any 2D matrix → CASI score
matrix = np.random.randn(1000, 256)
q = global_quantile_matrix(matrix)
result = compute_casi_score(q)
print(f"CASI: {result['casi']}")
```

---

## 5. Encryption Validation

### Is this data actually encrypted?

```bash
# Command line
cat encrypted.bin | live-casi
live-casi --file output.bin
live-casi --test  # Self-test: 26 cipher scenarios

# CI/CD: exit 1 if CASI > 2.0
./encrypt | live-casi --quiet --exit-code 2.0
```

```python
from live_casi import LiveCASI
import os

engine = LiveCASI(key_size=32, window_keys=10000)
engine.feed(os.urandom(320000))
engine.force_update()
print(f"CASI: {engine.current_casi}")  # ~1.0 for random
```

### Blind cipher identification

```python
from live_casi import identify_cipher
result = identify_cipher(data)
print(f"Cipher: {result['cipher']}, Confidence: {result['confidence']}%")
```

### Network/pcap analysis

```bash
pip install live-casi[network]
live-casi --pcap capture.pcap --problems-only
```

```python
from live_casi import analyze_pcap
connections = analyze_pcap('capture.pcap')
for conn in connections:
    print(f"{conn['src']} → {conn['dst']}: {conn['verdict']}")
```

### Firmware/binary scanner

```python
from live_casi import scan_binary
regions = scan_binary('firmware.bin')
for r in regions:
    print(f"0x{r['offset']:x}: {r['label']} (CASI={r['casi']:.1f})")
```

### Post-quantum cryptography

```bash
pip install live-casi[pqc]
```

```python
from live_casi import pqc_analyze, pqc_compare
result = pqc_analyze('mlkem', 768)   # ML-KEM-768 (Kyber)
report = pqc_compare()               # All 9 PQC configs vs classical
```

---

## Supported Ciphers

| Cipher | Frontier | Notes |
|--------|----------|-------|
| ChaCha20 | R3 | Diagonal quarter-rounds prevent earlier detection |
| Salsa20 | R4 | Amplified 3-pass inference |
| AES-128 | R3 | SPN architecture |
| Speck 32/64 | R7 | Chosen-plaintext rotational differentials |
| Blowfish | R3 | Key-dependent S-boxes |
| 3DES | R2 | Fixed S-boxes |
| Camellia | R6 | FL function weakness at R7 |
| RC4 | drop=0 | KSA bias detection |

## 26 Detection Strategies

4 cryptanalytic + 14 deep + 8 implementation strategies covering all major attack families: avalanche, differential, linear, rotational, boomerang, integral, division property, algebraic degree, and more.

## Architecture

```
live_casi/
├── core.py            — Engine, 26 strategies, CLI
├── academia.py        — AI text detection (5-layer PDF analysis)
├── fingerprint.py     — Author re-identification
├── text_detect.py     — Token-residuum patent method
├── tool_signatures.py — ChatGPT/DeepL/paraphraser detection
├── collapse.py        — Model collapse detection & monitoring
├── nn_analysis.py     — Weight/activation/gradient CASI
├── ciphers.py         — 8 cipher implementations
├── identify.py        — Blind cipher identification
├── scanner.py         — Binary/firmware scanner
├── network.py         — pcap/live network analysis
├── pqc.py             — Post-quantum cryptography
├── probe.py           — Channel probe generator/verifier
├── causal.py          — .causal knowledge graph generation
├── benchmark.py       — Frontier benchmarking
├── frontier.py        — Detection boundary search
└── nist_compare.py    — CASI vs NIST SP 800-22
```

## License

MIT
