Metadata-Version: 2.4
Name: wfw-ai
Version: 1.0.0
Summary: WANI — Wave-Attractor Neural Inference Framework
Author: Zain Ali
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Classifier: License :: OSI Approved :: MIT License
Requires-Dist: llama-cpp-python>=0.2.56

# WANI — Wave-Attractor Neural Inference
**Powered by Zain Ali**

Run any GGUF model locally. Drop model in folder, run one command.

---

## Folder Structure

```
WANI/
├── wani.py          ← Main tool (run this)
├── requirements.txt
├── models/          ← Put your .gguf files HERE
│   └── (empty)
├── logs/
│   └── wani.log     ← Auto-created
└── core/
    ├── config.py
    ├── scanner.py
    ├── loader.py
    └── __init__.py
```

---

## Setup (3 steps)

**Step 1 — Install Python 3.8+**
```
python --version
```

**Step 2 — Install backend**
```bash
# CPU only (works on all systems including ARM/Android)
pip install llama-cpp-python

# OR with GPU support (Vulkan — works on Adreno 610/650)
CMAKE_ARGS="-DLLAMA_VULKAN=on" pip install llama-cpp-python --force-reinstall
```

**Step 3 — Add your model**
```
Download any GGUF model from:
  https://huggingface.co/models?search=gguf

Put the .gguf file inside:  WANI/models/

Example models to try:
  • Llama-3.2-1B-Instruct-Q4_K_M.gguf    (~700 MB)
  • Llama-3.2-3B-Instruct-Q4_K_M.gguf    (~1.8 GB)
  • Mistral-7B-Instruct-v0.3.Q4_K_M.gguf (~4.1 GB)
  • Phi-3-mini-4k-instruct-Q4_K_M.gguf   (~2.2 GB)
```

---

## Usage

```bash
# Interactive mode (auto-scan + pick model)
python wani.py

# List all models
python wani.py --list

# Load specific model, start chat
python wani.py --model Llama-3.2-1B-Instruct-Q4_K_M.gguf --chat

# Single prompt
python wani.py --model yourmodel.gguf --prompt "Pakistan ki capital kya hai?"

# With custom settings
python wani.py --model yourmodel.gguf --threads 4 --ctx 4096 --temp 0.8
```

---

## Options

| Flag | Default | Description |
|------|---------|-------------|
| `--model FILE` | auto-pick | Model filename from models/ |
| `--prompt TEXT` | — | Single prompt, then exit |
| `--chat` | — | Force chat mode |
| `--ctx N` | 2048 | Context window (tokens) |
| `--threads N` | 4 | CPU threads |
| `--gpu N` | 0 | GPU offload layers (0=CPU) |
| `--temp F` | 0.7 | Temperature (creativity) |
| `--max N` | 512 | Max tokens to generate |
| `--list` | — | List models and exit |
| `--install` | — | Show install instructions |

---

## Recommended Models by RAM

| RAM | Model | Download |
|-----|-------|----------|
| 1 GB | Llama-3.2-1B Q4 | HuggingFace |
| 2 GB | Phi-3-mini Q4 | HuggingFace |
| 3 GB | Llama-3.2-3B Q4 | HuggingFace |
| 4 GB | Mistral-7B Q4 | HuggingFace |
| 8 GB | LLaMA-13B Q4 | HuggingFace |

---

## Snapdragon 685 Tips

```bash
# Use 4 big cores only
python wani.py --threads 4

# Small context saves RAM
python wani.py --ctx 1024

# 1B or 3B models run best
# Use Q4_K_M quantization (best quality/size ratio)
```

---

## Chat Commands

While in chat mode:
- `stats` — show session stats (TPS, tokens generated)
- `clear` — reset conversation history
- `help` — show commands
- `quit` — exit

---

*WANI Framework v1.0.0 — Powered by Zain Ali*

