Metadata-Version: 2.4
Name: texthumanizer
Version: 1.1.0
Summary: Offline AI-text humanizer that preserves images, tables, and research context.
Author-email: Rejaul Karim <reja86305@gmail.com>
License: MIT
Project-URL: Homepage, https://pypi.org/project/texthumanizer/
Project-URL: Issues, https://github.com/reja273/texthumanizer/issues
Keywords: humanizer,paraphrase,plagiarism,AI text,NLP,research,students,text processing
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: tqdm>=4.64
Provides-Extra: ml
Requires-Dist: transformers>=4.36; extra == "ml"
Requires-Dist: torch>=2.0; extra == "ml"
Requires-Dist: sentencepiece>=0.1.99; extra == "ml"
Provides-Extra: docx
Requires-Dist: python-docx>=1.1; extra == "docx"
Provides-Extra: all
Requires-Dist: texthumanizer[docx,ml]; extra == "all"

# texthumanizer 📝

**Offline AI-text humanizer & plagiarism reducer** for students and researchers.
No internet needed after model download. Preserves research context, citations, and semantic meaning.

---

## ✨ Features

| Feature | Detail |
|---|---|
| **Humanize AI text** | Rewrites ChatGPT / Claude / Gemini output to sound natural |
| **Plagiarism reduction** | Paraphrase-based, not just synonym swap |
| **Semantic preservation** | Meaning, tone, and argument structure kept intact |
| **Research-aware** | Citations `[1]`, abbreviations `DNA`, units `95%`, `et al.` — all preserved |
| **DOCX support** | Paragraph-level processing; headings untouched |
| **100% offline** | T5-based model runs locally after first download (~250 MB) |
| **Lightweight** | CPU-friendly, no GPU required |

---

## 📦 Installation

### Full install (recommended)
```bash
pip install texthumanizer[all]
```

### Minimal (text only, no docx)
```bash
pip install texthumanizer[ml]
```

### With DOCX support only
```bash
pip install texthumanizer[ml,docx]
```

> **First run** downloads the T5 model (~250 MB) from HuggingFace once and caches it locally.

---

## 🚀 Quick Start

### 1. Humanize pasted text

```python
from texthumanizer import TextHumanizer

th = TextHumanizer()

ai_text = """
Artificial intelligence has rapidly transformed numerous sectors of society,
demonstrating unprecedented capabilities in natural language processing,
computer vision, and decision-making systems.
"""

result = th.humanize_text(ai_text)
print(result)
```

### 2a. Humanize a .docx → save new .docx

```python
from texthumanizer import TextHumanizer

th = TextHumanizer()

# Saves "humanized_my_essay.docx" next to the original
output_path = th.humanize_doc("my_essay.docx", output="doc")
print(f"Saved: {output_path}")

# Custom output path
th.humanize_doc("my_essay.docx", output="doc", output_path="D:/final_essay.docx")
```

### 2b. Humanize a .docx → get plain text back

```python
from texthumanizer import TextHumanizer

th = TextHumanizer()
text = th.humanize_doc("my_essay.docx", output="text")
print(text)
```

---

## ⚙️ Configuration

```python
th = TextHumanizer(
    diversity=0.7,    # 0.0 = minimal changes, 1.0 = maximum rewriting (default: 0.7)
    device=-1,        # -1 = CPU, 0 = GPU (default: -1)
    verbose=True,     # Show progress (default: True)
)
```

| Parameter | Range | Effect |
|---|---|---|
| `diversity=0.3` | Low | Light rewording, very safe for technical papers |
| `diversity=0.7` | Medium | Balanced — good for essays and reports ✅ |
| `diversity=0.9` | High | Heavy rewriting — good for blog posts or general text |

---

## 🖥️ CLI Usage

### Interactive mode
```bash
python -m texthumanizer.cli
```

### Direct text
```bash
python -m texthumanizer.cli text "Your AI-generated text here" --diversity 0.7
```

### Pipe from file
```bash
cat essay.txt | python -m texthumanizer.cli text
```

### DOCX → humanized DOCX
```bash
python -m texthumanizer.cli doc essay.docx --output doc
```

### DOCX → print text
```bash
python -m texthumanizer.cli doc essay.docx --output text
```

---

## 🔬 How It Works

```
Input Text
    │
    ▼
[Mask technical terms]         ← citations, abbreviations, units, years
    │
    ▼
[Split into sentences]         ← smart splitter (handles abbreviations)
    │
    ▼
[T5 Paraphrasing model]        ← humarin/chatgpt_paraphraser_on_T5_base
    │                            temperature + top-k + top-p sampling
    ▼
[Restore masked terms]         ← [1], DNA, 2023 put back exactly
    │
    ▼
Output Text
```

**Why T5 and not GPT-style?**
T5 is an encoder–decoder model trained specifically on paraphrase tasks. It is:
- Much smaller (~250 MB vs multi-GB GPT models)
- CPU-friendly and fast
- Better at preserving meaning than decoder-only models

---

## 📋 What Gets Preserved

| Type | Example | Preserved? |
|---|---|---|
| Academic citations | `[1]`, `[1,2,3]` | ✅ |
| Author citations | `(Smith et al., 2021)` | ✅ |
| Abbreviations | `DNA`, `AI`, `COVID`, `LSTM` | ✅ |
| Years | `2023`, `1990` | ✅ |
| Percentages | `95%`, `3.5%` | ✅ |
| Scientific units | `kg`, `MHz`, `nm`, `kcal` | ✅ |
| Figure/Table refs | `Fig. 3`, `Table 1` | ✅ |
| DOIs / URLs | `doi:10.xxx`, `https://...` | ✅ |
| Latin abbreviations | `et al.`, `e.g.`, `i.e.` | ✅ |
| Headings (in .docx) | Section titles | ✅ untouched |

---

## 🧪 Example Output

**Input (AI-generated):**
> The utilization of machine learning algorithms has demonstrated significant efficacy in the domain of medical diagnosis, achieving accuracy rates exceeding 95% in several clinical trials [1,2].

**Output (humanized):**
> Using machine learning methods has shown strong results in medical diagnosis, reaching accuracy levels above 95% in a number of clinical studies [1,2].

---

## 💡 Tips for Best Results

- **Research papers**: Use `diversity=0.4` – `0.6` to keep technical accuracy
- **Essays / assignments**: Use `diversity=0.7` (default)
- **Blog posts / creative writing**: Use `diversity=0.8` – `0.9`
- Process section-by-section for long papers for best control
- GPU users: set `device=0` for ~5× speed improvement

---

## 🔬 How It Works (In-place Replacement)
Unlike other humanizers that strip away formatting, **texthumanizer** uses an "In-place run-level replacement" strategy:
1. It creates a temporary copy of your `.docx`.
2. It identifies text-bearing "runs" within each paragraph.
3. It humanizes the text while skipping runs that contain images or drawings.
4. It injects the new text back into the original XML structure, keeping your layout 100% intact.

## 📄 License

MIT License — free for personal and academic use.

---

## ⚠️ Disclaimer & Ethics
This tool is designed to assist researchers in improving the readability of their own writing. It is NOT intended for academic dishonesty or bypassing plagiarism checks for unoriginal work. Use responsibly and always cite your AI assistance if required by your institution.
