Metadata-Version: 2.4
Name: makecontextsimple
Version: 0.1.1
Summary: Convert documents to semantic HTML optimized for LLM context - reduces token congestion
Project-URL: Homepage, https://github.com/makecontextsimple/makecontextsimple
Project-URL: Repository, https://github.com/makecontextsimple/makecontextsimple
Project-URL: Issues, https://github.com/makecontextsimple/makecontextsimple/issues
Project-URL: Documentation, https://github.com/makecontextsimple/makecontextsimple#readme
Author: MakeContextSimple Contributors
License-Expression: MIT
License-File: LICENSE
Keywords: ai,context,converter,documents,html,llm,rag,tokens
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Text Processing :: Markup :: HTML
Requires-Python: >=3.10
Requires-Dist: beautifulsoup4>=4.12.0
Requires-Dist: lxml>=4.9.0
Requires-Dist: requests>=2.31.0
Provides-Extra: all
Requires-Dist: openpyxl>=3.1.0; extra == 'all'
Requires-Dist: pdfminer-six>=20221105; extra == 'all'
Requires-Dist: pdfplumber>=0.10.0; extra == 'all'
Requires-Dist: pillow>=10.0.0; extra == 'all'
Requires-Dist: python-docx>=1.0.0; extra == 'all'
Requires-Dist: python-pptx>=0.6.21; extra == 'all'
Provides-Extra: dev
Requires-Dist: build>=1.0.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0.0; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Requires-Dist: twine>=5.0.0; extra == 'dev'
Provides-Extra: docx
Requires-Dist: python-docx>=1.0.0; extra == 'docx'
Provides-Extra: image
Requires-Dist: pillow>=10.0.0; extra == 'image'
Provides-Extra: pdf
Requires-Dist: pdfminer-six>=20221105; extra == 'pdf'
Requires-Dist: pdfplumber>=0.10.0; extra == 'pdf'
Provides-Extra: pptx
Requires-Dist: python-pptx>=0.6.21; extra == 'pptx'
Provides-Extra: xlsx
Requires-Dist: openpyxl>=3.1.0; extra == 'xlsx'
Description-Content-Type: text/markdown

# MakeContextSimple

Convert documents to semantic HTML optimized for LLM context consumption.

## Overview

MakeContextSimple is a Python utility that converts various document formats into clean, semantic HTML optimized for large language model (LLM) consumption. Unlike Markdown-based converters, MakeContextSimple produces HTML that is:

- **Token-efficient**: Less syntax overhead than Markdown for complex structures
- **Semantically rich**: HTML tags convey meaning without extra markers
- **Machine-parseable**: Standard HTML parsers work reliably
- **Browser-viewable**: Output can be directly viewed in any browser

## Supported Formats

| Category | Formats |
|----------|---------|
| Documents | PDF, DOCX, Markdown |
| Office | PPTX, XLSX |
| Web | HTML, XML, RSS |
| Data | CSV, JSON |
| Text | Plain text, Code files, Config files |
| Images | JPG, PNG, GIF, WebP, BMP |

## Installation

### Basic Installation

```bash
pip install makecontextsimple
```

### With Optional Dependencies

```bash
# For PDF support
pip install makecontextsimple[pdf]

# For Office document support
pip install makecontextsimple[docx,pptx,xlsx]

# For image support
pip install makecontextsimple[image]

# For all formats
pip install makecontextsimple[all]
```

### From Source

```bash
git clone https://github.com/makecontextsimple/makecontextsimple.git
cd makecontextsimple
pip install -e ".[all]"
```

### Docker

```bash
# Build image
docker build -t makecontextsimple .

# Convert a file
docker run --rm -v $(pwd):/data makecontextsimple document.pdf -o /data/output.html

# LLM-optimized output
docker run --rm -v $(pwd):/data makecontextsimple document.pdf --llm -o /data/context.html
```

### Docker Compose

```bash
# Single file conversion
docker compose run convert

# LLM-optimized conversion
docker compose run convert-llm

# Batch convert all PDFs in input/ folder
docker compose run batch
```

## Usage

### Command Line

```bash
# Convert a file to HTML (output to stdout)
makecontextsimple document.pdf

# Convert with custom output file
makecontextsimple document.pdf -o output.html

# Generate minimal HTML for LLM context
makecontextsimple document.pdf --llm

# List supported formats
makecontextsimple --list-formats
```

### Python API

```python
from makecontextsimple import MakeContextSimple

# Initialize converter
converter = MakeContextSimple()

# Convert a file
result = converter.convert("document.pdf")

# Get full HTML document
html = result.to_full_document()
print(html)

# Get minimal HTML for LLM context
llm_context = result.to_llm_context()

# Save directly to file
converter.convert_to_file("document.pdf", "output.html")

# Convert URL content
import requests
response = requests.get("https://example.com/page.html")
result = converter.convert(response)
```

### Custom Styles

```python
# Use custom CSS
custom_css = """
body { font-family: Arial; max-width: 800px; margin: 0 auto; }
h1 { color: #333; }
"""
result = converter.convert("document.pdf")
html = result.to_full_document(styles=custom_css)
```

### Custom Converters

```python
from makecontextsimple import HTMLConverter, HTMLResult

class MyCustomConverter(HTMLConverter):
    def accepts(self, file_stream, mimetype=None, extension=None, **kwargs):
        return extension == ".myformat"
    
    def convert(self, file_stream, mimetype=None, extension=None, **kwargs):
        content = file_stream.read().decode("utf-8")
        # Custom conversion logic
        html = f"<pre>{content}</pre>"
        return HTMLResult(html=html, title="Custom Format")

# Register custom converter
converter = MakeContextSimple()
converter.register_converter(MyCustomConverter(), priority=0)
```

## Architecture

MakeContextSimple follows a plugin-based converter architecture:

```
MakeContextSimple (orchestrator)
    ├── HTMLConverter (abstract base)
    │   ├── PDFConverter
    │   ├── DOCXConverter
    │   ├── PPTXConverter
    │   ├── XLSXConverter
    │   ├── ImageConverter
    │   ├── CSVConverter
    │   ├── JSONConverter
    │   ├── XMLConverter
    │   ├── HTMLConverter_Builtin
    │   ├── MarkdownConverter
    │   └── PlainTextConverter
    ├── HTMLBuilder (utilities)
    └── HTMLResult (output container)
```

### Key Components

- **MakeContextSimple**: Main orchestrator that manages converters and I/O
- **HTMLConverter**: Abstract base class for all format converters
- **HTMLBuilder**: Utility class for constructing semantic HTML
- **HTMLResult**: Container for conversion output with metadata

## Why HTML Over Markdown?

| Aspect | Markdown | HTML |
|--------|----------|------|
| Token Efficiency | Good | Better (15-20% fewer) |
| Table Syntax | `\|---\|` separators | `<table>` tags |
| Semantic Meaning | Relies on conventions | Explicit tags |
| Parsing | Regex/string ops | Standard parsers |
| Preview | Needs rendering | Native browser |

### Token Comparison Example

**Markdown (180 tokens):**
```markdown
| Name  | Age | City     |
|-------|-----|----------|
| Alice | 30  | New York |
```

**HTML (150 tokens):**
```html
<table>
<tr><td>Name</td><td>Age</td><td>City</td></tr>
<tr><td>Alice</td><td>30</td><td>New York</td></tr>
```

## Plugin System

MakeContextSimple supports third-party plugins via Python's entry_points:

```python
# In your plugin's pyproject.toml:
[project.entry-points."makecontextsimple.plugin"]
my_plugin = "my_package:register"

# In your plugin:
def register(converter_instance):
    converter_instance.register_converter(MyConverter(), priority=5)
```

## Development

### Setup

```bash
git clone https://github.com/makecontextsimple/makecontextsimple.git
cd makecontextsimple
pip install -e ".[dev]"
```

### Running Tests

```bash
pytest tests/
```

### Code Style

```bash
ruff check src/
ruff format src/
```

### Docker Development

```bash
# Build development image
docker build -t makecontextsimple:dev .

# Run tests in container
docker run --rm makecontextsimple:dev python -m pytest tests/

# Interactive shell
docker run --rm -it makecontextsimple:dev /bin/bash
```

### CI/CD

This project uses GitHub Actions for:

- **CI** (`.github/workflows/ci.yml`): Runs tests on push/PR
- **Publish** (`.github/workflows/publish.yml`): Publishes to PyPI and Docker Hub on release

#### Required Secrets

For publishing, add these secrets in GitHub Settings:

| Secret | Description |
|--------|-------------|
| `PYPI_API_TOKEN` | PyPI API token |
| `DOCKERHUB_USERNAME` | Docker Hub username |
| `DOCKERHUB_TOKEN` | Docker Hub access token |

### Publishing

#### Manual Publishing

```bash
# Build distribution
python -m build

# Check distribution
twine check dist/*

# Upload to PyPI
twine upload dist/*
```

#### Automated Publishing

Create a GitHub release to automatically publish to PyPI and Docker Hub.

```bash
# Create tag
git tag -a v0.1.0 -m "Release 0.1.0"
git push origin v0.1.0

# Create release on GitHub or use:
gh release create v0.1.0
```

## License

MIT License

## Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
