Metadata-Version: 2.4
Name: makecontextsimple
Version: 0.1.0
Summary: Convert documents to semantic HTML optimized for LLM context - reduces token congestion
Project-URL: Homepage, https://github.com/makecontextsimple/makecontextsimple
Project-URL: Repository, https://github.com/makecontextsimple/makecontextsimple
Project-URL: Issues, https://github.com/makecontextsimple/makecontextsimple/issues
Project-URL: Documentation, https://github.com/makecontextsimple/makecontextsimple#readme
Author: MakeContextSimple Contributors
License-Expression: MIT
License-File: LICENSE
Keywords: ai,context,converter,documents,html,llm,rag,tokens
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Text Processing :: Markup :: HTML
Requires-Python: >=3.10
Requires-Dist: beautifulsoup4>=4.12.0
Requires-Dist: lxml>=4.9.0
Requires-Dist: requests>=2.31.0
Provides-Extra: all
Requires-Dist: openpyxl>=3.1.0; extra == 'all'
Requires-Dist: pdfminer-six>=20221105; extra == 'all'
Requires-Dist: pdfplumber>=0.10.0; extra == 'all'
Requires-Dist: pillow>=10.0.0; extra == 'all'
Requires-Dist: python-docx>=1.0.0; extra == 'all'
Requires-Dist: python-pptx>=0.6.21; extra == 'all'
Provides-Extra: dev
Requires-Dist: build>=1.0.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0.0; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Requires-Dist: twine>=5.0.0; extra == 'dev'
Provides-Extra: docx
Requires-Dist: python-docx>=1.0.0; extra == 'docx'
Provides-Extra: image
Requires-Dist: pillow>=10.0.0; extra == 'image'
Provides-Extra: pdf
Requires-Dist: pdfminer-six>=20221105; extra == 'pdf'
Requires-Dist: pdfplumber>=0.10.0; extra == 'pdf'
Provides-Extra: pptx
Requires-Dist: python-pptx>=0.6.21; extra == 'pptx'
Provides-Extra: xlsx
Requires-Dist: openpyxl>=3.1.0; extra == 'xlsx'
Description-Content-Type: text/markdown

# MakeContextSimple

Convert documents to semantic HTML optimized for LLM context consumption.

## Overview

MakeContextSimple is a Python utility that converts various document formats into clean, semantic HTML optimized for large language model (LLM) consumption. Unlike Markdown-based converters, MakeContextSimple produces HTML that is:

- **Token-efficient**: Less syntax overhead than Markdown for complex structures
- **Semantically rich**: HTML tags convey meaning without extra markers
- **Machine-parseable**: Standard HTML parsers work reliably
- **Browser-viewable**: Output can be directly viewed in any browser

## Supported Formats

| Category | Formats |
|----------|---------|
| Documents | PDF, DOCX, Markdown |
| Office | PPTX, XLSX |
| Web | HTML, XML, RSS |
| Data | CSV, JSON |
| Text | Plain text, Code files, Config files |
| Images | JPG, PNG, GIF, WebP, BMP |

## Installation

### Basic Installation

```bash
pip install makecontextsimple
```

### With Optional Dependencies

```bash
# For PDF support
pip install makecontextsimple[pdf]

# For Office document support
pip install makecontextsimple[docx,pptx,xlsx]

# For image support
pip install makecontextsimple[image]

# For all formats
pip install makecontextsimple[all]
```

### From Source

```bash
git clone https://github.com/makecontextsimple/makecontextsimple.git
cd makecontextsimple
pip install -e ".[all]"
```

## Usage

### Command Line

```bash
# Convert a file to HTML (output to stdout)
makecontextsimple document.pdf

# Convert with custom output file
makecontextsimple document.pdf -o output.html

# Generate minimal HTML for LLM context
makecontextsimple document.pdf --llm

# List supported formats
makecontextsimple --list-formats
```

### Python API

```python
from makecontextsimple import MakeContextSimple

# Initialize converter
converter = MakeContextSimple()

# Convert a file
result = converter.convert("document.pdf")

# Get full HTML document
html = result.to_full_document()
print(html)

# Get minimal HTML for LLM context
llm_context = result.to_llm_context()

# Save directly to file
converter.convert_to_file("document.pdf", "output.html")

# Convert URL content
import requests
response = requests.get("https://example.com/page.html")
result = converter.convert(response)
```

### Custom Styles

```python
# Use custom CSS
custom_css = """
body { font-family: Arial; max-width: 800px; margin: 0 auto; }
h1 { color: #333; }
"""
result = converter.convert("document.pdf")
html = result.to_full_document(styles=custom_css)
```

### Custom Converters

```python
from makecontextsimple import HTMLConverter, HTMLResult

class MyCustomConverter(HTMLConverter):
    def accepts(self, file_stream, mimetype=None, extension=None, **kwargs):
        return extension == ".myformat"
    
    def convert(self, file_stream, mimetype=None, extension=None, **kwargs):
        content = file_stream.read().decode("utf-8")
        # Custom conversion logic
        html = f"<pre>{content}</pre>"
        return HTMLResult(html=html, title="Custom Format")

# Register custom converter
converter = MakeContextSimple()
converter.register_converter(MyCustomConverter(), priority=0)
```

## Architecture

MakeContextSimple follows a plugin-based converter architecture:

```
MakeContextSimple (orchestrator)
    ├── HTMLConverter (abstract base)
    │   ├── PDFConverter
    │   ├── DOCXConverter
    │   ├── PPTXConverter
    │   ├── XLSXConverter
    │   ├── ImageConverter
    │   ├── CSVConverter
    │   ├── JSONConverter
    │   ├── XMLConverter
    │   ├── HTMLConverter_Builtin
    │   ├── MarkdownConverter
    │   └── PlainTextConverter
    ├── HTMLBuilder (utilities)
    └── HTMLResult (output container)
```

### Key Components

- **MakeContextSimple**: Main orchestrator that manages converters and I/O
- **HTMLConverter**: Abstract base class for all format converters
- **HTMLBuilder**: Utility class for constructing semantic HTML
- **HTMLResult**: Container for conversion output with metadata

## Why HTML Over Markdown?

| Aspect | Markdown | HTML |
|--------|----------|------|
| Token Efficiency | Good | Better (15-20% fewer) |
| Table Syntax | `\|---\|` separators | `<table>` tags |
| Semantic Meaning | Relies on conventions | Explicit tags |
| Parsing | Regex/string ops | Standard parsers |
| Preview | Needs rendering | Native browser |

### Token Comparison Example

**Markdown (180 tokens):**
```markdown
| Name  | Age | City     |
|-------|-----|----------|
| Alice | 30  | New York |
```

**HTML (150 tokens):**
```html
<table>
<tr><td>Name</td><td>Age</td><td>City</td></tr>
<tr><td>Alice</td><td>30</td><td>New York</td></tr>
```

## Plugin System

MakeContextSimple supports third-party plugins via Python's entry_points:

```python
# In your plugin's pyproject.toml:
[project.entry-points."makecontextsimple.plugin"]
my_plugin = "my_package:register"

# In your plugin:
def register(converter_instance):
    converter_instance.register_converter(MyConverter(), priority=5)
```

## Development

### Setup

```bash
git clone https://github.com/makecontextsimple/makecontextsimple.git
cd makecontextsimple
pip install -e ".[dev]"
```

### Running Tests

```bash
pytest tests/
```

### Code Style

```bash
ruff check src/
ruff format src/
```

## License

MIT License

## Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
