Metadata-Version: 2.4
Name: text-summarizer-gi
Version: 1.0.0
Summary: Lightweight text summarizer with intelligent chunking and token counting for multiple LLM providers
Author-email: Dhivya <dhivyashankar27@example.com>
License: MIT
Keywords: summarizer,text,llm,nlp,tokens,chunking
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Developers
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: openai>=1.0.0
Requires-Dist: tiktoken
Dynamic: license-file

# Text Summarizer - Minimal & Focused

A simple, production-ready text summarization library.

**Input:** Long text  
**Output:** Concise summary

## Quick Start

```python
from text_summarizer_gi import TextSummarizer

summarizer = TextSummarizer()
result = summarizer.summarize("Your long text here...")
print(result.summary)
```

## Setup

```bash
# 1. Install dependencies
pip install openai

# 2. Set Azure credentials
export AZURE_OPENAI_API_KEY="your-key"
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com"
export AZURE_OPENAI_DEPLOYMENT="gpt-4o-mini"
```

## Features

- ✓ Simple, focused API
- ✓ Intelligent text chunking (sentence-based)
- ✓ Token counting
- ✓ Multiple summary types: short, medium, detailed
- ✓ Multiple tones: neutral, formal, casual
- ✓ Compression tracking
- ✓ Error handling with fallbacks

## Basic Usage

### Simple Summarization
```python
from text_summarizer_gi import TextSummarizer

summarizer = TextSummarizer()

text = "Your long text..."

# Short summary (25% of original)
result = summarizer.summarize(text, summary_type="short")
print(result.summary)

# Medium summary (40% of original)
result = summarizer.summarize(text, summary_type="medium")
print(result.summary)

# Detailed summary (60% of original)
result = summarizer.summarize(text, summary_type="detailed")
print(result.summary)
```

### With Different Tones
```python
# Formal tone
result = summarizer.summarize(text, tone="formal")

# Casual tone
result = summarizer.summarize(text, tone="casual")

# Neutral tone (default)
result = summarizer.summarize(text, tone="neutral")
```

### Token Counting
```python
from text_summarizer_gi import count_tokens

tokens = count_tokens("Your text...")
print(f"Token count: {tokens}")
```

### Text Chunking
```python
from text_summarizer_gi import chunk_text, chunk_text_by_sentences

# Character-based chunking
chunks = chunk_text("Your text...", chunk_size=3000)

# Sentence-based chunking (better for summarization)
chunks = chunk_text_by_sentences("Your text...")
```

## Result Object

```python
result = summarizer.summarize(text)

result.summary              # The summarized text
result.input_tokens         # Input token count
result.output_tokens        # Output token count
result.compression_ratio    # Output reduction percentage
```

## Testing

```bash
# Run basic test
python test_basic.py

# Check imports
python -c "from text_summarizer_gi import TextSummarizer; print('✓ OK')"
```

## Project Structure

```
text_summarizer_gi/
├── __init__.py           # Exports
├── summarizer.py         # Main TextSummarizer class
├── prompts.py            # Summarization prompts
├── chunking.py           # Text chunking utilities
├── token_counter.py      # Token counting
└── utils.py              # Helper functions

test_basic.py            # Basic test
pyproject.toml           # Package config
README.md                # This file
LICENSE                  # MIT License
```

## How It Works

1. **Input Validation** - Check text is not empty
2. **Token Counting** - Count tokens in input
3. **Chunking** - Split large texts into manageable chunks (sentence-based)
4. **Summarization** - Send each chunk to Azure OpenAI with clear instructions
5. **Combination** - If multiple chunks, combine and re-summarize
6. **Output** - Return summary with compression statistics

## Key Features

### Smart Chunking
- Sentence-based chunking preserves context better than character-based
- Each chunk is processed independently for better quality
- Multiple summaries are combined and re-summarized

### Clear Prompts
- Explicit instruction to create REAL summaries, not copies
- Target length guidance (short/medium/detailed)
- Tone customization
- Lower temperature (0.5) for more focused output

### Compression Tracking
- Input and output token counts
- Compression ratio shows effectiveness
- Helps optimize summary type selection

## Error Handling

- Empty responses → fallback to first sentences
- API errors → logged with fallback
- Invalid input → clear error messages

## Dependencies

- `openai>=1.0.0` - For Azure OpenAI API

## License

MIT License - See LICENSE file

## Version

1.0.0 - Clean, focused implementation
