Metadata-Version: 2.3
Name: anyparser-llamaindex
Version: 0.0.2
Summary: LlamaIndex integration for Anyparser
License: Apache-2.0
Requires-Python: >=3.9
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Typing :: Typed
Requires-Dist: anyparser-core
Requires-Dist: llama-index-core (>=0.10.0,<0.11.0)
Project-URL: Homepage, https://github.com/anyparser/anyparser_llamaindex
Description-Content-Type: text/markdown

# Anyparser LlamaIndex: Seamless Integration of Anyparser with LlamaIndex

https://anyparser.com

**Integrate Anyparser's powerful content extraction capabilities with LlamaIndex for enhanced AI workflows.** This integration package enables seamless use of Anyparser's document processing and data extraction features within your LlamaIndex applications, making it easier than ever to build sophisticated AI pipelines.

## Installation

```bash
pip install anyparser-llamaindex
```

## Setup

Before running the examples, make sure to set your Anyparser API credentials as environment variables:

```bash
export ANYPARSER_API_KEY="your-api-key"
export ANYPARSER_API_URL="https://anyparserapi.com"
```

## Anyparser LlamaIndex Examples

This `examples` directory contains examples demonstrating different ways to use the Anyparser LlamaIndex integration.

```bash
python examples/01_basic_usage.py
python examples/02_single_file_json.py
python examples/03_single_file_markdown.py
python examples/04_multiple_files_json.py
python examples/05_multiple_files_markdown.py
python examples/06_load_folder.py
python examples/07_ocr_markdown.py
python examples/08_ocr_json.py
python examples/09_web_crawler.py
```

## Features Demonstrated

### Document Processing
- Different output formats (markdown, JSON)
- Multiple file handling
- Folder processing
- Metadata handling

### Web Crawling
- Basic crawling with depth and scope control
- Advanced URL and content filtering
- Crawling strategies (BFS, LIFO)
- Rate limiting and robots.txt respect

## Notes

- All examples use async/await for better performance
- Error handling is included in all examples
- Each example includes detailed comments explaining the options used
- OCR examples support multiple languages
- Crawler examples demonstrate various filtering and control options

## Features Demonstrated

- Different output formats (markdown, JSON)
- OCR capabilities with language support
- OCR performance presets
- Image extraction
- Table extraction
- Metadata handling
- Error handling
- Async/await usage

## License

Apache-2.0
