Metadata-Version: 2.2
Name: pdfxact
Version: 0.1.1
Summary: PDF Quiz Generator
Author: pdfxact contributors
License: MIT
Keywords: pdf,quiz,generator
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Education
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Requires-Python: <3.12,>=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: wheel>=0.38.0
Requires-Dist: setuptools>=65.5.1
Requires-Dist: numpy<1.25.0,>=1.23.5
Requires-Dist: torch<2.1.0,>=2.0.0
Requires-Dist: spacy==3.5.3
Requires-Dist: transformers==4.30.0
Requires-Dist: huggingface-hub==0.16.4
Requires-Dist: sentence-transformers==2.2.2
Requires-Dist: pypdf>=4.0.0
Requires-Dist: easyocr>=1.7.0
Requires-Dist: pillow>=10.0.0
Requires-Dist: diskcache>=5.0.0
Requires-Dist: reportlab>=4.0.0
Requires-Dist: ollama>=0.1.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: isort>=5.12.0; extra == "dev"
Requires-Dist: pylint>=2.17.0; extra == "dev"
Requires-Dist: autopep8>=2.0.0; extra == "dev"
Requires-Dist: pre-commit>=3.5.0; extra == "dev"
Requires-Dist: bandit[toml]>=1.7.8; extra == "dev"
Requires-Dist: pycln>=2.4.0; extra == "dev"

# PDFxact

A powerful PDF Quiz Generator that extracts content from PDFs and automatically generates educational questions with fact checking capabilities.

## Installation

```bash
pip install pdfxact
```

## Features

- PDF text extraction with OCR support
- AI-powered question generation from PDF content
- Automated fact checking and validation
- Text preprocessing and quality analysis
- Batch processing for efficient question generation
- Customizable number of questions

## Usage

### Command Line Interface

```bash
# Generate 5 questions from a PDF file
qz path/to/your/document.pdf

# Generate a specific number of questions
qz path/to/your/document.pdf --num_questions 10

# Control batch processing size
qz path/to/your/document.pdf --num_questions 20 --batch_size 8
```

### Python API

```python
from qz.text_extractor import TextExtractor
from qz.question_generator import QuestionGenerator

# Extract text from PDF
extractor = TextExtractor()
text = extractor.extract_text_from_pdf("path/to/your/document.pdf")

# Generate quiz questions
generator = QuestionGenerator()
questions = generator.generate_question_from_text(text, num_questions=5)

# Process the questions
for i, question in enumerate(questions, 1):
    print(f"Question {i}: {question['text']}")
    print("Options:", question['options'])
    print("Correct Answer:", question['options'][question['correct_answer']])
    print()
```

## Advanced Features

- **OCR Support**: Automatically extracts text from scanned PDFs using EasyOCR
- **Fact Checking**: Validates generated questions against source content
- **Quality Analysis**: Ensures generated questions meet educational standards
- **Batch Processing**: Efficiently processes multiple questions in parallel

## Requirements

- Python 3.11
- Dependencies (automatically installed):
  - torch: Deep learning support
  - spacy: NLP processing
  - transformers: Question generation
  - pypdf: PDF processing
  - easyocr: OCR support
  - reportlab: PDF generation
  - ollama: LLM integration

See pyproject.toml for full dependency list.

## Development

1. Clone the repository
2. Install development dependencies:
   ```bash
   pip install -r requirements-dev.txt
   ```
3. Install pre-commit hooks:
   ```bash
   pre-commit install
   ```
4. Run tests:
   ```bash
   pytest
   ```

## License

MIT License - See LICENSE file for details
