Metadata-Version: 2.4
Name: papertuner
Version: 0.0.3
Summary: A package for creating ML research assistant models through paper dataset creation and model fine-tuning
Author-email: Your Name <your.email@example.com>
Project-URL: Homepage, https://github.com/yourusername/papertuner
Project-URL: Bug Tracker, https://github.com/yourusername/papertuner/issues
Project-URL: Documentation, https://github.com/yourusername/papertuner#readme
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: huggingface_hub==0.29.3
Requires-Dist: tenacity==9.0.0
Requires-Dist: PyMuPDF>=1.22.0
Requires-Dist: arxiv>=1.4.0
Requires-Dist: openai==1.66.3
Requires-Dist: tqdm==4.67.1
Requires-Dist: requests==2.32.3
Requires-Dist: datasets==3.4.1
Requires-Dist: sentence-transformers==3.4.1
Requires-Dist: trl==0.15.2
Requires-Dist: vllm==0.8.1
Provides-Extra: cu118-torch220
Requires-Dist: unsloth[cu118-torch220]==2025.3.18; extra == "cu118-torch220"
Provides-Extra: cu118-torch230
Requires-Dist: unsloth[cu118-torch230]==2025.3.18; extra == "cu118-torch230"
Provides-Extra: cu118-torch240
Requires-Dist: unsloth[cu118-torch240]==2025.3.18; extra == "cu118-torch240"
Provides-Extra: cu118-torch250
Requires-Dist: unsloth[cu118-torch250]==2025.3.18; extra == "cu118-torch250"
Provides-Extra: cu121-torch220
Requires-Dist: unsloth[cu121-torch220]==2025.3.18; extra == "cu121-torch220"
Provides-Extra: cu121-torch230
Requires-Dist: unsloth[cu121-torch230]==2025.3.18; extra == "cu121-torch230"
Provides-Extra: cu121-torch240
Requires-Dist: unsloth[cu121-torch240]==2025.3.18; extra == "cu121-torch240"
Provides-Extra: cu121-torch250
Requires-Dist: unsloth[cu121-torch250]==2025.3.18; extra == "cu121-torch250"
Provides-Extra: cu124-torch220
Requires-Dist: unsloth[cu124-torch220]==2025.3.18; extra == "cu124-torch220"
Provides-Extra: cu124-torch230
Requires-Dist: unsloth[cu124-torch230]==2025.3.18; extra == "cu124-torch230"
Provides-Extra: cu124-torch240
Requires-Dist: unsloth[cu124-torch240]==2025.3.18; extra == "cu124-torch240"
Provides-Extra: cu124-torch250
Requires-Dist: unsloth[cu124-torch250]==2025.3.18; extra == "cu124-torch250"
Provides-Extra: cu118-ampere-torch220
Requires-Dist: unsloth[cu118-ampere-torch220]==2025.3.18; extra == "cu118-ampere-torch220"
Provides-Extra: cu118-ampere-torch230
Requires-Dist: unsloth[cu118-ampere-torch230]==2025.3.18; extra == "cu118-ampere-torch230"
Provides-Extra: cu118-ampere-torch240
Requires-Dist: unsloth[cu118-ampere-torch240]==2025.3.18; extra == "cu118-ampere-torch240"
Provides-Extra: cu118-ampere-torch250
Requires-Dist: unsloth[cu118-ampere-torch250]==2025.3.18; extra == "cu118-ampere-torch250"
Provides-Extra: cu121-ampere-torch220
Requires-Dist: unsloth[cu121-ampere-torch220]==2025.3.18; extra == "cu121-ampere-torch220"
Provides-Extra: cu121-ampere-torch230
Requires-Dist: unsloth[cu121-ampere-torch230]==2025.3.18; extra == "cu121-ampere-torch230"
Provides-Extra: cu121-ampere-torch240
Requires-Dist: unsloth[cu121-ampere-torch240]==2025.3.18; extra == "cu121-ampere-torch240"
Provides-Extra: cu121-ampere-torch250
Requires-Dist: unsloth[cu121-ampere-torch250]==2025.3.18; extra == "cu121-ampere-torch250"
Provides-Extra: cu124-ampere-torch220
Requires-Dist: unsloth[cu124-ampere-torch220]==2025.3.18; extra == "cu124-ampere-torch220"
Provides-Extra: cu124-ampere-torch230
Requires-Dist: unsloth[cu124-ampere-torch230]==2025.3.18; extra == "cu124-ampere-torch230"
Provides-Extra: cu124-ampere-torch240
Requires-Dist: unsloth[cu124-ampere-torch240]==2025.3.18; extra == "cu124-ampere-torch240"
Provides-Extra: cu124-ampere-torch250
Requires-Dist: unsloth[cu124-ampere-torch250]==2025.3.18; extra == "cu124-ampere-torch250"

# PaperTuner

PaperTuner is a Python package for creating research assistant models by processing academic papers and fine-tuning language models to provide methodology guidance and research approaches.

## Features

- Automated extraction of research papers from arXiv
- Section extraction to identify problem statements, methodologies, and results
- Generation of high-quality question-answer pairs for research methodology
- Fine-tuning of language models with GRPO (Growing Rank Pruned Optimization)
- Integration with Hugging Face for dataset and model sharing

## Installation

```bash
pip install papertuner
```

## Basic Usage

### As a Command-Line Tool

#### 1. Create a dataset from research papers

```bash
# Set up your environment variables
export GEMINI_API_KEY="your-api-key"
export HF_TOKEN="your-huggingface-token"  # Optional, for uploading to HF

# Run the dataset creation
papertuner-dataset --max-papers 100
```

#### 2. Train a model

```bash
# Train using the created or an existing dataset
papertuner-train --model "Qwen/Qwen2.5-3B-Instruct" --dataset "densud2/ml_qa_dataset"
```

### As a Python Library

```python
from papertuner import ResearchPaperProcessor, ResearchAssistantTrainer

# Create a dataset
processor = ResearchPaperProcessor(
    api_key="your-api-key",
    hf_repo_id="your-username/dataset-name"
)
papers = processor.process_papers(max_papers=10)

# Train a model
trainer = ResearchAssistantTrainer(
    model_name="Qwen/Qwen2.5-3B-Instruct",
    lora_rank=64,
    output_dir="./model_output"
)
results = trainer.train("your-username/dataset-name")

# Test the model
question = "How would you design a transformer model for time series forecasting?"
response = trainer.run_inference(
    results["model"],
    results["tokenizer"],
    question,
    results["lora_path"]
)
print(response)
```

## Configuration

You can configure the tool using environment variables or when initializing the classes:

- `GEMINI_API_KEY`: API key for generating QA pairs
- `HF_TOKEN`: Hugging Face token for uploading datasets and models
- `HF_REPO_ID`: Hugging Face repository ID for the dataset
- `PAPERTUNER_DATA_DIR`: Custom directory for storing data (default: ~/.papertuner/data)

## License

MIT License

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.
