Metadata-Version: 2.4
Name: arxivflow
Version: 0.1.1
Summary: Automate arXiv paper tracking with LLM-powered metadata extraction and Google Sheets sync.
Author: Zhijie Zhao
License: MIT
Project-URL: Homepage, https://github.com/zjzhao/arXivFlow
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.13
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: arxiv>=3.0.0
Requires-Dist: pandas>=3.0.2
Requires-Dist: pymupdf>=1.27.2.3
Requires-Dist: gspread>=6.2.1
Requires-Dist: ollama>=0.6.1
Requires-Dist: numpy>=2.4.4
Requires-Dist: pydantic>=2.13.3
Requires-Dist: requests>=2.33.1
Requires-Dist: google-api-python-client>=2.194.0
Requires-Dist: google-auth-oauthlib>=1.3.1
Dynamic: license-file

# arXivFlow 🚀

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.13+](https://img.shields.io/badge/python-3.13+-blue.svg)](https://www.python.org/downloads/)
[![Static Badge](https://img.shields.io/badge/pypi-0.1.1-blue)](https://pypi.org/project/arxivflow/)
[![Ollama](https://img.shields.io/badge/Ollama-Llama3.2-orange.svg)](https://ollama.ai/)
[![arXiv](https://img.shields.io/badge/arXiv-API-red.svg)](https://arxiv.org/help/api/index)

**arXivFlow** is a powerful Python-based automation tool designed to streamline the research paper discovery and tracking process. It autonomously fetches metadata from arXiv, performs local AI-driven analysis using **Ollama (Llama 3.2)**, and synchronizes the results with **Google Sheets** and local databases.

---

## ✨ Features

- **Automated Retrieval**: Fetch the latest papers from specific arXiv categories (e.g., `cs.AI`, `cs.LG`, `hep-ph`) within any date range.
- **Local AI Analysis**: Uses **Ollama (Llama 3.2)** to extract keywords and contact information (emails/affiliations) directly from PDF text. No cloud API costs or data privacy concerns.
- **Intelligent PDF Handling**: Automatically downloads PDFs and extracts text for deep analysis. Supports custom storage paths.
- **Multi-Format Export**: Save your research data to **CSV**, **JSON**, **Excel**, or **SQLite** for flexible offline analysis.
- **Google Sheets Sync**: Seamlessly push compiled research data to a shared Google Sheet for team collaboration.
- **Type-Safe & Modular**: Clean, documented Python code with full type hinting and a class-based architecture.

---

## 🛠️ Prerequisites

1. **Python 3.13+**: Ensure you have a modern Python environment.
2. **Ollama**: Install [Ollama](https://ollama.ai/) and download the required model:
   ```bash
   ollama pull llama3.2
   ```
3. **Google Cloud Credentials**:
   - Enable the **Google Sheets** and **Google Drive** APIs.
   - Create a **Service Account** and download the JSON key as `credentials.json`.
   - Ensure the service account has 'Editor' permissions on the sheet.

---

## 🚀 Installation

### From PyPI (Recommended)
```bash
pip install arxivflow
```

### From Source (For Development)

1. **Clone the repository**:
   ```bash
   git clone https://github.com/zjzhao/arXivFlow.git
   cd arXivFlow
   ```

2. **Set up virtual environment**:
   ```bash
   python -m venv .
   source bin/activate  # On Windows: Scripts\activate
   ```

3. **Install dependencies**:
   ```bash
   pip install -e .
   ```

---

## 📖 Usage

### Quick Start

```python
from arxivflow import arXivFlow
import datetime

# 1. Initialize the flow
flow = arXivFlow(
    categories=["cs.AI", "cs.CV"], 
    ollama_model="llama3.2",
    max_results=20,
    start_date=datetime.datetime.now() - datetime.timedelta(days=7)
)

# 2. Fetch data & Extract info (Keywords/Contacts)
df = flow.get_arxiv_data(download_pdfs=True)

# 3. Save to your preferred formats
flow.save_to_csv("my_research.csv")
flow.save_to_sqlite("research.db")

# 4. Sync with Google Sheets
flow.save_to_google_sheet(
    sheet_id="YOUR_SHEET_ID", 
    credentials_file="credentials.json"
)
```

---

## 🏗️ Architecture

The project follows a modular structure for easy extension:

- `src/arxivflow/arxivflow.py`: The main orchestrator class (`arXivFlow`).
- `src/arxivflow/ollama_functions.py`: Local LLM interface using the Ollama API.

---

## 📜 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## 🤝 Contributing

Contributions are what make the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are **greatly appreciated**.

1. Fork the Project
2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`)
3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the Branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request
