Metadata-Version: 2.4
Name: langchain-google-classroom
Version: 0.1.0
Summary: An integration package connecting Google Classroom and LangChain
Project-URL: Homepage, https://github.com/ayanokojix21/langchain-google-classroom
Project-URL: Source, https://github.com/ayanokojix21/langchain-google-classroom
Project-URL: Documentation, https://github.com/ayanokojix21/langchain-google-classroom#readme
Author: Nishchal Chandel
License: MIT
License-File: LICENSE
Keywords: document-loader,education,google-classroom,langchain,rag
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Education
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.9
Requires-Dist: google-api-python-client<3.0.0,>=2.100.0
Requires-Dist: google-auth-httplib2<1.0.0,>=0.2.0
Requires-Dist: google-auth-oauthlib<2.0.0,>=1.2.0
Requires-Dist: google-auth<3.0.0,>=2.25.0
Requires-Dist: langchain-core<1.0.0,>=0.3.0
Provides-Extra: dev
Requires-Dist: mypy<2.0.0,>=1.10.0; extra == 'dev'
Requires-Dist: pypdf<5.0.0,>=4.0.0; extra == 'dev'
Requires-Dist: pytest-mock<4.0.0,>=3.10.0; extra == 'dev'
Requires-Dist: pytest<10.0.0,>=7.3.0; extra == 'dev'
Requires-Dist: python-docx<2.0.0,>=1.0.0; extra == 'dev'
Requires-Dist: ruff<1,>=0.5.0; extra == 'dev'
Provides-Extra: lint
Requires-Dist: ruff<1,>=0.5.0; extra == 'lint'
Provides-Extra: parsers
Requires-Dist: pypdf<5.0.0,>=4.0.0; extra == 'parsers'
Requires-Dist: python-docx<2.0.0,>=1.0.0; extra == 'parsers'
Provides-Extra: test
Requires-Dist: pytest-asyncio<2.0.0,>=0.21.1; extra == 'test'
Requires-Dist: pytest-mock<4.0.0,>=3.10.0; extra == 'test'
Requires-Dist: pytest-socket<1.0.0,>=0.7.0; extra == 'test'
Requires-Dist: pytest<10.0.0,>=7.3.0; extra == 'test'
Provides-Extra: typing
Requires-Dist: mypy<2.0.0,>=1.10.0; extra == 'typing'
Description-Content-Type: text/markdown

# 🎓 langchain-google-classroom

[![CI](https://github.com/ayanokojix21/langchain-google-classroom/actions/workflows/ci.yml/badge.svg)](https://github.com/ayanokojix21/langchain-google-classroom/actions/workflows/ci.yml)
[![PyPI version](https://img.shields.io/pypi/v/langchain-google-classroom.svg)](https://pypi.org/project/langchain-google-classroom/)
[![Python](https://img.shields.io/pypi/pyversions/langchain-google-classroom.svg)](https://pypi.org/project/langchain-google-classroom/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

A **LangChain** integration package that loads Google Classroom content — assignments, announcements, course materials, and Drive attachments — as `Document` objects for RAG pipelines, semantic search, AI teaching assistants, and course chatbots.

## ✨ Features

- **Full Classroom coverage** — assignments, announcements, and course materials
- **Drive attachments** — auto-download and parse PDF, DOCX, text, CSV, HTML files
- **Vision LLM image description** — embedded PDF images described by Gemini/GPT-4V
- **Pluggable parsers** — bring your own `BaseBlobParser` (PyMuPDF, Unstructured, etc.)
- **Retry/backoff** — exponential backoff with jitter on rate-limited API calls
- **Flexible auth** — service accounts, OAuth, cached tokens, or pre-built credentials
- **Rich metadata** — course info, timestamps, due dates, links on every Document
- **Lazy loading** — memory-efficient streaming via `lazy_load()`

## 📦 Installation

```bash
pip install langchain-google-classroom
```

With file attachment parsing (PDF, DOCX):

```bash
pip install langchain-google-classroom[parsers]
```

## 🚀 Quickstart

```python
from langchain_google_classroom import GoogleClassroomLoader

# Load all accessible courses
loader = GoogleClassroomLoader()
docs = loader.load()

for doc in docs:
    print(doc.metadata["content_type"], "—", doc.metadata["title"])
    print(doc.page_content[:200])
    print()
```

## 🔐 Authentication

### Service Account (recommended for production)

```python
loader = GoogleClassroomLoader(
    service_account_file="service_account.json",
)
```

### OAuth User Credentials

```python
loader = GoogleClassroomLoader(
    client_secrets_file="credentials.json",
    token_file="token.json",
)
```

### Pre-built Credentials

```python
from google.oauth2 import service_account

creds = service_account.Credentials.from_service_account_file(
    "service_account.json",
    scopes=["https://www.googleapis.com/auth/classroom.courses.readonly"],
)
loader = GoogleClassroomLoader(credentials=creds)
```

## 📎 Attachments & File Parsing

```python
loader = GoogleClassroomLoader(
    course_ids=["123456789"],
    load_attachments=True,      # Download Drive files
    parse_attachments=True,     # Parse with BaseBlobParser
)
docs = loader.load()
# Yields: assignment docs + parsed PDF/DOCX/text attachment docs
```

### Custom Parser

```python
from langchain_community.document_loaders.parsers.pdf import PyMuPDFParser

loader = GoogleClassroomLoader(
    course_ids=["123456789"],
    file_parser_cls=PyMuPDFParser,
)
```

## 🖼️ Vision LLM — Image Description

Extract and describe images embedded in PDFs using any vision-capable LLM:

```python
from langchain_google_genai import ChatGoogleGenerativeAI

loader = GoogleClassroomLoader(
    course_ids=["123456789"],
    load_attachments=True,
    vision_model=ChatGoogleGenerativeAI(model="gemini-2.0-flash"),
)
docs = loader.load()
# PDF pages now include: "[Image: chart.png]\nA bar chart showing student grades..."
```

## 🎯 Selective Loading

```python
loader = GoogleClassroomLoader(
    course_ids=["123456789"],
    load_assignments=True,
    load_announcements=False,
    load_materials=False,
    load_attachments=False,
)
```

## 📄 Document Structure

Each document includes rich metadata:

```python
Document(
    page_content="Assignment: Homework 3\n\nComplete exercises 1-5...",
    metadata={
        "source": "google_classroom",
        "course_id": "12345",
        "course_name": "Machine Learning",
        "content_type": "assignment",        # or "announcement", "material", "assignment_attachment"
        "title": "Homework 3",
        "item_id": "67890",
        "created_time": "2024-01-15T10:00:00Z",
        "updated_time": "2024-01-15T10:00:00Z",
        "due_date": "2024-01-22T23:59:00",   # assignments only
        "max_points": 100.0,                  # assignments only
        "alternate_link": "https://classroom.google.com/...",
    }
)
```

## ⚙️ Configuration Reference

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `course_ids` | `list[str]` | `None` | Specific course IDs (`None` = all accessible) |
| `load_assignments` | `bool` | `True` | Load courseWork items |
| `load_announcements` | `bool` | `True` | Load announcements |
| `load_materials` | `bool` | `True` | Load courseWorkMaterials |
| `load_attachments` | `bool` | `True` | Download and process Drive attachments |
| `parse_attachments` | `bool` | `True` | Parse files with BaseBlobParser |
| `load_images` | `bool` | `False` | Process image MIME types |
| `vision_model` | `BaseChatModel` | `None` | Vision LLM for image description |
| `image_prompt` | `str` | `None` | Custom prompt for vision model |
| `file_parser_cls` | `type[BaseBlobParser]` | `None` | Custom parser for all attachments |
| `file_parser_kwargs` | `dict` | `None` | kwargs for custom parser |
| `credentials` | `Credentials` | `None` | Pre-built Google credentials |
| `service_account_file` | `str` | `None` | Service account key JSON path |
| `token_file` | `str` | `None` | Cached OAuth token path |
| `client_secrets_file` | `str` | `None` | OAuth client secrets path |
| `scopes` | `list[str]` | Read-only | API scopes to request |

## 🏗️ Architecture

```
GoogleClassroomLoader (BaseLoader)
├── _utilities.py         — auth, retry/backoff, guard_import
├── classroom_api.py      — paginated Classroom API fetcher
├── document_builder.py   — raw API → LangChain Document
├── drive_resolver.py     — Drive download/export
├── normalizer.py         — text cleanup (Unicode NFC, whitespace)
└── parsers/
    ├── __init__.py       — MIME registry + get_parser()
    ├── pdf_parser.py     — pypdf + vision LLM
    ├── docx_parser.py    — python-docx
    ├── text_parser.py    — built-in UTF-8
    └── image_parser.py   — vision LLM + base64 fallback
```

## 🧪 Development

```bash
# Clone and install
git clone https://github.com/ayanokojix21/langchain-google-classroom.git
cd langchain-google-classroom
pip install -e ".[dev]"

# Run tests
pytest tests/unit/ -v

# Lint
ruff check langchain_google_classroom/ tests/
```

## 📝 License

MIT — see [LICENSE](LICENSE) for details.