Metadata-Version: 2.4
Name: openbatch
Version: 0.0.4
Summary: Create batch jobs for the OpenAI API with ease.
Author-email: Daniel Gomm <daniel.gomm@cwi.nl>
Project-URL: Homepage, https://github.com/daniel-gomm/openbatch
Project-URL: Issues, https://github.com/daniel-gomm/openbatch/issues
Project-URL: Documentation, https://daniel-gomm.github.io/openbatch/
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pydantic>=2.11.9
Provides-Extra: test
Requires-Dist: pytest>=8.0.0; extra == "test"
Requires-Dist: pytest-cov>=4.1.0; extra == "test"
Provides-Extra: dev
Requires-Dist: ruff>=0.8.0; extra == "dev"
Requires-Dist: mypy>=1.13.0; extra == "dev"
Requires-Dist: pre-commit>=4.0.0; extra == "dev"
Dynamic: license-file

# OpenBatch: Simplify OpenAI Batch Job Creation

[![PyPI version](https://badge.fury.io/py/openbatch.svg)](https://badge.fury.io/py/openbatch)
[![Python versions](https://img.shields.io/pypi/pyversions/openbatch.svg)](https://pypi.org/project/openbatch/)
[![Tests](https://github.com/daniel-gomm/openbatch/actions/workflows/test.yml/badge.svg)](https://github.com/daniel-gomm/openbatch/actions/workflows/test.yml)
[![codecov](https://codecov.io/gh/daniel-gomm/openbatch/branch/main/graph/badge.svg)](https://codecov.io/gh/daniel-gomm/openbatch)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![GitHub stars](https://img.shields.io/github/stars/daniel-gomm/openbatch.svg?style=social&label=Star)](https://github.com/daniel-gomm/openbatch)

**OpenBatch** is a lightweight Python utility designed to streamline the creation of JSONL files for the [OpenAI Batch API](https://platform.openai.com/docs/guides/batch). It provides a type-safe and intuitive interface using Pydantic models to construct requests for the `/v1/responses`, `/v1/chat/completions`, and `/v1/embeddings` endpoints.

For a detailed guide on using OpenBatch, please refer to the **[OpenBatch Documentation](https://openbatch.daniel-gomm.com/)**.

The library offers two distinct APIs to fit your workflow:

  * **`BatchCollector`**: A high-level, fluent API that mimics the official `openai` client. It's perfect for adding individual, distinct requests to a batch file with minimal setup.
  * **`BatchJobManager`**: A lower-level API designed for programmatically generating large batches of requests from templates and lists of inputs. It's ideal for scalable tasks like classification, data extraction, or bulk embeddings.

-----

## Installation

```bash
pip install openbatch
```

-----

## Quickstart: The `BatchCollector` API

The `BatchCollector` provides the simplest way to get started. You instantiate it with a file path, and then use its methods to add requests one by one. This example showcases calls to the Responses and Embeddings APIs.

```python
from pydantic import BaseModel, Field
from typing import List
from openbatch import BatchCollector, ReasoningConfig

# Define a Pydantic model for structured JSON output
class LogicalAnalysis(BaseModel):
    premise: str
    conclusion: str
    is_valid: bool = Field(description="Whether the conclusion logically follows from the premise.")

# 1. Initialize the collector with the desired output file path
BATCH_FILE = "responses_api_batch.jsonl"
collector = BatchCollector(batch_file_path=BATCH_FILE)

# 2. Add a standard request to the Responses API
collector.responses.create(
    custom_id="request-1-response",
    model="gpt-4o",
    instructions="You are a historian. Provide a concise summary.",
    input="What were the main causes of the French Revolution?",
    max_output_tokens=200
)

# 3. Add a structured request using a reasoning model.
# Note: Reasoning models may not support 'temperature', and it is omitted here.
collector.responses.parse(
    custom_id="request-2-reasoning",
    model="gpt-5-mini",  # Hypothetical reasoning model
    text_format=LogicalAnalysis,
    instructions="Analyze the logical argument provided by the user.",
    input="Premise: All birds can fly. A penguin is a bird. Conclusion: Therefore, a penguin can fly.",
    reasoning=ReasoningConfig(effort="high") # Configure the reasoning effort
)

# We need to create a separate collector for embeddings since the batch API requires one request type per file
EMBEDDINGS_BATCH_FILE = "embeddings_api_batch.jsonl"
embeddings_collector = BatchCollector(batch_file_path=EMBEDDINGS_BATCH_FILE)

# 4. Add an Embedding request
embeddings_collector.embeddings.create(
    custom_id="request-3-embedding",
    model="text-embedding-3-small",
    inp="OpenBatch simplifies creating batch jobs."
)

print(f"Batch file '{BATCH_FILE}' created successfully.")
```

-----

## Advanced Usage: The `BatchJobManager` API

For more complex or repetitive tasks, the `BatchJobManager` is the more appropriate tool. It excels at generating thousands of requests from a single template, for any supported API.

### Example 1: Batch Job from a Prompt Template (Responses API)

Imagine you want to generate marketing copy for 10,000 new products. Instead of creating each request manually, you can use a template with the Responses API.

```python
from openbatch import (
    BatchJobManager,
    PromptTemplate,
    Message,
    ResponsesRequest,
    PromptTemplateInputInstance
)

# 1. Define a prompt template with placeholders
copywriting_template = PromptTemplate(
    messages=[
        Message(role="system", content="You are a marketing copywriter. Generate a catchy, two-sentence description."),
        Message(role="user", content="Product: {product_name}, Features: {features}")
    ]
)

# 2. Define the common configuration for all requests
common_request_config = ResponsesRequest(
    model="gpt-4o-mini",
    temperature=0.8,
    max_output_tokens=100
)

# 3. Create a list of input instances
product_instances = [
    PromptTemplateInputInstance(
        id="prod_001",
        prompt_value_mapping={"product_name": "AeroGlide Drone", "features": "4K camera, 30-min flight"}
    ),
    PromptTemplateInputInstance(
        id="prod_002",
        prompt_value_mapping={"product_name": "HydroPure Bottle", "features": "Self-cleaning, insulated steel"}
    ),
    # ... add up to 9,998 more products
]

# 4. Use the manager to generate the batch file
manager = BatchJobManager()
manager.add_templated_instances(
    prompt=copywriting_template,
    common_request=common_request_config,
    input_instances=product_instances,
    save_file_path="copywriting_batch.jsonl"
)
```

### Example 2: Batch Embedding Requests

Similarly, you can easily create a batch job for generating embeddings for a large number of documents.

```python
from openbatch import BatchJobManager, EmbeddingsRequest, EmbeddingInputInstance

# 1. Define the common configuration for all embedding requests
common_embedding_config = EmbeddingsRequest(
    model="text-embedding-3-small",
    dimensions=512
)

# 2. Create a list of input instances
documents_to_embed = [
    EmbeddingInputInstance(id="doc_1", input="The sky is blue."),
    EmbeddingInputInstance(id="doc_2", input="Grass is green."),
    # ... add thousands more documents
]

# 3. Use the manager to generate the batch file
manager = BatchJobManager()
manager.add_embedding_requests(
    inputs=documents_to_embed,
    common_request=common_embedding_config,
    save_file_path="embeddings_batch.jsonl"
)
```

-----

## Configuring the Request

The `common_request` objects (`ResponsesRequest`, `EmbeddingsRequest`, etc.) are Pydantic models that expose all available API parameters. You can configure any parameter by passing it to the constructor.

```python
from openbatch import ResponsesRequest, ReasoningConfig

# Example of a more detailed configuration for the Responses API
detailed_config = ResponsesRequest(
    model="gpt-4o",
    service_tier="flex",
    reasoning=ReasoningConfig(effort="minimal"),
    max_output_tokens=500
)
```

You can also override any common setting on a per-instance basis by using the `instance_request_options` field.

-----

## What's Next?

`OpenBatch` helps you create the batch file. The next steps involve using that file with the OpenAI API:

1.  **Upload File**: Upload your generated `.jsonl` file to OpenAI.
2.  **Create Batch Job**: Create a new batch job pointing to your uploaded file.
3.  **Retrieve Results**: Monitor the job's status and, once completed, download the output file with the results.

For detailed instructions on these steps, please refer to the **[Official OpenAI Batch API Documentation](https://platform.openai.com/docs/api-reference/batch)**.

-----

## Validation

OpenBatch includes built-in [validation](https://openbatch.daniel-gomm.com/validation/) to catch errors before uploading to OpenAI.

```python
from openbatch import validate_batch_file

result = validate_batch_file("my_batch.jsonl")
if result.is_valid:
    print(f"Valid! {result.stats['total_requests']} requests")
```

-----

## Testing

OpenBatch includes a comprehensive test suite.

```bash
# Install with test dependencies
pip install -e ".[test]"

# Run tests
pytest

# Run with coverage report
pytest --cov=openbatch
```

The test suite includes:
- Unit tests for all core functionality
- Integration tests for end-to-end workflows
- Tests for structured outputs, reasoning models, and unicode handling

-----

## Contributing

Contributions are welcome! Please see our [Contributing Guide](CONTRIBUTING.md) for details on:

- Development setup
- Code quality standards
- Branch naming conventions (`feature/`, `fix/`, `documentation/`)
- Commit message conventions (using [gitmoji](https://gitmoji.dev/))
- Opening issues and pull requests

For bugs and feature requests, please [open an issue](https://github.com/TiepNguyen2003/OpenAIBatchJobBuilder/issues).
