Metadata-Version: 2.4
Name: langchain-maritaca
Version: 0.4.1
Summary: An integration package connecting Maritaca AI and LangChain for Brazilian Portuguese language models
Project-URL: Homepage, https://github.com/anderson-ufrj/langchain-maritaca
Project-URL: Documentation, https://github.com/anderson-ufrj/langchain-maritaca#readme
Project-URL: Repository, https://github.com/anderson-ufrj/langchain-maritaca
Project-URL: Issues, https://github.com/anderson-ufrj/langchain-maritaca/issues
Project-URL: Changelog, https://github.com/anderson-ufrj/langchain-maritaca/blob/main/CHANGELOG.md
Project-URL: Maritaca AI, https://www.maritaca.ai/
Author-email: Anderson Henrique da Silva <andersonhs27@gmail.com>
License: MIT
License-File: LICENSE
Keywords: ai,brazilian,chatbot,langchain,llm,maritaca,nlp,portuguese,sabia
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: <4.0,>=3.10
Requires-Dist: httpx<1.0.0,>=0.25.0
Requires-Dist: langchain-core<2.0.0,>=0.3.0
Provides-Extra: dev
Requires-Dist: mypy<2.0.0,>=1.10.0; extra == 'dev'
Requires-Dist: pre-commit<5.0.0,>=3.0.0; extra == 'dev'
Requires-Dist: pytest-asyncio<1.0.0,>=0.21.1; extra == 'dev'
Requires-Dist: pytest-cov<6.0.0,>=4.0.0; extra == 'dev'
Requires-Dist: pytest-mock<4.0.0,>=3.10.0; extra == 'dev'
Requires-Dist: pytest<9.0.0,>=7.3.0; extra == 'dev'
Requires-Dist: respx<1.0.0,>=0.20.0; extra == 'dev'
Requires-Dist: ruff<1.0.0,>=0.8.0; extra == 'dev'
Provides-Extra: test
Requires-Dist: pytest-asyncio<1.0.0,>=0.21.1; extra == 'test'
Requires-Dist: pytest-cov<6.0.0,>=4.0.0; extra == 'test'
Requires-Dist: pytest-mock<4.0.0,>=3.10.0; extra == 'test'
Requires-Dist: pytest<9.0.0,>=7.3.0; extra == 'test'
Requires-Dist: respx<1.0.0,>=0.20.0; extra == 'test'
Provides-Extra: tokenizer
Requires-Dist: tiktoken<1.0.0,>=0.5.0; extra == 'tokenizer'
Description-Content-Type: text/markdown

# langchain-maritaca

[![PyPI version](https://img.shields.io/pypi/v/langchain-maritaca.svg)](https://pypi.org/project/langchain-maritaca/)
[![Python](https://img.shields.io/pypi/pyversions/langchain-maritaca.svg)](https://pypi.org/project/langchain-maritaca/)
[![Downloads](https://img.shields.io/pypi/dm/langchain-maritaca.svg)](https://pypi.org/project/langchain-maritaca/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![CI](https://github.com/anderson-ufrj/langchain-maritaca/actions/workflows/ci.yml/badge.svg)](https://github.com/anderson-ufrj/langchain-maritaca/actions/workflows/ci.yml)
[![codecov](https://codecov.io/gh/anderson-ufrj/langchain-maritaca/graph/badge.svg)](https://codecov.io/gh/anderson-ufrj/langchain-maritaca)

[🇧🇷 Leia em Português](README.pt-br.md)

An integration package connecting [Maritaca AI](https://www.maritaca.ai/) and [LangChain](https://langchain.com/) for Brazilian Portuguese language models.

**Author:** Anderson Henrique da Silva
**Location:** Minas Gerais, Brasil
**GitHub:** [anderson-ufrj](https://github.com/anderson-ufrj)

## Overview

Maritaca AI provides state-of-the-art Brazilian Portuguese language models, including the Sabiá family of models. This integration allows you to use Maritaca's models seamlessly within the LangChain ecosystem.

### Available Models

| Model | Context | Input (R$/1M) | Output (R$/1M) | Vision |
|-------|---------|---------------|----------------|--------|
| `sabia-3.1` | 128k | R$5.00 | R$10.00 | Yes |
| `sabiazinho-4` | 128k | R$1.00 | R$4.00 | Yes |
| `sabiazinho-3.1` | 32k | R$1.00 | R$3.00 | Yes |

> **Note:** All models support vision/multimodal inputs (images).

## Installation

```bash
pip install langchain-maritaca
```

## Setup

Set your Maritaca API key as an environment variable:

```bash
export MARITACA_API_KEY="your-api-key"
```

Or pass it directly to the model:

```python
from langchain_maritaca import ChatMaritaca

model = ChatMaritaca(api_key="your-api-key")
```

## Usage

### Basic Usage

```python
from langchain_maritaca import ChatMaritaca

model = ChatMaritaca(
    model="sabia-3.1",
    temperature=0.7,
)

messages = [
    ("system", "Você é um assistente prestativo especializado em cultura brasileira."),
    ("human", "Quais são as principais festas populares do Brasil?"),
]

response = model.invoke(messages)
print(response.content)
```

### Streaming

```python
from langchain_maritaca import ChatMaritaca

model = ChatMaritaca(model="sabia-3.1", streaming=True)

for chunk in model.stream("Conte uma história sobre o folclore brasileiro"):
    print(chunk.content, end="", flush=True)
```

### Async Usage

```python
import asyncio
from langchain_maritaca import ChatMaritaca

async def main():
    model = ChatMaritaca(model="sabia-3.1")
    response = await model.ainvoke("Qual é a receita de pão de queijo?")
    print(response.content)

asyncio.run(main())
```

### With LangChain Expression Language (LCEL)

```python
from langchain_maritaca import ChatMaritaca
from langchain_core.prompts import ChatPromptTemplate

model = ChatMaritaca(model="sabia-3.1")

prompt = ChatPromptTemplate.from_messages([
    ("system", "Você é um especialista em {topic}."),
    ("human", "{question}"),
])

chain = prompt | model

response = chain.invoke({
    "topic": "história do Brasil",
    "question": "Quem foi Tiradentes?"
})
print(response.content)
```

### With Tool Calling (Function Calling)

```python
from langchain_maritaca import ChatMaritaca
from langchain_core.tools import tool

@tool
def get_weather(city: str) -> str:
    """Get the current weather for a city."""
    return f"O clima em {city} está ensolarado, 25°C"

model = ChatMaritaca(model="sabia-3.1")
model_with_tools = model.bind_tools([get_weather])

response = model_with_tools.invoke("Como está o tempo em São Paulo?")
print(response)
```

### Vision / Multimodal (Images)

All Maritaca models support image inputs. You can send images via URL or base64:

```python
from langchain_maritaca import ChatMaritaca
from langchain_core.messages import HumanMessage

model = ChatMaritaca(model="sabiazinho-4")

# With image URL
response = model.invoke([
    HumanMessage(content=[
        {"type": "text", "text": "O que você vê nesta imagem?"},
        {"type": "image", "url": "https://example.com/image.jpg"}
    ])
])
print(response.content)

# With base64-encoded image
response = model.invoke([
    HumanMessage(content=[
        {"type": "text", "text": "Descreva esta imagem em detalhes"},
        {"type": "image", "base64": "iVBORw0KGgo...", "mime_type": "image/png"}
    ])
])
```

Also compatible with OpenAI's `image_url` format:

```python
response = model.invoke([
    HumanMessage(content=[
        {"type": "text", "text": "What's in this image?"},
        {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
    ])
])
```

### With Caching

```python
from langchain_core.caches import InMemoryCache
from langchain_core.globals import set_llm_cache
from langchain_maritaca import ChatMaritaca

# Enable caching globally
set_llm_cache(InMemoryCache())

model = ChatMaritaca(model="sabia-3.1")

# First call - hits the API
response1 = model.invoke("Qual é a capital do Brasil?")

# Second call - uses cache (instant, no API cost!)
response2 = model.invoke("Qual é a capital do Brasil?")
```

### With Callbacks for Observability

```python
from langchain_maritaca import ChatMaritaca, CostTrackingCallback, LatencyTrackingCallback

# Create callbacks for monitoring
cost_cb = CostTrackingCallback()
latency_cb = LatencyTrackingCallback()

model = ChatMaritaca(callbacks=[cost_cb, latency_cb])

# Make some calls
model.invoke("Hello!")
model.invoke("How are you?")

# Check metrics
print(f"Total cost: ${cost_cb.total_cost:.6f}")
print(f"Total tokens: {cost_cb.total_tokens}")
print(f"Average latency: {latency_cb.average_latency:.2f}s")
print(f"P95 latency: {latency_cb.p95_latency:.2f}s")
```

### Token Counting & Cost Estimation

```python
from langchain_maritaca import ChatMaritaca
from langchain_core.messages import HumanMessage

model = ChatMaritaca(model="sabia-3.1")

# Count tokens in text
tokens = model.get_num_tokens("Olá, como você está?")
print(f"Tokens: {tokens}")

# Estimate cost before making a request
messages = [HumanMessage(content="Tell me about Brazil")]
estimate = model.estimate_cost(messages, max_output_tokens=1000)
print(f"Estimated cost: ${estimate['total_cost']:.6f}")
```

> **Tip**: Install with `pip install langchain-maritaca[tokenizer]` for accurate token counting using tiktoken.

## Why Maritaca AI?

Maritaca AI models are specifically trained for Brazilian Portuguese, offering:

- **Native Portuguese Understanding**: Better comprehension of Brazilian idioms, expressions, and cultural context
- **Local Data Training**: Trained on diverse Brazilian Portuguese data sources
- **Cost-Effective**: Competitive pricing for Portuguese language tasks
- **Low Latency**: Servers located in Brazil for faster response times

## Used in Production

**[Cidadão.AI](https://cidadao-ai-frontend.vercel.app/pt)** - Brazilian government transparency platform powered by AI agents, handling 331K+ requests/month.

- Frontend: [github.com/anderson-ufrj/cidadao.ai-frontend](https://github.com/anderson-ufrj/cidadao.ai-frontend)
- Backend: [github.com/anderson-ufrj/cidadao.ai-backend](https://github.com/anderson-ufrj/cidadao.ai-backend)

> *Using this package in production? [Open an issue](https://github.com/anderson-ufrj/langchain-maritaca/issues) to get featured!*

## API Reference

### ChatMaritaca

Main class for interacting with Maritaca AI models.

**Parameters:**

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `model` | str | `"sabia-3.1"` | Model name to use |
| `temperature` | float | `0.7` | Sampling temperature (0.0-2.0) |
| `max_tokens` | int | None | Maximum tokens to generate |
| `top_p` | float | `0.9` | Top-p sampling parameter |
| `api_key` | str | None | Maritaca API key (or use env var) |
| `base_url` | str | `"https://chat.maritaca.ai/api"` | API base URL |
| `timeout` | float | `60.0` | Request timeout in seconds |
| `max_retries` | int | `2` | Maximum retry attempts |
| `retry_if_rate_limited` | bool | `True` | Auto-retry on rate limit (HTTP 429) |
| `retry_delay` | float | `1.0` | Initial delay between retries (seconds) |
| `retry_max_delay` | float | `60.0` | Maximum delay between retries (seconds) |
| `retry_multiplier` | float | `2.0` | Multiplier for exponential backoff |
| `streaming` | bool | `False` | Enable streaming responses |

## Development

### Setup

```bash
# Clone the repository
git clone https://github.com/anderson-ufrj/langchain-maritaca.git
cd langchain-maritaca

# Install dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run linting
ruff check .
ruff format .

# Run type checking
mypy langchain_maritaca
```

### Running Tests

```bash
# Unit tests only
pytest tests/unit_tests/

# Integration tests (requires MARITACA_API_KEY)
pytest tests/integration_tests/

# With coverage
pytest --cov=langchain_maritaca --cov-report=html
```

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

1. Fork the repository
2. Create your feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'feat: add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request

## Changelog

See [CHANGELOG.md](CHANGELOG.md) for a list of changes.

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Related Projects

- [LangChain](https://github.com/langchain-ai/langchain) - Building applications with LLMs through composability
- [Maritaca AI](https://www.maritaca.ai/) - Brazilian Portuguese language models
