Metadata-Version: 2.4
Name: llama-index-llms-grok
Version: 0.1.1
Summary: LlamaIndex integration for xAI Grok models with full structured output support, token counting, and native xai-sdk integration
License: MIT
License-File: LICENSE
Keywords: llama-index,grok,xai,llm,ai,machine-learning,structured-outputs,pydantic,reasoning
Author: Jose Medina
Author-email: josemedina@gmail.com
Requires-Python: >=3.10
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Dist: llama-index-core (>=0.14.8)
Requires-Dist: xai-sdk (>=1.4.0,<2.0.0)
Project-URL: Bug Tracker, https://github.com/josemedina/llama-index-llms-grok/issues
Project-URL: Documentation, https://github.com/josemedina/llama-index-llms-grok#readme
Project-URL: Homepage, https://github.com/josemedina/llama-index-llms-grok
Project-URL: Repository, https://github.com/josemedina/llama-index-llms-grok
Description-Content-Type: text/markdown

# llama-index-llms-grok

LlamaIndex integration for xAI's Grok models using the official `xai-sdk`.

This library provides native support for the latest Grok models (including Grok 4 and Grok 4.1 fast models with and without reasoning) using xAI's modern Chat API, unlike the older OpenAI-compatible completions endpoint.

## Installation

```bash
pip install llama-index-llms-grok
```

## Setup

Get your API key from [console.x.ai](https://console.x.ai/) and set it as an environment variable:

```bash
export XAI_API_KEY=your_api_key_here
```

## Quick Start

### Basic Chat

```python
from llama_index_llms_grok import Grok
from llama_index.core.llms import ChatMessage

# Initialize with default Grok 4.1 model
llm = Grok(api_key="your_api_key")  # or set XAI_API_KEY env var

# Chat
messages = [
    ChatMessage(role="system", content="You are a helpful assistant."),
    ChatMessage(role="user", content="Explain quantum computing briefly."),
]
response = llm.chat(messages)
print(response.message.content)
```

### Using Grok Fast (Non-Reasoning)

```python
from llama_index_llms_grok import GrokFast

llm = GrokFast()  # Uses grok-4-1-fast-non-reasoning model
response = llm.complete("What is the capital of France?")
print(response.text)
```

### Using Grok with Reasoning Mode

```python
from llama_index_llms_grok import GrokReasoning

# Reasoning models may take longer, so timeout is set to 3600s by default
llm = GrokReasoning(show_reasoning=True)  # Set to True to see thinking process
response = llm.complete("Solve this logic puzzle: ...")
print(response.text)
```

### Using Grok for Code

```python
from llama_index_llms_grok import GrokCode

llm = GrokCode()  # Uses grok-code-fast-1 model
response = llm.complete("Write a Python function to calculate fibonacci numbers.")
print(response.text)
```

### Using Grok Vision

```python
from llama_index_llms_grok import GrokVision

llm = GrokVision()  # Uses grok-2-vision-1212 model
# Vision capabilities for image understanding
```

### Using Grok 3 Models

```python
from llama_index_llms_grok import Grok3, Grok3Mini

# Full Grok 3 model
llm = Grok3()

# Or lightweight Grok 3 Mini
llm_mini = Grok3Mini()
```

### Streaming

```python
from llama_index_llms_grok import Grok
from llama_index.core.llms import ChatMessage

llm = Grok()
messages = [ChatMessage(role="user", content="Tell me a story about AI.")]

for chunk in llm.stream_chat(messages):
    print(chunk.delta, end="", flush=True)
```

### Custom Parameters

```python
from llama_index_llms_grok import Grok

llm = Grok(
    model="grok-4-1-fast-reasoning",
    temperature=0.7,
    max_tokens=1024,
    timeout=600,
)
```

### Token Counting

Token usage is available in `response.additional_kwargs`:

```python
from llama_index_llms_grok import Grok

llm = Grok()
response = llm.complete("Hello, world!")

# Get token counts
print(f"Prompt tokens: {response.additional_kwargs.get('prompt_tokens')}")
print(f"Completion tokens: {response.additional_kwargs.get('completion_tokens')}")
print(f"Total tokens: {response.additional_kwargs.get('total_tokens')}")
```

See `examples/token_counting_example.py` for more examples.

### Structured Outputs

Grok now supports structured outputs with Pydantic models:

```python
from pydantic import BaseModel
from llama_index.core.prompts import PromptTemplate
from llama_index.core.program import LLMTextCompletionProgram
from llama_index_llms_grok import Grok

class Person(BaseModel):
    name: str
    age: int
    occupation: str

llm = Grok()

# Method 1: Using structured_predict
prompt = PromptTemplate("Extract person info: {text}")
person = llm.structured_predict(
    output_cls=Person,
    prompt=prompt,
    text="Alice is a 30-year-old engineer"
)

# Method 2: Using LLMTextCompletionProgram
program = LLMTextCompletionProgram.from_defaults(
    output_cls=Person,
    llm=llm,
    prompt_template_str="Extract person info: {text}"
)
person = program(text="Bob is a 25-year-old designer")

# Method 3: Using as_structured_llm
structured_llm = llm.as_structured_llm(output_cls=Person)
response = structured_llm.complete("Extract: Charlie, 35, doctor")
person = response.raw  # Pydantic model instance
```

See `examples/structured_outputs_example.py` for comprehensive examples.

#### Streaming Structured Outputs

```python
from llama_index.core.prompts import PromptTemplate
from llama_index_llms_grok import Grok

llm = Grok()
prompt = PromptTemplate("Extract product info: {text}")

# Stream partial structured outputs
for partial_product in llm.stream_structured_predict(
    output_cls=Product,
    prompt=prompt,
    text="iPhone 15 costs $999 and is in stock"
):
    print(f"Update: {partial_product.name}")
```

## Available Models

### Language Models

#### Grok 4.1 (Latest - 2M Context Window)
- **`grok-4-1-fast-reasoning`** - Fast model with reasoning (default)
- **`grok-4-1-fast-non-reasoning`** - Fast model without reasoning (`GrokFast`)

#### Grok 4 (2M Context Window)  
- **`grok-4-fast-reasoning`** - Alternative fast with reasoning
- **`grok-4-fast-non-reasoning`** - Alternative fast without reasoning

#### Specialized Models
- **`grok-code-fast-1`** - Optimized for code (256K context) (`GrokCode`)
- **`grok-4-0709`** - Specific version (256K context)

#### Grok 3 (131K Context Window)
- **`grok-3`** - Standard Grok 3 model (`Grok3`)
- **`grok-3-mini`** - Lightweight Grok 3 (`Grok3Mini`)

#### Grok 2 
- **`grok-2-1212`** - Grok 2 from December 2024 (131K context)
- **`grok-2-vision-1212`** - Vision-enabled Grok 2 (32K context) (`GrokVision`)

### Image Generation Models
- **`grok-2-image-1212`** - Image generation (not yet supported in this package)

## Features

- ✅ Native xAI SDK integration using modern Chat API
- ✅ Support for all Grok models (2, 3, 4, 4.1)
- ✅ 2M context window support for Grok 4.1 models
- ✅ Specialized models: Code, Vision
- ✅ Reasoning and non-reasoning modes
- ✅ Streaming responses
- ✅ **Structured outputs** with `LLMTextCompletionProgram` and `as_structured_llm()`
- ✅ Token counting via `response.additional_kwargs`
- ✅ Automatic reasoning content handling
- ✅ Full LlamaIndex LLM interface compatibility
- ✅ Type hints and proper error handling
- ✅ Configurable timeouts for long-running reasoning tasks
- ✅ Async/await support

### Note on TokenCountingHandler

- ⚠️ **TokenCountingHandler** may not work perfectly - use `response.additional_kwargs` for reliable token counts

See [COMPATIBILITY_NOTES.md](COMPATIBILITY_NOTES.md) for details.

## Advanced Usage

### Accessing Reasoning Content

When using reasoning models with `show_reasoning=False` (default), the thinking process is stripped from the response but accessible via `additional_kwargs`:

```python
from llama_index_llms_grok import GrokReasoning
from llama_index.core.llms import ChatMessage

llm = GrokReasoning(show_reasoning=False)
response = llm.chat([ChatMessage(role="user", content="Complex question...")])

# Access reasoning if available
if "reasoning_content" in response.message.additional_kwargs:
    print("Thinking:", response.message.additional_kwargs["reasoning_content"])
print("Answer:", response.message.content)
```

### Integration with LlamaIndex

```python
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index_llms_grok import Grok

# Load documents
documents = SimpleDirectoryReader("./data").load_data()

# Create index with Grok
llm = Grok(model="grok-4-1-fast-reasoning")
index = VectorStoreIndex.from_documents(documents)

# Query
query_engine = index.as_query_engine(llm=llm)
response = query_engine.query("What are the key points in these documents?")
print(response)
```

## Examples

The package includes comprehensive examples demonstrating all features:

### Available Example Files

1. **`examples/basic_usage.py`** - Basic usage of all Grok models
   - Chat and completion
   - Fast and reasoning models
   - Streaming responses
   - Code generation with GrokCode
   - Grok 3 and Grok 3 Mini usage
   - Vision model information

2. **`examples/token_counting_example.py`** - Token usage tracking
   - Basic token counting from responses
   - Multi-turn conversation tracking
   - Token counting with different models
   - Best practices for token tracking

3. **`examples/structured_outputs_example.py`** - Structured outputs (NEW!)
   - Using `structured_predict()` method
   - Using `LLMTextCompletionProgram`
   - Using `as_structured_llm()`
   - Complex nested Pydantic models
   - Streaming structured outputs
   - Integration with query engines
   - Before/after comparisons

### Running Examples

```bash
# Set your API key
export XAI_API_KEY=your_api_key_here

# Run basic examples
python examples/basic_usage.py

# Run token counting examples
python examples/token_counting_example.py

# Run structured outputs examples
python examples/structured_outputs_example.py
```

## Requirements

- Python >=3.10
- `xai-sdk>=1.4.0`
- `llama-index-core>=0.14.8`

## License

MIT

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## Documentation

### Main Documentation
- **README.md** (this file) - Quick start and basic usage
- **[COMPATIBILITY_NOTES.md](COMPATIBILITY_NOTES.md)** - Compatibility information

## Links

- [xAI Documentation](https://docs.x.ai/)
- [xAI Console](https://console.x.ai/) - Get your API key
- [LlamaIndex Documentation](https://docs.llamaindex.ai/)
- [GitHub Repository](https://github.com/josemedina/llama-index-llms-grok)

## Comparison with Other Providers

### Why Use This Integration?

This integration uses xAI's native SDK instead of OpenAI compatibility mode:

- ✅ **Latest Models**: Access to newest Grok models immediately
- ✅ **Native Features**: Full reasoning mode support with `<thinking>` tags
- ✅ **Structured Outputs**: Complete LLMTextCompletionProgram support
- ✅ **Better Performance**: 2M context window for Grok 4.1 models
- ✅ **Specialized Models**: GrokCode, GrokVision, Grok3Mini
- ✅ **Token Counting**: Built-in token usage tracking
- ✅ **Future-Proof**: Native SDK ensures compatibility with new xAI features

### Feature Comparison

| Feature | Grok (This Package) | OpenAI | Anthropic | Gemini |
|---------|---------------------|--------|-----------|--------|
| Structured Outputs | ✅ | ✅ | ✅ | ✅ |
| LLMTextCompletionProgram | ✅ | ✅ | ✅ | ✅ |
| Streaming | ✅ | ✅ | ✅ | ✅ |
| Async Support | ✅ | ✅ | ✅ | ✅ |
| Token Counting | ✅ | ✅ | ✅ | ✅ |
| Reasoning Mode | ✅ | ❌ | ✅ | ❌ |
| 2M Context Window | ✅ | ❌ | ✅ | ✅ |
| Code-Optimized Model | ✅ | ✅ | ✅ | ✅ |

## Changelog

### Version 0.1.1 (2024-11-20)

#### Added
- ✅ **Full Structured Output Support**: `structured_predict()`, `LLMTextCompletionProgram`, `as_structured_llm()`
- ✅ **Streaming Structured Outputs**: `stream_structured_predict()` and async version
- ✅ **JSON Schema Generation**: Automatic Pydantic model to JSON schema conversion
- ✅ **Response Parsing**: Automatic markdown stripping and JSON validation

#### Fixed  
- ✅ Fixed `'Response' object is not iterable` error with `LLMTextCompletionProgram`
- ✅ Improved structured output reliability with better prompt engineering

#### Documentation
- ✅ Added comprehensive structured outputs guide
- ✅ Added structured outputs examples
- ✅ Updated README with all new features
- ✅ Added troubleshooting section

### Version 0.1.0 (2024-11-20)

#### Added
- ✅ **Structured Outputs**: Full support for `LLMTextCompletionProgram` and `as_structured_llm()`
- ✅ **Token Counting**: Token usage available in `response.additional_kwargs`
- ✅ **New Models**: Support for all Grok models (2, 3, 4, 4.1)
- ✅ **Convenience Classes**: GrokFast, GrokReasoning, GrokCode, GrokVision, Grok3, Grok3Mini
- ✅ **Streaming Support**: Full streaming for chat, completion, and structured outputs
- ✅ **Async Support**: All async methods implemented
- ✅ **Comprehensive Documentation**: 50+ pages of guides and examples

#### Fixed
- ✅ Fixed `'Response' object is not iterable` error with structured outputs
- ✅ Fixed Pydantic v2 compatibility for llama-index-core 0.14.8+
- ✅ Fixed model names to match official xAI API

#### Features
- ✅ 2M context window support for Grok 4.1 models
- ✅ Automatic reasoning content extraction
- ✅ Dynamic context window detection per model
- ✅ JSON schema generation from Pydantic models
- ✅ Automatic response parsing and validation

## Troubleshooting

### Common Issues

**Issue**: `ValueError: Trying to read the xAI API key from the XAI_API_KEY environment variable but it doesn't exist`

**Solution**: Set your API key:
```bash
export XAI_API_KEY=your_api_key_here
```

Or pass it directly:
```python
llm = Grok(api_key="your_api_key_here")
```

**Issue**: Token counting not working

**Solution**: Use `response.additional_kwargs` instead of `TokenCountingHandler`:
```python
response = llm.complete("...")
tokens = response.additional_kwargs.get('total_tokens', 0)
```

**Issue**: Structured outputs failing

**Solution**: Make sure you're using the latest version with structured output support:
```bash
pip install --upgrade llama-index-llms-grok
```

For more issues and solutions, see [COMPATIBILITY_NOTES.md](COMPATIBILITY_NOTES.md).

## Best Practices

### Choosing the Right Model

- **Fast Responses**: Use `GrokFast()` (grok-4-1-fast-non-reasoning)
- **Complex Reasoning**: Use `GrokReasoning()` (grok-4-1-fast-reasoning)
- **Code Generation**: Use `GrokCode()` (grok-code-fast-1)
- **Budget-Friendly**: Use `Grok3Mini()` (grok-3-mini)

### Token Management

```python
# Always check token usage
response = llm.complete("...")
if response.additional_kwargs:
    tokens = response.additional_kwargs.get('total_tokens', 0)
    print(f"Used {tokens} tokens")
```

### Structured Outputs

```python
# Use descriptive field names and types
from pydantic import BaseModel, Field

class Person(BaseModel):  
    """Person information."""  # Helps the LLM understand
    name: str = Field(description="Full name")
    age: int = Field(description="Age in years", ge=0, le=150)
```

### Error Handling

```python
try:
    response = llm.complete("...")
except Exception as e:
    print(f"Error: {e}")
    # Handle error appropriately
```

