Turn documents into
structured data.
Multi-Media Support
Extract from documents, images, audio, and video. Smart detection routes to the right handler automatically.
Any LLM Provider
Built on pydantic-ai. Use OpenAI, Anthropic, Google, or any compatible provider with a single model string.
Type Safe
Leverage Pydantic schemas to ensure extracted data is validated, typed, and clean every time.
Built for developers.
Stop writing regex. Just define your schema and let the LLM do the heavy lifting.
- Works with documents, images, audio, and video
- Structured error handling with typed exceptions
- Optional logfire instrumentation for tracing
receipt_parser.py
from open_xtract import extract, UrlFetchError, ModelError
class Receipt(BaseModel):
vendor: str
items: list[LineItem]
total: float
try:
result = extract(
schema=Receipt,
model="anthropic:claude-sonnet-4-5",
url="https://example.com/receipt.jpg",
instructions="Extract receipt details",
)
print(f"Vendor: {result.vendor}, Total: ${result.total}")
except UrlFetchError as e:
print(f"Failed to fetch: {e}")
durable_extract.py
from open_xtract import extract, stop_temporal
# Just add durable=True for automatic
# Temporal workflow execution
result = extract(
schema=Article,
model="openai:gpt-5.2",
url="https://example.com/report.pdf",
instructions="Extract article info",
durable=True,
temporal_ui=True, # Optional, default True
)
# Temporal UI: http://localhost:8080
# Optional: clean up when done
stop_temporal()
Powered by Temporal
Durable Execution.
Long-running extractions that survive failures. Just add durable=True and let Temporal handle the rest.
- Auto-starts Temporal via Docker
- PostgreSQL for persistent state
- Optional Temporal UI for monitoring
- Automatic retries and recovery
$ uv add open-xtract[temporal]