# Pareta

> Pareta is a marketplace + control plane for open-weights models. The `pareta` Python SDK lets you deploy task-specific open-weights endpoints (Pareta picks the GPU), run metered OpenAI-compatible inference, browse a per-task benchmark catalog, and evaluate models on your own data — then deploy the winner. Install with `pip install pareta`; authenticate with a `pareta_sk_` key from the dashboard or `PARETA_API_KEY`.


## Guide

- [Installation & authentication](https://docs.pareta.ai/guide/installation): The `pareta` package is the Python client for Pareta. It deploys open-weights endpoints, runs metered OpenAI-compatible inference, browses the benchmark catalog, and evaluates models on your own…
- [Quickstart](https://docs.pareta.ai/guide/quickstart): Deploy the recommended open-weights model for a task and run inference against it, end to end, in about a dozen lines. Pareta picks the GPU and serving config for you, so there is no hardware to…
- [Core concepts](https://docs.pareta.ai/guide/core-concepts): Pareta deploys open-weights models as endpoints, lets you evaluate them on your own data, and serves OpenAI-compatible inference. This page covers the handful of ideas the rest of the SDK assumes…
- [Running inference](https://docs.pareta.ai/guide/inference): Once you have a live endpoint, you call it through `chat.completions.create`, which has the same shape as the OpenAI chat completions API. Pass the endpoint id as `model`, a list of messages, and…
- [Deploying & operating endpoints](https://docs.pareta.ai/guide/deploying-endpoints): `client.endpoints` is the control plane for serving open-weights models. You hand it a task and a model; it deploys an OpenAI-compatible inference endpoint, hands you back a live `Endpoint`, and…
- [Finding the right model](https://docs.pareta.ai/guide/discovery): Before you deploy anything, you pick a **task** and a **model**. Pareta does both for you from the SDK:
- [Evaluating on your own data](https://docs.pareta.ai/guide/evaluation): Benchmarks tell you which model wins on someone else's data. This page is about the only number that matters: how the candidates score on *your* rows.
- [Errors, retries & timeouts](https://docs.pareta.ai/guide/errors-and-retries): Every failure the SDK can raise is a subclass of `ParetaError`, so one `except` clause catches everything, and a more specific clause catches exactly the case you care about. The client also…
- [Async usage](https://docs.pareta.ai/guide/async): `AsyncPareta` is the asyncio-native client. It mirrors the synchronous `Pareta` client method-for-method: same constructor, same resource namespaces (`chat`, `models`, `endpoints`, `tasks`,…
- [Configuration](https://docs.pareta.ai/guide/configuration): Every Pareta call goes through one client object. Configuration is just how you build that client: which API key it sends, which environment it points at, how patient it is on slow or flaky…

## Examples

- [Deploy a model and call it](https://docs.pareta.ai/examples/deploy-and-infer): This is the shortest path from "I have a task" to "I'm getting completions back": pick a task, deploy the recommended open-weights model for it, then call the live endpoint with OpenAI-compatible…
- [From a sentence to a deployed winner](https://docs.pareta.ai/examples/find-and-deploy-best-model): You have a job to do ("pull the key fields out of these contracts") and a pile of your own examples. You want the cheapest open-weights model that actually does the job well, serving live…
- [Benchmark models on your own data](https://docs.pareta.ai/examples/evaluate-on-your-data): A public leaderboard tells you which model wins on someone else's data. It does not tell you which model wins on *yours*. This page shows how to take your own labeled rows, score a slate of…
- [Document extraction (PDF/image)](https://docs.pareta.ai/examples/document-extraction): Pull structured fields out of PDFs and scanned images, then serve the model that does it best for the least money.
- [Streaming chat completions](https://docs.pareta.ai/examples/streaming-chat): Stream tokens as the model generates them instead of waiting for the whole response. Pass `stream=True` to `chat.completions.create(...)` and you get an iterator of `ChatCompletionChunk` objects,…
- [Concurrent calls with AsyncPareta](https://docs.pareta.ai/examples/concurrent-async): `AsyncPareta` lets you fire many requests at once instead of one at a time. When you have a batch of inference prompts to score, or several eval runs to kick off, running them concurrently turns a…
- [Cost & quality monitoring](https://docs.pareta.ai/examples/cost-and-metrics): Every dollar you spend on Pareta runs through one org balance, and every model you serve gets watched for drift. This page is about reading both: what a call or an eval run actually cost, how the…
- [Migrating from the OpenAI SDK](https://docs.pareta.ai/examples/migrate-from-openai): Pareta inference is OpenAI-compatible. If you already call `chat.completions.create(...)` through the `openai` SDK, you do not have to rewrite that code to run on Pareta. Point the OpenAI client…

## Reference

- [Client (`Pareta`, `AsyncPareta`)](https://docs.pareta.ai/reference/client): The client is the one object you build and the only thing that talks to the network. It holds your API key, the environment URL, the timeout and retry policy, and an HTTP connection pool. Every…
- [chat.completions](https://docs.pareta.ai/reference/chat): Run inference against a deployed endpoint. `chat.completions.create(...)` is the one call you make to get tokens out of a model you deployed on Pareta. It has the same shape as the OpenAI chat…
- [models](https://docs.pareta.ai/reference/models): `client.models` lists the models you can call right now. It is the OpenAI-compatible model index: `GET /v1/models` returning only your deployed, url-bearing endpoints. Use it to discover the ids…
- [`endpoints`](https://docs.pareta.ai/reference/endpoints): `client.endpoints` is the control plane for serving open-weights models. Hand it a task and a model; it deploys an OpenAI-compatible inference endpoint, hands you back a live `Endpoint`, and lets…
- [tasks](https://docs.pareta.ai/reference/tasks): `client.tasks` is the catalog layer. Before you deploy or evaluate anything you need two things: a **task** (which benchmark you are solving) and a **model** (which model to deploy or measure).…
- [`evals`: evaluate models on your own data](https://docs.pareta.ai/reference/evals): `client.evals` runs the only benchmark that matters: how candidate models score on **your** rows. You hand Pareta a task and a list of labeled items, name a slate of open-weights candidates (and…
- [Exceptions](https://docs.pareta.ai/reference/exceptions): Every error the Pareta SDK raises is a subclass of `ParetaError`. That single base class is the contract: one `except ParetaError` catches anything the SDK can throw, and a narrower `except…
- [Response types](https://docs.pareta.ai/reference/types): Every method that talks to the API hands you back a typed object, not a bare dict. These objects give you attribute access and autocomplete over the shapes the API returns: a chat completion's…
- [Underlying HTTP API](https://docs.pareta.ai/reference/http-api): The Pareta Python SDK is a thin, typed wrapper over a plain JSON-over-HTTPS API served at `https://api.pareta.ai` under the `/v1/` prefix. Every method you call maps to exactly one route (a couple…

## Optional

- [OpenAPI spec](https://docs.pareta.ai/openapi.json): machine-readable contract for the underlying /v1 HTTP API the SDK wraps.
- [llms-full.txt](https://docs.pareta.ai/llms-full.txt): the entire docs in one file.
