# Pareta

> Pareta is a marketplace + control plane for open-weights models. Its SDKs let you deploy task-specific open-weights endpoints (Pareta picks the GPU), run metered OpenAI-compatible inference, browse a per-task benchmark catalog, and evaluate models on your own data — then deploy the winner. Authenticate with a `pareta_sk_` key from the dashboard or the `PARETA_API_KEY` environment variable.

Pareta ships one SDK per language, all sharing these docs and the same `/v1` HTTP API: Python (`pip install pareta`, `from pareta import Pareta`); TypeScript/JavaScript (`npm install pareta`, `import { Pareta } from "pareta"`).


## Guide

- [Installation & authentication](https://docs.pareta.ai/guide/installation): The `pareta` package is the official client for Pareta, available for **Python** (`pip install pareta`) and **TypeScript/JavaScript** (`npm install pareta`). It deploys open-weights endpoints,…
- [Quickstart](https://docs.pareta.ai/guide/quickstart): Deploy the recommended open-weights model for a task and run inference against it, end to end, in about a dozen lines. Pareta picks the GPU and serving config for you, so there is no hardware to…
- [Core concepts](https://docs.pareta.ai/guide/core-concepts): Pareta deploys open-weights models as endpoints, lets you evaluate them on your own data, and serves OpenAI-compatible inference. This page covers the handful of ideas the rest of the SDK assumes…
- [Running inference](https://docs.pareta.ai/guide/inference): Once you have a live endpoint, you call it through `chat.completions.create`, which has the same shape as the OpenAI chat completions API. Pass the endpoint id as `model`, a list of messages, and…
- [Deploying & operating endpoints](https://docs.pareta.ai/guide/deploying-endpoints): `client.endpoints` is the control plane for serving open-weights models. You hand it a task and a model; it deploys an OpenAI-compatible inference endpoint, hands you back a live `Endpoint`, and…
- [Finding the right model](https://docs.pareta.ai/guide/discovery): Before you deploy anything, you pick a **task** and a **model**. Pareta does both for you from the SDK:
- [Evaluating on your own data](https://docs.pareta.ai/guide/evaluation): Benchmarks tell you which model wins on someone else's data. This page is about the only number that matters: how the candidates score on *your* rows.
- [Errors, retries & timeouts](https://docs.pareta.ai/guide/errors-and-retries): Every failure the SDK can raise is a subclass of `ParetaError`, so one `except` clause catches everything, and a more specific clause catches exactly the case you care about. The client also…
- [Async usage](https://docs.pareta.ai/guide/async): `AsyncPareta` is the asyncio-native client. It mirrors the synchronous `Pareta` client method-for-method: same constructor, same resource namespaces (`chat`, `models`, `endpoints`, `tasks`,…
- [Configuration](https://docs.pareta.ai/guide/configuration): Every Pareta call goes through one client object. Configuration is just how you build that client: which API key it sends, which environment it points at, how patient it is on slow or flaky…

## Examples

- [Deploy a model and call it](https://docs.pareta.ai/examples/deploy-and-infer): This is the shortest path from "I have a task" to "I'm getting completions back": pick a task, deploy the recommended open-weights model for it, then call the live endpoint with OpenAI-compatible…
- [From a sentence to a deployed winner](https://docs.pareta.ai/examples/find-and-deploy-best-model): You have a job to do ("pull the key fields out of these contracts") and a pile of your own examples. You want the cheapest open-weights model that actually does the job well, serving live…
- [Benchmark models on your own data](https://docs.pareta.ai/examples/evaluate-on-your-data): A public leaderboard tells you which model wins on someone else's data. It does not tell you which model wins on *yours*. This page shows how to take your own labeled rows, score a slate of…
- [Document extraction (PDF/image)](https://docs.pareta.ai/examples/document-extraction): Pull structured fields out of PDFs and scanned images, then serve the model that does it best for the least money.
- [Streaming chat completions](https://docs.pareta.ai/examples/streaming-chat): Stream tokens as the model generates them instead of waiting for the whole response. Pass `stream=True` to `chat.completions.create(...)` and you get an iterator of `ChatCompletionChunk` objects,…
- [Concurrent calls with AsyncPareta](https://docs.pareta.ai/examples/concurrent-async): `AsyncPareta` lets you fire many requests at once instead of one at a time. When you have a batch of inference prompts to score, or several eval runs to kick off, running them concurrently turns a…
- [Cost & quality monitoring](https://docs.pareta.ai/examples/cost-and-metrics): Every dollar you spend on Pareta runs through one org balance, and every model you serve gets watched for drift. This page is about reading both: what a call or an eval run actually cost, how the…
- [Migrating from the OpenAI SDK](https://docs.pareta.ai/examples/migrate-from-openai): Pareta inference is OpenAI-compatible. If you already call `chat.completions.create(...)` through the `openai` SDK, you do not have to rewrite that code to run on Pareta. Point the OpenAI client…

## Reference

- [Client (`Pareta`, `AsyncPareta`)](https://docs.pareta.ai/reference/client): The client is the one object you build and the only thing that talks to the network. It holds your API key, the environment URL, the timeout and retry policy, and an HTTP connection pool. Every…
- [chat.completions](https://docs.pareta.ai/reference/chat): Run inference against a deployed endpoint. `chat.completions.create(...)` is the one call you make to get tokens out of a model you deployed on Pareta. It has the same shape as the OpenAI chat…
- [models](https://docs.pareta.ai/reference/models): `client.models` lists the models you can call right now. It is the OpenAI-compatible model index: `GET /v1/models` returning only your deployed, url-bearing endpoints. Use it to discover the ids…
- [`endpoints`](https://docs.pareta.ai/reference/endpoints): `client.endpoints` is the control plane for serving open-weights models. Hand it a task and a model; it deploys an OpenAI-compatible inference endpoint, hands you back a live `Endpoint`, and lets…
- [tasks](https://docs.pareta.ai/reference/tasks): `client.tasks` is the catalog layer. Before you deploy or evaluate anything you need two things: a **task** (which benchmark you are solving) and a **model** (which model to deploy or measure).…
- [`evals`: evaluate models on your own data](https://docs.pareta.ai/reference/evals): `client.evals` runs the only benchmark that matters: how candidate models score on **your** rows. You hand Pareta a task and a list of labeled items, name a slate of open-weights candidates (and…
- [Exceptions](https://docs.pareta.ai/reference/exceptions): Every error the Pareta SDK raises is a subclass of `ParetaError`. That single base class is the contract: one `except ParetaError` catches anything the SDK can throw, and a narrower `except…
- [Response types](https://docs.pareta.ai/reference/types): Every method that talks to the API hands you back a typed object, not a bare dict. These objects give you attribute access and autocomplete over the shapes the API returns: a chat completion's…
- [Underlying HTTP API](https://docs.pareta.ai/reference/http-api): The Pareta SDKs (Python and TypeScript) are thin, typed wrappers over a plain JSON-over-HTTPS API served at `https://api.pareta.ai` under the `/v1/` prefix. Every method you call maps to exactly…

## Optional

- [OpenAPI spec](https://docs.pareta.ai/openapi.json): machine-readable contract for the underlying /v1 HTTP API the SDKs wrap.
- [llms-full.txt](https://docs.pareta.ai/llms-full.txt): the entire docs in one file.